mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-01-11 14:40:26 +01:00
Created extensive documentation for the DG_MAX_RATIO parameter, which controls delta compression efficiency thresholds. New Documentation: - docs/DG_MAX_RATIO.md (526 lines) * Complete explanation of how DG_MAX_RATIO works * Real-world scenarios and use cases * Decision trees for choosing optimal values * Industry-specific recommendations * Monitoring and tuning strategies * Advanced usage patterns * Comprehensive FAQ Updates to Existing Documentation: - README.md: Added link to DG_MAX_RATIO guide with tip callout - CLAUDE.md: Added detailed DG_MAX_RATIO explanation and guide link - Dockerfile: Added inline comments explaining DG_MAX_RATIO tuning - docs/sdk/getting-started.md: Added DG_MAX_RATIO guide reference Key Topics Covered: - What DG_MAX_RATIO does and why it exists - How to choose the right value (0.2-0.7 range) - Real-world scenarios (nightly builds, major versions, etc.) - Industry-specific use cases (SaaS, mobile apps, backups, etc.) - Configuration examples (Docker, SDK, CLI) - Monitoring and optimization strategies - Advanced usage patterns (dynamic ratios, A/B testing) - FAQ addressing common questions Examples Included: - Conservative (0.2-0.3): For dissimilar files or expensive storage - Default (0.5): Balanced approach for most use cases - Permissive (0.6-0.7): For very similar files or cheap storage Value Proposition: - Helps users optimize compression for their specific use case - Prevents inefficient delta compression - Provides data-driven tuning methodology - Reduces support questions about compression behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
6.9 KiB
6.9 KiB
Getting Started with DeltaGlider SDK
This guide will help you get up and running with the DeltaGlider Python SDK in minutes.
Prerequisites
- Python 3.11 or higher
- AWS credentials configured (or access to MinIO/S3-compatible storage)
- xdelta3 installed on your system (installed automatically with the package)
Installation
Using pip
pip install deltaglider
Using uv (faster)
uv pip install deltaglider
Development Installation
git clone https://github.com/beshu-tech/deltaglider
cd deltaglider
pip install -e ".[dev]"
Configuration
AWS Credentials
DeltaGlider uses standard AWS credential discovery:
- Environment Variables
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-west-2
- AWS Credentials File (
~/.aws/credentials)
[default]
aws_access_key_id = your_access_key
aws_secret_access_key = your_secret_key
region = us-west-2
- IAM Role (when running on EC2/ECS/Lambda) Automatically uses instance/task role credentials
Custom S3 Endpoints
For MinIO, Cloudflare R2, or other S3-compatible storage:
from deltaglider import create_client
client = create_client(endpoint_url="http://minio.local:9000")
Or via environment variable:
export AWS_ENDPOINT_URL=http://minio.local:9000
DeltaGlider Configuration
DeltaGlider supports the following environment variables:
Logging & Performance:
DG_LOG_LEVEL: Logging level (default:INFO, options:DEBUG,INFO,WARNING,ERROR)DG_MAX_RATIO: Maximum delta/file ratio (default:0.5, range:0.0-1.0)- See DG_MAX_RATIO.md for complete tuning guide
- Controls when to use delta compression vs. direct storage
- Lower (0.2-0.3) = conservative, only high-quality compression
- Higher (0.6-0.7) = permissive, accept modest savings
Cache Configuration:
DG_CACHE_BACKEND: Cache backend type (default:filesystem, options:filesystem,memory)DG_CACHE_MEMORY_SIZE_MB: Memory cache size in MB (default:100)DG_CACHE_ENCRYPTION_KEY: Optional base64-encoded Fernet key for persistent encryption
Security:
- Encryption is always enabled (cannot be disabled)
- Ephemeral encryption keys per process (forward secrecy)
- Corrupted cache files automatically deleted
- Set
DG_CACHE_ENCRYPTION_KEYonly for cross-process cache sharing
Example:
# Use memory cache for faster performance in CI/CD
export DG_CACHE_BACKEND=memory
export DG_CACHE_MEMORY_SIZE_MB=500
# Enable debug logging
export DG_LOG_LEVEL=DEBUG
# Adjust delta compression threshold
export DG_MAX_RATIO=0.3 # More aggressive compression
Your First Upload
Basic Example
from deltaglider import create_client
# Create a client
client = create_client()
# Upload a file
summary = client.upload(
file_path="my-app-v1.0.0.zip",
s3_url="s3://my-bucket/releases/v1.0.0/"
)
# Check the results
print(f"Upload completed!")
print(f"Original size: {summary.original_size_mb:.1f} MB")
print(f"Stored size: {summary.stored_size_mb:.1f} MB")
print(f"Compression: {summary.savings_percent:.0f}%")
print(f"Is delta: {summary.is_delta}")
Understanding the Results
When you upload a file, DeltaGlider returns an UploadSummary with:
operation: What was done (PUTfor new reference,PUT_DELTAfor delta)original_size_mb: Original file size in MBstored_size_mb: Actual size stored in S3savings_percent: Percentage of storage savedis_delta: Whether delta compression was useddelta_ratio: Ratio of delta size to original (smaller is better)
Downloading Files
# Download a file
client.download(
s3_url="s3://my-bucket/releases/v1.0.0/my-app-v1.0.0.zip",
output_path="downloaded-app.zip"
)
# The file is automatically reconstructed if it was stored as a delta
Working with Multiple Versions
Here's where DeltaGlider shines - uploading multiple versions:
from deltaglider import create_client
from pathlib import Path
client = create_client()
# Upload multiple versions
versions = ["v1.0.0", "v1.0.1", "v1.0.2", "v1.1.0"]
for version in versions:
file = f"builds/my-app-{version}.zip"
summary = client.upload(
file_path=file,
s3_url=f"s3://releases/{version}/"
)
if summary.is_delta:
print(f"{version}: Compressed to {summary.stored_size_mb:.1f}MB "
f"(saved {summary.savings_percent:.0f}%)")
else:
print(f"{version}: Stored as reference ({summary.original_size_mb:.1f}MB)")
# Typical output:
# v1.0.0: Stored as reference (100.0MB)
# v1.0.1: Compressed to 0.2MB (saved 99.8%)
# v1.0.2: Compressed to 0.3MB (saved 99.7%)
# v1.1.0: Compressed to 5.2MB (saved 94.8%)
Verification
Verify the integrity of stored files:
# Verify a stored file
is_valid = client.verify("s3://releases/v1.0.0/my-app-v1.0.0.zip")
print(f"File integrity: {'✓ Valid' if is_valid else '✗ Corrupted'}")
Error Handling
from deltaglider import create_client
client = create_client()
try:
summary = client.upload("app.zip", "s3://bucket/path/")
except FileNotFoundError:
print("Local file not found")
except PermissionError:
print("S3 access denied - check credentials")
except Exception as e:
print(f"Upload failed: {e}")
Logging
Control logging verbosity:
# Debug logging for troubleshooting
client = create_client(log_level="DEBUG")
# Quiet mode
client = create_client(log_level="WARNING")
# Default is INFO
client = create_client() # INFO level
Local Testing with MinIO
For development and testing without AWS:
- Start MinIO
docker run -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=minioadmin \
-e MINIO_ROOT_PASSWORD=minioadmin \
minio/minio server /data --console-address ":9001"
-
Create a bucket (via MinIO Console at http://localhost:9001)
-
Use DeltaGlider
from deltaglider import create_client
client = create_client(
endpoint_url="http://localhost:9000"
)
# Set credentials via environment
import os
os.environ["AWS_ACCESS_KEY_ID"] = "minioadmin"
os.environ["AWS_SECRET_ACCESS_KEY"] = "minioadmin"
# Now use normally
summary = client.upload("test.zip", "s3://test-bucket/")
Best Practices
- Group Similar Files: Upload related files to the same S3 prefix for optimal compression
- Version Naming: Use consistent naming for versions (e.g.,
app-v1.0.0.zip,app-v1.0.1.zip) - Cache Management: The local reference cache improves performance - don't clear it unnecessarily
- Error Recovery: Always handle exceptions for production code
- Monitoring: Log compression ratios to track effectiveness
Next Steps
- Examples - See real-world usage patterns
- API Reference - Complete API documentation
- Architecture - Understand how it works