# DeltaGlider πŸ›Έ **Store 4TB of similar files in 5GB. No, that's not a typo.** DeltaGlider is a drop-in S3 replacement that achieves 99.9% compression for versioned artifacts, backups, and release archives through intelligent binary delta compression. [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![xdelta3](https://img.shields.io/badge/powered%20by-xdelta3-green.svg)](http://xdelta.org/) ## The Problem We Solved You're storing hundreds of versions of your releases. Each 100MB build differs by <1% from the previous version. You're paying to store 100GB of what's essentially 100MB of unique data. Sound familiar? ## Real-World Impact From our [ReadOnlyREST case study](docs/case-study-readonlyrest.md): - **Before**: 201,840 files, 3.96TB storage, $1,120/year - **After**: Same files, 4.9GB storage, $1.32/year - **Compression**: 99.9% (not a typo) - **Integration time**: 5 minutes ## How It Works ``` Traditional S3: v1.0.0.zip (100MB) β†’ S3: 100MB v1.0.1.zip (100MB) β†’ S3: 100MB (200MB total) v1.0.2.zip (100MB) β†’ S3: 100MB (300MB total) With DeltaGlider: v1.0.0.zip (100MB) β†’ S3: 100MB reference + 0KB delta v1.0.1.zip (100MB) β†’ S3: 98KB delta (100.1MB total) v1.0.2.zip (100MB) β†’ S3: 97KB delta (100.3MB total) ``` ## Quick Start ### Installation ```bash # Via pip (Python 3.11+) pip install deltaglider # Via uv (faster) uv pip install deltaglider # Via Docker docker run -v ~/.aws:/root/.aws deltaglider/deltaglider --help ``` ### AWS S3 Compatible Commands DeltaGlider is a **drop-in replacement** for AWS S3 CLI with automatic delta compression: ```bash # Copy files to/from S3 (automatic delta compression for archives) deltaglider cp my-app-v1.0.0.zip s3://releases/ deltaglider cp s3://releases/my-app-v1.0.0.zip ./downloaded.zip # Recursive directory operations deltaglider cp -r ./dist/ s3://releases/v1.0.0/ deltaglider cp -r s3://releases/v1.0.0/ ./local-copy/ # List buckets and objects deltaglider ls # List all buckets deltaglider ls s3://releases/ # List objects deltaglider ls -r s3://releases/ # Recursive listing deltaglider ls -h --summarize s3://releases/ # Human-readable with summary # Remove objects deltaglider rm s3://releases/old-version.zip # Remove single object deltaglider rm -r s3://releases/old/ # Recursive removal deltaglider rm --dryrun s3://releases/test.zip # Preview deletion # Sync directories (only transfers changes) deltaglider sync ./local-dir/ s3://releases/ # Sync to S3 deltaglider sync s3://releases/ ./local-backup/ # Sync from S3 deltaglider sync --delete ./src/ s3://backup/ # Mirror exactly deltaglider sync --exclude "*.log" ./src/ s3://backup/ # Exclude patterns # Works with MinIO, R2, and S3-compatible storage deltaglider cp file.zip s3://bucket/ --endpoint-url http://localhost:9000 ``` ### Legacy Commands (still supported) ```bash # Original DeltaGlider commands deltaglider put my-app-v1.0.0.zip s3://releases/ deltaglider get s3://releases/my-app-v1.0.1.zip deltaglider verify s3://releases/my-app-v1.0.1.zip.delta ``` ## Intelligent File Type Detection DeltaGlider automatically detects file types and applies the optimal strategy: | File Type | Strategy | Typical Compression | |-----------|----------|-------------------| | `.zip`, `.tar`, `.gz` | Binary delta | 99%+ for similar versions | | `.dmg`, `.deb`, `.rpm` | Binary delta | 95%+ for similar versions | | `.jar`, `.war`, `.ear` | Binary delta | 90%+ for similar builds | | `.exe`, `.dll`, `.so` | Direct upload | 0% (no delta benefit) | | `.txt`, `.json`, `.xml` | Direct upload | 0% (use gzip instead) | | `.sha1`, `.sha512`, `.md5` | Direct upload | 0% (already minimal) | ## Performance Benchmarks Testing with real software releases: ```python # 513 Elasticsearch plugin releases (82.5MB each) Original size: 42.3 GB DeltaGlider size: 115 MB Compression: 99.7% Upload speed: 3-4 files/second Download speed: <100ms reconstruction ``` ## Integration Examples ### Drop-in AWS CLI Replacement ```bash # Before (aws-cli) aws s3 cp release-v2.0.0.zip s3://releases/ aws s3 cp --recursive ./build/ s3://releases/v2.0.0/ aws s3 ls s3://releases/ aws s3 rm s3://releases/old-version.zip # After (deltaglider) - Same commands, 99% less storage! deltaglider cp release-v2.0.0.zip s3://releases/ deltaglider cp -r ./build/ s3://releases/v2.0.0/ deltaglider ls s3://releases/ deltaglider rm s3://releases/old-version.zip ``` ### CI/CD Pipeline (GitHub Actions) ```yaml - name: Upload Release with 99% compression run: | pip install deltaglider # Use AWS S3 compatible syntax deltaglider cp dist/*.zip s3://releases/${{ github.ref_name }}/ # Or use recursive for entire directories deltaglider cp -r dist/ s3://releases/${{ github.ref_name }}/ ``` ### Backup Script ```bash #!/bin/bash # Daily backup with automatic deduplication tar -czf backup-$(date +%Y%m%d).tar.gz /data deltaglider cp backup-*.tar.gz s3://backups/ # Only changes are stored, not full backup # List backups with human-readable sizes deltaglider ls -h s3://backups/ # Clean up old backups deltaglider rm -r s3://backups/2023/ ``` ### Python SDK ```python from deltaglider import DeltaService service = DeltaService( bucket="releases", storage_backend="s3", # or "minio", "r2", etc ) # Upload with automatic compression summary = service.put("my-app-v2.0.0.zip", "v2.0.0/") print(f"Stored {summary.original_size} as {summary.stored_size}") # Output: Stored 104857600 as 98304 (99.9% reduction) # Download with automatic reconstruction service.get("v2.0.0/my-app-v2.0.0.zip", "local-copy.zip") ``` ## Migration from AWS CLI Migrating from `aws s3` to `deltaglider` is as simple as changing the command name: | AWS CLI | DeltaGlider | Compression Benefit | |---------|------------|-------------------| | `aws s3 cp file.zip s3://bucket/` | `deltaglider cp file.zip s3://bucket/` | βœ… 99% for similar files | | `aws s3 cp -r dir/ s3://bucket/` | `deltaglider cp -r dir/ s3://bucket/` | βœ… 99% for archives | | `aws s3 ls s3://bucket/` | `deltaglider ls s3://bucket/` | - | | `aws s3 rm s3://bucket/file` | `deltaglider rm s3://bucket/file` | - | | `aws s3 sync dir/ s3://bucket/` | `deltaglider sync dir/ s3://bucket/` | βœ… 99% incremental | ### Compatibility Flags ```bash # All standard AWS flags work deltaglider cp file.zip s3://bucket/ \ --endpoint-url http://localhost:9000 \ --profile production \ --region us-west-2 # DeltaGlider-specific flags deltaglider cp file.zip s3://bucket/ \ --no-delta # Disable compression for specific files --max-ratio 0.8 # Only use delta if compression > 20% ``` ## Architecture DeltaGlider uses a clean hexagonal architecture: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Your App │────▢│ DeltaGlider │────▢│ S3/MinIO β”‚ β”‚ (CLI/SDK) β”‚ β”‚ Core β”‚ β”‚ Storage β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ Local Cache β”‚ β”‚ (References) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Key Components:** - **Binary diff engine**: xdelta3 for optimal compression - **Intelligent routing**: Automatic file type detection - **Integrity verification**: SHA256 on every operation - **Local caching**: Fast repeated operations - **Zero dependencies**: No database, no manifest files ## When to Use DeltaGlider βœ… **Perfect for:** - Software releases and versioned artifacts - Container images and layers - Database backups and snapshots - Machine learning model checkpoints - Game assets and updates - Any versioned binary data ❌ **Not ideal for:** - Already compressed unique files - Streaming media files - Frequently changing unstructured data - Files smaller than 1MB ## Comparison | Solution | Compression | Speed | Integration | Cost | |----------|------------|-------|-------------|------| | **DeltaGlider** | 99%+ | Fast | Drop-in | Open source | | S3 Versioning | 0% | Native | Built-in | $$ per version | | Deduplication | 30-50% | Slow | Complex | Enterprise $$$ | | Git LFS | Good | Slow | Git-only | $ per GB | | Restic/Borg | 80-90% | Medium | Backup-only | Open source | ## Production Ready - βœ… **Battle tested**: 200K+ files in production - βœ… **Data integrity**: SHA256 verification on every operation - βœ… **S3 compatible**: Works with AWS, MinIO, Cloudflare R2, etc. - βœ… **Atomic operations**: No partial states - βœ… **Concurrent safe**: Multiple clients supported - βœ… **Well tested**: 95%+ code coverage ## Development ```bash # Clone the repo git clone https://github.com/your-org/deltaglider cd deltaglider # Install with dev dependencies uv pip install -e ".[dev]" # Run tests uv run pytest # Run with local MinIO docker-compose up -d export AWS_ENDPOINT_URL=http://localhost:9000 deltaglider put test.zip s3://test/ ``` ## FAQ **Q: What if my reference file gets corrupted?** A: Every operation includes SHA256 verification. Corruption is detected immediately. **Q: How fast is reconstruction?** A: Sub-100ms for typical files. The delta is applied in-memory using xdelta3. **Q: Can I use this with existing S3 data?** A: Yes! DeltaGlider can start optimizing new uploads immediately. Old data remains accessible. **Q: What's the overhead for unique files?** A: Zero. Files without similarity are uploaded directly. **Q: Is this compatible with S3 encryption?** A: Yes, DeltaGlider respects all S3 settings including SSE, KMS, and bucket policies. ## The Math For `N` versions of a `S` MB file with `D%` difference between versions: **Traditional S3**: `N Γ— S` MB **DeltaGlider**: `S + (N-1) Γ— S Γ— D%` MB Example: 100 versions of 100MB files with 1% difference: - **Traditional**: 10,000 MB - **DeltaGlider**: 199 MB - **Savings**: 98% ## Contributing We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. Key areas we're exploring: - Cloud-native reference management - Rust implementation for 10x speed - Automatic similarity detection - Multi-threaded delta generation - WASM support for browser usage ## License MIT - Use it freely in your projects. ## Success Stories > "We reduced our artifact storage from 4TB to 5GB. This isn't hyperboleβ€”it's math." > β€” [ReadOnlyREST Case Study](docs/case-study-readonlyrest.md) > "Our CI/CD pipeline now uploads 100x faster. Deploys that took minutes now take seconds." > β€” Platform Engineer at [redacted] > "We were about to buy expensive deduplication storage. DeltaGlider saved us $50K/year." > β€” CTO at [stealth startup] --- **Try it now**: Got versioned files in S3? See your potential savings: ```bash # Analyze your S3 bucket deltaglider analyze s3://your-bucket/ # Output: "Potential savings: 95.2% (4.8TB β†’ 237GB)" ``` Built with ❀️ by engineers who were tired of paying to store the same bytes over and over.