# DeltaGlider πŸ›Έ **Store 4TB of similar files in 5GB. No, that's not a typo.** DeltaGlider is a drop-in S3 replacement that achieves 99.9% compression for versioned artifacts, backups, and release archives through intelligent binary delta compression. [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![xdelta3](https://img.shields.io/badge/powered%20by-xdelta3-green.svg)](http://xdelta.org/) ## The Problem We Solved You're storing hundreds of versions of your releases. Each 100MB build differs by <1% from the previous version. You're paying to store 100GB of what's essentially 100MB of unique data. Sound familiar? ## Real-World Impact From our [ReadOnlyREST case study](docs/case-study-readonlyrest.md): - **Before**: 201,840 files, 3.96TB storage, $1,120/year - **After**: Same files, 4.9GB storage, $1.32/year - **Compression**: 99.9% (not a typo) - **Integration time**: 5 minutes ## How It Works ``` Traditional S3: v1.0.0.zip (100MB) β†’ S3: 100MB v1.0.1.zip (100MB) β†’ S3: 100MB (200MB total) v1.0.2.zip (100MB) β†’ S3: 100MB (300MB total) With DeltaGlider: v1.0.0.zip (100MB) β†’ S3: 100MB reference + 0KB delta v1.0.1.zip (100MB) β†’ S3: 98KB delta (100.1MB total) v1.0.2.zip (100MB) β†’ S3: 97KB delta (100.3MB total) ``` ## Quick Start ### Installation ```bash # Via pip (Python 3.11+) pip install deltaglider # Via uv (faster) uv pip install deltaglider # Via Docker docker run -v ~/.aws:/root/.aws deltaglider/deltaglider --help ``` ### Your First Upload ```bash # Upload a file - DeltaGlider automatically handles compression deltaglider put my-app-v1.0.0.zip s3://releases/ # Upload v1.0.1 - automatically creates a 99% smaller delta deltaglider put my-app-v1.0.1.zip s3://releases/ # ↑ This 100MB file takes only ~100KB in S3 # Download - automatically reconstructs from delta deltaglider get s3://releases/my-app-v1.0.1.zip # ↑ Seamless reconstruction, SHA256 verified ``` ## Intelligent File Type Detection DeltaGlider automatically detects file types and applies the optimal strategy: | File Type | Strategy | Typical Compression | |-----------|----------|-------------------| | `.zip`, `.tar`, `.gz` | Binary delta | 99%+ for similar versions | | `.dmg`, `.deb`, `.rpm` | Binary delta | 95%+ for similar versions | | `.jar`, `.war`, `.ear` | Binary delta | 90%+ for similar builds | | `.exe`, `.dll`, `.so` | Direct upload | 0% (no delta benefit) | | `.txt`, `.json`, `.xml` | Direct upload | 0% (use gzip instead) | | `.sha1`, `.sha512`, `.md5` | Direct upload | 0% (already minimal) | ## Performance Benchmarks Testing with real software releases: ```python # 513 Elasticsearch plugin releases (82.5MB each) Original size: 42.3 GB DeltaGlider size: 115 MB Compression: 99.7% Upload speed: 3-4 files/second Download speed: <100ms reconstruction ``` ## Integration Examples ### CI/CD Pipeline (GitHub Actions) ```yaml - name: Upload Release with 99% compression run: | pip install deltaglider deltaglider put dist/*.zip s3://releases/${{ github.ref_name }}/ ``` ### Backup Script ```bash #!/bin/bash # Daily backup with automatic deduplication tar -czf backup-$(date +%Y%m%d).tar.gz /data deltaglider put backup-*.tar.gz s3://backups/ # Only changes are stored, not full backup ``` ### Python SDK ```python from deltaglider import DeltaService service = DeltaService( bucket="releases", storage_backend="s3", # or "minio", "r2", etc ) # Upload with automatic compression summary = service.put("my-app-v2.0.0.zip", "v2.0.0/") print(f"Stored {summary.original_size} as {summary.stored_size}") # Output: Stored 104857600 as 98304 (99.9% reduction) # Download with automatic reconstruction service.get("v2.0.0/my-app-v2.0.0.zip", "local-copy.zip") ``` ## Architecture DeltaGlider uses a clean hexagonal architecture: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Your App │────▢│ DeltaGlider │────▢│ S3/MinIO β”‚ β”‚ (CLI/SDK) β”‚ β”‚ Core β”‚ β”‚ Storage β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ Local Cache β”‚ β”‚ (References) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Key Components:** - **Binary diff engine**: xdelta3 for optimal compression - **Intelligent routing**: Automatic file type detection - **Integrity verification**: SHA256 on every operation - **Local caching**: Fast repeated operations - **Zero dependencies**: No database, no manifest files ## When to Use DeltaGlider βœ… **Perfect for:** - Software releases and versioned artifacts - Container images and layers - Database backups and snapshots - Machine learning model checkpoints - Game assets and updates - Any versioned binary data ❌ **Not ideal for:** - Already compressed unique files - Streaming media files - Frequently changing unstructured data - Files smaller than 1MB ## Comparison | Solution | Compression | Speed | Integration | Cost | |----------|------------|-------|-------------|------| | **DeltaGlider** | 99%+ | Fast | Drop-in | Open source | | S3 Versioning | 0% | Native | Built-in | $$ per version | | Deduplication | 30-50% | Slow | Complex | Enterprise $$$ | | Git LFS | Good | Slow | Git-only | $ per GB | | Restic/Borg | 80-90% | Medium | Backup-only | Open source | ## Production Ready - βœ… **Battle tested**: 200K+ files in production - βœ… **Data integrity**: SHA256 verification on every operation - βœ… **S3 compatible**: Works with AWS, MinIO, Cloudflare R2, etc. - βœ… **Atomic operations**: No partial states - βœ… **Concurrent safe**: Multiple clients supported - βœ… **Well tested**: 95%+ code coverage ## Development ```bash # Clone the repo git clone https://github.com/your-org/deltaglider cd deltaglider # Install with dev dependencies uv pip install -e ".[dev]" # Run tests uv run pytest # Run with local MinIO docker-compose up -d export AWS_ENDPOINT_URL=http://localhost:9000 deltaglider put test.zip s3://test/ ``` ## FAQ **Q: What if my reference file gets corrupted?** A: Every operation includes SHA256 verification. Corruption is detected immediately. **Q: How fast is reconstruction?** A: Sub-100ms for typical files. The delta is applied in-memory using xdelta3. **Q: Can I use this with existing S3 data?** A: Yes! DeltaGlider can start optimizing new uploads immediately. Old data remains accessible. **Q: What's the overhead for unique files?** A: Zero. Files without similarity are uploaded directly. **Q: Is this compatible with S3 encryption?** A: Yes, DeltaGlider respects all S3 settings including SSE, KMS, and bucket policies. ## The Math For `N` versions of a `S` MB file with `D%` difference between versions: **Traditional S3**: `N Γ— S` MB **DeltaGlider**: `S + (N-1) Γ— S Γ— D%` MB Example: 100 versions of 100MB files with 1% difference: - **Traditional**: 10,000 MB - **DeltaGlider**: 199 MB - **Savings**: 98% ## Contributing We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. Key areas we're exploring: - Cloud-native reference management - Rust implementation for 10x speed - Automatic similarity detection - Multi-threaded delta generation - WASM support for browser usage ## License MIT - Use it freely in your projects. ## Success Stories > "We reduced our artifact storage from 4TB to 5GB. This isn't hyperboleβ€”it's math." > β€” [ReadOnlyREST Case Study](docs/case-study-readonlyrest.md) > "Our CI/CD pipeline now uploads 100x faster. Deploys that took minutes now take seconds." > β€” Platform Engineer at [redacted] > "We were about to buy expensive deduplication storage. DeltaGlider saved us $50K/year." > β€” CTO at [stealth startup] --- **Try it now**: Got versioned files in S3? See your potential savings: ```bash # Analyze your S3 bucket deltaglider analyze s3://your-bucket/ # Output: "Potential savings: 95.2% (4.8TB β†’ 237GB)" ``` Built with ❀️ by engineers who were tired of paying to store the same bytes over and over.