diff --git a/README.md b/README.md index e5ac8d1..b05e10c 100644 --- a/README.md +++ b/README.md @@ -6,9 +6,8 @@ [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![xdelta3](https://img.shields.io/badge/powered%20by-xdelta3-green.svg)](https://github.com/jmacd/xdelta) -
- DeltaGlider Logo -
+> 🌟 Star if you like this! πŸ™ +> Leave a message in [Issues](https://github.com/beshu-tech/deltaglider/issues) - we are listening! **Store 4TB of similar files in 5GB. No, that's not a typo.** @@ -37,6 +36,10 @@ We don't expect significant benefit for multimedia content like videos, but we n The quickest way to start is using the GUI * https://github.com/beshu-tech/deltaglider_commander/ +
+ DeltaGlider Logo +
+ ### CLI Installation ```bash @@ -487,18 +490,18 @@ This is why DeltaGlider achieves 99%+ compression on versioned archives - xdelta ### System Architecture -DeltaGlider uses a clean hexagonal architecture: +DeltaGlider intelligently stores files within **DeltaSpaces** - S3 prefixes where related files share a common reference file for delta compression: ``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Your App │────▢│ DeltaGlider │────▢│ S3/MinIO β”‚ -β”‚ (CLI/SDK) β”‚ β”‚ Core β”‚ β”‚ Storage β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” - β”‚ Local Cache β”‚ - β”‚ (References) β”‚ - β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Your App │────▢│ DeltaGlider │────▢│ DeltaSpace β”‚ +β”‚ (CLI/SDK) β”‚ β”‚ Core β”‚ β”‚ (S3 prefix) β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ + β”‚ β”‚ reference.bin β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ file1.delta β”‚ + β”‚ Local Cache β”‚ β”‚ file2.delta β”‚ + β”‚ (References) β”‚ β”‚ file3.delta β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Key Components:** @@ -507,6 +510,9 @@ DeltaGlider uses a clean hexagonal architecture: - **Integrity verification**: SHA256 on every operation - **Local caching**: Fast repeated operations - **Zero dependencies**: No database, no manifest files +- **Modular storage**: The storage layer is pluggable - you could easily replace S3 with a filesystem driver (using extended attributes for metadata) or any other backend + +The codebase follows a ports-and-adapters pattern where core business logic is decoupled from infrastructure, with storage operations abstracted through well-defined interfaces in the `ports/` directory and concrete implementations in `adapters/`. ### When to Use DeltaGlider @@ -651,14 +657,8 @@ MIT - Use it freely in your projects. ## Success Stories -> "We reduced our artifact storage from 4TB to 5GB. This isn't hyperboleβ€”it's math." -> β€” [ReadOnlyREST Case Study](docs/case-study-readonlyrest.md) - -> "Our CI/CD pipeline now uploads 100x faster. Deploys that took minutes now take seconds." -> β€” Platform Engineer at [redacted] - -> "We were about to buy expensive deduplication storage. DeltaGlider saved us $50K/year." -> β€” CTO at [stealth startup] +> "We reduced our artifact storage from 4TB to 5GB. CI is also much faster, due to smaller uploads." +> β€” [ReadonlyREST Case Study](docs/case-study-readonlyrest.md) --- @@ -670,4 +670,10 @@ deltaglider analyze s3://your-bucket/ # Output: "Potential savings: 95.2% (4.8TB β†’ 237GB)" ``` -Built with ❀️ by engineers who were tired of paying to store the same bytes over and over. +## Who built this? + +Built with ❀️ by [ReadonlyREST](https://readonlyrest.com) engineers who were tired of paying to store the same bytes over and over. + +We also built [Anaphora](https://anaphora.it) for aggregated reports and alerting + +And [Deltaglider Commander](https://github.com/beshu-tech/deltaglider_commander) diff --git a/docs/case-study-readonlyrest.md b/docs/case-study-readonlyrest.md index 1764e2d..a3f77e7 100644 --- a/docs/case-study-readonlyrest.md +++ b/docs/case-study-readonlyrest.md @@ -1,347 +1,76 @@ -# Case Study: How ReadOnlyREST Reduced Storage Costs by 99.9% with DeltaGlider +## How ReadonlyREST Cut 4TB of S3 Storage Down to 5GB (and Saved 99.9%) -## Executive Summary +### TL;DR -**The Challenge**: ReadOnlyREST, a security plugin for Elasticsearch, was facing exponential storage costs managing 145 release versions across multiple product lines, consuming nearly 4TB of S3 storage. +We were paying to store 4TB of mostly identical plugin builds. +DeltaGlider deduplicated everything down to 4.9GB β€” 99.9% smaller, $1.1k/year cheaper, and no workflow changes. -**The Solution**: DeltaGlider, an intelligent delta compression system that reduced storage from 4,060GB to just 4.9GB. +#### The Problem -**The Impact**: -- πŸ’° **$1,119 annual savings** on storage costs -- πŸ“‰ **99.9% reduction** in storage usage -- ⚑ **Zero changes** to existing workflows -- βœ… **Full data integrity** maintained +ReadonlyREST supports ~150 Elasticsearch/Kibana versions Γ— multiple product lines Γ— all our own releases. +After years of publishing builds, our S3 archive hit `4TB` (201,840 files, $93/month). +Glacier helped, but restoring files took 48 hours β€” useless for CI/CD. ---- +Every plugin ZIP was ~82MB, but `99.7% identical` to the next one. We were paying to store duplicates. -## The Storage Crisis +#### The Fix: DeltaGlider -### The Numbers That Kept Us Up at Night +DeltaGlider stores binary deltas between similar files instead of full copies. -ReadOnlyREST maintains a comprehensive release archive: -- **145 version folders** (v1.50.0 through v1.66.1) -- **201,840 total files** to manage -- **3.96 TB** of S3 storage consumed -- **$1,120/year** in storage costs alone - -Each version folder contained: -- 513 plugin ZIP files (one for each Elasticsearch version) -- 879 checksum files (SHA1 and SHA512) -- 3 product lines (Enterprise, Pro, Free) - -### The Hidden Problem - -What made this particularly painful wasn't just the sizeβ€”it was the **redundancy**. Each 82.5MB plugin ZIP was 99.7% identical to others in the same version, differing only in minor Elasticsearch compatibility adjustments. We were essentially storing the same data hundreds of times. - -> "We were paying to store 4TB of data that was fundamentally just variations of the same ~250MB of unique content. It felt like photocopying War and Peace 500 times because each copy had a different page number." -> -> β€” *DevOps Lead* - ---- - -## Enter DeltaGlider - -### The Lightbulb Moment - -The breakthrough came when we realized we didn't need to store complete filesβ€”just the *differences* between them. DeltaGlider applies this principle automatically: - -1. **First file becomes the reference** (stored in full) -2. **Similar files store only deltas** (typically 0.3% of original size) -3. **Different files uploaded directly** (no delta overhead) - -### Implementation: Surprisingly Simple - -```bash -# Before DeltaGlider (standard S3 upload) -aws s3 cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/ -# Size on S3: 82.5MB - -# With DeltaGlider -deltaglider cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/ -# Size on S3: 65KB (99.92% smaller!) +# Before +``` +aws s3 cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/ # 82MB ``` -The beauty? **Zero changes to our build pipeline**. DeltaGlider works as a drop-in replacement for S3 uploads. - ---- - -## The Results: Beyond Our Expectations - -### Storage Transformation - +# After ``` -BEFORE DELTAGLIDER AFTER DELTAGLIDER -━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━ -4,060 GB (3.96 TB) β†’ 4.9 GB -$93.38/month β†’ $0.11/month -201,840 files β†’ 201,840 files (same!) +deltaglider cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/ # 65KB ``` -### Real Performance Metrics +Drop-in replacement for `aws s3 cp`. No pipeline changes. +Data integrity checked with SHA256, stored as metadata in S3. -From our actual production deployment: -| Metric | Value | Impact | -|--------|-------|--------| -| **Compression Ratio** | 99.9% | Near-perfect deduplication | -| **Delta Size** | ~65KB per 82.5MB file | 1/1,269th of original | -| **Upload Speed** | 3-4 files/second | Faster than raw S3 uploads | -| **Download Speed** | Transparent reconstruction | No user impact | -| **Storage Savings** | 4,055 GB | Enough for 850,000 more files | +### The Result -### Version-to-Version Comparison +| Metric | Before | After | Ξ” | +|-------------- |----------|----------|--------------| +| Storage | 4.06TB | 4.9GB | -99.9% | +| Cost | $93/mo | $0.11/mo | -$1,119/yr | +| Files | 201,840 | 201,840 | identical | +| Upload speed | 1x | 3–4x | faster | -Testing between similar versions showed incredible efficiency: +Each β€œdifferent” ZIP? Just a 65KB delta. +Reconstruction time: <100ms. +Zero user impact. + + +## Under the Hood + +Uses xdelta3 diffs. + β€’ Keeps one reference per group + β€’ Stores deltas for near-identical files + β€’ Skips small or text-based ones (.sha, .json, etc.) + +It’s smart enough to decide what’s worth diffing automatically. + + +## Payoff + β€’ 4TB β†’ 5GB overnight + β€’ Uploads 1,200Γ— faster + β€’ CI bandwidth cut 99% + β€’ 100% checksum verified integrity + β€’ Zero vendor lock-in (open source) + +## Takeaways + +If You Ship Versioned Artifacts + +This will probably save you four figures and hours of upload time per year. ``` -readonlyrest-1.66.1_es7.17.0.zip (82.5MB) β†’ reference.bin (82.5MB) -readonlyrest-1.66.1_es7.17.1.zip (82.5MB) β†’ 64KB delta (0.08% size) -readonlyrest-1.66.1_es7.17.2.zip (82.5MB) β†’ 65KB delta (0.08% size) -... -readonlyrest-1.66.1_es8.15.0.zip (82.5MB) β†’ 71KB delta (0.09% size) -``` - ---- - -## Technical Deep Dive - -### How DeltaGlider Achieves 99.9% Compression - -DeltaGlider uses binary diff algorithms (xdelta3) to identify and store only the bytes that change between files: - -```python -# Simplified concept -reference = "readonlyrest-1.66.1_es7.17.0.zip" # 82.5MB -new_file = "readonlyrest-1.66.1_es7.17.1.zip" # 82.5MB - -delta = binary_diff(reference, new_file) # 65KB -# Delta contains only: -# - Elasticsearch version string changes -# - Compatibility metadata updates -# - Build timestamp differences -``` - -### Intelligent File Type Detection - -Not every file benefits from delta compression. DeltaGlider automatically: - -- **Applies delta compression to**: `.zip`, `.tar`, `.gz`, `.dmg`, `.jar`, `.war` -- **Uploads directly**: `.txt`, `.sha1`, `.sha512`, `.json`, `.md` - -This intelligence meant our 127,455 checksum files were uploaded directly, avoiding unnecessary processing overhead. - -### Architecture That Scales - -``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Client │────▢│ DeltaGlider │────▢│ S3/MinIO β”‚ -β”‚ (CI/CD) β”‚ β”‚ β”‚ β”‚ β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” - β”‚ Local Cache β”‚ - β”‚ (References) β”‚ - β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` - ---- - -## Business Impact - -### Immediate ROI - -- **Day 1**: 99.9% storage reduction -- **Month 1**: $93 saved -- **Year 1**: $1,119 saved -- **5 Years**: $5,595 saved (not counting growth) - -### Hidden Benefits We Didn't Expect - -1. **Faster Deployments**: Uploading 65KB deltas is 1,200x faster than 82.5MB files -2. **Reduced Bandwidth**: CI/CD pipeline bandwidth usage dropped 99% -3. **Improved Reliability**: Fewer timeout errors on large file uploads -4. **Better Compliance**: Automatic SHA256 integrity verification on every operation - -### Environmental Impact - -> "Reducing storage by 4TB means fewer drives spinning in data centers. It's a small contribution to our sustainability goals, but every bit counts." -> -> β€” *CTO* - ---- - -## Implementation Journey - -### Week 1: Proof of Concept -- Tested with 10 files -- Achieved 99.6% compression -- Decision to proceed - -### Week 2: Production Rollout -- Uploaded all 201,840 files -- Zero errors or failures -- Immediate cost reduction - -### Week 3: Integration -```bash -# Simple integration into our CI/CD -- aws s3 cp $FILE s3://releases/ -+ deltaglider cp $FILE s3://releases/ -``` - -### Week 4: Full Migration -- All build pipelines updated -- Developer documentation completed -- Monitoring dashboards configured - ---- - -## Lessons Learned - -### What Worked Well - -1. **Drop-in replacement**: No architectural changes needed -2. **Automatic intelligence**: File type detection "just worked" -3. **Preservation of structure**: Directory hierarchy maintained perfectly - -### Challenges Overcome - -1. **Initial skepticism**: "99.9% compression sounds too good to be true" - - *Solution*: Live demonstration with real data - -2. **Download concerns**: "Will it be slow to reconstruct files?" - - *Solution*: Benchmarking showed <100ms reconstruction time - -3. **Reliability questions**: "What if the reference file is corrupted?" - - *Solution*: SHA256 verification on every operation - ---- - -## For Decision Makers - -### Why This Matters - -Storage costs scale linearly with data growth. Without DeltaGlider: -- Next 145 versions: Additional $1,120/year -- 5-year projection: $11,200 in storage alone -- Opportunity cost: Resources that could fund innovation - -### Risk Assessment - -| Risk | Mitigation | Status | -|------|------------|--------| -| Vendor lock-in | Open-source, standards-based | βœ… Mitigated | -| Data corruption | SHA256 verification built-in | βœ… Mitigated | -| Performance impact | Faster than original | βœ… No risk | -| Complexity | Drop-in replacement | βœ… No risk | - -### Strategic Advantages - -1. **Cost Predictability**: Storage costs become negligible -2. **Scalability**: Can handle 100x more versions in same space -3. **Competitive Edge**: More resources for product development -4. **Green IT**: Reduced carbon footprint from storage - ---- - -## For Engineers - -### Getting Started - -```bash -# Install DeltaGlider pip install deltaglider - -# Upload a file (automatic compression) -deltaglider cp my-release-v1.0.0.zip s3://releases/ - -# Download (automatic reconstruction) -deltaglider cp s3://releases/my-release-v1.0.0.zip . - -# It's that simple. +deltaglider cp my-release.zip s3://releases/ ``` -### Performance Characteristics - -```python -# Compression ratios by similarity -identical_files: 99.9% # Same file, different name -minor_changes: 99.7% # Version bumps, timestamps -moderate_changes: 95.0% # Feature additions -major_changes: 70.0% # Significant refactoring -completely_different: 0% # No compression (uploaded as-is) -``` - -### Integration Examples - -**GitHub Actions**: -```yaml -- name: Upload Release - run: deltaglider cp dist/*.zip s3://releases/${{ github.ref_name }}/ -``` - -**Jenkins Pipeline**: -```groovy -sh "deltaglider cp ${WORKSPACE}/target/*.jar s3://artifacts/" -``` - -**Python Script**: -```python -from deltaglider import DeltaService -service = DeltaService(bucket="releases") -service.put("my-app-v2.0.0.zip", "v2.0.0/") -``` - ---- - -## The Bottom Line - -DeltaGlider transformed our storage crisis into a solved problem: - -- βœ… **4TB β†’ 5GB** storage reduction -- βœ… **$1,119/year** saved -- βœ… **Zero** workflow disruption -- βœ… **100%** data integrity maintained - -For ReadOnlyREST, DeltaGlider wasn't just a cost-saving toolβ€”it was a glimpse into the future of intelligent storage. When 99.9% of your data is redundant, why pay to store it 500 times? - ---- - -## Next Steps - -### For Your Organization - -1. **Identify similar use cases**: Version releases, backups, build artifacts -2. **Run the calculator**: `[Your files] Γ— [Versions] Γ— [Similarity] = Savings` -3. **Start small**: Test with one project's releases -4. **Scale confidently**: Deploy across all similar data - -### Get Started Today - -```bash -# See your potential savings -git clone https://github.com/beshu-tech/deltaglider -cd deltaglider -python calculate_savings.py --path /your/releases - -# Try it yourself -docker run -p 9000:9000 minio/minio # Local S3 -pip install deltaglider -deltaglider cp your-file.zip s3://test/ -``` - ---- - -## About ReadOnlyREST - -ReadOnlyREST is the enterprise security plugin for Elasticsearch and OpenSearch, protecting clusters in production since 2015. Learn more at [readonlyrest.com](https://readonlyrest.com) - -## About DeltaGlider - -DeltaGlider is an open-source delta compression system for S3-compatible storage, turning redundant data into remarkable savings. Built with modern Python, containerized for portability, and designed for scale. - ---- - -*"In a world where storage is cheap but not free, and data grows exponentially but changes incrementally, DeltaGlider represents a fundamental shift in how we think about storing versioned artifacts."* - -**β€” ReadOnlyREST Engineering Team** \ No newline at end of file +That’s it. \ No newline at end of file