mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-01-11 22:30:48 +01:00
freshen up
This commit is contained in:
50
README.md
50
README.md
@@ -6,9 +6,8 @@
|
||||
[](https://www.python.org/downloads/)
|
||||
[](https://github.com/jmacd/xdelta)
|
||||
|
||||
<div align="center">
|
||||
<img src="https://github.com/beshu-tech/deltaglider/raw/main/docs/deltaglider.png" alt="DeltaGlider Logo"/>
|
||||
</div>
|
||||
> 🌟 Star if you like this! 🙏
|
||||
> Leave a message in [Issues](https://github.com/beshu-tech/deltaglider/issues) - we are listening!
|
||||
|
||||
**Store 4TB of similar files in 5GB. No, that's not a typo.**
|
||||
|
||||
@@ -37,6 +36,10 @@ We don't expect significant benefit for multimedia content like videos, but we n
|
||||
The quickest way to start is using the GUI
|
||||
* https://github.com/beshu-tech/deltaglider_commander/
|
||||
|
||||
<div align="center">
|
||||
<img src="https://github.com/beshu-tech/deltaglider/raw/main/docs/deltaglider.png" alt="DeltaGlider Logo"/>
|
||||
</div>
|
||||
|
||||
### CLI Installation
|
||||
|
||||
```bash
|
||||
@@ -487,18 +490,18 @@ This is why DeltaGlider achieves 99%+ compression on versioned archives - xdelta
|
||||
|
||||
### System Architecture
|
||||
|
||||
DeltaGlider uses a clean hexagonal architecture:
|
||||
DeltaGlider intelligently stores files within **DeltaSpaces** - S3 prefixes where related files share a common reference file for delta compression:
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
|
||||
│ Your App │────▶│ DeltaGlider │────▶│ S3/MinIO │
|
||||
│ (CLI/SDK) │ │ Core │ │ Storage │
|
||||
└─────────────┘ └──────────────┘ └─────────────┘
|
||||
│
|
||||
┌──────▼───────┐
|
||||
│ Local Cache │
|
||||
│ (References) │
|
||||
└──────────────┘
|
||||
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
|
||||
│ Your App │────▶│ DeltaGlider │────▶│ DeltaSpace │
|
||||
│ (CLI/SDK) │ │ Core │ │ (S3 prefix) │
|
||||
└─────────────┘ └──────────────┘ ├─────────────────┤
|
||||
│ │ reference.bin │
|
||||
┌──────▼───────┐ │ file1.delta │
|
||||
│ Local Cache │ │ file2.delta │
|
||||
│ (References) │ │ file3.delta │
|
||||
└──────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
**Key Components:**
|
||||
@@ -507,6 +510,9 @@ DeltaGlider uses a clean hexagonal architecture:
|
||||
- **Integrity verification**: SHA256 on every operation
|
||||
- **Local caching**: Fast repeated operations
|
||||
- **Zero dependencies**: No database, no manifest files
|
||||
- **Modular storage**: The storage layer is pluggable - you could easily replace S3 with a filesystem driver (using extended attributes for metadata) or any other backend
|
||||
|
||||
The codebase follows a ports-and-adapters pattern where core business logic is decoupled from infrastructure, with storage operations abstracted through well-defined interfaces in the `ports/` directory and concrete implementations in `adapters/`.
|
||||
|
||||
### When to Use DeltaGlider
|
||||
|
||||
@@ -651,14 +657,8 @@ MIT - Use it freely in your projects.
|
||||
|
||||
## Success Stories
|
||||
|
||||
> "We reduced our artifact storage from 4TB to 5GB. This isn't hyperbole—it's math."
|
||||
> — [ReadOnlyREST Case Study](docs/case-study-readonlyrest.md)
|
||||
|
||||
> "Our CI/CD pipeline now uploads 100x faster. Deploys that took minutes now take seconds."
|
||||
> — Platform Engineer at [redacted]
|
||||
|
||||
> "We were about to buy expensive deduplication storage. DeltaGlider saved us $50K/year."
|
||||
> — CTO at [stealth startup]
|
||||
> "We reduced our artifact storage from 4TB to 5GB. CI is also much faster, due to smaller uploads."
|
||||
> — [ReadonlyREST Case Study](docs/case-study-readonlyrest.md)
|
||||
|
||||
---
|
||||
|
||||
@@ -670,4 +670,10 @@ deltaglider analyze s3://your-bucket/
|
||||
# Output: "Potential savings: 95.2% (4.8TB → 237GB)"
|
||||
```
|
||||
|
||||
Built with ❤️ by engineers who were tired of paying to store the same bytes over and over.
|
||||
## Who built this?
|
||||
|
||||
Built with ❤️ by [ReadonlyREST](https://readonlyrest.com) engineers who were tired of paying to store the same bytes over and over.
|
||||
|
||||
We also built [Anaphora](https://anaphora.it) for aggregated reports and alerting
|
||||
|
||||
And [Deltaglider Commander](https://github.com/beshu-tech/deltaglider_commander)
|
||||
|
||||
@@ -1,347 +1,76 @@
|
||||
# Case Study: How ReadOnlyREST Reduced Storage Costs by 99.9% with DeltaGlider
|
||||
## How ReadonlyREST Cut 4TB of S3 Storage Down to 5GB (and Saved 99.9%)
|
||||
|
||||
## Executive Summary
|
||||
### TL;DR
|
||||
|
||||
**The Challenge**: ReadOnlyREST, a security plugin for Elasticsearch, was facing exponential storage costs managing 145 release versions across multiple product lines, consuming nearly 4TB of S3 storage.
|
||||
We were paying to store 4TB of mostly identical plugin builds.
|
||||
DeltaGlider deduplicated everything down to 4.9GB — 99.9% smaller, $1.1k/year cheaper, and no workflow changes.
|
||||
|
||||
**The Solution**: DeltaGlider, an intelligent delta compression system that reduced storage from 4,060GB to just 4.9GB.
|
||||
#### The Problem
|
||||
|
||||
**The Impact**:
|
||||
- 💰 **$1,119 annual savings** on storage costs
|
||||
- 📉 **99.9% reduction** in storage usage
|
||||
- ⚡ **Zero changes** to existing workflows
|
||||
- ✅ **Full data integrity** maintained
|
||||
ReadonlyREST supports ~150 Elasticsearch/Kibana versions × multiple product lines × all our own releases.
|
||||
After years of publishing builds, our S3 archive hit `4TB` (201,840 files, $93/month).
|
||||
Glacier helped, but restoring files took 48 hours — useless for CI/CD.
|
||||
|
||||
---
|
||||
Every plugin ZIP was ~82MB, but `99.7% identical` to the next one. We were paying to store duplicates.
|
||||
|
||||
## The Storage Crisis
|
||||
#### The Fix: DeltaGlider
|
||||
|
||||
### The Numbers That Kept Us Up at Night
|
||||
DeltaGlider stores binary deltas between similar files instead of full copies.
|
||||
|
||||
ReadOnlyREST maintains a comprehensive release archive:
|
||||
- **145 version folders** (v1.50.0 through v1.66.1)
|
||||
- **201,840 total files** to manage
|
||||
- **3.96 TB** of S3 storage consumed
|
||||
- **$1,120/year** in storage costs alone
|
||||
|
||||
Each version folder contained:
|
||||
- 513 plugin ZIP files (one for each Elasticsearch version)
|
||||
- 879 checksum files (SHA1 and SHA512)
|
||||
- 3 product lines (Enterprise, Pro, Free)
|
||||
|
||||
### The Hidden Problem
|
||||
|
||||
What made this particularly painful wasn't just the size—it was the **redundancy**. Each 82.5MB plugin ZIP was 99.7% identical to others in the same version, differing only in minor Elasticsearch compatibility adjustments. We were essentially storing the same data hundreds of times.
|
||||
|
||||
> "We were paying to store 4TB of data that was fundamentally just variations of the same ~250MB of unique content. It felt like photocopying War and Peace 500 times because each copy had a different page number."
|
||||
>
|
||||
> — *DevOps Lead*
|
||||
|
||||
---
|
||||
|
||||
## Enter DeltaGlider
|
||||
|
||||
### The Lightbulb Moment
|
||||
|
||||
The breakthrough came when we realized we didn't need to store complete files—just the *differences* between them. DeltaGlider applies this principle automatically:
|
||||
|
||||
1. **First file becomes the reference** (stored in full)
|
||||
2. **Similar files store only deltas** (typically 0.3% of original size)
|
||||
3. **Different files uploaded directly** (no delta overhead)
|
||||
|
||||
### Implementation: Surprisingly Simple
|
||||
|
||||
```bash
|
||||
# Before DeltaGlider (standard S3 upload)
|
||||
aws s3 cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/
|
||||
# Size on S3: 82.5MB
|
||||
|
||||
# With DeltaGlider
|
||||
deltaglider cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/
|
||||
# Size on S3: 65KB (99.92% smaller!)
|
||||
# Before
|
||||
```
|
||||
aws s3 cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/ # 82MB
|
||||
```
|
||||
|
||||
The beauty? **Zero changes to our build pipeline**. DeltaGlider works as a drop-in replacement for S3 uploads.
|
||||
|
||||
---
|
||||
|
||||
## The Results: Beyond Our Expectations
|
||||
|
||||
### Storage Transformation
|
||||
|
||||
# After
|
||||
```
|
||||
BEFORE DELTAGLIDER AFTER DELTAGLIDER
|
||||
━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━
|
||||
4,060 GB (3.96 TB) → 4.9 GB
|
||||
$93.38/month → $0.11/month
|
||||
201,840 files → 201,840 files (same!)
|
||||
deltaglider cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/ # 65KB
|
||||
```
|
||||
|
||||
### Real Performance Metrics
|
||||
Drop-in replacement for `aws s3 cp`. No pipeline changes.
|
||||
Data integrity checked with SHA256, stored as metadata in S3.
|
||||
|
||||
From our actual production deployment:
|
||||
|
||||
| Metric | Value | Impact |
|
||||
|--------|-------|--------|
|
||||
| **Compression Ratio** | 99.9% | Near-perfect deduplication |
|
||||
| **Delta Size** | ~65KB per 82.5MB file | 1/1,269th of original |
|
||||
| **Upload Speed** | 3-4 files/second | Faster than raw S3 uploads |
|
||||
| **Download Speed** | Transparent reconstruction | No user impact |
|
||||
| **Storage Savings** | 4,055 GB | Enough for 850,000 more files |
|
||||
### The Result
|
||||
|
||||
### Version-to-Version Comparison
|
||||
| Metric | Before | After | Δ |
|
||||
|-------------- |----------|----------|--------------|
|
||||
| Storage | 4.06TB | 4.9GB | -99.9% |
|
||||
| Cost | $93/mo | $0.11/mo | -$1,119/yr |
|
||||
| Files | 201,840 | 201,840 | identical |
|
||||
| Upload speed | 1x | 3–4x | faster |
|
||||
|
||||
Testing between similar versions showed incredible efficiency:
|
||||
Each “different” ZIP? Just a 65KB delta.
|
||||
Reconstruction time: <100ms.
|
||||
Zero user impact.
|
||||
|
||||
|
||||
## Under the Hood
|
||||
|
||||
Uses xdelta3 diffs.
|
||||
• Keeps one reference per group
|
||||
• Stores deltas for near-identical files
|
||||
• Skips small or text-based ones (.sha, .json, etc.)
|
||||
|
||||
It’s smart enough to decide what’s worth diffing automatically.
|
||||
|
||||
|
||||
## Payoff
|
||||
• 4TB → 5GB overnight
|
||||
• Uploads 1,200× faster
|
||||
• CI bandwidth cut 99%
|
||||
• 100% checksum verified integrity
|
||||
• Zero vendor lock-in (open source)
|
||||
|
||||
## Takeaways
|
||||
|
||||
If You Ship Versioned Artifacts
|
||||
|
||||
This will probably save you four figures and hours of upload time per year.
|
||||
|
||||
```
|
||||
readonlyrest-1.66.1_es7.17.0.zip (82.5MB) → reference.bin (82.5MB)
|
||||
readonlyrest-1.66.1_es7.17.1.zip (82.5MB) → 64KB delta (0.08% size)
|
||||
readonlyrest-1.66.1_es7.17.2.zip (82.5MB) → 65KB delta (0.08% size)
|
||||
...
|
||||
readonlyrest-1.66.1_es8.15.0.zip (82.5MB) → 71KB delta (0.09% size)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Technical Deep Dive
|
||||
|
||||
### How DeltaGlider Achieves 99.9% Compression
|
||||
|
||||
DeltaGlider uses binary diff algorithms (xdelta3) to identify and store only the bytes that change between files:
|
||||
|
||||
```python
|
||||
# Simplified concept
|
||||
reference = "readonlyrest-1.66.1_es7.17.0.zip" # 82.5MB
|
||||
new_file = "readonlyrest-1.66.1_es7.17.1.zip" # 82.5MB
|
||||
|
||||
delta = binary_diff(reference, new_file) # 65KB
|
||||
# Delta contains only:
|
||||
# - Elasticsearch version string changes
|
||||
# - Compatibility metadata updates
|
||||
# - Build timestamp differences
|
||||
```
|
||||
|
||||
### Intelligent File Type Detection
|
||||
|
||||
Not every file benefits from delta compression. DeltaGlider automatically:
|
||||
|
||||
- **Applies delta compression to**: `.zip`, `.tar`, `.gz`, `.dmg`, `.jar`, `.war`
|
||||
- **Uploads directly**: `.txt`, `.sha1`, `.sha512`, `.json`, `.md`
|
||||
|
||||
This intelligence meant our 127,455 checksum files were uploaded directly, avoiding unnecessary processing overhead.
|
||||
|
||||
### Architecture That Scales
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
|
||||
│ Client │────▶│ DeltaGlider │────▶│ S3/MinIO │
|
||||
│ (CI/CD) │ │ │ │ │
|
||||
└─────────────┘ └──────────────┘ └─────────────┘
|
||||
│
|
||||
┌──────▼───────┐
|
||||
│ Local Cache │
|
||||
│ (References) │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Business Impact
|
||||
|
||||
### Immediate ROI
|
||||
|
||||
- **Day 1**: 99.9% storage reduction
|
||||
- **Month 1**: $93 saved
|
||||
- **Year 1**: $1,119 saved
|
||||
- **5 Years**: $5,595 saved (not counting growth)
|
||||
|
||||
### Hidden Benefits We Didn't Expect
|
||||
|
||||
1. **Faster Deployments**: Uploading 65KB deltas is 1,200x faster than 82.5MB files
|
||||
2. **Reduced Bandwidth**: CI/CD pipeline bandwidth usage dropped 99%
|
||||
3. **Improved Reliability**: Fewer timeout errors on large file uploads
|
||||
4. **Better Compliance**: Automatic SHA256 integrity verification on every operation
|
||||
|
||||
### Environmental Impact
|
||||
|
||||
> "Reducing storage by 4TB means fewer drives spinning in data centers. It's a small contribution to our sustainability goals, but every bit counts."
|
||||
>
|
||||
> — *CTO*
|
||||
|
||||
---
|
||||
|
||||
## Implementation Journey
|
||||
|
||||
### Week 1: Proof of Concept
|
||||
- Tested with 10 files
|
||||
- Achieved 99.6% compression
|
||||
- Decision to proceed
|
||||
|
||||
### Week 2: Production Rollout
|
||||
- Uploaded all 201,840 files
|
||||
- Zero errors or failures
|
||||
- Immediate cost reduction
|
||||
|
||||
### Week 3: Integration
|
||||
```bash
|
||||
# Simple integration into our CI/CD
|
||||
- aws s3 cp $FILE s3://releases/
|
||||
+ deltaglider cp $FILE s3://releases/
|
||||
```
|
||||
|
||||
### Week 4: Full Migration
|
||||
- All build pipelines updated
|
||||
- Developer documentation completed
|
||||
- Monitoring dashboards configured
|
||||
|
||||
---
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
### What Worked Well
|
||||
|
||||
1. **Drop-in replacement**: No architectural changes needed
|
||||
2. **Automatic intelligence**: File type detection "just worked"
|
||||
3. **Preservation of structure**: Directory hierarchy maintained perfectly
|
||||
|
||||
### Challenges Overcome
|
||||
|
||||
1. **Initial skepticism**: "99.9% compression sounds too good to be true"
|
||||
- *Solution*: Live demonstration with real data
|
||||
|
||||
2. **Download concerns**: "Will it be slow to reconstruct files?"
|
||||
- *Solution*: Benchmarking showed <100ms reconstruction time
|
||||
|
||||
3. **Reliability questions**: "What if the reference file is corrupted?"
|
||||
- *Solution*: SHA256 verification on every operation
|
||||
|
||||
---
|
||||
|
||||
## For Decision Makers
|
||||
|
||||
### Why This Matters
|
||||
|
||||
Storage costs scale linearly with data growth. Without DeltaGlider:
|
||||
- Next 145 versions: Additional $1,120/year
|
||||
- 5-year projection: $11,200 in storage alone
|
||||
- Opportunity cost: Resources that could fund innovation
|
||||
|
||||
### Risk Assessment
|
||||
|
||||
| Risk | Mitigation | Status |
|
||||
|------|------------|--------|
|
||||
| Vendor lock-in | Open-source, standards-based | ✅ Mitigated |
|
||||
| Data corruption | SHA256 verification built-in | ✅ Mitigated |
|
||||
| Performance impact | Faster than original | ✅ No risk |
|
||||
| Complexity | Drop-in replacement | ✅ No risk |
|
||||
|
||||
### Strategic Advantages
|
||||
|
||||
1. **Cost Predictability**: Storage costs become negligible
|
||||
2. **Scalability**: Can handle 100x more versions in same space
|
||||
3. **Competitive Edge**: More resources for product development
|
||||
4. **Green IT**: Reduced carbon footprint from storage
|
||||
|
||||
---
|
||||
|
||||
## For Engineers
|
||||
|
||||
### Getting Started
|
||||
|
||||
```bash
|
||||
# Install DeltaGlider
|
||||
pip install deltaglider
|
||||
|
||||
# Upload a file (automatic compression)
|
||||
deltaglider cp my-release-v1.0.0.zip s3://releases/
|
||||
|
||||
# Download (automatic reconstruction)
|
||||
deltaglider cp s3://releases/my-release-v1.0.0.zip .
|
||||
|
||||
# It's that simple.
|
||||
deltaglider cp my-release.zip s3://releases/
|
||||
```
|
||||
|
||||
### Performance Characteristics
|
||||
|
||||
```python
|
||||
# Compression ratios by similarity
|
||||
identical_files: 99.9% # Same file, different name
|
||||
minor_changes: 99.7% # Version bumps, timestamps
|
||||
moderate_changes: 95.0% # Feature additions
|
||||
major_changes: 70.0% # Significant refactoring
|
||||
completely_different: 0% # No compression (uploaded as-is)
|
||||
```
|
||||
|
||||
### Integration Examples
|
||||
|
||||
**GitHub Actions**:
|
||||
```yaml
|
||||
- name: Upload Release
|
||||
run: deltaglider cp dist/*.zip s3://releases/${{ github.ref_name }}/
|
||||
```
|
||||
|
||||
**Jenkins Pipeline**:
|
||||
```groovy
|
||||
sh "deltaglider cp ${WORKSPACE}/target/*.jar s3://artifacts/"
|
||||
```
|
||||
|
||||
**Python Script**:
|
||||
```python
|
||||
from deltaglider import DeltaService
|
||||
service = DeltaService(bucket="releases")
|
||||
service.put("my-app-v2.0.0.zip", "v2.0.0/")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The Bottom Line
|
||||
|
||||
DeltaGlider transformed our storage crisis into a solved problem:
|
||||
|
||||
- ✅ **4TB → 5GB** storage reduction
|
||||
- ✅ **$1,119/year** saved
|
||||
- ✅ **Zero** workflow disruption
|
||||
- ✅ **100%** data integrity maintained
|
||||
|
||||
For ReadOnlyREST, DeltaGlider wasn't just a cost-saving tool—it was a glimpse into the future of intelligent storage. When 99.9% of your data is redundant, why pay to store it 500 times?
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### For Your Organization
|
||||
|
||||
1. **Identify similar use cases**: Version releases, backups, build artifacts
|
||||
2. **Run the calculator**: `[Your files] × [Versions] × [Similarity] = Savings`
|
||||
3. **Start small**: Test with one project's releases
|
||||
4. **Scale confidently**: Deploy across all similar data
|
||||
|
||||
### Get Started Today
|
||||
|
||||
```bash
|
||||
# See your potential savings
|
||||
git clone https://github.com/beshu-tech/deltaglider
|
||||
cd deltaglider
|
||||
python calculate_savings.py --path /your/releases
|
||||
|
||||
# Try it yourself
|
||||
docker run -p 9000:9000 minio/minio # Local S3
|
||||
pip install deltaglider
|
||||
deltaglider cp your-file.zip s3://test/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## About ReadOnlyREST
|
||||
|
||||
ReadOnlyREST is the enterprise security plugin for Elasticsearch and OpenSearch, protecting clusters in production since 2015. Learn more at [readonlyrest.com](https://readonlyrest.com)
|
||||
|
||||
## About DeltaGlider
|
||||
|
||||
DeltaGlider is an open-source delta compression system for S3-compatible storage, turning redundant data into remarkable savings. Built with modern Python, containerized for portability, and designed for scale.
|
||||
|
||||
---
|
||||
|
||||
*"In a world where storage is cheap but not free, and data grows exponentially but changes incrementally, DeltaGlider represents a fundamental shift in how we think about storing versioned artifacts."*
|
||||
|
||||
**— ReadOnlyREST Engineering Team**
|
||||
That’s it.
|
||||
Reference in New Issue
Block a user