RFC: deltaglider – Delta-Aware S3 File Storage Wrapper ===================================================== Author: [Senior Architect] Status: Draft Date: 2025-09-21 Version: 0.1 Preface ------- The cost of storing large binary artifacts (e.g., ZIP plugins, deliverables) on Amazon S3 is significant when multiple versions differ by only a few kilobytes. Current practice redundantly uploads full versions, wasting space and increasing transfer times. deltaglider is a CLI tool that transparently reduces storage overhead by representing a directory of similar large files as: - A single reference file (reference.bin) in each DeltaSpace S3 prefix. - A set of delta files (.delta) encoding differences against the reference. This approach compresses storage usage to near-optimal while retaining simple semantics. Goals ----- 1. Save S3 space by storing only one full copy of similar files per DeltaSpace and small binary deltas for subsequent versions. 2. Transparent developer workflow – deltaglider put/get mirrors aws s3 cp. 3. Minimal state management – no manifests, no external databases. 4. Integrity assurance – strong hashing (SHA256) stored in metadata, verified on upload/restore. 5. Extensible – simple metadata keys, base for future optimizations. Non-Goals --------- - Deduplication across multiple directories/prefixes. - Streaming delta generation across multiple references (always one reference per DeltaSpace). - Automatic background compaction or garbage collection. Terminology ----------- - DeltaSpace: An S3 prefix containing related files for delta compression. - Reference file: The first uploaded file in a DeltaSpace, stored as reference.bin. - Delta file: Result of running xdelta3 against the reference, named .delta. Architecture ------------ Reference Selection - First uploaded file in a DeltaSpace becomes the reference. - Stored as reference.bin. - Original filename preserved in metadata of both reference.bin and zero-diff delta. Delta Creation - All subsequent uploads are turned into delta files: xdelta3 -e -9 -s reference.bin .delta - Uploaded under the name .delta. - Metadata includes: - original_name, file_sha256, file_size, created_at, ref_key, ref_sha256, delta_size Metadata Requirements - All S3 objects uploaded by deltaglider must contain: - tool: deltaglider/0.1.0 - original_name - file_sha256 - file_size - created_at - ref_key - ref_sha256 - delta_size Local Cache - Path: /tmp/.deltaglider/reference_cache///reference.bin - Ensures deltas can be computed without repeatedly downloading the reference. CLI Specification ----------------- deltaglider put - If no reference.bin: upload as reference.bin, upload zero-diff .delta. - If reference.bin exists: create delta, upload .delta with metadata. - Output JSON summary. deltaglider get > file.zip - Download reference (from cache or S3). - Download delta. - Run xdelta3 to reconstruct. deltaglider verify - Hydrate file locally. - Recompute SHA256. - Compare against metadata. Error Handling -------------- - Abort if xdelta3 fails. - Warn if metadata missing. - Warn if delta size > threshold (default 0.5x full size). Security Considerations ----------------------- - Integrity verified by SHA256. - Metadata treated as opaque. - Requires IAM: s3:GetObject, s3:PutObject, s3:ListBucket, s3:DeleteObject. Future Work ----------- - Lazy caching of hydrated files. - Support other compression algorithms. - Add parallel restore for very large files. End of RFC ==========