mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-05-24 07:47:05 +02:00
2d345bc663
The Rust `deltaglider_proxy` ships proxy + CLI + UI in one binary with a byte-identical wire format. Maintaining both has been a duplication tax (metadata-namespace fix v6.1.2 had to land twice). This release is the final feature release; security/bug fixes stop here. What this commit does: - CLI: every invocation prints a deprecation notice to stderr pointing at github.com/beshu-tech/deltaglider_proxy with a one-line migration alias (`alias dg='deltaglider_proxy s3'`). Banner prints once per process; suppress via DG_SUPPRESS_DEPRECATION=1 for CI that hasn't migrated yet. - README: prominent deprecation banner at the top with the migration command and the archive-timing notice (~1 week after v6.2.0 ships). - pyproject.toml: description prefixed with "DEPRECATED" so PyPI search results show the warning. Classifier moved Beta -> Inactive. - CHANGELOG: v6.2.0 entry under "Deprecated" documenting the migration path + archive plan, preserving the carried-forward Fixed/Changed/ Added items from Unreleased. Repo archive timing: Maintainer will archive ~1 week after v6.2.0 hits PyPI to give users a window to see the stderr notice on their next update. PyPI installs continue to work indefinitely. No behaviour changes to the wire format, the CLI surface, or the metadata schema. Existing buckets remain readable forever. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
19 KiB
19 KiB
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[6.2.0] - 2026-05-22 — Final release; project deprecated
Deprecated
- The
deltagliderPython package is deprecated as of this release. The canonical implementation is nowdeltaglider_proxy, a single Rust binary that ships the S3-compatible proxy, thes3CLI (every Python subcommand has a 1:1 Rust equivalent), and the web UI. Wire format is byte-identical: data written by this tool is readable bydeltaglider_proxyand vice versa. - Every CLI invocation now prints a deprecation banner to stderr.
Set
DG_SUPPRESS_DEPRECATION=1to silence it for CI/automation that hasn't migrated yet. - PyPI classifier bumped to
Development Status :: 7 - Inactive. - Repo will be archived approximately one week after this release.
PyPI installs continue to work indefinitely (PyPI never deletes
published versions), but no further updates or security fixes will
land. File new issues against
deltaglider_proxy.
Migration:
brew install beshu-tech/tap/deltaglider_proxy
# or grab a binary from
# https://github.com/beshu-tech/deltaglider_proxy/releases
alias dg='deltaglider_proxy s3'
dg cp foo s3://bucket/foo
dg ls s3://bucket
dg migrate s3://src s3://dest
Fixed (carried from Unreleased)
- Direct-upload metadata now uses the canonical
dg-*dashed namespace. Pre-fix, files routed through_upload_direct(non-delta-eligible extensions:.sha1,.sha512, etc.) wrote metadata with bare underscored keys (original_name,file_sha256,compression) while delta and reference uploads correctly used the namespaced form (dg-original-name,dg-file-sha256,dg-compression). Downstream consumers — most visibly the DeltaGlider Proxy — only recognised the dashed form, so every.sha1/.sha512listing triggered aPATHOLOGICAL | Missing/corrupt DG metadatawarning. Aligned the writer to the canonical scheme so new uploads stop producing log spam.
Changed
- Read path now resolves both schemes uniformly. The historical bare keys (
original_name,compression, etc.) stay inMETADATA_KEY_ALIASESso already-stored objects keep being recognised on read — no migration required. Replaced ad-hocmetadata.get("compression")/metadata.get("original_name")/metadata.get("file_size")/metadata.get("ref_key")lookups inDeltaService.get,DeltaService.delete,_delete_delta, the recursive-delete listing path,client.list_objects_v2, andclient_operations.stats.get_object_infowithresolve_metadata(meta, field)calls so both schemes work transparently for the lifetime of the bucket. Newcompressionandsource_nameentries added to the alias table. DeltaService.get"regular S3 vs DeltaGlider-managed" dispatch now usesresolve_metadatafor thefile_sha256presence check. Pre-fix, this check looked for the literal string"dg-file-sha256"inobj_head.metadata, which silently misclassified legacy bare-keyed direct uploads (file_sha256without thedg-prefix) as "regular S3 objects" — they still served correctly because both branches call_get_direct, but the wrong log line fired and the wrongfile_sizevalue was recorded for telemetry. Caught during adversarial PR review.
Added
- Regression tests for the dual-scheme contract (
tests/unit/test_metadata_aliases.py, 11 tests): every alias resolves, new dashed keys win when both are present, empty strings count as missing, the alias-table shape is pinned (first alias dashed, bare underscored alias always present,compression+source_namepresent). test_direct_upload_emits_dashed_namespaceintest_core_service.pypins the writer to emitdg-*-only metadata so the original underscored regression cannot return.test_get_legacy_direct_upload_not_misclassified_as_regular_s3intest_core_service.pypins theget()dispatch to route bare-keyed legacy direct uploads through the DeltaGlider-managed branch (not the "regular S3 object" passthrough). Demonstrated to fail without the correspondingresolve_metadataswap, pass with it.
[6.1.1] - 2026-03-23
Fixed
- S3-Compatible Endpoint Support: Disabled boto3 automatic request checksums (CRC32/CRC64) that were added in boto3 1.36+. S3-compatible stores like Hetzner Object Storage reject these headers with
BadRequest, breaking direct (non-delta) file uploads. Setsrequest_checksum_calculation="when_required"to restore compatibility while still working with AWS S3. - CI: LocalStack pinned to 4.4 —
localstack/localstack:latestnow requires a paid license; pinned to last free version across all workflows and docker-compose files.
Changed
- Dependency Pinning: All runtime dependencies now use major-version upper bounds (
boto3>=1.35.0,<2.0.0, etc.) to prevent surprise breaking changes in Docker builds.
Added
- S3 Compatibility Tests: New
test_s3_compat.pyunit tests verifying the boto3 client disables automatic checksums andput_objectdoesn't pass checksum kwargs — regression protection for non-AWS S3 endpoints. - Dependency Management Guide: Added quarterly dependency refresh checklist and known compatibility constraints to CLAUDE.md.
6.1.0 - 2025-02-07
Added
- Bucket ACL Management: New
put_bucket_acl()andget_bucket_acl()methods- boto3-compatible passthrough to native S3 ACL operations
- Supports canned ACLs (
private,public-read,public-read-write,authenticated-read) - Supports grant-based ACLs (
GrantRead,GrantWrite,GrantFullControl, etc.) - Supports full
AccessControlPolicydict for fine-grained control - SDK method count increased from 21 to 23
- New CLI Commands:
deltaglider put-bucket-aclanddeltaglider get-bucket-acl- Mirrors
aws s3api put-bucket-acl/get-bucket-aclsyntax - Accepts bucket name or
s3://bucketURL format - JSON output for
get-bucket-acl(compatible with AWS CLI) - Supports
--endpoint-url,--region,--profileflags
- Mirrors
- Docker Publishing: Added GitHub Actions workflow for multi-arch Docker image builds (amd64/arm64)
Changed
- Refactor: Extracted
DeltaGliderConfigdataclass for centralized configuration management - Refactor: Introduced typed
DeleteResultandRecursiveDeleteResultdataclasses replacing raw dicts - Refactor: Centralized S3 metadata key aliases into
core/models.pyconstants - Refactor: Extracted helper methods in
DeltaServicefor improved readability
Fixed
- Removed unused imports flagged by ruff in test files
Documentation
- Updated BOTO3_COMPATIBILITY.md (coverage 20% → 23%)
- Updated AWS S3 CLI compatibility docs with ACL command examples
- Refreshed README with dark mode logo and streamlined content
- Cleaned up SDK documentation and examples
6.0.0 - 2025-10-17
Added
- EC2 Region Detection & Cost Optimization
- Automatic detection of EC2 instance region using IMDSv2
- Warns when EC2 region ≠ S3 client region (potential cross-region charges)
- Different warnings for auto-detected vs. explicit
--regionflag mismatches - Green checkmark when regions are aligned (optimal configuration)
- Can be disabled with
DG_DISABLE_EC2_DETECTION=trueenvironment variable - Helps users optimize for cost and performance before migration starts
- New CLI Command:
deltaglider migratefor S3-to-S3 bucket migration with compression- Supports resume capability (skips already migrated files)
- Real-time progress tracking with file count and statistics
- Interactive confirmation prompt (use
--yesto skip) - Prefix preservation by default (use
--no-preserve-prefixto disable) - Dry run mode with
--dry-runflag - Include/exclude pattern filtering
- Shows compression statistics after migration
- EC2-aware region logging: Detects EC2 instance and warns about cross-region charges
- FIXED: Now correctly preserves original filenames during migration
- S3-to-S3 Recursive Copy:
deltaglider cp -r s3://source/ s3://dest/now supported- Automatically uses migration functionality with prefix preservation
- Applies delta compression during transfer
- Preserves original filenames correctly
- Version Command: Added
--versionflag to show deltaglider version- Usage:
deltaglider --version
- Usage:
- DeltaService API Enhancement: Added
override_nameparameter toput()method- Allows specifying destination filename independently of source filesystem path
- Enables proper S3-to-S3 transfers without filesystem renaming tricks
- Rehydration & Purge: Automatic rehydration of delta-compressed files for presigned URL access
- New
deltaglider purgeCLI command to clean expired temporary files
- New
- Metadata Namespace: Centralized
dg-prefixed metadata keys for all DeltaGlider metadata - S3-Based Stats Caching: Bucket statistics cached in S3 with automatic invalidation
Fixed
- Critical: S3-to-S3 migration now preserves original filenames
- Previously created files with temp names like
tmp1b9cpdsn.zip - Now correctly uses original filenames from source S3 keys
- Fixed by adding
override_nameparameter toDeltaService.put()
- Previously created files with temp names like
- CLI Region Support:
--regionflag now properly passes region to boto3 client- Previously only set environment variable, relied on boto3 auto-detection
- Now explicitly passes
region_nametoboto3.client()viaboto3_kwargs - Ensures consistent behavior with
DeltaGliderClientSDK
Changed
- Recursive S3-to-S3 copy operations now preserve source prefix structure by default
- Migration operations show formatted output with source and destination paths
Documentation
- Added comprehensive migration guide in README.md
- Updated CLI reference with migrate command examples
- Added prefix preservation behavior documentation
[5.1.1] - 2025-01-10
Fixed
- Stats Command: Fixed incorrect compression ratio calculations
- Now correctly counts ALL files including reference.bin in compressed size
- Fixed handling of orphaned reference.bin files (reference files with no delta files)
- Added prominent warnings for orphaned reference files with cleanup commands
- Fixed stats for buckets with no compression (now shows 0% instead of negative)
- SHA1 checksum files are now properly included in calculations
Improved
- Stats Performance: Optimized metadata fetching with parallel requests
- 5-10x faster for buckets with many delta files
- Uses ThreadPoolExecutor for concurrent HEAD requests
- Single-pass calculation algorithm for better efficiency
5.1.0 - 2025-10-10
Added
- New CLI Command:
deltaglider stats <bucket>for bucket statistics and compression metrics- Supports
--detailedflag for comprehensive analysis - Supports
--jsonflag for machine-readable output - Accepts multiple formats:
s3://bucket/,s3://bucket,bucket
- Supports
- Session-Level Statistics Caching: Bucket stats now cached per client instance
- Automatic cache invalidation on mutations (put, delete, bucket operations)
- Intelligent cache reuse (detailed stats serve quick stat requests)
- Enhanced
list_buckets()includes cached stats when available
- Programmatic Cache Management: Added cache management APIs for long-running applications
clear_cache(): Clear all cached referencesevict_cache(): Remove specific cached reference- Session-scoped cache lifecycle management
Changed
- Bucket statistics are now cached within client session for performance
list_buckets()response includesDeltaGliderStatsmetadata when cached
Documentation
- Added comprehensive DG_MAX_RATIO tuning guide in docs/
- Updated CLI command reference in CLAUDE.md and README.md
- Added detailed cache management documentation
5.0.3 - 2025-10-10
Security
- BREAKING: Removed all legacy shared cache code for security
- BREAKING: Encryption is now ALWAYS ON (cannot be disabled)
- Ephemeral process-isolated cache is now the ONLY mode (no opt-out)
- Content-Addressed Storage (CAS): Implemented SHA256-based cache storage
- Zero collision risk (SHA256 namespace guarantees uniqueness)
- Automatic deduplication (same content = same filename)
- Tampering protection (changing content changes SHA, breaks lookup)
- Two-level directory structure for filesystem optimization
- Encrypted Cache: All cache data encrypted at rest using Fernet (AES-128-CBC + HMAC)
- Ephemeral encryption keys per process (forward secrecy)
- Optional persistent keys via
DG_CACHE_ENCRYPTION_KEYfor shared filesystems - Automatic cleanup of corrupted cache files on decryption failures
- Fixed TOCTOU vulnerabilities with atomic SHA validation at use-time
- Added
get_validated_ref()method to prevent cache poisoning - Eliminated multi-user data exposure through mandatory cache isolation
Removed
- BREAKING: Removed
DG_UNSAFE_SHARED_CACHEenvironment variable - BREAKING: Removed
DG_CACHE_DIRenvironment variable - BREAKING: Removed
DG_CACHE_ENCRYPTIONenvironment variable (encryption always on) - BREAKING: Removed
cache_dirparameter fromcreate_client()
Changed
- Cache is now auto-created in
/tmp/deltaglider-*and cleaned on exit - All cache operations use file locking (Unix) and SHA validation
- Added
CacheMissErrorandCacheCorruptionErrorexceptions
Added
- New
ContentAddressedCacheadapter inadapters/cache_cas.py - New
EncryptedCachewrapper inadapters/cache_encrypted.py - New
MemoryCacheadapter inadapters/cache_memory.pywith LRU eviction - Self-describing cache structure with SHA256-based filenames
- Configurable cache backends via
DG_CACHE_BACKEND(filesystem or memory) - Memory cache size limit via
DG_CACHE_MEMORY_SIZE_MB(default: 100MB)
Internal
- Updated all tests to use Content-Addressed Storage and encryption
- All 119 tests passing with zero errors (99 original + 20 new cache tests)
- Type checking: 0 errors (mypy)
- Linting: All checks passed (ruff)
- Completed Phase 1, 2, and 7 of SECURITY_FIX_ROADMAP.md
- Added comprehensive test suites for encryption (13 tests) and memory cache (10 tests)
5.0.1 - 2025-01-10
Changed
- Code Organization: Refactored client.py from 1560 to 1154 lines (26% reduction)
- Extracted client operations into modular
client_operations/package:bucket.py- S3 bucket management operationspresigned.py- Presigned URL generationbatch.py- Batch upload/download operationsstats.py- Analytics and statistics operations
- Improved code maintainability with logical separation of concerns
- Better developer experience with cleaner module structure
Internal
- Full type safety maintained with mypy (0 errors)
- All 99 tests passing
- Code quality checks passing (ruff)
- No breaking changes - all public APIs remain unchanged
5.0.0 - 2025-01-10
Added
- boto3-compatible TypedDict types for S3 responses (no boto3 import needed)
- Complete boto3 compatibility vision document
- Type-safe response builders using TypedDict patterns
Changed
- BREAKING:
list_objects()now returns boto3-compatible dict instead of custom dataclass- Use
response['Contents']instead ofresponse.contents - Use
response.get('IsTruncated')instead ofresponse.is_truncated - Use
response.get('NextContinuationToken')instead ofresponse.next_continuation_token - DeltaGlider metadata now in
Metadatafield of each object
- Use
- Internal response building now uses TypedDict for compile-time type safety
- All S3 responses are dicts at runtime (TypedDict is a dict!)
Fixed
- Updated all documentation examples to use dict-based responses
- Fixed pagination examples in README and API docs
- Corrected SDK documentation with accurate method signatures
4.2.4 - 2025-01-10
Fixed
- Show only filename in
lsoutput instead of full path for cleaner display - Correct
lscommand path handling and prefix display logic
4.2.3 - 2025-01-07
Added
- Comprehensive test coverage for
delete_objects_recursive()method with 19 thorough tests - Tests cover delta suffix handling, error/warning aggregation, statistics tracking, and edge cases
- Better code organization with separate
client_models.pyandclient_delete_helpers.pymodules
Fixed
- Fixed all mypy type errors using proper
cast()for type safety - Improved type hints for dictionary operations in client code
Changed
- Refactored client code into logical modules for better maintainability
- Enhanced code quality with comprehensive linting and type checking
- All 99 integration/unit tests passing with zero type errors
Internal
- Better separation of concerns in client module
- Improved developer experience with clearer code structure
4.2.2 - 2024-10-06
Fixed
- Add .delta suffix fallback for
delete_object()method - Handle regular S3 objects without DeltaGlider metadata
- Update mypy type ignore comment for compatibility
4.2.1 - 2024-10-06
Fixed
- Make GitHub release creation non-blocking in workflows
4.2.0 - 2024-10-03
Added
- AWS credential parameters to
create_client()function - Support for custom endpoint URLs
- Enhanced boto3 compatibility
4.1.0 - 2024-09-29
Added
- boto3-compatible client API
- Bucket management methods
- Comprehensive SDK documentation
4.0.0 - 2024-09-21
Added
- Initial public release
- CLI with AWS S3 compatibility
- Delta compression for versioned artifacts
- 99%+ compression for similar files