Files
deltaglider/CHANGELOG.md
T
Simone Scarduzio 2d345bc663 v6.2.0: deprecate Python deltaglider in favor of deltaglider_proxy (Rust) (#9)
The Rust `deltaglider_proxy` ships proxy + CLI + UI in one binary with a
byte-identical wire format. Maintaining both has been a duplication tax
(metadata-namespace fix v6.1.2 had to land twice). This release is the
final feature release; security/bug fixes stop here.

What this commit does:

- CLI: every invocation prints a deprecation notice to stderr pointing
  at github.com/beshu-tech/deltaglider_proxy with a one-line migration
  alias (`alias dg='deltaglider_proxy s3'`). Banner prints once per
  process; suppress via DG_SUPPRESS_DEPRECATION=1 for CI that hasn't
  migrated yet.
- README: prominent deprecation banner at the top with the migration
  command and the archive-timing notice (~1 week after v6.2.0 ships).
- pyproject.toml: description prefixed with "DEPRECATED" so PyPI search
  results show the warning. Classifier moved Beta -> Inactive.
- CHANGELOG: v6.2.0 entry under "Deprecated" documenting the migration
  path + archive plan, preserving the carried-forward Fixed/Changed/
  Added items from Unreleased.

Repo archive timing: Maintainer will archive ~1 week after v6.2.0 hits
PyPI to give users a window to see the stderr notice on their next
update. PyPI installs continue to work indefinitely.

No behaviour changes to the wire format, the CLI surface, or the
metadata schema. Existing buckets remain readable forever.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:57:19 +02:00

19 KiB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[6.2.0] - 2026-05-22 — Final release; project deprecated

Deprecated

  • The deltaglider Python package is deprecated as of this release. The canonical implementation is now deltaglider_proxy, a single Rust binary that ships the S3-compatible proxy, the s3 CLI (every Python subcommand has a 1:1 Rust equivalent), and the web UI. Wire format is byte-identical: data written by this tool is readable by deltaglider_proxy and vice versa.
  • Every CLI invocation now prints a deprecation banner to stderr. Set DG_SUPPRESS_DEPRECATION=1 to silence it for CI/automation that hasn't migrated yet.
  • PyPI classifier bumped to Development Status :: 7 - Inactive.
  • Repo will be archived approximately one week after this release. PyPI installs continue to work indefinitely (PyPI never deletes published versions), but no further updates or security fixes will land. File new issues against deltaglider_proxy.

Migration:

brew install beshu-tech/tap/deltaglider_proxy
# or grab a binary from
# https://github.com/beshu-tech/deltaglider_proxy/releases

alias dg='deltaglider_proxy s3'
dg cp foo s3://bucket/foo
dg ls s3://bucket
dg migrate s3://src s3://dest

Fixed (carried from Unreleased)

  • Direct-upload metadata now uses the canonical dg-* dashed namespace. Pre-fix, files routed through _upload_direct (non-delta-eligible extensions: .sha1, .sha512, etc.) wrote metadata with bare underscored keys (original_name, file_sha256, compression) while delta and reference uploads correctly used the namespaced form (dg-original-name, dg-file-sha256, dg-compression). Downstream consumers — most visibly the DeltaGlider Proxy — only recognised the dashed form, so every .sha1/.sha512 listing triggered a PATHOLOGICAL | Missing/corrupt DG metadata warning. Aligned the writer to the canonical scheme so new uploads stop producing log spam.

Changed

  • Read path now resolves both schemes uniformly. The historical bare keys (original_name, compression, etc.) stay in METADATA_KEY_ALIASES so already-stored objects keep being recognised on read — no migration required. Replaced ad-hoc metadata.get("compression") / metadata.get("original_name") / metadata.get("file_size") / metadata.get("ref_key") lookups in DeltaService.get, DeltaService.delete, _delete_delta, the recursive-delete listing path, client.list_objects_v2, and client_operations.stats.get_object_info with resolve_metadata(meta, field) calls so both schemes work transparently for the lifetime of the bucket. New compression and source_name entries added to the alias table.
  • DeltaService.get "regular S3 vs DeltaGlider-managed" dispatch now uses resolve_metadata for the file_sha256 presence check. Pre-fix, this check looked for the literal string "dg-file-sha256" in obj_head.metadata, which silently misclassified legacy bare-keyed direct uploads (file_sha256 without the dg- prefix) as "regular S3 objects" — they still served correctly because both branches call _get_direct, but the wrong log line fired and the wrong file_size value was recorded for telemetry. Caught during adversarial PR review.

Added

  • Regression tests for the dual-scheme contract (tests/unit/test_metadata_aliases.py, 11 tests): every alias resolves, new dashed keys win when both are present, empty strings count as missing, the alias-table shape is pinned (first alias dashed, bare underscored alias always present, compression + source_name present).
  • test_direct_upload_emits_dashed_namespace in test_core_service.py pins the writer to emit dg-*-only metadata so the original underscored regression cannot return.
  • test_get_legacy_direct_upload_not_misclassified_as_regular_s3 in test_core_service.py pins the get() dispatch to route bare-keyed legacy direct uploads through the DeltaGlider-managed branch (not the "regular S3 object" passthrough). Demonstrated to fail without the corresponding resolve_metadata swap, pass with it.

[6.1.1] - 2026-03-23

Fixed

  • S3-Compatible Endpoint Support: Disabled boto3 automatic request checksums (CRC32/CRC64) that were added in boto3 1.36+. S3-compatible stores like Hetzner Object Storage reject these headers with BadRequest, breaking direct (non-delta) file uploads. Sets request_checksum_calculation="when_required" to restore compatibility while still working with AWS S3.
  • CI: LocalStack pinned to 4.4localstack/localstack:latest now requires a paid license; pinned to last free version across all workflows and docker-compose files.

Changed

  • Dependency Pinning: All runtime dependencies now use major-version upper bounds (boto3>=1.35.0,<2.0.0, etc.) to prevent surprise breaking changes in Docker builds.

Added

  • S3 Compatibility Tests: New test_s3_compat.py unit tests verifying the boto3 client disables automatic checksums and put_object doesn't pass checksum kwargs — regression protection for non-AWS S3 endpoints.
  • Dependency Management Guide: Added quarterly dependency refresh checklist and known compatibility constraints to CLAUDE.md.

6.1.0 - 2025-02-07

Added

  • Bucket ACL Management: New put_bucket_acl() and get_bucket_acl() methods
    • boto3-compatible passthrough to native S3 ACL operations
    • Supports canned ACLs (private, public-read, public-read-write, authenticated-read)
    • Supports grant-based ACLs (GrantRead, GrantWrite, GrantFullControl, etc.)
    • Supports full AccessControlPolicy dict for fine-grained control
    • SDK method count increased from 21 to 23
  • New CLI Commands: deltaglider put-bucket-acl and deltaglider get-bucket-acl
    • Mirrors aws s3api put-bucket-acl / get-bucket-acl syntax
    • Accepts bucket name or s3://bucket URL format
    • JSON output for get-bucket-acl (compatible with AWS CLI)
    • Supports --endpoint-url, --region, --profile flags
  • Docker Publishing: Added GitHub Actions workflow for multi-arch Docker image builds (amd64/arm64)

Changed

  • Refactor: Extracted DeltaGliderConfig dataclass for centralized configuration management
  • Refactor: Introduced typed DeleteResult and RecursiveDeleteResult dataclasses replacing raw dicts
  • Refactor: Centralized S3 metadata key aliases into core/models.py constants
  • Refactor: Extracted helper methods in DeltaService for improved readability

Fixed

  • Removed unused imports flagged by ruff in test files

Documentation

  • Updated BOTO3_COMPATIBILITY.md (coverage 20% → 23%)
  • Updated AWS S3 CLI compatibility docs with ACL command examples
  • Refreshed README with dark mode logo and streamlined content
  • Cleaned up SDK documentation and examples

6.0.0 - 2025-10-17

Added

  • EC2 Region Detection & Cost Optimization
    • Automatic detection of EC2 instance region using IMDSv2
    • Warns when EC2 region ≠ S3 client region (potential cross-region charges)
    • Different warnings for auto-detected vs. explicit --region flag mismatches
    • Green checkmark when regions are aligned (optimal configuration)
    • Can be disabled with DG_DISABLE_EC2_DETECTION=true environment variable
    • Helps users optimize for cost and performance before migration starts
  • New CLI Command: deltaglider migrate for S3-to-S3 bucket migration with compression
    • Supports resume capability (skips already migrated files)
    • Real-time progress tracking with file count and statistics
    • Interactive confirmation prompt (use --yes to skip)
    • Prefix preservation by default (use --no-preserve-prefix to disable)
    • Dry run mode with --dry-run flag
    • Include/exclude pattern filtering
    • Shows compression statistics after migration
    • EC2-aware region logging: Detects EC2 instance and warns about cross-region charges
    • FIXED: Now correctly preserves original filenames during migration
  • S3-to-S3 Recursive Copy: deltaglider cp -r s3://source/ s3://dest/ now supported
    • Automatically uses migration functionality with prefix preservation
    • Applies delta compression during transfer
    • Preserves original filenames correctly
  • Version Command: Added --version flag to show deltaglider version
    • Usage: deltaglider --version
  • DeltaService API Enhancement: Added override_name parameter to put() method
    • Allows specifying destination filename independently of source filesystem path
    • Enables proper S3-to-S3 transfers without filesystem renaming tricks
  • Rehydration & Purge: Automatic rehydration of delta-compressed files for presigned URL access
    • New deltaglider purge CLI command to clean expired temporary files
  • Metadata Namespace: Centralized dg- prefixed metadata keys for all DeltaGlider metadata
  • S3-Based Stats Caching: Bucket statistics cached in S3 with automatic invalidation

Fixed

  • Critical: S3-to-S3 migration now preserves original filenames
    • Previously created files with temp names like tmp1b9cpdsn.zip
    • Now correctly uses original filenames from source S3 keys
    • Fixed by adding override_name parameter to DeltaService.put()
  • CLI Region Support: --region flag now properly passes region to boto3 client
    • Previously only set environment variable, relied on boto3 auto-detection
    • Now explicitly passes region_name to boto3.client() via boto3_kwargs
    • Ensures consistent behavior with DeltaGliderClient SDK

Changed

  • Recursive S3-to-S3 copy operations now preserve source prefix structure by default
  • Migration operations show formatted output with source and destination paths

Documentation

  • Added comprehensive migration guide in README.md
  • Updated CLI reference with migrate command examples
  • Added prefix preservation behavior documentation

[5.1.1] - 2025-01-10

Fixed

  • Stats Command: Fixed incorrect compression ratio calculations
    • Now correctly counts ALL files including reference.bin in compressed size
    • Fixed handling of orphaned reference.bin files (reference files with no delta files)
    • Added prominent warnings for orphaned reference files with cleanup commands
    • Fixed stats for buckets with no compression (now shows 0% instead of negative)
    • SHA1 checksum files are now properly included in calculations

Improved

  • Stats Performance: Optimized metadata fetching with parallel requests
    • 5-10x faster for buckets with many delta files
    • Uses ThreadPoolExecutor for concurrent HEAD requests
    • Single-pass calculation algorithm for better efficiency

5.1.0 - 2025-10-10

Added

  • New CLI Command: deltaglider stats <bucket> for bucket statistics and compression metrics
    • Supports --detailed flag for comprehensive analysis
    • Supports --json flag for machine-readable output
    • Accepts multiple formats: s3://bucket/, s3://bucket, bucket
  • Session-Level Statistics Caching: Bucket stats now cached per client instance
    • Automatic cache invalidation on mutations (put, delete, bucket operations)
    • Intelligent cache reuse (detailed stats serve quick stat requests)
    • Enhanced list_buckets() includes cached stats when available
  • Programmatic Cache Management: Added cache management APIs for long-running applications
    • clear_cache(): Clear all cached references
    • evict_cache(): Remove specific cached reference
    • Session-scoped cache lifecycle management

Changed

  • Bucket statistics are now cached within client session for performance
  • list_buckets() response includes DeltaGliderStats metadata when cached

Documentation

  • Added comprehensive DG_MAX_RATIO tuning guide in docs/
  • Updated CLI command reference in CLAUDE.md and README.md
  • Added detailed cache management documentation

5.0.3 - 2025-10-10

Security

  • BREAKING: Removed all legacy shared cache code for security
  • BREAKING: Encryption is now ALWAYS ON (cannot be disabled)
  • Ephemeral process-isolated cache is now the ONLY mode (no opt-out)
  • Content-Addressed Storage (CAS): Implemented SHA256-based cache storage
    • Zero collision risk (SHA256 namespace guarantees uniqueness)
    • Automatic deduplication (same content = same filename)
    • Tampering protection (changing content changes SHA, breaks lookup)
    • Two-level directory structure for filesystem optimization
  • Encrypted Cache: All cache data encrypted at rest using Fernet (AES-128-CBC + HMAC)
    • Ephemeral encryption keys per process (forward secrecy)
    • Optional persistent keys via DG_CACHE_ENCRYPTION_KEY for shared filesystems
    • Automatic cleanup of corrupted cache files on decryption failures
  • Fixed TOCTOU vulnerabilities with atomic SHA validation at use-time
  • Added get_validated_ref() method to prevent cache poisoning
  • Eliminated multi-user data exposure through mandatory cache isolation

Removed

  • BREAKING: Removed DG_UNSAFE_SHARED_CACHE environment variable
  • BREAKING: Removed DG_CACHE_DIR environment variable
  • BREAKING: Removed DG_CACHE_ENCRYPTION environment variable (encryption always on)
  • BREAKING: Removed cache_dir parameter from create_client()

Changed

  • Cache is now auto-created in /tmp/deltaglider-* and cleaned on exit
  • All cache operations use file locking (Unix) and SHA validation
  • Added CacheMissError and CacheCorruptionError exceptions

Added

  • New ContentAddressedCache adapter in adapters/cache_cas.py
  • New EncryptedCache wrapper in adapters/cache_encrypted.py
  • New MemoryCache adapter in adapters/cache_memory.py with LRU eviction
  • Self-describing cache structure with SHA256-based filenames
  • Configurable cache backends via DG_CACHE_BACKEND (filesystem or memory)
  • Memory cache size limit via DG_CACHE_MEMORY_SIZE_MB (default: 100MB)

Internal

  • Updated all tests to use Content-Addressed Storage and encryption
  • All 119 tests passing with zero errors (99 original + 20 new cache tests)
  • Type checking: 0 errors (mypy)
  • Linting: All checks passed (ruff)
  • Completed Phase 1, 2, and 7 of SECURITY_FIX_ROADMAP.md
  • Added comprehensive test suites for encryption (13 tests) and memory cache (10 tests)

5.0.1 - 2025-01-10

Changed

  • Code Organization: Refactored client.py from 1560 to 1154 lines (26% reduction)
  • Extracted client operations into modular client_operations/ package:
    • bucket.py - S3 bucket management operations
    • presigned.py - Presigned URL generation
    • batch.py - Batch upload/download operations
    • stats.py - Analytics and statistics operations
  • Improved code maintainability with logical separation of concerns
  • Better developer experience with cleaner module structure

Internal

  • Full type safety maintained with mypy (0 errors)
  • All 99 tests passing
  • Code quality checks passing (ruff)
  • No breaking changes - all public APIs remain unchanged

5.0.0 - 2025-01-10

Added

  • boto3-compatible TypedDict types for S3 responses (no boto3 import needed)
  • Complete boto3 compatibility vision document
  • Type-safe response builders using TypedDict patterns

Changed

  • BREAKING: list_objects() now returns boto3-compatible dict instead of custom dataclass
    • Use response['Contents'] instead of response.contents
    • Use response.get('IsTruncated') instead of response.is_truncated
    • Use response.get('NextContinuationToken') instead of response.next_continuation_token
    • DeltaGlider metadata now in Metadata field of each object
  • Internal response building now uses TypedDict for compile-time type safety
  • All S3 responses are dicts at runtime (TypedDict is a dict!)

Fixed

  • Updated all documentation examples to use dict-based responses
  • Fixed pagination examples in README and API docs
  • Corrected SDK documentation with accurate method signatures

4.2.4 - 2025-01-10

Fixed

  • Show only filename in ls output instead of full path for cleaner display
  • Correct ls command path handling and prefix display logic

4.2.3 - 2025-01-07

Added

  • Comprehensive test coverage for delete_objects_recursive() method with 19 thorough tests
  • Tests cover delta suffix handling, error/warning aggregation, statistics tracking, and edge cases
  • Better code organization with separate client_models.py and client_delete_helpers.py modules

Fixed

  • Fixed all mypy type errors using proper cast() for type safety
  • Improved type hints for dictionary operations in client code

Changed

  • Refactored client code into logical modules for better maintainability
  • Enhanced code quality with comprehensive linting and type checking
  • All 99 integration/unit tests passing with zero type errors

Internal

  • Better separation of concerns in client module
  • Improved developer experience with clearer code structure

4.2.2 - 2024-10-06

Fixed

  • Add .delta suffix fallback for delete_object() method
  • Handle regular S3 objects without DeltaGlider metadata
  • Update mypy type ignore comment for compatibility

4.2.1 - 2024-10-06

Fixed

  • Make GitHub release creation non-blocking in workflows

4.2.0 - 2024-10-03

Added

  • AWS credential parameters to create_client() function
  • Support for custom endpoint URLs
  • Enhanced boto3 compatibility

4.1.0 - 2024-09-29

Added

  • boto3-compatible client API
  • Bucket management methods
  • Comprehensive SDK documentation

4.0.0 - 2024-09-21

Added

  • Initial public release
  • CLI with AWS S3 compatibility
  • Delta compression for versioned artifacts
  • 99%+ compression for similar files