mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-05-19 13:26:54 +02:00
d81240be80
* fix(metadata): align direct-upload keys to canonical dg-* namespace
`_upload_direct` (the path taken by non-delta-eligible files like
.sha1 / .sha512) wrote user-metadata with bare underscored keys
(`original_name`, `file_sha256`, `compression`) while delta and
reference uploads correctly used the canonical dashed namespace
(`dg-original-name`, `dg-file-sha256`, `dg-compression`).
Downstream consumers — most visibly the DeltaGlider Proxy — only
recognised the dashed form, so every .sha1 / .sha512 listing on
a bucket holding deltaglider-uploaded files produced:
WARN PATHOLOGICAL | Missing/corrupt DG metadata for
bucket/key.sha1 -- falling back to passthrough.
Error: Storage error: Missing dg-original-name
This patch aligns the writer to the canonical scheme and keeps the
read path backward-compatible with already-stored bare-keyed objects
via `resolve_metadata`. No re-upload required.
Changes
-------
* `_upload_direct` emits metadata using `f"{METADATA_PREFIX}{key}"`
(the same pattern delta/reference uploads already use).
* `METADATA_KEY_ALIASES` now lists `compression` and `source_name`
so `resolve_metadata` works for both fields uniformly.
* Replaced bare `metadata.get("compression")` /
`metadata.get("original_name")` / `metadata.get("file_size")` /
`metadata.get("ref_key")` lookups in `DeltaService.get`,
`DeltaService.delete`, `_delete_delta`, the recursive-delete
listing path, `client.list_objects_v2`, and
`client_operations.stats.get_object_info` with `resolve_metadata`
calls so legacy bare-keyed objects keep working forever.
Tests
-----
* `tests/unit/test_metadata_aliases.py` (new, 11 tests) — pins the
alias table contract: new dashed keys, legacy bare underscored
keys, legacy hyphenated keys, priority rule, empty-string
handling.
* `test_direct_upload_emits_dashed_namespace` in
`tests/unit/test_core_service.py` — pins the writer to emit only
dg-* keys.
* Existing tests using the legacy bare `compression: "none"` form
in `test_s3_compat.py` and `test_recursive_delete_reference_*.py`
still pass — proving the dual-scheme read contract holds.
Full unit suite: 87/87 pass, mypy clean, ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(metadata): also resolve legacy file_sha256 in get() dispatch
Adversarial review of the original patch caught a second
asymmetry: DeltaService.get's "is this a regular S3 object or
DeltaGlider-managed?" dispatch was a literal-string check
`"dg-file-sha256" not in obj_head.metadata`. After the writer
fix, NEW direct uploads have `dg-file-sha256` so they route
correctly. But ~4400 pre-fix `.sha1` / `.sha512` files in
production have the bare `file_sha256` key, and they were
silently being routed through the "regular S3 object" branch
instead of the "direct upload" branch.
Both branches call `_get_direct` so file content was still
served correctly — but the wrong log message fired
("Downloading regular S3 object (no DeltaGlider metadata)") and
the recorded file-size for telemetry came from obj_head.size
instead of the metadata's `file_size` (same value for direct
uploads, but still semantically wrong).
Swap the literal-string check for `resolve_metadata(meta,
"file_sha256") is None` so both schemes route to the
DeltaGlider-managed branch.
Added regression test `test_get_legacy_direct_upload_not_
misclassified_as_regular_s3` that builds a HEAD response with
the legacy bare-keyed metadata shape (exactly what's stored on
Hetzner today for the .sha files), captures the log messages,
and fails if the "regular S3 object" canary fires.
Demonstrated locally: revert the dispatch back to literal-string
check → new test fails with the canary log line. Restore →
88/88 pass.
CHANGELOG updated to document both fixes (writer + dispatch).
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
318 lines
17 KiB
Markdown
318 lines
17 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to this project will be documented in this file.
|
|
|
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
|
|
## [Unreleased]
|
|
|
|
### Fixed
|
|
- **Direct-upload metadata now uses the canonical `dg-*` dashed namespace.** Pre-fix, files routed through `_upload_direct` (non-delta-eligible extensions: `.sha1`, `.sha512`, etc.) wrote metadata with bare underscored keys (`original_name`, `file_sha256`, `compression`) while delta and reference uploads correctly used the namespaced form (`dg-original-name`, `dg-file-sha256`, `dg-compression`). Downstream consumers — most visibly the [DeltaGlider Proxy](https://github.com/beshu-tech/deltaglider_proxy) — only recognised the dashed form, so every `.sha1`/`.sha512` listing triggered a `PATHOLOGICAL | Missing/corrupt DG metadata` warning. Aligned the writer to the canonical scheme so new uploads stop producing log spam.
|
|
|
|
### Changed
|
|
- **Read path now resolves both schemes uniformly.** The historical bare keys (`original_name`, `compression`, etc.) stay in `METADATA_KEY_ALIASES` so already-stored objects keep being recognised on read — no migration required. Replaced ad-hoc `metadata.get("compression")` / `metadata.get("original_name")` / `metadata.get("file_size")` / `metadata.get("ref_key")` lookups in `DeltaService.get`, `DeltaService.delete`, `_delete_delta`, the recursive-delete listing path, `client.list_objects_v2`, and `client_operations.stats.get_object_info` with `resolve_metadata(meta, field)` calls so both schemes work transparently for the lifetime of the bucket. New `compression` and `source_name` entries added to the alias table.
|
|
- **`DeltaService.get` "regular S3 vs DeltaGlider-managed" dispatch** now uses `resolve_metadata` for the `file_sha256` presence check. Pre-fix, this check looked for the literal string `"dg-file-sha256"` in `obj_head.metadata`, which silently misclassified legacy bare-keyed direct uploads (`file_sha256` without the `dg-` prefix) as "regular S3 objects" — they still served correctly because both branches call `_get_direct`, but the wrong log line fired and the wrong `file_size` value was recorded for telemetry. Caught during adversarial PR review.
|
|
|
|
### Added
|
|
- **Regression tests for the dual-scheme contract** (`tests/unit/test_metadata_aliases.py`, 11 tests): every alias resolves, new dashed keys win when both are present, empty strings count as missing, the alias-table shape is pinned (first alias dashed, bare underscored alias always present, `compression` + `source_name` present).
|
|
- **`test_direct_upload_emits_dashed_namespace`** in `test_core_service.py` pins the writer to emit `dg-*`-only metadata so the original underscored regression cannot return.
|
|
- **`test_get_legacy_direct_upload_not_misclassified_as_regular_s3`** in `test_core_service.py` pins the `get()` dispatch to route bare-keyed legacy direct uploads through the DeltaGlider-managed branch (not the "regular S3 object" passthrough). Demonstrated to fail without the corresponding `resolve_metadata` swap, pass with it.
|
|
|
|
## [6.1.1] - 2026-03-23
|
|
|
|
### Fixed
|
|
- **S3-Compatible Endpoint Support**: Disabled boto3 automatic request checksums (CRC32/CRC64) that were added in boto3 1.36+. S3-compatible stores like Hetzner Object Storage reject these headers with `BadRequest`, breaking direct (non-delta) file uploads. Sets `request_checksum_calculation="when_required"` to restore compatibility while still working with AWS S3.
|
|
- **CI: LocalStack pinned to 4.4** — `localstack/localstack:latest` now requires a paid license; pinned to last free version across all workflows and docker-compose files.
|
|
|
|
### Changed
|
|
- **Dependency Pinning**: All runtime dependencies now use major-version upper bounds (`boto3>=1.35.0,<2.0.0`, etc.) to prevent surprise breaking changes in Docker builds.
|
|
|
|
### Added
|
|
- **S3 Compatibility Tests**: New `test_s3_compat.py` unit tests verifying the boto3 client disables automatic checksums and `put_object` doesn't pass checksum kwargs — regression protection for non-AWS S3 endpoints.
|
|
- **Dependency Management Guide**: Added quarterly dependency refresh checklist and known compatibility constraints to CLAUDE.md.
|
|
|
|
## [6.1.0] - 2025-02-07
|
|
|
|
### Added
|
|
- **Bucket ACL Management**: New `put_bucket_acl()` and `get_bucket_acl()` methods
|
|
- boto3-compatible passthrough to native S3 ACL operations
|
|
- Supports canned ACLs (`private`, `public-read`, `public-read-write`, `authenticated-read`)
|
|
- Supports grant-based ACLs (`GrantRead`, `GrantWrite`, `GrantFullControl`, etc.)
|
|
- Supports full `AccessControlPolicy` dict for fine-grained control
|
|
- SDK method count increased from 21 to 23
|
|
- **New CLI Commands**: `deltaglider put-bucket-acl` and `deltaglider get-bucket-acl`
|
|
- Mirrors `aws s3api put-bucket-acl` / `get-bucket-acl` syntax
|
|
- Accepts bucket name or `s3://bucket` URL format
|
|
- JSON output for `get-bucket-acl` (compatible with AWS CLI)
|
|
- Supports `--endpoint-url`, `--region`, `--profile` flags
|
|
- **Docker Publishing**: Added GitHub Actions workflow for multi-arch Docker image builds (amd64/arm64)
|
|
|
|
### Changed
|
|
- **Refactor**: Extracted `DeltaGliderConfig` dataclass for centralized configuration management
|
|
- **Refactor**: Introduced typed `DeleteResult` and `RecursiveDeleteResult` dataclasses replacing raw dicts
|
|
- **Refactor**: Centralized S3 metadata key aliases into `core/models.py` constants
|
|
- **Refactor**: Extracted helper methods in `DeltaService` for improved readability
|
|
|
|
### Fixed
|
|
- Removed unused imports flagged by ruff in test files
|
|
|
|
### Documentation
|
|
- Updated BOTO3_COMPATIBILITY.md (coverage 20% → 23%)
|
|
- Updated AWS S3 CLI compatibility docs with ACL command examples
|
|
- Refreshed README with dark mode logo and streamlined content
|
|
- Cleaned up SDK documentation and examples
|
|
|
|
## [6.0.0] - 2025-10-17
|
|
|
|
### Added
|
|
- **EC2 Region Detection & Cost Optimization**
|
|
- Automatic detection of EC2 instance region using IMDSv2
|
|
- Warns when EC2 region ≠ S3 client region (potential cross-region charges)
|
|
- Different warnings for auto-detected vs. explicit `--region` flag mismatches
|
|
- Green checkmark when regions are aligned (optimal configuration)
|
|
- Can be disabled with `DG_DISABLE_EC2_DETECTION=true` environment variable
|
|
- Helps users optimize for cost and performance before migration starts
|
|
- **New CLI Command**: `deltaglider migrate` for S3-to-S3 bucket migration with compression
|
|
- Supports resume capability (skips already migrated files)
|
|
- Real-time progress tracking with file count and statistics
|
|
- Interactive confirmation prompt (use `--yes` to skip)
|
|
- Prefix preservation by default (use `--no-preserve-prefix` to disable)
|
|
- Dry run mode with `--dry-run` flag
|
|
- Include/exclude pattern filtering
|
|
- Shows compression statistics after migration
|
|
- **EC2-aware region logging**: Detects EC2 instance and warns about cross-region charges
|
|
- **FIXED**: Now correctly preserves original filenames during migration
|
|
- **S3-to-S3 Recursive Copy**: `deltaglider cp -r s3://source/ s3://dest/` now supported
|
|
- Automatically uses migration functionality with prefix preservation
|
|
- Applies delta compression during transfer
|
|
- Preserves original filenames correctly
|
|
- **Version Command**: Added `--version` flag to show deltaglider version
|
|
- Usage: `deltaglider --version`
|
|
- **DeltaService API Enhancement**: Added `override_name` parameter to `put()` method
|
|
- Allows specifying destination filename independently of source filesystem path
|
|
- Enables proper S3-to-S3 transfers without filesystem renaming tricks
|
|
- **Rehydration & Purge**: Automatic rehydration of delta-compressed files for presigned URL access
|
|
- New `deltaglider purge` CLI command to clean expired temporary files
|
|
- **Metadata Namespace**: Centralized `dg-` prefixed metadata keys for all DeltaGlider metadata
|
|
- **S3-Based Stats Caching**: Bucket statistics cached in S3 with automatic invalidation
|
|
|
|
### Fixed
|
|
- **Critical**: S3-to-S3 migration now preserves original filenames
|
|
- Previously created files with temp names like `tmp1b9cpdsn.zip`
|
|
- Now correctly uses original filenames from source S3 keys
|
|
- Fixed by adding `override_name` parameter to `DeltaService.put()`
|
|
- **CLI Region Support**: `--region` flag now properly passes region to boto3 client
|
|
- Previously only set environment variable, relied on boto3 auto-detection
|
|
- Now explicitly passes `region_name` to `boto3.client()` via `boto3_kwargs`
|
|
- Ensures consistent behavior with `DeltaGliderClient` SDK
|
|
|
|
### Changed
|
|
- Recursive S3-to-S3 copy operations now preserve source prefix structure by default
|
|
- Migration operations show formatted output with source and destination paths
|
|
|
|
### Documentation
|
|
- Added comprehensive migration guide in README.md
|
|
- Updated CLI reference with migrate command examples
|
|
- Added prefix preservation behavior documentation
|
|
|
|
## [5.1.1] - 2025-01-10
|
|
|
|
### Fixed
|
|
- **Stats Command**: Fixed incorrect compression ratio calculations
|
|
- Now correctly counts ALL files including reference.bin in compressed size
|
|
- Fixed handling of orphaned reference.bin files (reference files with no delta files)
|
|
- Added prominent warnings for orphaned reference files with cleanup commands
|
|
- Fixed stats for buckets with no compression (now shows 0% instead of negative)
|
|
- SHA1 checksum files are now properly included in calculations
|
|
|
|
### Improved
|
|
- **Stats Performance**: Optimized metadata fetching with parallel requests
|
|
- 5-10x faster for buckets with many delta files
|
|
- Uses ThreadPoolExecutor for concurrent HEAD requests
|
|
- Single-pass calculation algorithm for better efficiency
|
|
|
|
## [5.1.0] - 2025-10-10
|
|
|
|
### Added
|
|
- **New CLI Command**: `deltaglider stats <bucket>` for bucket statistics and compression metrics
|
|
- Supports `--detailed` flag for comprehensive analysis
|
|
- Supports `--json` flag for machine-readable output
|
|
- Accepts multiple formats: `s3://bucket/`, `s3://bucket`, `bucket`
|
|
- **Session-Level Statistics Caching**: Bucket stats now cached per client instance
|
|
- Automatic cache invalidation on mutations (put, delete, bucket operations)
|
|
- Intelligent cache reuse (detailed stats serve quick stat requests)
|
|
- Enhanced `list_buckets()` includes cached stats when available
|
|
- **Programmatic Cache Management**: Added cache management APIs for long-running applications
|
|
- `clear_cache()`: Clear all cached references
|
|
- `evict_cache()`: Remove specific cached reference
|
|
- Session-scoped cache lifecycle management
|
|
|
|
### Changed
|
|
- Bucket statistics are now cached within client session for performance
|
|
- `list_buckets()` response includes `DeltaGliderStats` metadata when cached
|
|
|
|
### Documentation
|
|
- Added comprehensive DG_MAX_RATIO tuning guide in docs/
|
|
- Updated CLI command reference in CLAUDE.md and README.md
|
|
- Added detailed cache management documentation
|
|
|
|
## [5.0.3] - 2025-10-10
|
|
|
|
### Security
|
|
- **BREAKING**: Removed all legacy shared cache code for security
|
|
- **BREAKING**: Encryption is now ALWAYS ON (cannot be disabled)
|
|
- Ephemeral process-isolated cache is now the ONLY mode (no opt-out)
|
|
- **Content-Addressed Storage (CAS)**: Implemented SHA256-based cache storage
|
|
- Zero collision risk (SHA256 namespace guarantees uniqueness)
|
|
- Automatic deduplication (same content = same filename)
|
|
- Tampering protection (changing content changes SHA, breaks lookup)
|
|
- Two-level directory structure for filesystem optimization
|
|
- **Encrypted Cache**: All cache data encrypted at rest using Fernet (AES-128-CBC + HMAC)
|
|
- Ephemeral encryption keys per process (forward secrecy)
|
|
- Optional persistent keys via `DG_CACHE_ENCRYPTION_KEY` for shared filesystems
|
|
- Automatic cleanup of corrupted cache files on decryption failures
|
|
- Fixed TOCTOU vulnerabilities with atomic SHA validation at use-time
|
|
- Added `get_validated_ref()` method to prevent cache poisoning
|
|
- Eliminated multi-user data exposure through mandatory cache isolation
|
|
|
|
### Removed
|
|
- **BREAKING**: Removed `DG_UNSAFE_SHARED_CACHE` environment variable
|
|
- **BREAKING**: Removed `DG_CACHE_DIR` environment variable
|
|
- **BREAKING**: Removed `DG_CACHE_ENCRYPTION` environment variable (encryption always on)
|
|
- **BREAKING**: Removed `cache_dir` parameter from `create_client()`
|
|
|
|
### Changed
|
|
- Cache is now auto-created in `/tmp/deltaglider-*` and cleaned on exit
|
|
- All cache operations use file locking (Unix) and SHA validation
|
|
- Added `CacheMissError` and `CacheCorruptionError` exceptions
|
|
|
|
### Added
|
|
- New `ContentAddressedCache` adapter in `adapters/cache_cas.py`
|
|
- New `EncryptedCache` wrapper in `adapters/cache_encrypted.py`
|
|
- New `MemoryCache` adapter in `adapters/cache_memory.py` with LRU eviction
|
|
- Self-describing cache structure with SHA256-based filenames
|
|
- Configurable cache backends via `DG_CACHE_BACKEND` (filesystem or memory)
|
|
- Memory cache size limit via `DG_CACHE_MEMORY_SIZE_MB` (default: 100MB)
|
|
|
|
### Internal
|
|
- Updated all tests to use Content-Addressed Storage and encryption
|
|
- All 119 tests passing with zero errors (99 original + 20 new cache tests)
|
|
- Type checking: 0 errors (mypy)
|
|
- Linting: All checks passed (ruff)
|
|
- Completed Phase 1, 2, and 7 of SECURITY_FIX_ROADMAP.md
|
|
- Added comprehensive test suites for encryption (13 tests) and memory cache (10 tests)
|
|
|
|
## [5.0.1] - 2025-01-10
|
|
|
|
### Changed
|
|
- **Code Organization**: Refactored client.py from 1560 to 1154 lines (26% reduction)
|
|
- Extracted client operations into modular `client_operations/` package:
|
|
- `bucket.py` - S3 bucket management operations
|
|
- `presigned.py` - Presigned URL generation
|
|
- `batch.py` - Batch upload/download operations
|
|
- `stats.py` - Analytics and statistics operations
|
|
- Improved code maintainability with logical separation of concerns
|
|
- Better developer experience with cleaner module structure
|
|
|
|
### Internal
|
|
- Full type safety maintained with mypy (0 errors)
|
|
- All 99 tests passing
|
|
- Code quality checks passing (ruff)
|
|
- No breaking changes - all public APIs remain unchanged
|
|
|
|
## [5.0.0] - 2025-01-10
|
|
|
|
### Added
|
|
- boto3-compatible TypedDict types for S3 responses (no boto3 import needed)
|
|
- Complete boto3 compatibility vision document
|
|
- Type-safe response builders using TypedDict patterns
|
|
|
|
### Changed
|
|
- **BREAKING**: `list_objects()` now returns boto3-compatible dict instead of custom dataclass
|
|
- Use `response['Contents']` instead of `response.contents`
|
|
- Use `response.get('IsTruncated')` instead of `response.is_truncated`
|
|
- Use `response.get('NextContinuationToken')` instead of `response.next_continuation_token`
|
|
- DeltaGlider metadata now in `Metadata` field of each object
|
|
- Internal response building now uses TypedDict for compile-time type safety
|
|
- All S3 responses are dicts at runtime (TypedDict is a dict!)
|
|
|
|
### Fixed
|
|
- Updated all documentation examples to use dict-based responses
|
|
- Fixed pagination examples in README and API docs
|
|
- Corrected SDK documentation with accurate method signatures
|
|
|
|
## [4.2.4] - 2025-01-10
|
|
|
|
### Fixed
|
|
- Show only filename in `ls` output instead of full path for cleaner display
|
|
- Correct `ls` command path handling and prefix display logic
|
|
|
|
## [4.2.3] - 2025-01-07
|
|
|
|
### Added
|
|
- Comprehensive test coverage for `delete_objects_recursive()` method with 19 thorough tests
|
|
- Tests cover delta suffix handling, error/warning aggregation, statistics tracking, and edge cases
|
|
- Better code organization with separate `client_models.py` and `client_delete_helpers.py` modules
|
|
|
|
### Fixed
|
|
- Fixed all mypy type errors using proper `cast()` for type safety
|
|
- Improved type hints for dictionary operations in client code
|
|
|
|
### Changed
|
|
- Refactored client code into logical modules for better maintainability
|
|
- Enhanced code quality with comprehensive linting and type checking
|
|
- All 99 integration/unit tests passing with zero type errors
|
|
|
|
### Internal
|
|
- Better separation of concerns in client module
|
|
- Improved developer experience with clearer code structure
|
|
|
|
## [4.2.2] - 2024-10-06
|
|
|
|
### Fixed
|
|
- Add .delta suffix fallback for `delete_object()` method
|
|
- Handle regular S3 objects without DeltaGlider metadata
|
|
- Update mypy type ignore comment for compatibility
|
|
|
|
## [4.2.1] - 2024-10-06
|
|
|
|
### Fixed
|
|
- Make GitHub release creation non-blocking in workflows
|
|
|
|
## [4.2.0] - 2024-10-03
|
|
|
|
### Added
|
|
- AWS credential parameters to `create_client()` function
|
|
- Support for custom endpoint URLs
|
|
- Enhanced boto3 compatibility
|
|
|
|
## [4.1.0] - 2024-09-29
|
|
|
|
### Added
|
|
- boto3-compatible client API
|
|
- Bucket management methods
|
|
- Comprehensive SDK documentation
|
|
|
|
## [4.0.0] - 2024-09-21
|
|
|
|
### Added
|
|
- Initial public release
|
|
- CLI with AWS S3 compatibility
|
|
- Delta compression for versioned artifacts
|
|
- 99%+ compression for similar files
|
|
|
|
[6.1.0]: https://github.com/beshu-tech/deltaglider/compare/v6.0.2...v6.1.0
|
|
[6.0.0]: https://github.com/beshu-tech/deltaglider/compare/v5.1.1...v6.0.0
|
|
[5.1.0]: https://github.com/beshu-tech/deltaglider/compare/v5.0.3...v5.1.0
|
|
[5.0.3]: https://github.com/beshu-tech/deltaglider/compare/v5.0.1...v5.0.3
|
|
[5.0.1]: https://github.com/beshu-tech/deltaglider/compare/v5.0.0...v5.0.1
|
|
[5.0.0]: https://github.com/beshu-tech/deltaglider/compare/v4.2.4...v5.0.0
|
|
[4.2.4]: https://github.com/beshu-tech/deltaglider/compare/v4.2.3...v4.2.4
|
|
[4.2.3]: https://github.com/beshu-tech/deltaglider/compare/v4.2.2...v4.2.3
|
|
[4.2.2]: https://github.com/beshu-tech/deltaglider/compare/v4.2.1...v4.2.2
|
|
[4.2.1]: https://github.com/beshu-tech/deltaglider/compare/v4.2.0...v4.2.1
|
|
[4.2.0]: https://github.com/beshu-tech/deltaglider/compare/v4.1.0...v4.2.0
|
|
[4.1.0]: https://github.com/beshu-tech/deltaglider/compare/v4.0.0...v4.1.0
|
|
[4.0.0]: https://github.com/beshu-tech/deltaglider/releases/tag/v4.0.0
|