The Rust `deltaglider_proxy` ships proxy + CLI + UI in one binary with a
byte-identical wire format. Maintaining both has been a duplication tax
(metadata-namespace fix v6.1.2 had to land twice). This release is the
final feature release; security/bug fixes stop here.
What this commit does:
- CLI: every invocation prints a deprecation notice to stderr pointing
at github.com/beshu-tech/deltaglider_proxy with a one-line migration
alias (`alias dg='deltaglider_proxy s3'`). Banner prints once per
process; suppress via DG_SUPPRESS_DEPRECATION=1 for CI that hasn't
migrated yet.
- README: prominent deprecation banner at the top with the migration
command and the archive-timing notice (~1 week after v6.2.0 ships).
- pyproject.toml: description prefixed with "DEPRECATED" so PyPI search
results show the warning. Classifier moved Beta -> Inactive.
- CHANGELOG: v6.2.0 entry under "Deprecated" documenting the migration
path + archive plan, preserving the carried-forward Fixed/Changed/
Added items from Unreleased.
Repo archive timing: Maintainer will archive ~1 week after v6.2.0 hits
PyPI to give users a window to see the stderr notice on their next
update. PyPI installs continue to work indefinitely.
No behaviour changes to the wire format, the CLI surface, or the
metadata schema. Existing buckets remain readable forever.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(metadata): align direct-upload keys to canonical dg-* namespace
`_upload_direct` (the path taken by non-delta-eligible files like
.sha1 / .sha512) wrote user-metadata with bare underscored keys
(`original_name`, `file_sha256`, `compression`) while delta and
reference uploads correctly used the canonical dashed namespace
(`dg-original-name`, `dg-file-sha256`, `dg-compression`).
Downstream consumers — most visibly the DeltaGlider Proxy — only
recognised the dashed form, so every .sha1 / .sha512 listing on
a bucket holding deltaglider-uploaded files produced:
WARN PATHOLOGICAL | Missing/corrupt DG metadata for
bucket/key.sha1 -- falling back to passthrough.
Error: Storage error: Missing dg-original-name
This patch aligns the writer to the canonical scheme and keeps the
read path backward-compatible with already-stored bare-keyed objects
via `resolve_metadata`. No re-upload required.
Changes
-------
* `_upload_direct` emits metadata using `f"{METADATA_PREFIX}{key}"`
(the same pattern delta/reference uploads already use).
* `METADATA_KEY_ALIASES` now lists `compression` and `source_name`
so `resolve_metadata` works for both fields uniformly.
* Replaced bare `metadata.get("compression")` /
`metadata.get("original_name")` / `metadata.get("file_size")` /
`metadata.get("ref_key")` lookups in `DeltaService.get`,
`DeltaService.delete`, `_delete_delta`, the recursive-delete
listing path, `client.list_objects_v2`, and
`client_operations.stats.get_object_info` with `resolve_metadata`
calls so legacy bare-keyed objects keep working forever.
Tests
-----
* `tests/unit/test_metadata_aliases.py` (new, 11 tests) — pins the
alias table contract: new dashed keys, legacy bare underscored
keys, legacy hyphenated keys, priority rule, empty-string
handling.
* `test_direct_upload_emits_dashed_namespace` in
`tests/unit/test_core_service.py` — pins the writer to emit only
dg-* keys.
* Existing tests using the legacy bare `compression: "none"` form
in `test_s3_compat.py` and `test_recursive_delete_reference_*.py`
still pass — proving the dual-scheme read contract holds.
Full unit suite: 87/87 pass, mypy clean, ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(metadata): also resolve legacy file_sha256 in get() dispatch
Adversarial review of the original patch caught a second
asymmetry: DeltaService.get's "is this a regular S3 object or
DeltaGlider-managed?" dispatch was a literal-string check
`"dg-file-sha256" not in obj_head.metadata`. After the writer
fix, NEW direct uploads have `dg-file-sha256` so they route
correctly. But ~4400 pre-fix `.sha1` / `.sha512` files in
production have the bare `file_sha256` key, and they were
silently being routed through the "regular S3 object" branch
instead of the "direct upload" branch.
Both branches call `_get_direct` so file content was still
served correctly — but the wrong log message fired
("Downloading regular S3 object (no DeltaGlider metadata)") and
the recorded file-size for telemetry came from obj_head.size
instead of the metadata's `file_size` (same value for direct
uploads, but still semantically wrong).
Swap the literal-string check for `resolve_metadata(meta,
"file_sha256") is None` so both schemes route to the
DeltaGlider-managed branch.
Added regression test `test_get_legacy_direct_upload_not_
misclassified_as_regular_s3` that builds a HEAD response with
the legacy bare-keyed metadata shape (exactly what's stored on
Hetzner today for the .sha files), captures the log messages,
and fails if the "regular S3 object" canary fires.
Demonstrated locally: revert the dispatch back to literal-string
check → new test fails with the canary log line. Restore →
88/88 pass.
CHANGELOG updated to document both fixes (writer + dispatch).
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
S3-compatible endpoints (Hetzner) occasionally return transient
BadRequest errors. Retries up to 3 times with exponential backoff
(1s, 2s) before giving up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
boto3 1.36+ sends CRC32/CRC64 checksums by default on PUT requests.
S3-compatible stores like Hetzner Object Storage reject these with
BadRequest, breaking direct (non-delta) file uploads. This sets
request_checksum_calculation="when_required" to restore compatibility
while still working with AWS S3.
Also pins runtime deps to major version ranges and adds S3 compat tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sha tag template `prefix={{branch}}-` produces `:-hash` on tag
pushes because {{branch}} is empty, resulting in an invalid Docker
tag like `beshultd/deltaglider:-482f45f`. Only emit sha tags on
branch pushes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add boto3-compatible bucket ACL operations as pure S3 passthroughs,
following the existing create_bucket/delete_bucket pattern. Includes
CLI commands (put-bucket-acl, get-bucket-acl), 7 integration tests,
and documentation updates (method count 21→23).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace dict[str,Any] returns in delete/delete_recursive with DeleteResult
and RecursiveDeleteResult dataclasses for type safety
- Extract _delete_reference/_delete_delta/_classify_objects_for_deletion
helper methods from oversized delete methods in service.py
- Centralize metadata key aliases in METADATA_KEY_ALIASES dict with
resolve_metadata() replacing duplicated _meta_value() lookups
- Add DeltaGliderConfig dataclass with from_env() for centralized config
- Add ObjectKey.full_key property, remove dead _multipart_uploads dict
- Update all consumers (client, CLI, tests) for dataclass access patterns
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Added `rehydrate_for_download` method to download and decompress deltaglider-compressed files, re-uploading them with expiration metadata.
- Introduced `generate_presigned_url_with_rehydration` method to generate presigned URLs that automatically handle rehydration for both regular and deltaglider files.
- Implemented `purge_temp_files` command in CLI to delete expired temporary files from the .deltaglider/tmp/ directory, with options for dry run and JSON output.
- Enhanced service methods to support the new rehydration and purging features, including detailed logging and metrics tracking.
- Added object_limit_reached attribute to BucketStats for tracking limits.
- Introduced QUICK_LIST_LIMIT and SAMPLED_LIST_LIMIT constants to manage listing limits.
- Implemented _first_metadata_value helper function for improved metadata retrieval.
- Updated get_bucket_stats to log when listing is capped due to limits.
- Refactored DeltaMeta to streamline metadata extraction with error handling.
- Enhanced object listing to support max_objects parameter and limit tracking.
This document describes the old metadata format without dg- prefix.
Since v6.0.0 uses the new dg- prefixed format and requires all files
to be re-uploaded (greenfield approach), this diagnosis doc is no longer
relevant.
This is a major release with breaking changes to metadata format.
BREAKING CHANGES:
- All metadata keys now use 'dg-' namespace prefix (becomes 'x-amz-meta-dg-*' in S3)
- Old metadata format is not supported - all files must be re-uploaded
- Stats behavior changed: quick mode no longer shows misleading warnings
Features:
- Metadata now uses real package version (dg-tool: deltaglider/VERSION)
- All metadata keys properly namespaced with 'dg-' prefix
- Clean stats output in quick mode (no per-file warning spam)
- Fixed nonsensical negative compression ratios in quick mode
Fixes:
- Stats now correctly handles delta files without metadata
- Space saved shows 0 instead of negative numbers when metadata unavailable
- Removed misleading warnings in quick mode (metadata not fetched is expected)
- Fixed metadata keys to use hyphens instead of underscores
Documentation:
- Added comprehensive metadata documentation
- Added stats calculation behavior guide
- Added real version tracking documentation
Tests:
- Updated all tests to use new dg- prefixed metadata keys
- All 73 unit tests passing
- All quality checks passing (ruff, mypy)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix pagination bug using continuation_token instead of start_after
- Add stats caching to prevent blocking web apps
- Improve code formatting and type checking
- Add comprehensive unit tests for new features
- Fix test mock usage in object_listing tests
**Performance Issues Fixed:**
1. aws_compat.py: Changed to use cached stats only (no bucket scans after uploads)
2. stats.py: Added safety mechanisms to prevent infinite hangs
- Max 10k iterations (10M object limit)
- 10 min timeout on metadata fetching
- Missing pagination token detection
- Graceful error recovery with partial stats
**Refactoring:**
- Reduced nesting in get_bucket_stats from 5 levels to 2 levels
- Extracted 5 helper functions for better maintainability
- Main function reduced from 300+ lines to 33 lines
- 100% backward compatible - no API changes
**Benefits:**
- Web apps no longer hang on upload/delete operations
- Explicit get_bucket_stats() calls complete within bounded time
- Better error handling and logging
- Easier to test and maintain
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove unnecessary concurrent.futures patches in tests
- Update test_detailed_stats_flag to match current implementation behavior
- Tests now properly handle parallel metadata fetching without mocking