Commit Graph

140 Commits

Author SHA1 Message Date
Simone Scarduzio 2d345bc663 v6.2.0: deprecate Python deltaglider in favor of deltaglider_proxy (Rust) (#9)
The Rust `deltaglider_proxy` ships proxy + CLI + UI in one binary with a
byte-identical wire format. Maintaining both has been a duplication tax
(metadata-namespace fix v6.1.2 had to land twice). This release is the
final feature release; security/bug fixes stop here.

What this commit does:

- CLI: every invocation prints a deprecation notice to stderr pointing
  at github.com/beshu-tech/deltaglider_proxy with a one-line migration
  alias (`alias dg='deltaglider_proxy s3'`). Banner prints once per
  process; suppress via DG_SUPPRESS_DEPRECATION=1 for CI that hasn't
  migrated yet.
- README: prominent deprecation banner at the top with the migration
  command and the archive-timing notice (~1 week after v6.2.0 ships).
- pyproject.toml: description prefixed with "DEPRECATED" so PyPI search
  results show the warning. Classifier moved Beta -> Inactive.
- CHANGELOG: v6.2.0 entry under "Deprecated" documenting the migration
  path + archive plan, preserving the carried-forward Fixed/Changed/
  Added items from Unreleased.

Repo archive timing: Maintainer will archive ~1 week after v6.2.0 hits
PyPI to give users a window to see the stderr notice on their next
update. PyPI installs continue to work indefinitely.

No behaviour changes to the wire format, the CLI surface, or the
metadata schema. Existing buckets remain readable forever.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:57:19 +02:00
Simone Scarduzio d81240be80 fix(metadata): align direct-upload keys to canonical dg-* namespace (#8)
* fix(metadata): align direct-upload keys to canonical dg-* namespace

`_upload_direct` (the path taken by non-delta-eligible files like
.sha1 / .sha512) wrote user-metadata with bare underscored keys
(`original_name`, `file_sha256`, `compression`) while delta and
reference uploads correctly used the canonical dashed namespace
(`dg-original-name`, `dg-file-sha256`, `dg-compression`).

Downstream consumers — most visibly the DeltaGlider Proxy — only
recognised the dashed form, so every .sha1 / .sha512 listing on
a bucket holding deltaglider-uploaded files produced:

    WARN PATHOLOGICAL | Missing/corrupt DG metadata for
    bucket/key.sha1 -- falling back to passthrough.
    Error: Storage error: Missing dg-original-name

This patch aligns the writer to the canonical scheme and keeps the
read path backward-compatible with already-stored bare-keyed objects
via `resolve_metadata`. No re-upload required.

Changes
-------
* `_upload_direct` emits metadata using `f"{METADATA_PREFIX}{key}"`
  (the same pattern delta/reference uploads already use).
* `METADATA_KEY_ALIASES` now lists `compression` and `source_name`
  so `resolve_metadata` works for both fields uniformly.
* Replaced bare `metadata.get("compression")` /
  `metadata.get("original_name")` / `metadata.get("file_size")` /
  `metadata.get("ref_key")` lookups in `DeltaService.get`,
  `DeltaService.delete`, `_delete_delta`, the recursive-delete
  listing path, `client.list_objects_v2`, and
  `client_operations.stats.get_object_info` with `resolve_metadata`
  calls so legacy bare-keyed objects keep working forever.

Tests
-----
* `tests/unit/test_metadata_aliases.py` (new, 11 tests) — pins the
  alias table contract: new dashed keys, legacy bare underscored
  keys, legacy hyphenated keys, priority rule, empty-string
  handling.
* `test_direct_upload_emits_dashed_namespace` in
  `tests/unit/test_core_service.py` — pins the writer to emit only
  dg-* keys.
* Existing tests using the legacy bare `compression: "none"` form
  in `test_s3_compat.py` and `test_recursive_delete_reference_*.py`
  still pass — proving the dual-scheme read contract holds.

Full unit suite: 87/87 pass, mypy clean, ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(metadata): also resolve legacy file_sha256 in get() dispatch

Adversarial review of the original patch caught a second
asymmetry: DeltaService.get's "is this a regular S3 object or
DeltaGlider-managed?" dispatch was a literal-string check
`"dg-file-sha256" not in obj_head.metadata`. After the writer
fix, NEW direct uploads have `dg-file-sha256` so they route
correctly. But ~4400 pre-fix `.sha1` / `.sha512` files in
production have the bare `file_sha256` key, and they were
silently being routed through the "regular S3 object" branch
instead of the "direct upload" branch.

Both branches call `_get_direct` so file content was still
served correctly — but the wrong log message fired
("Downloading regular S3 object (no DeltaGlider metadata)") and
the recorded file-size for telemetry came from obj_head.size
instead of the metadata's `file_size` (same value for direct
uploads, but still semantically wrong).

Swap the literal-string check for `resolve_metadata(meta,
"file_sha256") is None` so both schemes route to the
DeltaGlider-managed branch.

Added regression test `test_get_legacy_direct_upload_not_
misclassified_as_regular_s3` that builds a HEAD response with
the legacy bare-keyed metadata shape (exactly what's stored on
Hetzner today for the .sha files), captures the log messages,
and fails if the "regular S3 object" canary fires.

Demonstrated locally: revert the dispatch back to literal-string
check → new test fails with the canary log line. Restore →
88/88 pass.

CHANGELOG updated to document both fixes (writer + dispatch).

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v6.1.2
2026-05-17 10:28:25 +02:00
Simone Scarduzio a98fc7c178 style: format storage_s3.py for ruff format compliance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v6.1.1
2026-03-23 19:15:09 +01:00
Simone Scarduzio 82e00623de fix: verbose diagnostic logging on put_object retries
On retry: logs bucket, key, body size, content type, metadata keys,
endpoint URL, HTTP status, error code/message, request ID, and full
HTTP response headers. Enables botocore DEBUG logging for wire-level
HTTP traces on subsequent retry attempts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 19:08:35 +01:00
Simone Scarduzio e8c76f1dc7 fix: add retry with backoff for put_object on transient S3 failures
S3-compatible endpoints (Hetzner) occasionally return transient
BadRequest errors. Retries up to 3 times with exponential backoff
(1s, 2s) before giving up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 19:02:51 +01:00
Simone Scarduzio c492a5087b feat: log deltaglider version on every CLI invocation
Helps verify which version is running in Docker containers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 13:58:15 +01:00
Simone Scarduzio 85af5a95c8 docs: update CHANGELOG for v6.1.1 release
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:52:49 +01:00
Simone Scarduzio 60b70309fa fix: pin LocalStack to 4.4 (latest now requires paid license)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:50:05 +01:00
Simone Scarduzio b0699f952a fix: disable boto3 auto-checksums for S3-compatible endpoint support
boto3 1.36+ sends CRC32/CRC64 checksums by default on PUT requests.
S3-compatible stores like Hetzner Object Storage reject these with
BadRequest, breaking direct (non-delta) file uploads. This sets
request_checksum_calculation="when_required" to restore compatibility
while still working with AWS S3.

Also pins runtime deps to major version ranges and adds S3 compat tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:45:05 +01:00
Simone Scarduzio 9bfe121f44 style: format files for ruff format --check compliance
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v6.1.0
2026-02-07 16:02:40 +01:00
Simone Scarduzio 6cab3de9a0 fix: disable sha tag on tag pushes to avoid invalid Docker tag
The sha tag template `prefix={{branch}}-` produces `:-hash` on tag
pushes because {{branch}} is empty, resulting in an invalid Docker
tag like `beshultd/deltaglider:-482f45f`. Only emit sha tags on
branch pushes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 15:57:37 +01:00
Simone Scarduzio 482f45fc02 docs: update CHANGELOG for v6.1.0 release
Add v6.1.0 section with bucket ACL support, Docker publishing,
config/model refactoring. Backfill v6.0.0 section from previously
unreleased entries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 15:55:50 +01:00
Simone Scarduzio 6b3245266e feat: add put_bucket_acl and get_bucket_acl support
Add boto3-compatible bucket ACL operations as pure S3 passthroughs,
following the existing create_bucket/delete_bucket pattern. Includes
CLI commands (put-bucket-acl, get-bucket-acl), 7 integration tests,
and documentation updates (method count 21→23).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 15:53:33 +01:00
Simone Scarduzio 20053acb5f fix: remove unused imports flagged by ruff
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 08:48:22 +01:00
Simone Scarduzio 87f425734f refactor: typed result dataclasses, centralized metadata aliases, config extraction
- Replace dict[str,Any] returns in delete/delete_recursive with DeleteResult
  and RecursiveDeleteResult dataclasses for type safety
- Extract _delete_reference/_delete_delta/_classify_objects_for_deletion
  helper methods from oversized delete methods in service.py
- Centralize metadata key aliases in METADATA_KEY_ALIASES dict with
  resolve_metadata() replacing duplicated _meta_value() lookups
- Add DeltaGliderConfig dataclass with from_env() for centralized config
- Add ObjectKey.full_key property, remove dead _multipart_uploads dict
- Update all consumers (client, CLI, tests) for dataclass access patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:16:57 +01:00
Simone Scarduzio 012662c377 updates 2025-11-11 17:20:43 +01:00
Simone Scarduzio 284f030fae updates to docs 2025-11-11 17:05:50 +01:00
Simone Scarduzio 7a4d30a007 freshen up 2025-11-11 11:18:06 +01:00
Simone Scarduzio 0d46283ff0 width 2025-11-11 09:55:52 +01:00
Simone Scarduzio 805e2967bc dark mode 2025-11-11 09:53:54 +01:00
Simone Scarduzio 2ef1741d51 freshen up readme 2025-11-11 09:48:34 +01:00
Simone Scarduzio 2c1d756e7b tweak readme 2025-11-06 16:14:29 +01:00
Simone Scarduzio c6cee7ae26 docker 2025-11-06 15:56:15 +01:00
Simone Scarduzio cee9a9fd2d higher limits why not v6.0.2 2025-10-17 18:43:46 +02:00
Simone Scarduzio 0507e6ebcd format 2025-10-16 17:14:37 +02:00
Simone Scarduzio fa9c4fa42d feat: Implement rehydration and purge functionality for deltaglider files
- Added `rehydrate_for_download` method to download and decompress deltaglider-compressed files, re-uploading them with expiration metadata.
- Introduced `generate_presigned_url_with_rehydration` method to generate presigned URLs that automatically handle rehydration for both regular and deltaglider files.
- Implemented `purge_temp_files` command in CLI to delete expired temporary files from the .deltaglider/tmp/ directory, with options for dry run and JSON output.
- Enhanced service methods to support the new rehydration and purging features, including detailed logging and metrics tracking.
2025-10-16 17:02:00 +02:00
Simone Scarduzio 934d83975c fix: format models.py v6.0.1 2025-10-16 11:21:33 +02:00
Simone Scarduzio c32d5265d9 feat: Enhance metadata handling and bucket statistics
- Added object_limit_reached attribute to BucketStats for tracking limits.
- Introduced QUICK_LIST_LIMIT and SAMPLED_LIST_LIMIT constants to manage listing limits.
- Implemented _first_metadata_value helper function for improved metadata retrieval.
- Updated get_bucket_stats to log when listing is capped due to limits.
- Refactored DeltaMeta to streamline metadata extraction with error handling.
- Enhanced object listing to support max_objects parameter and limit tracking.
2025-10-16 11:17:13 +02:00
Simone Scarduzio 1cf7e3ad21 import 2025-10-15 18:52:56 +02:00
Simone Scarduzio 9b36087438 not mandatory to have the command metadata field set 2025-10-15 18:16:43 +02:00
Simone Scarduzio 60877966f2 docs: Remove outdated METADATA_ISSUE_DIAGNOSIS.md
This document describes the old metadata format without dg- prefix.
Since v6.0.0 uses the new dg- prefixed format and requires all files
to be re-uploaded (greenfield approach), this diagnosis doc is no longer
relevant.
2025-10-15 11:45:52 +02:00
Simone Scarduzio fbd44ea3c3 style: Format integration test files with ruff v6.0.0 2025-10-15 11:38:17 +02:00
Simone Scarduzio 3f689fc601 fix: Update integration tests for new metadata format and caching behavior
- Fix sync tests: Add list_objects.side_effect = NotImplementedError() to mock
- Fix sync tests: Add side_effect for put() to avoid hanging
- Fix MockStorage: Add continuation_token parameter to list_objects()
- Fix stats tests: Update assertions to include use_cache and refresh_cache params
- Fix bucket management test: Update caching expectations for S3-based cache

All 97 integration tests now pass.
2025-10-15 11:34:43 +02:00
Simone Scarduzio 3753212f96 style: Format test file with ruff
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-15 11:22:00 +02:00
Simone Scarduzio db7d14f8a8 feat: Add metadata namespace and fix stats calculation
This is a major release with breaking changes to metadata format.

BREAKING CHANGES:
- All metadata keys now use 'dg-' namespace prefix (becomes 'x-amz-meta-dg-*' in S3)
- Old metadata format is not supported - all files must be re-uploaded
- Stats behavior changed: quick mode no longer shows misleading warnings

Features:
- Metadata now uses real package version (dg-tool: deltaglider/VERSION)
- All metadata keys properly namespaced with 'dg-' prefix
- Clean stats output in quick mode (no per-file warning spam)
- Fixed nonsensical negative compression ratios in quick mode

Fixes:
- Stats now correctly handles delta files without metadata
- Space saved shows 0 instead of negative numbers when metadata unavailable
- Removed misleading warnings in quick mode (metadata not fetched is expected)
- Fixed metadata keys to use hyphens instead of underscores

Documentation:
- Added comprehensive metadata documentation
- Added stats calculation behavior guide
- Added real version tracking documentation

Tests:
- Updated all tests to use new dg- prefixed metadata keys
- All 73 unit tests passing
- All quality checks passing (ruff, mypy)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-15 11:19:10 +02:00
Simone Scarduzio e1259b7ea8 fix: Code quality improvements for v5.2.2 release
- Fix pagination bug using continuation_token instead of start_after
- Add stats caching to prevent blocking web apps
- Improve code formatting and type checking
- Add comprehensive unit tests for new features
- Fix test mock usage in object_listing tests
v5.2.2
2025-10-14 23:54:49 +02:00
Simone Scarduzio ff05e77c24 fix: Prevent get_bucket_stats from blocking web apps indefinitely
**Performance Issues Fixed:**
1. aws_compat.py: Changed to use cached stats only (no bucket scans after uploads)
2. stats.py: Added safety mechanisms to prevent infinite hangs
   - Max 10k iterations (10M object limit)
   - 10 min timeout on metadata fetching
   - Missing pagination token detection
   - Graceful error recovery with partial stats

**Refactoring:**
- Reduced nesting in get_bucket_stats from 5 levels to 2 levels
- Extracted 5 helper functions for better maintainability
- Main function reduced from 300+ lines to 33 lines
- 100% backward compatible - no API changes

**Benefits:**
- Web apps no longer hang on upload/delete operations
- Explicit get_bucket_stats() calls complete within bounded time
- Better error handling and logging
- Easier to test and maintain

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
v5.2.1
2025-10-14 14:47:39 +02:00
Simone Scarduzio c3d385bf18 fix tests v5.2.0 2025-10-13 17:26:35 +02:00
Simone Scarduzio aea5cb5d9a feat: Enhance S3 migration CLI with new commands and EC2 detection option 2025-10-12 23:12:32 +02:00
Simone Scarduzio b2ca59490b feat: Add EC2 region detection and cost optimization features 2025-10-12 22:41:48 +02:00
Simone Scarduzio 4f56c4b600 fix: Preserve original filenames during S3-to-S3 migration 2025-10-12 18:10:04 +02:00
Simone Scarduzio 14c6af0f35 handle version in cli 2025-10-12 17:47:05 +02:00
Simone Scarduzio 67792b2031 migrate CLI support 2025-10-12 17:37:44 +02:00
Simone Scarduzio a9a1396e6e style: Format test_stats_algorithm.py with ruff v5.1.1 2025-10-11 14:17:49 +02:00
Simone Scarduzio 52eb5bba21 fix: Fix unit test import issues for concurrent.futures
- Remove unnecessary concurrent.futures patches in tests
- Update test_detailed_stats_flag to match current implementation behavior
- Tests now properly handle parallel metadata fetching without mocking
2025-10-11 14:13:40 +02:00
Simone Scarduzio f75db142e8 fix: Correct logging message formatting in get_bucket_stats and update test assertionsalls for clarity. 2025-10-11 14:05:54 +02:00
Simone Scarduzio 35d34d4862 chore: Update CHANGELOG for v5.1.1 release
- Document stats command fixes
- Document performance improvements
2025-10-10 19:57:11 +02:00
Simone Scarduzio 9230cbd762 test 2025-10-10 19:52:15 +02:00
Simone Scarduzio 2eba6e8d38 optimisation 2025-10-10 19:50:33 +02:00
Simone Scarduzio 656726b57b algorithm correctness 2025-10-10 19:46:39 +02:00