Commit Graph

17 Commits

Author SHA1 Message Date
Simone Scarduzio 2d345bc663 v6.2.0: deprecate Python deltaglider in favor of deltaglider_proxy (Rust) (#9)
The Rust `deltaglider_proxy` ships proxy + CLI + UI in one binary with a
byte-identical wire format. Maintaining both has been a duplication tax
(metadata-namespace fix v6.1.2 had to land twice). This release is the
final feature release; security/bug fixes stop here.

What this commit does:

- CLI: every invocation prints a deprecation notice to stderr pointing
  at github.com/beshu-tech/deltaglider_proxy with a one-line migration
  alias (`alias dg='deltaglider_proxy s3'`). Banner prints once per
  process; suppress via DG_SUPPRESS_DEPRECATION=1 for CI that hasn't
  migrated yet.
- README: prominent deprecation banner at the top with the migration
  command and the archive-timing notice (~1 week after v6.2.0 ships).
- pyproject.toml: description prefixed with "DEPRECATED" so PyPI search
  results show the warning. Classifier moved Beta -> Inactive.
- CHANGELOG: v6.2.0 entry under "Deprecated" documenting the migration
  path + archive plan, preserving the carried-forward Fixed/Changed/
  Added items from Unreleased.

Repo archive timing: Maintainer will archive ~1 week after v6.2.0 hits
PyPI to give users a window to see the stderr notice on their next
update. PyPI installs continue to work indefinitely.

No behaviour changes to the wire format, the CLI surface, or the
metadata schema. Existing buckets remain readable forever.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:57:19 +02:00
Simone Scarduzio d81240be80 fix(metadata): align direct-upload keys to canonical dg-* namespace (#8)
* fix(metadata): align direct-upload keys to canonical dg-* namespace

`_upload_direct` (the path taken by non-delta-eligible files like
.sha1 / .sha512) wrote user-metadata with bare underscored keys
(`original_name`, `file_sha256`, `compression`) while delta and
reference uploads correctly used the canonical dashed namespace
(`dg-original-name`, `dg-file-sha256`, `dg-compression`).

Downstream consumers — most visibly the DeltaGlider Proxy — only
recognised the dashed form, so every .sha1 / .sha512 listing on
a bucket holding deltaglider-uploaded files produced:

    WARN PATHOLOGICAL | Missing/corrupt DG metadata for
    bucket/key.sha1 -- falling back to passthrough.
    Error: Storage error: Missing dg-original-name

This patch aligns the writer to the canonical scheme and keeps the
read path backward-compatible with already-stored bare-keyed objects
via `resolve_metadata`. No re-upload required.

Changes
-------
* `_upload_direct` emits metadata using `f"{METADATA_PREFIX}{key}"`
  (the same pattern delta/reference uploads already use).
* `METADATA_KEY_ALIASES` now lists `compression` and `source_name`
  so `resolve_metadata` works for both fields uniformly.
* Replaced bare `metadata.get("compression")` /
  `metadata.get("original_name")` / `metadata.get("file_size")` /
  `metadata.get("ref_key")` lookups in `DeltaService.get`,
  `DeltaService.delete`, `_delete_delta`, the recursive-delete
  listing path, `client.list_objects_v2`, and
  `client_operations.stats.get_object_info` with `resolve_metadata`
  calls so legacy bare-keyed objects keep working forever.

Tests
-----
* `tests/unit/test_metadata_aliases.py` (new, 11 tests) — pins the
  alias table contract: new dashed keys, legacy bare underscored
  keys, legacy hyphenated keys, priority rule, empty-string
  handling.
* `test_direct_upload_emits_dashed_namespace` in
  `tests/unit/test_core_service.py` — pins the writer to emit only
  dg-* keys.
* Existing tests using the legacy bare `compression: "none"` form
  in `test_s3_compat.py` and `test_recursive_delete_reference_*.py`
  still pass — proving the dual-scheme read contract holds.

Full unit suite: 87/87 pass, mypy clean, ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(metadata): also resolve legacy file_sha256 in get() dispatch

Adversarial review of the original patch caught a second
asymmetry: DeltaService.get's "is this a regular S3 object or
DeltaGlider-managed?" dispatch was a literal-string check
`"dg-file-sha256" not in obj_head.metadata`. After the writer
fix, NEW direct uploads have `dg-file-sha256` so they route
correctly. But ~4400 pre-fix `.sha1` / `.sha512` files in
production have the bare `file_sha256` key, and they were
silently being routed through the "regular S3 object" branch
instead of the "direct upload" branch.

Both branches call `_get_direct` so file content was still
served correctly — but the wrong log message fired
("Downloading regular S3 object (no DeltaGlider metadata)") and
the recorded file-size for telemetry came from obj_head.size
instead of the metadata's `file_size` (same value for direct
uploads, but still semantically wrong).

Swap the literal-string check for `resolve_metadata(meta,
"file_sha256") is None` so both schemes route to the
DeltaGlider-managed branch.

Added regression test `test_get_legacy_direct_upload_not_
misclassified_as_regular_s3` that builds a HEAD response with
the legacy bare-keyed metadata shape (exactly what's stored on
Hetzner today for the .sha files), captures the log messages,
and fails if the "regular S3 object" canary fires.

Demonstrated locally: revert the dispatch back to literal-string
check → new test fails with the canary log line. Restore →
88/88 pass.

CHANGELOG updated to document both fixes (writer + dispatch).

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 10:28:25 +02:00
Simone Scarduzio 85af5a95c8 docs: update CHANGELOG for v6.1.1 release
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:52:49 +01:00
Simone Scarduzio 482f45fc02 docs: update CHANGELOG for v6.1.0 release
Add v6.1.0 section with bucket ACL support, Docker publishing,
config/model refactoring. Backfill v6.0.0 section from previously
unreleased entries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 15:55:50 +01:00
Simone Scarduzio b2ca59490b feat: Add EC2 region detection and cost optimization features 2025-10-12 22:41:48 +02:00
Simone Scarduzio 4f56c4b600 fix: Preserve original filenames during S3-to-S3 migration 2025-10-12 18:10:04 +02:00
Simone Scarduzio 14c6af0f35 handle version in cli 2025-10-12 17:47:05 +02:00
Simone Scarduzio 35d34d4862 chore: Update CHANGELOG for v5.1.1 release
- Document stats command fixes
- Document performance improvements
2025-10-10 19:57:11 +02:00
Simone Scarduzio dbd2632cae docs: Update SDK documentation for v5.1.0 features
- Add session-level caching documentation to API reference
- Document clear_cache() and evict_cache() methods
- Add comprehensive bucket statistics examples
- Update list_buckets() with DeltaGliderStats metadata
- Add cache management patterns and best practices
- Update CHANGELOG comparison links

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-10 18:34:44 +02:00
Simone Scarduzio 3d04a407c0 feat: Add stats command with session-level caching (v5.1.0)
New Features:
- Add 'deltaglider stats' CLI command for bucket compression metrics
- Session-level bucket statistics caching for performance
- Enhanced list_buckets() with cached stats metadata

Technical Changes:
- Automatic cache invalidation on bucket mutations
- Intelligent cache reuse (detailed → quick fallback)
- Comprehensive test coverage (106+ new test lines)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-10 18:30:05 +02:00
Simone Scarduzio ac7d4e067f security: Make encryption always-on with auto-cleanup
BREAKING CHANGES:
- Encryption is now ALWAYS enabled (cannot be disabled)
- Removed DG_CACHE_ENCRYPTION environment variable

Security Enhancements:
- Encryption is mandatory for all cache operations
- Ephemeral encryption keys per process (forward secrecy)
- Automatic deletion of corrupted cache files on decryption failures
- Auto-cleanup on both decryption failures and SHA mismatches

Changes:
- Removed DG_CACHE_ENCRYPTION toggle from CLI and SDK
- Updated EncryptedCache to auto-delete corrupted files
- Simplified cache initialization (always wrapped with encryption)
- DG_CACHE_ENCRYPTION_KEY remains optional for persistent keys

Documentation:
- Updated CLAUDE.md with encryption always-on behavior
- Updated CHANGELOG.md with breaking changes
- Clarified security model and auto-cleanup behavior

Testing:
- All 119 tests passing with encryption always-on
- Type checking: 0 errors (mypy)
- Linting: All checks passed (ruff)

Rationale:
- Zero-trust cache architecture requires encryption
- Corrupted cache is security risk - auto-deletion prevents exploitation
- Ephemeral keys provide maximum security by default
- Users who need cross-process sharing can opt-in with persistent keys

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-10 09:51:29 +02:00
Simone Scarduzio 90a342dc33 feat: Implement Content-Addressed Storage (CAS) cache
Implemented SHA256-based Content-Addressed Storage to eliminate
cache collisions and enable automatic deduplication.

Key Features:
- Zero collision risk: SHA256 namespace guarantees uniqueness
- Automatic deduplication: same content = same filename
- Tampering protection: changing content changes SHA, breaks lookup
- Two-level directory structure (ab/cd/abcdef...) for filesystem optimization

Changes:
- Added ContentAddressedCache adapter in adapters/cache_cas.py
- Updated CLI and SDK to use CAS instead of FsCacheAdapter
- Updated all tests to use ContentAddressedCache
- Documented CAS architecture in CLAUDE.md and SECURITY_FIX_ROADMAP.md

Security Benefits:
- Eliminates cross-endpoint collision vulnerabilities
- Self-describing cache (filename IS the checksum)
- Natural cache validation without external metadata

All quality checks passing:
- 99 tests passing (0 failures)
- Type checking: 0 errors (mypy)
- Linting: All checks passed (ruff)

Completed Phase 2 of SECURITY_FIX_ROADMAP.md
2025-10-10 09:06:29 +02:00
Simone Scarduzio f9f2b036e3 docs: Update CHANGELOG.md for v5.0.3 release 2025-10-10 08:57:52 +02:00
Simone Scarduzio fb2877bfd3 docs: Update CHANGELOG.md for v5.0.1 release
- Document code organization improvements
- Note 26% reduction in client.py size
- List new client_operations/ package modules
- Maintain full backward compatibility
- All tests passing, type safety maintained
2025-10-09 08:31:09 +02:00
Simone Scarduzio 8bc0a0eaf3 docs: Fix outdated examples and update documentation for boto3-compatible responses
Updated all documentation to reflect the boto3-compatible dict responses:
- Fixed pagination examples in README.md to use dict access
- Updated docs/sdk/api.md with correct list_objects() signature and examples
- Added return type documentation for list_objects()
- Updated CHANGELOG.md with breaking changes and migration info

All examples now use:
- response['Contents'] instead of response.contents
- response.get('IsTruncated') instead of response.is_truncated
- response.get('NextContinuationToken') for pagination

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-08 14:33:03 +02:00
Simone Scarduzio fa1f8b85a9 docs: Update CHANGELOG for v4.2.4 2025-10-08 14:09:30 +02:00
Simone Scarduzio e706ddebdd docs: Add CHANGELOG and update documentation for v4.2.3
- Create CHANGELOG.md with release history
- Update SDK documentation with test coverage and type safety info
- Highlight 99 integration/unit tests and comprehensive coverage
- Add quality assurance badges (mypy, ruff)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-07 23:19:19 +02:00