mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-05-19 05:16:54 +02:00
d81240be80
* fix(metadata): align direct-upload keys to canonical dg-* namespace
`_upload_direct` (the path taken by non-delta-eligible files like
.sha1 / .sha512) wrote user-metadata with bare underscored keys
(`original_name`, `file_sha256`, `compression`) while delta and
reference uploads correctly used the canonical dashed namespace
(`dg-original-name`, `dg-file-sha256`, `dg-compression`).
Downstream consumers — most visibly the DeltaGlider Proxy — only
recognised the dashed form, so every .sha1 / .sha512 listing on
a bucket holding deltaglider-uploaded files produced:
WARN PATHOLOGICAL | Missing/corrupt DG metadata for
bucket/key.sha1 -- falling back to passthrough.
Error: Storage error: Missing dg-original-name
This patch aligns the writer to the canonical scheme and keeps the
read path backward-compatible with already-stored bare-keyed objects
via `resolve_metadata`. No re-upload required.
Changes
-------
* `_upload_direct` emits metadata using `f"{METADATA_PREFIX}{key}"`
(the same pattern delta/reference uploads already use).
* `METADATA_KEY_ALIASES` now lists `compression` and `source_name`
so `resolve_metadata` works for both fields uniformly.
* Replaced bare `metadata.get("compression")` /
`metadata.get("original_name")` / `metadata.get("file_size")` /
`metadata.get("ref_key")` lookups in `DeltaService.get`,
`DeltaService.delete`, `_delete_delta`, the recursive-delete
listing path, `client.list_objects_v2`, and
`client_operations.stats.get_object_info` with `resolve_metadata`
calls so legacy bare-keyed objects keep working forever.
Tests
-----
* `tests/unit/test_metadata_aliases.py` (new, 11 tests) — pins the
alias table contract: new dashed keys, legacy bare underscored
keys, legacy hyphenated keys, priority rule, empty-string
handling.
* `test_direct_upload_emits_dashed_namespace` in
`tests/unit/test_core_service.py` — pins the writer to emit only
dg-* keys.
* Existing tests using the legacy bare `compression: "none"` form
in `test_s3_compat.py` and `test_recursive_delete_reference_*.py`
still pass — proving the dual-scheme read contract holds.
Full unit suite: 87/87 pass, mypy clean, ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(metadata): also resolve legacy file_sha256 in get() dispatch
Adversarial review of the original patch caught a second
asymmetry: DeltaService.get's "is this a regular S3 object or
DeltaGlider-managed?" dispatch was a literal-string check
`"dg-file-sha256" not in obj_head.metadata`. After the writer
fix, NEW direct uploads have `dg-file-sha256` so they route
correctly. But ~4400 pre-fix `.sha1` / `.sha512` files in
production have the bare `file_sha256` key, and they were
silently being routed through the "regular S3 object" branch
instead of the "direct upload" branch.
Both branches call `_get_direct` so file content was still
served correctly — but the wrong log message fired
("Downloading regular S3 object (no DeltaGlider metadata)") and
the recorded file-size for telemetry came from obj_head.size
instead of the metadata's `file_size` (same value for direct
uploads, but still semantically wrong).
Swap the literal-string check for `resolve_metadata(meta,
"file_sha256") is None` so both schemes route to the
DeltaGlider-managed branch.
Added regression test `test_get_legacy_direct_upload_not_
misclassified_as_regular_s3` that builds a HEAD response with
the legacy bare-keyed metadata shape (exactly what's stored on
Hetzner today for the .sha files), captures the log messages,
and fails if the "regular S3 object" canary fires.
Demonstrated locally: revert the dispatch back to literal-string
check → new test fails with the canary log line. Restore →
88/88 pass.
CHANGELOG updated to document both fixes (writer + dispatch).
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>