fix(metadata): align direct-upload keys to canonical dg-* namespace (#8)

* fix(metadata): align direct-upload keys to canonical dg-* namespace

`_upload_direct` (the path taken by non-delta-eligible files like
.sha1 / .sha512) wrote user-metadata with bare underscored keys
(`original_name`, `file_sha256`, `compression`) while delta and
reference uploads correctly used the canonical dashed namespace
(`dg-original-name`, `dg-file-sha256`, `dg-compression`).

Downstream consumers — most visibly the DeltaGlider Proxy — only
recognised the dashed form, so every .sha1 / .sha512 listing on
a bucket holding deltaglider-uploaded files produced:

    WARN PATHOLOGICAL | Missing/corrupt DG metadata for
    bucket/key.sha1 -- falling back to passthrough.
    Error: Storage error: Missing dg-original-name

This patch aligns the writer to the canonical scheme and keeps the
read path backward-compatible with already-stored bare-keyed objects
via `resolve_metadata`. No re-upload required.

Changes
-------
* `_upload_direct` emits metadata using `f"{METADATA_PREFIX}{key}"`
  (the same pattern delta/reference uploads already use).
* `METADATA_KEY_ALIASES` now lists `compression` and `source_name`
  so `resolve_metadata` works for both fields uniformly.
* Replaced bare `metadata.get("compression")` /
  `metadata.get("original_name")` / `metadata.get("file_size")` /
  `metadata.get("ref_key")` lookups in `DeltaService.get`,
  `DeltaService.delete`, `_delete_delta`, the recursive-delete
  listing path, `client.list_objects_v2`, and
  `client_operations.stats.get_object_info` with `resolve_metadata`
  calls so legacy bare-keyed objects keep working forever.

Tests
-----
* `tests/unit/test_metadata_aliases.py` (new, 11 tests) — pins the
  alias table contract: new dashed keys, legacy bare underscored
  keys, legacy hyphenated keys, priority rule, empty-string
  handling.
* `test_direct_upload_emits_dashed_namespace` in
  `tests/unit/test_core_service.py` — pins the writer to emit only
  dg-* keys.
* Existing tests using the legacy bare `compression: "none"` form
  in `test_s3_compat.py` and `test_recursive_delete_reference_*.py`
  still pass — proving the dual-scheme read contract holds.

Full unit suite: 87/87 pass, mypy clean, ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(metadata): also resolve legacy file_sha256 in get() dispatch

Adversarial review of the original patch caught a second
asymmetry: DeltaService.get's "is this a regular S3 object or
DeltaGlider-managed?" dispatch was a literal-string check
`"dg-file-sha256" not in obj_head.metadata`. After the writer
fix, NEW direct uploads have `dg-file-sha256` so they route
correctly. But ~4400 pre-fix `.sha1` / `.sha512` files in
production have the bare `file_sha256` key, and they were
silently being routed through the "regular S3 object" branch
instead of the "direct upload" branch.

Both branches call `_get_direct` so file content was still
served correctly — but the wrong log message fired
("Downloading regular S3 object (no DeltaGlider metadata)") and
the recorded file-size for telemetry came from obj_head.size
instead of the metadata's `file_size` (same value for direct
uploads, but still semantically wrong).

Swap the literal-string check for `resolve_metadata(meta,
"file_sha256") is None` so both schemes route to the
DeltaGlider-managed branch.

Added regression test `test_get_legacy_direct_upload_not_
misclassified_as_regular_s3` that builds a HEAD response with
the legacy bare-keyed metadata shape (exactly what's stored on
Hetzner today for the .sha files), captures the log messages,
and fails if the "regular S3 object" canary fires.

Demonstrated locally: revert the dispatch back to literal-string
check → new test fails with the canary log line. Restore →
88/88 pass.

CHANGELOG updated to document both fixes (writer + dispatch).

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Simone Scarduzio
2026-05-17 10:28:25 +02:00
committed by GitHub
parent a98fc7c178
commit d81240be80
7 changed files with 311 additions and 22 deletions
+11 -4
View File
@@ -42,7 +42,7 @@ from .client_operations.stats import StatsMode
from .core import DeltaService, DeltaSpace, ObjectKey
from .core.errors import NotFoundError
from .core.models import DeleteResult
from .core.models import DeleteResult, resolve_metadata
from .core.object_listing import ObjectListing, list_objects_page
from .core.s3_uri import parse_s3_url
from .response_builders import (
@@ -398,10 +398,17 @@ class DeltaGliderClient:
obj_head = self.service.storage.head(f"{Bucket}/{obj['key']}")
if obj_head and obj_head.metadata:
metadata = obj_head.metadata
# Update with actual compression stats
original_size = int(metadata.get("file_size", obj["size"]))
# Update with actual compression stats. Use
# `resolve_metadata` so we accept both the new
# dashed `dg-*` keys and the legacy bare ones.
file_size_raw = resolve_metadata(metadata, "file_size")
original_size = int(file_size_raw) if file_size_raw else obj["size"]
# `compression_ratio` isn't in the alias table
# (it's a derived stat, not part of the core
# metadata contract) so fall back to plain
# get() with the legacy bare key.
compression_ratio = float(metadata.get("compression_ratio", 0.0))
reference_key = metadata.get("ref_key")
reference_key = resolve_metadata(metadata, "ref_key")
deltaglider_metadata["deltaglider-original-size"] = str(original_size)
deltaglider_metadata["deltaglider-compression-ratio"] = str(