mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-05-19 05:16:54 +02:00
fix(metadata): align direct-upload keys to canonical dg-* namespace (#8)
* fix(metadata): align direct-upload keys to canonical dg-* namespace
`_upload_direct` (the path taken by non-delta-eligible files like
.sha1 / .sha512) wrote user-metadata with bare underscored keys
(`original_name`, `file_sha256`, `compression`) while delta and
reference uploads correctly used the canonical dashed namespace
(`dg-original-name`, `dg-file-sha256`, `dg-compression`).
Downstream consumers — most visibly the DeltaGlider Proxy — only
recognised the dashed form, so every .sha1 / .sha512 listing on
a bucket holding deltaglider-uploaded files produced:
WARN PATHOLOGICAL | Missing/corrupt DG metadata for
bucket/key.sha1 -- falling back to passthrough.
Error: Storage error: Missing dg-original-name
This patch aligns the writer to the canonical scheme and keeps the
read path backward-compatible with already-stored bare-keyed objects
via `resolve_metadata`. No re-upload required.
Changes
-------
* `_upload_direct` emits metadata using `f"{METADATA_PREFIX}{key}"`
(the same pattern delta/reference uploads already use).
* `METADATA_KEY_ALIASES` now lists `compression` and `source_name`
so `resolve_metadata` works for both fields uniformly.
* Replaced bare `metadata.get("compression")` /
`metadata.get("original_name")` / `metadata.get("file_size")` /
`metadata.get("ref_key")` lookups in `DeltaService.get`,
`DeltaService.delete`, `_delete_delta`, the recursive-delete
listing path, `client.list_objects_v2`, and
`client_operations.stats.get_object_info` with `resolve_metadata`
calls so legacy bare-keyed objects keep working forever.
Tests
-----
* `tests/unit/test_metadata_aliases.py` (new, 11 tests) — pins the
alias table contract: new dashed keys, legacy bare underscored
keys, legacy hyphenated keys, priority rule, empty-string
handling.
* `test_direct_upload_emits_dashed_namespace` in
`tests/unit/test_core_service.py` — pins the writer to emit only
dg-* keys.
* Existing tests using the legacy bare `compression: "none"` form
in `test_s3_compat.py` and `test_recursive_delete_reference_*.py`
still pass — proving the dual-scheme read contract holds.
Full unit suite: 87/87 pass, mypy clean, ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(metadata): also resolve legacy file_sha256 in get() dispatch
Adversarial review of the original patch caught a second
asymmetry: DeltaService.get's "is this a regular S3 object or
DeltaGlider-managed?" dispatch was a literal-string check
`"dg-file-sha256" not in obj_head.metadata`. After the writer
fix, NEW direct uploads have `dg-file-sha256` so they route
correctly. But ~4400 pre-fix `.sha1` / `.sha512` files in
production have the bare `file_sha256` key, and they were
silently being routed through the "regular S3 object" branch
instead of the "direct upload" branch.
Both branches call `_get_direct` so file content was still
served correctly — but the wrong log message fired
("Downloading regular S3 object (no DeltaGlider metadata)") and
the recorded file-size for telemetry came from obj_head.size
instead of the metadata's `file_size` (same value for direct
uploads, but still semantically wrong).
Swap the literal-string check for `resolve_metadata(meta,
"file_sha256") is None` so both schemes route to the
DeltaGlider-managed branch.
Added regression test `test_get_legacy_direct_upload_not_
misclassified_as_regular_s3` that builds a HEAD response with
the legacy bare-keyed metadata shape (exactly what's stored on
Hetzner today for the .sha files), captures the log messages,
and fails if the "regular S3 object" canary fires.
Demonstrated locally: revert the dispatch back to literal-string
check → new test fails with the canary log line. Restore →
88/88 pass.
CHANGELOG updated to document both fixes (writer + dispatch).
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Fixed
|
||||
- **Direct-upload metadata now uses the canonical `dg-*` dashed namespace.** Pre-fix, files routed through `_upload_direct` (non-delta-eligible extensions: `.sha1`, `.sha512`, etc.) wrote metadata with bare underscored keys (`original_name`, `file_sha256`, `compression`) while delta and reference uploads correctly used the namespaced form (`dg-original-name`, `dg-file-sha256`, `dg-compression`). Downstream consumers — most visibly the [DeltaGlider Proxy](https://github.com/beshu-tech/deltaglider_proxy) — only recognised the dashed form, so every `.sha1`/`.sha512` listing triggered a `PATHOLOGICAL | Missing/corrupt DG metadata` warning. Aligned the writer to the canonical scheme so new uploads stop producing log spam.
|
||||
|
||||
### Changed
|
||||
- **Read path now resolves both schemes uniformly.** The historical bare keys (`original_name`, `compression`, etc.) stay in `METADATA_KEY_ALIASES` so already-stored objects keep being recognised on read — no migration required. Replaced ad-hoc `metadata.get("compression")` / `metadata.get("original_name")` / `metadata.get("file_size")` / `metadata.get("ref_key")` lookups in `DeltaService.get`, `DeltaService.delete`, `_delete_delta`, the recursive-delete listing path, `client.list_objects_v2`, and `client_operations.stats.get_object_info` with `resolve_metadata(meta, field)` calls so both schemes work transparently for the lifetime of the bucket. New `compression` and `source_name` entries added to the alias table.
|
||||
- **`DeltaService.get` "regular S3 vs DeltaGlider-managed" dispatch** now uses `resolve_metadata` for the `file_sha256` presence check. Pre-fix, this check looked for the literal string `"dg-file-sha256"` in `obj_head.metadata`, which silently misclassified legacy bare-keyed direct uploads (`file_sha256` without the `dg-` prefix) as "regular S3 objects" — they still served correctly because both branches call `_get_direct`, but the wrong log line fired and the wrong `file_size` value was recorded for telemetry. Caught during adversarial PR review.
|
||||
|
||||
### Added
|
||||
- **Regression tests for the dual-scheme contract** (`tests/unit/test_metadata_aliases.py`, 11 tests): every alias resolves, new dashed keys win when both are present, empty strings count as missing, the alias-table shape is pinned (first alias dashed, bare underscored alias always present, `compression` + `source_name` present).
|
||||
- **`test_direct_upload_emits_dashed_namespace`** in `test_core_service.py` pins the writer to emit `dg-*`-only metadata so the original underscored regression cannot return.
|
||||
- **`test_get_legacy_direct_upload_not_misclassified_as_regular_s3`** in `test_core_service.py` pins the `get()` dispatch to route bare-keyed legacy direct uploads through the DeltaGlider-managed branch (not the "regular S3 object" passthrough). Demonstrated to fail without the corresponding `resolve_metadata` swap, pass with it.
|
||||
|
||||
## [6.1.1] - 2026-03-23
|
||||
|
||||
### Fixed
|
||||
|
||||
@@ -42,7 +42,7 @@ from .client_operations.stats import StatsMode
|
||||
|
||||
from .core import DeltaService, DeltaSpace, ObjectKey
|
||||
from .core.errors import NotFoundError
|
||||
from .core.models import DeleteResult
|
||||
from .core.models import DeleteResult, resolve_metadata
|
||||
from .core.object_listing import ObjectListing, list_objects_page
|
||||
from .core.s3_uri import parse_s3_url
|
||||
from .response_builders import (
|
||||
@@ -398,10 +398,17 @@ class DeltaGliderClient:
|
||||
obj_head = self.service.storage.head(f"{Bucket}/{obj['key']}")
|
||||
if obj_head and obj_head.metadata:
|
||||
metadata = obj_head.metadata
|
||||
# Update with actual compression stats
|
||||
original_size = int(metadata.get("file_size", obj["size"]))
|
||||
# Update with actual compression stats. Use
|
||||
# `resolve_metadata` so we accept both the new
|
||||
# dashed `dg-*` keys and the legacy bare ones.
|
||||
file_size_raw = resolve_metadata(metadata, "file_size")
|
||||
original_size = int(file_size_raw) if file_size_raw else obj["size"]
|
||||
# `compression_ratio` isn't in the alias table
|
||||
# (it's a derived stat, not part of the core
|
||||
# metadata contract) so fall back to plain
|
||||
# get() with the legacy bare key.
|
||||
compression_ratio = float(metadata.get("compression_ratio", 0.0))
|
||||
reference_key = metadata.get("ref_key")
|
||||
reference_key = resolve_metadata(metadata, "ref_key")
|
||||
|
||||
deltaglider_metadata["deltaglider-original-size"] = str(original_size)
|
||||
deltaglider_metadata["deltaglider-compression-ratio"] = str(
|
||||
|
||||
@@ -17,6 +17,7 @@ from typing import Any, Literal
|
||||
|
||||
from ..client_models import BucketStats, CompressionEstimate, ObjectInfo
|
||||
from ..core.delta_extensions import is_delta_candidate
|
||||
from ..core.models import resolve_metadata
|
||||
from ..core.object_listing import list_all_objects
|
||||
from ..core.s3_uri import parse_s3_url
|
||||
|
||||
@@ -549,16 +550,22 @@ def get_object_info(
|
||||
metadata = obj_head.metadata
|
||||
is_delta = key.endswith(".delta")
|
||||
|
||||
# Use resolve_metadata for the dg-* namespace keys so we read
|
||||
# both new (dashed-prefixed) and legacy (bare underscored) uploads
|
||||
# transparently. `last_modified`, `etag`, `compression_ratio` are
|
||||
# not part of the dg-* contract — they're per-listing or derived
|
||||
# fields and stay on direct .get() lookups.
|
||||
file_size_raw = resolve_metadata(metadata, "file_size")
|
||||
return ObjectInfo(
|
||||
key=key,
|
||||
size=obj_head.size,
|
||||
last_modified=metadata.get("last_modified", ""),
|
||||
etag=metadata.get("etag"),
|
||||
original_size=int(metadata.get("file_size", obj_head.size)),
|
||||
original_size=int(file_size_raw) if file_size_raw else obj_head.size,
|
||||
compressed_size=obj_head.size,
|
||||
compression_ratio=float(metadata.get("compression_ratio", 0.0)),
|
||||
is_delta=is_delta,
|
||||
reference_key=metadata.get("ref_key"),
|
||||
reference_key=resolve_metadata(metadata, "ref_key"),
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -58,6 +58,20 @@ METADATA_KEY_ALIASES: dict[str, tuple[str, ...]] = {
|
||||
"delta-cmd",
|
||||
),
|
||||
"note": (f"{METADATA_PREFIX}note", "dg_note", "note"),
|
||||
# `compression` was historically written bare (no prefix) by the
|
||||
# direct-upload path; v6.1.2 aligned it to the dashed namespace.
|
||||
# Both forms must continue to resolve so already-stored objects
|
||||
# keep being recognised on read.
|
||||
"compression": (f"{METADATA_PREFIX}compression", "dg_compression", "compression"),
|
||||
# `source-name` is reference-only metadata. Listed here so a
|
||||
# single call to `resolve_metadata(meta, "source_name")` works
|
||||
# uniformly with the rest of this table.
|
||||
"source_name": (
|
||||
f"{METADATA_PREFIX}source-name",
|
||||
"dg_source_name",
|
||||
"source_name",
|
||||
"source-name",
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -30,6 +30,7 @@ from .errors import (
|
||||
PolicyViolationWarning,
|
||||
)
|
||||
from .models import (
|
||||
METADATA_PREFIX,
|
||||
DeleteResult,
|
||||
DeltaMeta,
|
||||
DeltaSpace,
|
||||
@@ -177,9 +178,15 @@ class DeltaService:
|
||||
if obj_head is None:
|
||||
raise NotFoundError(f"Object not found: {object_key.key}")
|
||||
|
||||
# Check if this is a regular S3 object (not uploaded via DeltaGlider)
|
||||
# Regular S3 objects won't have DeltaGlider metadata (dg-file-sha256 key)
|
||||
if "dg-file-sha256" not in obj_head.metadata:
|
||||
# Check if this is a regular S3 object (not uploaded via
|
||||
# DeltaGlider). A DeltaGlider-managed object always carries a
|
||||
# `file_sha256` field — could be the canonical `dg-file-sha256`
|
||||
# (new direct + all delta + all reference uploads) OR the
|
||||
# legacy bare `file_sha256` (pre-v6.1.2 direct uploads). Use
|
||||
# `resolve_metadata` so both schemes route to the
|
||||
# DeltaGlider-managed download branches instead of the
|
||||
# "regular S3 object" passthrough.
|
||||
if resolve_metadata(obj_head.metadata, "file_sha256") is None:
|
||||
# This is a regular S3 object, download it directly
|
||||
self.logger.info(
|
||||
"Downloading regular S3 object (no DeltaGlider metadata)",
|
||||
@@ -198,8 +205,11 @@ class DeltaService:
|
||||
self.metrics.timing("deltaglider.get.duration", duration)
|
||||
return
|
||||
|
||||
# Check if this is a direct upload (non-delta) uploaded via DeltaGlider
|
||||
if obj_head.metadata.get("compression") == "none":
|
||||
# Check if this is a direct upload (non-delta) uploaded via
|
||||
# DeltaGlider. Use `resolve_metadata` so we recognise both the
|
||||
# legacy bare `compression` key and the new dashed
|
||||
# `dg-compression` key.
|
||||
if resolve_metadata(obj_head.metadata, "compression") == "none":
|
||||
# Direct download without delta processing
|
||||
self._get_direct(object_key, obj_head, out)
|
||||
duration = (self.clock.now() - start_time).total_seconds()
|
||||
@@ -591,14 +601,22 @@ class DeltaService:
|
||||
key = original_name
|
||||
full_key = f"{delta_space.bucket}/{key}"
|
||||
|
||||
# Create metadata for the file
|
||||
# Create metadata for the file using the dashed `dg-*`
|
||||
# namespace so direct uploads match the same scheme as delta /
|
||||
# reference uploads. Pre-v6.1.2 versions wrote these keys bare
|
||||
# (e.g. `original_name` instead of `dg-original-name`); the
|
||||
# METADATA_KEY_ALIASES table in core/models.py keeps the bare
|
||||
# forms resolvable on read so already-stored objects keep
|
||||
# working. New uploads emit the canonical dashed form so
|
||||
# downstream consumers (the Rust S3 proxy in particular) stop
|
||||
# logging PATHOLOGICAL warnings on every .sha1 / .sha512 list.
|
||||
metadata = {
|
||||
"tool": self.tool_version,
|
||||
"original_name": original_name,
|
||||
"file_sha256": file_sha256,
|
||||
"file_size": str(file_size),
|
||||
"created_at": self.clock.now().isoformat(),
|
||||
"compression": "none", # Mark as non-compressed
|
||||
f"{METADATA_PREFIX}tool": self.tool_version,
|
||||
f"{METADATA_PREFIX}original-name": original_name,
|
||||
f"{METADATA_PREFIX}file-sha256": file_sha256,
|
||||
f"{METADATA_PREFIX}file-size": str(file_size),
|
||||
f"{METADATA_PREFIX}created-at": self.clock.now().isoformat(),
|
||||
f"{METADATA_PREFIX}compression": "none", # Mark as non-compressed
|
||||
}
|
||||
|
||||
# Upload the file directly
|
||||
@@ -642,11 +660,13 @@ class DeltaService:
|
||||
self._delete_reference(object_key, full_key, result)
|
||||
elif object_key.key.endswith(".delta"):
|
||||
self._delete_delta(object_key, full_key, obj_head, result)
|
||||
elif obj_head.metadata.get("compression") == "none":
|
||||
elif resolve_metadata(obj_head.metadata, "compression") == "none":
|
||||
self.storage.delete(full_key)
|
||||
result.deleted = True
|
||||
result.type = "direct"
|
||||
result.original_name = obj_head.metadata.get("original_name", object_key.key)
|
||||
result.original_name = (
|
||||
resolve_metadata(obj_head.metadata, "original_name") or object_key.key
|
||||
)
|
||||
else:
|
||||
self.storage.delete(full_key)
|
||||
result.deleted = True
|
||||
@@ -712,7 +732,7 @@ class DeltaService:
|
||||
self.storage.delete(full_key)
|
||||
result.deleted = True
|
||||
result.type = "delta"
|
||||
result.original_name = obj_head.metadata.get("original_name", "unknown")
|
||||
result.original_name = resolve_metadata(obj_head.metadata, "original_name") or "unknown"
|
||||
|
||||
if "/" not in object_key.key:
|
||||
return
|
||||
@@ -841,7 +861,7 @@ class DeltaService:
|
||||
affected_deltaspaces.add("/".join(obj.key.split("/")[:-1]))
|
||||
else:
|
||||
obj_head = self.storage.head(f"{bucket}/{obj.key}")
|
||||
if obj_head and obj_head.metadata.get("compression") == "none":
|
||||
if obj_head and resolve_metadata(obj_head.metadata, "compression") == "none":
|
||||
direct_uploads.append(obj.key)
|
||||
else:
|
||||
other_objects.append(obj.key)
|
||||
|
||||
@@ -132,6 +132,47 @@ class TestDeltaServicePut:
|
||||
assert issubclass(w[0].category, PolicyViolationWarning)
|
||||
assert "exceeds threshold" in str(w[0].message)
|
||||
|
||||
def test_direct_upload_emits_dashed_namespace(self, service, temp_dir, mock_storage):
|
||||
"""Direct-upload (non-delta-eligible files like .sha1) must
|
||||
write metadata in the canonical dg-* dashed namespace.
|
||||
|
||||
Pre-v6.1.2 this path wrote bare underscored keys
|
||||
(``original_name``, ``file_sha256``, ``compression``) which
|
||||
downstream tools — most notably the Rust S3 proxy — didn't
|
||||
recognise, producing a PATHOLOGICAL warning for every
|
||||
listing. Pin the writer to the canonical scheme so the
|
||||
regression doesn't return.
|
||||
"""
|
||||
# .sha1 is in the non-delta extensions list → use_delta=False
|
||||
non_archive = temp_dir / "build.zip.sha1"
|
||||
non_archive.write_text("deadbeef build.zip\n")
|
||||
|
||||
delta_space = DeltaSpace(bucket="test-bucket", prefix="releases/v1")
|
||||
mock_storage.put.return_value = PutResult(etag="direct123")
|
||||
|
||||
summary = service.put(non_archive, delta_space)
|
||||
|
||||
assert summary.operation == "upload_direct"
|
||||
# Capture the metadata dict that was passed to storage.put
|
||||
# (call_args is the LAST call; direct upload makes exactly one).
|
||||
assert mock_storage.put.called
|
||||
_full_key, _local_file, emitted_meta = mock_storage.put.call_args[0]
|
||||
|
||||
# Every key must be in the dg-* dashed namespace.
|
||||
for key in emitted_meta.keys():
|
||||
assert key.startswith("dg-"), (
|
||||
f"Direct-upload metadata key {key!r} must use the dg-* "
|
||||
f"namespace (got: {list(emitted_meta.keys())})"
|
||||
)
|
||||
|
||||
# Spot-check the canonical keys carry the expected values.
|
||||
assert emitted_meta["dg-original-name"] == "build.zip.sha1"
|
||||
assert emitted_meta["dg-compression"] == "none"
|
||||
assert emitted_meta["dg-file-size"] == str(non_archive.stat().st_size)
|
||||
assert emitted_meta["dg-tool"].startswith("deltaglider/")
|
||||
assert "dg-file-sha256" in emitted_meta
|
||||
assert "dg-created-at" in emitted_meta
|
||||
|
||||
|
||||
class TestDeltaServiceGet:
|
||||
"""Test DeltaService.get method."""
|
||||
@@ -178,6 +219,70 @@ class TestDeltaServiceGet:
|
||||
assert output_path.exists()
|
||||
assert output_path.read_bytes() == test_content
|
||||
|
||||
def test_get_legacy_direct_upload_not_misclassified_as_regular_s3(
|
||||
self, service, mock_storage, temp_dir
|
||||
):
|
||||
"""Pre-v6.1.2 direct uploads have BARE metadata keys
|
||||
(``file_sha256``, ``compression``, ``original_name``) rather
|
||||
than the dashed ``dg-*`` namespace. The "is this a regular S3
|
||||
object or a DeltaGlider-managed one?" dispatch in ``get()``
|
||||
must recognise both schemes — otherwise pre-fix uploads end
|
||||
up in the wrong code path and the "Downloading regular S3
|
||||
object" log line lies about what's actually happening.
|
||||
|
||||
Regression for the dispatch asymmetry caught during PR review.
|
||||
"""
|
||||
import hashlib
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
key = ObjectKey(bucket="test-bucket", key="releases/v1/build.zip.sha1")
|
||||
content = b"deadbeef build.zip\n"
|
||||
real_sha = hashlib.sha256(content).hexdigest()
|
||||
|
||||
# Legacy direct-upload shape — exactly what's stored on
|
||||
# Hetzner today for ~4400 .sha1 / .sha512 files.
|
||||
legacy_direct_meta = {
|
||||
"tool": "deltaglider/6.1.1",
|
||||
"original_name": "build.zip.sha1",
|
||||
"file_sha256": real_sha,
|
||||
"file_size": str(len(content)),
|
||||
"created_at": "2026-05-16T03:28:01.000000",
|
||||
"compression": "none",
|
||||
}
|
||||
mock_storage.head.return_value = ObjectHead(
|
||||
key="releases/v1/build.zip.sha1",
|
||||
size=len(content),
|
||||
etag="legacy",
|
||||
last_modified=None,
|
||||
metadata=legacy_direct_meta,
|
||||
)
|
||||
mock_stream = MagicMock()
|
||||
mock_stream.read.side_effect = [content, b""]
|
||||
mock_storage.get.return_value = mock_stream
|
||||
|
||||
# Capture the log messages so we can assert which branch fired.
|
||||
captured = []
|
||||
orig_info = service.logger.info
|
||||
|
||||
def _capture(msg, **kw):
|
||||
captured.append((msg, kw))
|
||||
orig_info(msg, **kw)
|
||||
|
||||
service.logger.info = _capture
|
||||
try:
|
||||
service.get(key, temp_dir / "out.sha1")
|
||||
finally:
|
||||
service.logger.info = orig_info
|
||||
|
||||
msgs = [m for m, _ in captured]
|
||||
# The dispatch must NOT have mistaken this for a "regular S3
|
||||
# object" — that branch's log message is the canary.
|
||||
assert "Downloading regular S3 object (no DeltaGlider metadata)" not in msgs, (
|
||||
"Legacy bare-keyed direct upload was misclassified as a "
|
||||
"regular S3 object — `get()` dispatch isn't using "
|
||||
"resolve_metadata for the file_sha256 presence check."
|
||||
)
|
||||
|
||||
|
||||
class TestDeltaServiceVerify:
|
||||
"""Test DeltaService.verify method."""
|
||||
|
||||
@@ -0,0 +1,122 @@
|
||||
"""Regression tests for the dual-scheme metadata read/write contract.
|
||||
|
||||
The CLI historically wrote direct-upload metadata with bare,
|
||||
underscored keys (``original_name``, ``file_sha256``, ``compression``)
|
||||
while delta uploads used the canonical dashed namespace
|
||||
(``dg-original-name``, ``dg-file-sha256``, etc.). Downstream
|
||||
consumers — most notably the Rust S3 proxy — only knew the dashed
|
||||
form, so every ``.sha1`` / ``.sha512`` direct upload triggered a
|
||||
PATHOLOGICAL warning when listed.
|
||||
|
||||
v6.1.2 aligned the writer to the dashed form, but the read path
|
||||
must keep recognising the legacy bare keys forever so already-stored
|
||||
objects don't break. These tests pin both halves of the contract.
|
||||
"""
|
||||
|
||||
from deltaglider.core.models import (
|
||||
METADATA_KEY_ALIASES,
|
||||
METADATA_PREFIX,
|
||||
resolve_metadata,
|
||||
)
|
||||
|
||||
|
||||
class TestResolveMetadataAliases:
|
||||
"""Verify resolve_metadata accepts every documented alias."""
|
||||
|
||||
def test_new_dashed_keys_resolve(self):
|
||||
"""The current canonical scheme: dg-*-with-dashes."""
|
||||
meta = {
|
||||
f"{METADATA_PREFIX}tool": "deltaglider/6.1.2",
|
||||
f"{METADATA_PREFIX}original-name": "build.zip",
|
||||
f"{METADATA_PREFIX}file-sha256": "deadbeef",
|
||||
f"{METADATA_PREFIX}file-size": "1024",
|
||||
f"{METADATA_PREFIX}created-at": "2026-05-17T00:00:00Z",
|
||||
f"{METADATA_PREFIX}compression": "none",
|
||||
}
|
||||
assert resolve_metadata(meta, "tool") == "deltaglider/6.1.2"
|
||||
assert resolve_metadata(meta, "original_name") == "build.zip"
|
||||
assert resolve_metadata(meta, "file_sha256") == "deadbeef"
|
||||
assert resolve_metadata(meta, "file_size") == "1024"
|
||||
assert resolve_metadata(meta, "created_at") == "2026-05-17T00:00:00Z"
|
||||
assert resolve_metadata(meta, "compression") == "none"
|
||||
|
||||
def test_legacy_bare_underscored_keys_resolve(self):
|
||||
"""Pre-v6.1.2 direct-upload shape used by historical .sha files."""
|
||||
meta = {
|
||||
"tool": "deltaglider/6.1.1",
|
||||
"original_name": "build.zip.sha1",
|
||||
"file_sha256": "feedface",
|
||||
"file_size": "41",
|
||||
"created_at": "2026-05-16T03:28:01.000000",
|
||||
"compression": "none",
|
||||
}
|
||||
assert resolve_metadata(meta, "tool") == "deltaglider/6.1.1"
|
||||
assert resolve_metadata(meta, "original_name") == "build.zip.sha1"
|
||||
assert resolve_metadata(meta, "file_sha256") == "feedface"
|
||||
assert resolve_metadata(meta, "file_size") == "41"
|
||||
assert resolve_metadata(meta, "created_at") == "2026-05-16T03:28:01.000000"
|
||||
assert resolve_metadata(meta, "compression") == "none"
|
||||
|
||||
def test_legacy_hyphenated_keys_resolve(self):
|
||||
"""Some old paths used hyphens without the dg- prefix."""
|
||||
meta = {
|
||||
"original-name": "old.zip",
|
||||
"file-sha256": "cafe1234",
|
||||
"file-size": "2048",
|
||||
}
|
||||
assert resolve_metadata(meta, "original_name") == "old.zip"
|
||||
assert resolve_metadata(meta, "file_sha256") == "cafe1234"
|
||||
assert resolve_metadata(meta, "file_size") == "2048"
|
||||
|
||||
def test_priority_new_wins_when_both_present(self):
|
||||
"""If both schemes happen to coexist on one object, prefer the
|
||||
canonical dashed key — that's the writer's current intent."""
|
||||
meta = {
|
||||
f"{METADATA_PREFIX}original-name": "new.zip",
|
||||
"original_name": "old.zip",
|
||||
}
|
||||
assert resolve_metadata(meta, "original_name") == "new.zip"
|
||||
|
||||
def test_missing_returns_none(self):
|
||||
assert resolve_metadata({}, "tool") is None
|
||||
assert resolve_metadata({"unrelated": "x"}, "original_name") is None
|
||||
|
||||
def test_empty_string_treated_as_missing(self):
|
||||
"""Empty values must not satisfy the resolver — callers rely on
|
||||
None to trigger the fallback branch."""
|
||||
meta = {f"{METADATA_PREFIX}original-name": ""}
|
||||
assert resolve_metadata(meta, "original_name") is None
|
||||
|
||||
|
||||
class TestAliasTableContract:
|
||||
"""Pin the alias-table shape so a future regression on the
|
||||
ordering (which would break `priority_new_wins_when_both_present`)
|
||||
is caught immediately."""
|
||||
|
||||
def test_every_field_lists_new_dashed_first(self):
|
||||
"""The first alias in each tuple must be the canonical
|
||||
dg-*-with-dashes form. This is what `resolve_metadata` relies
|
||||
on for the "new wins over legacy when both present" rule."""
|
||||
for field, aliases in METADATA_KEY_ALIASES.items():
|
||||
assert aliases[0].startswith(METADATA_PREFIX), (
|
||||
f"{field}: first alias {aliases[0]!r} must be dashed namespace"
|
||||
)
|
||||
|
||||
def test_every_field_includes_legacy_underscored_form(self):
|
||||
"""Backward compat: bare underscored key must always be in the
|
||||
alias list. Pre-v6.1.2 direct uploads use them, and they
|
||||
must keep resolving forever."""
|
||||
for field, aliases in METADATA_KEY_ALIASES.items():
|
||||
assert field in aliases, (
|
||||
f"{field}: alias list must include the bare underscored "
|
||||
f"key {field!r} for legacy-upload compatibility"
|
||||
)
|
||||
|
||||
def test_compression_field_present(self):
|
||||
"""v6.1.2 added `compression` to the alias table so the
|
||||
direct-upload sentinel works on both schemes."""
|
||||
assert "compression" in METADATA_KEY_ALIASES
|
||||
|
||||
def test_source_name_field_present(self):
|
||||
"""Reference files' source_name should resolve uniformly."""
|
||||
assert "source_name" in METADATA_KEY_ALIASES
|
||||
Reference in New Issue
Block a user