mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-05-21 06:17:07 +02:00
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| ea8b9aa78b |
@@ -98,7 +98,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
services:
|
||||
localstack:
|
||||
image: localstack/localstack:4.4
|
||||
image: localstack/localstack:latest
|
||||
ports:
|
||||
- 4566:4566
|
||||
env:
|
||||
|
||||
@@ -146,7 +146,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
services:
|
||||
localstack:
|
||||
image: localstack/localstack:4.4
|
||||
image: localstack/localstack:latest
|
||||
ports:
|
||||
- 4566:4566
|
||||
env:
|
||||
|
||||
@@ -150,7 +150,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
services:
|
||||
localstack:
|
||||
image: localstack/localstack:4.4
|
||||
image: localstack/localstack:latest
|
||||
ports:
|
||||
- 4566:4566
|
||||
env:
|
||||
|
||||
@@ -5,33 +5,6 @@ All notable changes to this project will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Fixed
|
||||
- **Direct-upload metadata now uses the canonical `dg-*` dashed namespace.** Pre-fix, files routed through `_upload_direct` (non-delta-eligible extensions: `.sha1`, `.sha512`, etc.) wrote metadata with bare underscored keys (`original_name`, `file_sha256`, `compression`) while delta and reference uploads correctly used the namespaced form (`dg-original-name`, `dg-file-sha256`, `dg-compression`). Downstream consumers — most visibly the [DeltaGlider Proxy](https://github.com/beshu-tech/deltaglider_proxy) — only recognised the dashed form, so every `.sha1`/`.sha512` listing triggered a `PATHOLOGICAL | Missing/corrupt DG metadata` warning. Aligned the writer to the canonical scheme so new uploads stop producing log spam.
|
||||
|
||||
### Changed
|
||||
- **Read path now resolves both schemes uniformly.** The historical bare keys (`original_name`, `compression`, etc.) stay in `METADATA_KEY_ALIASES` so already-stored objects keep being recognised on read — no migration required. Replaced ad-hoc `metadata.get("compression")` / `metadata.get("original_name")` / `metadata.get("file_size")` / `metadata.get("ref_key")` lookups in `DeltaService.get`, `DeltaService.delete`, `_delete_delta`, the recursive-delete listing path, `client.list_objects_v2`, and `client_operations.stats.get_object_info` with `resolve_metadata(meta, field)` calls so both schemes work transparently for the lifetime of the bucket. New `compression` and `source_name` entries added to the alias table.
|
||||
- **`DeltaService.get` "regular S3 vs DeltaGlider-managed" dispatch** now uses `resolve_metadata` for the `file_sha256` presence check. Pre-fix, this check looked for the literal string `"dg-file-sha256"` in `obj_head.metadata`, which silently misclassified legacy bare-keyed direct uploads (`file_sha256` without the `dg-` prefix) as "regular S3 objects" — they still served correctly because both branches call `_get_direct`, but the wrong log line fired and the wrong `file_size` value was recorded for telemetry. Caught during adversarial PR review.
|
||||
|
||||
### Added
|
||||
- **Regression tests for the dual-scheme contract** (`tests/unit/test_metadata_aliases.py`, 11 tests): every alias resolves, new dashed keys win when both are present, empty strings count as missing, the alias-table shape is pinned (first alias dashed, bare underscored alias always present, `compression` + `source_name` present).
|
||||
- **`test_direct_upload_emits_dashed_namespace`** in `test_core_service.py` pins the writer to emit `dg-*`-only metadata so the original underscored regression cannot return.
|
||||
- **`test_get_legacy_direct_upload_not_misclassified_as_regular_s3`** in `test_core_service.py` pins the `get()` dispatch to route bare-keyed legacy direct uploads through the DeltaGlider-managed branch (not the "regular S3 object" passthrough). Demonstrated to fail without the corresponding `resolve_metadata` swap, pass with it.
|
||||
|
||||
## [6.1.1] - 2026-03-23
|
||||
|
||||
### Fixed
|
||||
- **S3-Compatible Endpoint Support**: Disabled boto3 automatic request checksums (CRC32/CRC64) that were added in boto3 1.36+. S3-compatible stores like Hetzner Object Storage reject these headers with `BadRequest`, breaking direct (non-delta) file uploads. Sets `request_checksum_calculation="when_required"` to restore compatibility while still working with AWS S3.
|
||||
- **CI: LocalStack pinned to 4.4** — `localstack/localstack:latest` now requires a paid license; pinned to last free version across all workflows and docker-compose files.
|
||||
|
||||
### Changed
|
||||
- **Dependency Pinning**: All runtime dependencies now use major-version upper bounds (`boto3>=1.35.0,<2.0.0`, etc.) to prevent surprise breaking changes in Docker builds.
|
||||
|
||||
### Added
|
||||
- **S3 Compatibility Tests**: New `test_s3_compat.py` unit tests verifying the boto3 client disables automatic checksums and `put_object` doesn't pass checksum kwargs — regression protection for non-AWS S3 endpoints.
|
||||
- **Dependency Management Guide**: Added quarterly dependency refresh checklist and known compatibility constraints to CLAUDE.md.
|
||||
|
||||
## [6.1.0] - 2025-02-07
|
||||
|
||||
### Added
|
||||
|
||||
@@ -256,24 +256,4 @@ Core delta logic is in `src/deltaglider/core/service.py`:
|
||||
- **Auto-Cleanup**: Corrupted or tampered cache files automatically deleted on decryption failures
|
||||
- **Persistent Keys**: Set `DG_CACHE_ENCRYPTION_KEY` only for cross-process cache sharing (use secrets management)
|
||||
- **Content-Addressed Storage**: SHA256-based filenames prevent collision attacks
|
||||
- **Zero-Trust Cache**: All cache operations include cryptographic validation
|
||||
|
||||
## Dependency Management
|
||||
|
||||
### Pinning Strategy
|
||||
Runtime dependencies in `pyproject.toml` use **compatible range pins** (`>=x.y.z,<NEXT_MAJOR`). This prevents surprise breaking changes from major versions while allowing patch/minor updates.
|
||||
|
||||
**Critical dependency: `boto3`** — This is the most breakage-prone dependency. AWS periodically changes default behaviors in minor releases (e.g., boto3 1.36+ added automatic request checksums that break S3-compatible stores like Hetzner Object Storage). The S3 adapter (`adapters/storage_s3.py`) explicitly sets `request_checksum_calculation="when_required"` to maintain compatibility with non-AWS S3 endpoints.
|
||||
|
||||
### Quarterly Dependency Refresh (do every ~3 months)
|
||||
1. **Check for updates**: `uv pip compile pyproject.toml --upgrade --dry-run`
|
||||
2. **Update in a branch**: bump version floors in `pyproject.toml` to current stable releases
|
||||
3. **Run full test suite**: `uv run pytest` (unit + integration)
|
||||
4. **Test against S3-compatible stores**: test a small file upload against Hetzner (or whichever non-AWS endpoint is in use) — boto3 updates are the most likely to break this
|
||||
5. **Rebuild Docker image** and test the same upload from the container
|
||||
6. **Check changelogs** for boto3, cryptography, and click for any deprecation notices or behavior changes
|
||||
|
||||
### Known Compatibility Constraints
|
||||
- **boto3**: Must use `request_checksum_calculation="when_required"` for Hetzner/MinIO compatibility. If upgrading past a new major behavior change, test direct uploads (non-delta path) of small files to non-AWS endpoints.
|
||||
- **cryptography**: Fernet API has been stable, but major versions may drop old OpenSSL support. Verify cache encryption still works after upgrades.
|
||||
- **click**: CLI argument parsing. Major versions may change decorator behavior. Run integration tests (`test_aws_cli_commands_v2.py`) after upgrades.
|
||||
- **Zero-Trust Cache**: All cache operations include cryptographic validation
|
||||
@@ -2,7 +2,7 @@ version: '3.8'
|
||||
|
||||
services:
|
||||
localstack:
|
||||
image: localstack/localstack:4.4
|
||||
image: localstack/localstack:latest
|
||||
ports:
|
||||
- "4566:4566"
|
||||
environment:
|
||||
|
||||
+1
-1
@@ -22,7 +22,7 @@ services:
|
||||
retries: 5
|
||||
|
||||
localstack:
|
||||
image: localstack/localstack:4.4
|
||||
image: localstack/localstack:latest
|
||||
container_name: deltaglider-localstack
|
||||
ports:
|
||||
- "4566:4566"
|
||||
|
||||
+5
-5
@@ -49,11 +49,11 @@ classifiers = [
|
||||
]
|
||||
|
||||
dependencies = [
|
||||
"boto3>=1.35.0,<2.0.0",
|
||||
"click>=8.1.0,<9.0.0",
|
||||
"cryptography>=42.0.0,<45.0.0",
|
||||
"python-dateutil>=2.9.0,<3.0.0",
|
||||
"requests>=2.32.0,<3.0.0",
|
||||
"boto3>=1.35.0",
|
||||
"click>=8.1.0",
|
||||
"cryptography>=42.0.0",
|
||||
"python-dateutil>=2.9.0",
|
||||
"requests>=2.32.0",
|
||||
]
|
||||
|
||||
[project.urls]
|
||||
|
||||
@@ -7,7 +7,6 @@ from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Any, BinaryIO, Optional
|
||||
|
||||
import boto3
|
||||
from botocore.config import Config
|
||||
from botocore.exceptions import ClientError
|
||||
|
||||
from ..ports.storage import ObjectHead, PutResult, StoragePort
|
||||
@@ -43,13 +42,6 @@ class S3StorageAdapter(StoragePort):
|
||||
client_params: dict[str, Any] = {
|
||||
"service_name": "s3",
|
||||
"endpoint_url": endpoint_url or os.environ.get("AWS_ENDPOINT_URL"),
|
||||
# Disable automatic request checksums (CRC32/CRC64) added in
|
||||
# boto3 1.36+. S3-compatible stores like Hetzner Object Storage
|
||||
# reject the checksum headers with BadRequest.
|
||||
"config": Config(
|
||||
request_checksum_calculation="when_required",
|
||||
response_checksum_validation="when_required",
|
||||
),
|
||||
}
|
||||
|
||||
# Merge in any additional boto3 kwargs (credentials, region, etc.)
|
||||
@@ -233,94 +225,47 @@ class S3StorageAdapter(StoragePort):
|
||||
f"AWS S3 limit (2KB). Some metadata may be lost!"
|
||||
)
|
||||
|
||||
import time
|
||||
try:
|
||||
response = self.client.put_object(
|
||||
Bucket=bucket,
|
||||
Key=object_key,
|
||||
Body=body_data,
|
||||
ContentType=content_type,
|
||||
Metadata=clean_metadata,
|
||||
)
|
||||
|
||||
max_retries = 3
|
||||
last_error: ClientError | None = None
|
||||
# VERIFICATION: Check if metadata was actually stored (especially for delta files)
|
||||
if object_key.endswith(".delta") and clean_metadata:
|
||||
try:
|
||||
# Verify metadata was stored by doing a HEAD immediately
|
||||
verify_response = self.client.head_object(Bucket=bucket, Key=object_key)
|
||||
stored_metadata = verify_response.get("Metadata", {})
|
||||
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
response = self.client.put_object(
|
||||
Bucket=bucket,
|
||||
Key=object_key,
|
||||
Body=body_data,
|
||||
ContentType=content_type,
|
||||
Metadata=clean_metadata,
|
||||
)
|
||||
if not stored_metadata:
|
||||
logger.error(
|
||||
f"PUT {object_key}: CRITICAL - Metadata was sent but NOT STORED! "
|
||||
f"Sent {len(clean_metadata)} keys, received 0 keys back."
|
||||
)
|
||||
elif len(stored_metadata) < len(clean_metadata):
|
||||
missing_keys = set(clean_metadata.keys()) - set(stored_metadata.keys())
|
||||
logger.warning(
|
||||
f"PUT {object_key}: Metadata partially stored. "
|
||||
f"Sent {len(clean_metadata)} keys, stored {len(stored_metadata)} keys. "
|
||||
f"Missing keys: {missing_keys}"
|
||||
)
|
||||
elif logger.isEnabledFor(logging.DEBUG):
|
||||
logger.debug(
|
||||
f"PUT {object_key}: Metadata verified - all {len(clean_metadata)} keys stored"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"PUT {object_key}: Could not verify metadata: {e}")
|
||||
|
||||
# VERIFICATION: Check if metadata was actually stored (especially for delta files)
|
||||
if object_key.endswith(".delta") and clean_metadata:
|
||||
try:
|
||||
# Verify metadata was stored by doing a HEAD immediately
|
||||
verify_response = self.client.head_object(Bucket=bucket, Key=object_key)
|
||||
stored_metadata = verify_response.get("Metadata", {})
|
||||
|
||||
if not stored_metadata:
|
||||
logger.error(
|
||||
f"PUT {object_key}: CRITICAL - Metadata was sent but NOT STORED! "
|
||||
f"Sent {len(clean_metadata)} keys, received 0 keys back."
|
||||
)
|
||||
elif len(stored_metadata) < len(clean_metadata):
|
||||
missing_keys = set(clean_metadata.keys()) - set(stored_metadata.keys())
|
||||
logger.warning(
|
||||
f"PUT {object_key}: Metadata partially stored. "
|
||||
f"Sent {len(clean_metadata)} keys, stored {len(stored_metadata)} keys. "
|
||||
f"Missing keys: {missing_keys}"
|
||||
)
|
||||
elif logger.isEnabledFor(logging.DEBUG):
|
||||
logger.debug(
|
||||
f"PUT {object_key}: Metadata verified - "
|
||||
f"all {len(clean_metadata)} keys stored"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"PUT {object_key}: Could not verify metadata: {e}")
|
||||
|
||||
return PutResult(
|
||||
etag=response["ETag"].strip('"'),
|
||||
version_id=response.get("VersionId"),
|
||||
)
|
||||
except ClientError as e:
|
||||
last_error = e
|
||||
if attempt < max_retries - 1:
|
||||
delay = 2**attempt # 1s, 2s
|
||||
# Log full error details
|
||||
error_response = e.response if hasattr(e, "response") else {}
|
||||
http_headers = error_response.get("ResponseMetadata", {}).get("HTTPHeaders", {})
|
||||
logger.warning(
|
||||
f"PUT {object_key}: Attempt {attempt + 1}/{max_retries} failed: {e}. "
|
||||
f"Retrying in {delay}s... "
|
||||
f"Details: bucket={bucket}, key={object_key}, "
|
||||
f"body_size={len(body_data)}, content_type={content_type}, "
|
||||
f"metadata_keys={list(clean_metadata.keys())}, "
|
||||
f"endpoint={self.client.meta.endpoint_url}, "
|
||||
f"http_status={error_response.get('ResponseMetadata', {}).get('HTTPStatusCode')}, "
|
||||
f"error_code={error_response.get('Error', {}).get('Code')}, "
|
||||
f"error_message={error_response.get('Error', {}).get('Message')}, "
|
||||
f"request_id={error_response.get('ResponseMetadata', {}).get('RequestId')}, "
|
||||
f"http_headers={dict(http_headers)}"
|
||||
)
|
||||
# Enable botocore wire-level logging for the retry
|
||||
logging.getLogger("botocore").setLevel(logging.DEBUG)
|
||||
time.sleep(delay)
|
||||
else:
|
||||
# Final attempt failed — log everything
|
||||
error_response = e.response if hasattr(e, "response") else {}
|
||||
http_headers = error_response.get("ResponseMetadata", {}).get("HTTPHeaders", {})
|
||||
logger.error(
|
||||
f"PUT {object_key}: All {max_retries} attempts failed. "
|
||||
f"Last error: {e}. "
|
||||
f"Details: bucket={bucket}, key={object_key}, "
|
||||
f"body_size={len(body_data)}, content_type={content_type}, "
|
||||
f"metadata={clean_metadata}, "
|
||||
f"endpoint={self.client.meta.endpoint_url}, "
|
||||
f"http_status={error_response.get('ResponseMetadata', {}).get('HTTPStatusCode')}, "
|
||||
f"error_code={error_response.get('Error', {}).get('Code')}, "
|
||||
f"error_message={error_response.get('Error', {}).get('Message')}, "
|
||||
f"request_id={error_response.get('ResponseMetadata', {}).get('RequestId')}, "
|
||||
f"http_headers={dict(http_headers)}"
|
||||
)
|
||||
|
||||
raise RuntimeError(f"Failed to put object: {last_error}") from last_error
|
||||
return PutResult(
|
||||
etag=response["ETag"].strip('"'),
|
||||
version_id=response.get("VersionId"),
|
||||
)
|
||||
except ClientError as e:
|
||||
raise RuntimeError(f"Failed to put object: {e}") from e
|
||||
|
||||
def delete(self, key: str) -> None:
|
||||
"""Delete object."""
|
||||
|
||||
@@ -155,11 +155,8 @@ def _version_callback(ctx: click.Context, param: click.Parameter, value: bool) -
|
||||
@click.pass_context
|
||||
def cli(ctx: click.Context, debug: bool) -> None:
|
||||
"""DeltaGlider - Delta-aware S3 file storage wrapper."""
|
||||
import logging
|
||||
|
||||
log_level = "DEBUG" if debug else os.environ.get("DG_LOG_LEVEL", "INFO")
|
||||
ctx.obj = create_service(log_level)
|
||||
logging.getLogger("deltaglider").info("deltaglider %s", __version__)
|
||||
|
||||
|
||||
@cli.command()
|
||||
|
||||
@@ -42,7 +42,7 @@ from .client_operations.stats import StatsMode
|
||||
|
||||
from .core import DeltaService, DeltaSpace, ObjectKey
|
||||
from .core.errors import NotFoundError
|
||||
from .core.models import DeleteResult, resolve_metadata
|
||||
from .core.models import DeleteResult
|
||||
from .core.object_listing import ObjectListing, list_objects_page
|
||||
from .core.s3_uri import parse_s3_url
|
||||
from .response_builders import (
|
||||
@@ -398,17 +398,10 @@ class DeltaGliderClient:
|
||||
obj_head = self.service.storage.head(f"{Bucket}/{obj['key']}")
|
||||
if obj_head and obj_head.metadata:
|
||||
metadata = obj_head.metadata
|
||||
# Update with actual compression stats. Use
|
||||
# `resolve_metadata` so we accept both the new
|
||||
# dashed `dg-*` keys and the legacy bare ones.
|
||||
file_size_raw = resolve_metadata(metadata, "file_size")
|
||||
original_size = int(file_size_raw) if file_size_raw else obj["size"]
|
||||
# `compression_ratio` isn't in the alias table
|
||||
# (it's a derived stat, not part of the core
|
||||
# metadata contract) so fall back to plain
|
||||
# get() with the legacy bare key.
|
||||
# Update with actual compression stats
|
||||
original_size = int(metadata.get("file_size", obj["size"]))
|
||||
compression_ratio = float(metadata.get("compression_ratio", 0.0))
|
||||
reference_key = resolve_metadata(metadata, "ref_key")
|
||||
reference_key = metadata.get("ref_key")
|
||||
|
||||
deltaglider_metadata["deltaglider-original-size"] = str(original_size)
|
||||
deltaglider_metadata["deltaglider-compression-ratio"] = str(
|
||||
@@ -1453,7 +1446,7 @@ def create_client(
|
||||
)
|
||||
|
||||
# SECURITY: Always use ephemeral process-isolated cache
|
||||
cache_dir = Path(tempfile.mkdtemp(prefix="deltaglider-", dir="/tmp"))
|
||||
cache_dir = Path(tempfile.mkdtemp(prefix="deltaglider-"))
|
||||
# Register cleanup handler to remove cache on exit
|
||||
atexit.register(lambda: shutil.rmtree(cache_dir, ignore_errors=True))
|
||||
|
||||
|
||||
@@ -17,7 +17,6 @@ from typing import Any, Literal
|
||||
|
||||
from ..client_models import BucketStats, CompressionEstimate, ObjectInfo
|
||||
from ..core.delta_extensions import is_delta_candidate
|
||||
from ..core.models import resolve_metadata
|
||||
from ..core.object_listing import list_all_objects
|
||||
from ..core.s3_uri import parse_s3_url
|
||||
|
||||
@@ -550,22 +549,16 @@ def get_object_info(
|
||||
metadata = obj_head.metadata
|
||||
is_delta = key.endswith(".delta")
|
||||
|
||||
# Use resolve_metadata for the dg-* namespace keys so we read
|
||||
# both new (dashed-prefixed) and legacy (bare underscored) uploads
|
||||
# transparently. `last_modified`, `etag`, `compression_ratio` are
|
||||
# not part of the dg-* contract — they're per-listing or derived
|
||||
# fields and stay on direct .get() lookups.
|
||||
file_size_raw = resolve_metadata(metadata, "file_size")
|
||||
return ObjectInfo(
|
||||
key=key,
|
||||
size=obj_head.size,
|
||||
last_modified=metadata.get("last_modified", ""),
|
||||
etag=metadata.get("etag"),
|
||||
original_size=int(file_size_raw) if file_size_raw else obj_head.size,
|
||||
original_size=int(metadata.get("file_size", obj_head.size)),
|
||||
compressed_size=obj_head.size,
|
||||
compression_ratio=float(metadata.get("compression_ratio", 0.0)),
|
||||
is_delta=is_delta,
|
||||
reference_key=resolve_metadata(metadata, "ref_key"),
|
||||
reference_key=metadata.get("ref_key"),
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -58,20 +58,6 @@ METADATA_KEY_ALIASES: dict[str, tuple[str, ...]] = {
|
||||
"delta-cmd",
|
||||
),
|
||||
"note": (f"{METADATA_PREFIX}note", "dg_note", "note"),
|
||||
# `compression` was historically written bare (no prefix) by the
|
||||
# direct-upload path; v6.1.2 aligned it to the dashed namespace.
|
||||
# Both forms must continue to resolve so already-stored objects
|
||||
# keep being recognised on read.
|
||||
"compression": (f"{METADATA_PREFIX}compression", "dg_compression", "compression"),
|
||||
# `source-name` is reference-only metadata. Listed here so a
|
||||
# single call to `resolve_metadata(meta, "source_name")` works
|
||||
# uniformly with the rest of this table.
|
||||
"source_name": (
|
||||
f"{METADATA_PREFIX}source-name",
|
||||
"dg_source_name",
|
||||
"source_name",
|
||||
"source-name",
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -30,7 +30,6 @@ from .errors import (
|
||||
PolicyViolationWarning,
|
||||
)
|
||||
from .models import (
|
||||
METADATA_PREFIX,
|
||||
DeleteResult,
|
||||
DeltaMeta,
|
||||
DeltaSpace,
|
||||
@@ -178,15 +177,9 @@ class DeltaService:
|
||||
if obj_head is None:
|
||||
raise NotFoundError(f"Object not found: {object_key.key}")
|
||||
|
||||
# Check if this is a regular S3 object (not uploaded via
|
||||
# DeltaGlider). A DeltaGlider-managed object always carries a
|
||||
# `file_sha256` field — could be the canonical `dg-file-sha256`
|
||||
# (new direct + all delta + all reference uploads) OR the
|
||||
# legacy bare `file_sha256` (pre-v6.1.2 direct uploads). Use
|
||||
# `resolve_metadata` so both schemes route to the
|
||||
# DeltaGlider-managed download branches instead of the
|
||||
# "regular S3 object" passthrough.
|
||||
if resolve_metadata(obj_head.metadata, "file_sha256") is None:
|
||||
# Check if this is a regular S3 object (not uploaded via DeltaGlider)
|
||||
# Regular S3 objects won't have DeltaGlider metadata (dg-file-sha256 key)
|
||||
if "dg-file-sha256" not in obj_head.metadata:
|
||||
# This is a regular S3 object, download it directly
|
||||
self.logger.info(
|
||||
"Downloading regular S3 object (no DeltaGlider metadata)",
|
||||
@@ -205,11 +198,8 @@ class DeltaService:
|
||||
self.metrics.timing("deltaglider.get.duration", duration)
|
||||
return
|
||||
|
||||
# Check if this is a direct upload (non-delta) uploaded via
|
||||
# DeltaGlider. Use `resolve_metadata` so we recognise both the
|
||||
# legacy bare `compression` key and the new dashed
|
||||
# `dg-compression` key.
|
||||
if resolve_metadata(obj_head.metadata, "compression") == "none":
|
||||
# Check if this is a direct upload (non-delta) uploaded via DeltaGlider
|
||||
if obj_head.metadata.get("compression") == "none":
|
||||
# Direct download without delta processing
|
||||
self._get_direct(object_key, obj_head, out)
|
||||
duration = (self.clock.now() - start_time).total_seconds()
|
||||
@@ -601,22 +591,14 @@ class DeltaService:
|
||||
key = original_name
|
||||
full_key = f"{delta_space.bucket}/{key}"
|
||||
|
||||
# Create metadata for the file using the dashed `dg-*`
|
||||
# namespace so direct uploads match the same scheme as delta /
|
||||
# reference uploads. Pre-v6.1.2 versions wrote these keys bare
|
||||
# (e.g. `original_name` instead of `dg-original-name`); the
|
||||
# METADATA_KEY_ALIASES table in core/models.py keeps the bare
|
||||
# forms resolvable on read so already-stored objects keep
|
||||
# working. New uploads emit the canonical dashed form so
|
||||
# downstream consumers (the Rust S3 proxy in particular) stop
|
||||
# logging PATHOLOGICAL warnings on every .sha1 / .sha512 list.
|
||||
# Create metadata for the file
|
||||
metadata = {
|
||||
f"{METADATA_PREFIX}tool": self.tool_version,
|
||||
f"{METADATA_PREFIX}original-name": original_name,
|
||||
f"{METADATA_PREFIX}file-sha256": file_sha256,
|
||||
f"{METADATA_PREFIX}file-size": str(file_size),
|
||||
f"{METADATA_PREFIX}created-at": self.clock.now().isoformat(),
|
||||
f"{METADATA_PREFIX}compression": "none", # Mark as non-compressed
|
||||
"tool": self.tool_version,
|
||||
"original_name": original_name,
|
||||
"file_sha256": file_sha256,
|
||||
"file_size": str(file_size),
|
||||
"created_at": self.clock.now().isoformat(),
|
||||
"compression": "none", # Mark as non-compressed
|
||||
}
|
||||
|
||||
# Upload the file directly
|
||||
@@ -660,13 +642,11 @@ class DeltaService:
|
||||
self._delete_reference(object_key, full_key, result)
|
||||
elif object_key.key.endswith(".delta"):
|
||||
self._delete_delta(object_key, full_key, obj_head, result)
|
||||
elif resolve_metadata(obj_head.metadata, "compression") == "none":
|
||||
elif obj_head.metadata.get("compression") == "none":
|
||||
self.storage.delete(full_key)
|
||||
result.deleted = True
|
||||
result.type = "direct"
|
||||
result.original_name = (
|
||||
resolve_metadata(obj_head.metadata, "original_name") or object_key.key
|
||||
)
|
||||
result.original_name = obj_head.metadata.get("original_name", object_key.key)
|
||||
else:
|
||||
self.storage.delete(full_key)
|
||||
result.deleted = True
|
||||
@@ -732,7 +712,7 @@ class DeltaService:
|
||||
self.storage.delete(full_key)
|
||||
result.deleted = True
|
||||
result.type = "delta"
|
||||
result.original_name = resolve_metadata(obj_head.metadata, "original_name") or "unknown"
|
||||
result.original_name = obj_head.metadata.get("original_name", "unknown")
|
||||
|
||||
if "/" not in object_key.key:
|
||||
return
|
||||
@@ -861,7 +841,7 @@ class DeltaService:
|
||||
affected_deltaspaces.add("/".join(obj.key.split("/")[:-1]))
|
||||
else:
|
||||
obj_head = self.storage.head(f"{bucket}/{obj.key}")
|
||||
if obj_head and resolve_metadata(obj_head.metadata, "compression") == "none":
|
||||
if obj_head and obj_head.metadata.get("compression") == "none":
|
||||
direct_uploads.append(obj.key)
|
||||
else:
|
||||
other_objects.append(obj.key)
|
||||
|
||||
@@ -132,47 +132,6 @@ class TestDeltaServicePut:
|
||||
assert issubclass(w[0].category, PolicyViolationWarning)
|
||||
assert "exceeds threshold" in str(w[0].message)
|
||||
|
||||
def test_direct_upload_emits_dashed_namespace(self, service, temp_dir, mock_storage):
|
||||
"""Direct-upload (non-delta-eligible files like .sha1) must
|
||||
write metadata in the canonical dg-* dashed namespace.
|
||||
|
||||
Pre-v6.1.2 this path wrote bare underscored keys
|
||||
(``original_name``, ``file_sha256``, ``compression``) which
|
||||
downstream tools — most notably the Rust S3 proxy — didn't
|
||||
recognise, producing a PATHOLOGICAL warning for every
|
||||
listing. Pin the writer to the canonical scheme so the
|
||||
regression doesn't return.
|
||||
"""
|
||||
# .sha1 is in the non-delta extensions list → use_delta=False
|
||||
non_archive = temp_dir / "build.zip.sha1"
|
||||
non_archive.write_text("deadbeef build.zip\n")
|
||||
|
||||
delta_space = DeltaSpace(bucket="test-bucket", prefix="releases/v1")
|
||||
mock_storage.put.return_value = PutResult(etag="direct123")
|
||||
|
||||
summary = service.put(non_archive, delta_space)
|
||||
|
||||
assert summary.operation == "upload_direct"
|
||||
# Capture the metadata dict that was passed to storage.put
|
||||
# (call_args is the LAST call; direct upload makes exactly one).
|
||||
assert mock_storage.put.called
|
||||
_full_key, _local_file, emitted_meta = mock_storage.put.call_args[0]
|
||||
|
||||
# Every key must be in the dg-* dashed namespace.
|
||||
for key in emitted_meta.keys():
|
||||
assert key.startswith("dg-"), (
|
||||
f"Direct-upload metadata key {key!r} must use the dg-* "
|
||||
f"namespace (got: {list(emitted_meta.keys())})"
|
||||
)
|
||||
|
||||
# Spot-check the canonical keys carry the expected values.
|
||||
assert emitted_meta["dg-original-name"] == "build.zip.sha1"
|
||||
assert emitted_meta["dg-compression"] == "none"
|
||||
assert emitted_meta["dg-file-size"] == str(non_archive.stat().st_size)
|
||||
assert emitted_meta["dg-tool"].startswith("deltaglider/")
|
||||
assert "dg-file-sha256" in emitted_meta
|
||||
assert "dg-created-at" in emitted_meta
|
||||
|
||||
|
||||
class TestDeltaServiceGet:
|
||||
"""Test DeltaService.get method."""
|
||||
@@ -219,70 +178,6 @@ class TestDeltaServiceGet:
|
||||
assert output_path.exists()
|
||||
assert output_path.read_bytes() == test_content
|
||||
|
||||
def test_get_legacy_direct_upload_not_misclassified_as_regular_s3(
|
||||
self, service, mock_storage, temp_dir
|
||||
):
|
||||
"""Pre-v6.1.2 direct uploads have BARE metadata keys
|
||||
(``file_sha256``, ``compression``, ``original_name``) rather
|
||||
than the dashed ``dg-*`` namespace. The "is this a regular S3
|
||||
object or a DeltaGlider-managed one?" dispatch in ``get()``
|
||||
must recognise both schemes — otherwise pre-fix uploads end
|
||||
up in the wrong code path and the "Downloading regular S3
|
||||
object" log line lies about what's actually happening.
|
||||
|
||||
Regression for the dispatch asymmetry caught during PR review.
|
||||
"""
|
||||
import hashlib
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
key = ObjectKey(bucket="test-bucket", key="releases/v1/build.zip.sha1")
|
||||
content = b"deadbeef build.zip\n"
|
||||
real_sha = hashlib.sha256(content).hexdigest()
|
||||
|
||||
# Legacy direct-upload shape — exactly what's stored on
|
||||
# Hetzner today for ~4400 .sha1 / .sha512 files.
|
||||
legacy_direct_meta = {
|
||||
"tool": "deltaglider/6.1.1",
|
||||
"original_name": "build.zip.sha1",
|
||||
"file_sha256": real_sha,
|
||||
"file_size": str(len(content)),
|
||||
"created_at": "2026-05-16T03:28:01.000000",
|
||||
"compression": "none",
|
||||
}
|
||||
mock_storage.head.return_value = ObjectHead(
|
||||
key="releases/v1/build.zip.sha1",
|
||||
size=len(content),
|
||||
etag="legacy",
|
||||
last_modified=None,
|
||||
metadata=legacy_direct_meta,
|
||||
)
|
||||
mock_stream = MagicMock()
|
||||
mock_stream.read.side_effect = [content, b""]
|
||||
mock_storage.get.return_value = mock_stream
|
||||
|
||||
# Capture the log messages so we can assert which branch fired.
|
||||
captured = []
|
||||
orig_info = service.logger.info
|
||||
|
||||
def _capture(msg, **kw):
|
||||
captured.append((msg, kw))
|
||||
orig_info(msg, **kw)
|
||||
|
||||
service.logger.info = _capture
|
||||
try:
|
||||
service.get(key, temp_dir / "out.sha1")
|
||||
finally:
|
||||
service.logger.info = orig_info
|
||||
|
||||
msgs = [m for m, _ in captured]
|
||||
# The dispatch must NOT have mistaken this for a "regular S3
|
||||
# object" — that branch's log message is the canary.
|
||||
assert "Downloading regular S3 object (no DeltaGlider metadata)" not in msgs, (
|
||||
"Legacy bare-keyed direct upload was misclassified as a "
|
||||
"regular S3 object — `get()` dispatch isn't using "
|
||||
"resolve_metadata for the file_sha256 presence check."
|
||||
)
|
||||
|
||||
|
||||
class TestDeltaServiceVerify:
|
||||
"""Test DeltaService.verify method."""
|
||||
|
||||
@@ -1,122 +0,0 @@
|
||||
"""Regression tests for the dual-scheme metadata read/write contract.
|
||||
|
||||
The CLI historically wrote direct-upload metadata with bare,
|
||||
underscored keys (``original_name``, ``file_sha256``, ``compression``)
|
||||
while delta uploads used the canonical dashed namespace
|
||||
(``dg-original-name``, ``dg-file-sha256``, etc.). Downstream
|
||||
consumers — most notably the Rust S3 proxy — only knew the dashed
|
||||
form, so every ``.sha1`` / ``.sha512`` direct upload triggered a
|
||||
PATHOLOGICAL warning when listed.
|
||||
|
||||
v6.1.2 aligned the writer to the dashed form, but the read path
|
||||
must keep recognising the legacy bare keys forever so already-stored
|
||||
objects don't break. These tests pin both halves of the contract.
|
||||
"""
|
||||
|
||||
from deltaglider.core.models import (
|
||||
METADATA_KEY_ALIASES,
|
||||
METADATA_PREFIX,
|
||||
resolve_metadata,
|
||||
)
|
||||
|
||||
|
||||
class TestResolveMetadataAliases:
|
||||
"""Verify resolve_metadata accepts every documented alias."""
|
||||
|
||||
def test_new_dashed_keys_resolve(self):
|
||||
"""The current canonical scheme: dg-*-with-dashes."""
|
||||
meta = {
|
||||
f"{METADATA_PREFIX}tool": "deltaglider/6.1.2",
|
||||
f"{METADATA_PREFIX}original-name": "build.zip",
|
||||
f"{METADATA_PREFIX}file-sha256": "deadbeef",
|
||||
f"{METADATA_PREFIX}file-size": "1024",
|
||||
f"{METADATA_PREFIX}created-at": "2026-05-17T00:00:00Z",
|
||||
f"{METADATA_PREFIX}compression": "none",
|
||||
}
|
||||
assert resolve_metadata(meta, "tool") == "deltaglider/6.1.2"
|
||||
assert resolve_metadata(meta, "original_name") == "build.zip"
|
||||
assert resolve_metadata(meta, "file_sha256") == "deadbeef"
|
||||
assert resolve_metadata(meta, "file_size") == "1024"
|
||||
assert resolve_metadata(meta, "created_at") == "2026-05-17T00:00:00Z"
|
||||
assert resolve_metadata(meta, "compression") == "none"
|
||||
|
||||
def test_legacy_bare_underscored_keys_resolve(self):
|
||||
"""Pre-v6.1.2 direct-upload shape used by historical .sha files."""
|
||||
meta = {
|
||||
"tool": "deltaglider/6.1.1",
|
||||
"original_name": "build.zip.sha1",
|
||||
"file_sha256": "feedface",
|
||||
"file_size": "41",
|
||||
"created_at": "2026-05-16T03:28:01.000000",
|
||||
"compression": "none",
|
||||
}
|
||||
assert resolve_metadata(meta, "tool") == "deltaglider/6.1.1"
|
||||
assert resolve_metadata(meta, "original_name") == "build.zip.sha1"
|
||||
assert resolve_metadata(meta, "file_sha256") == "feedface"
|
||||
assert resolve_metadata(meta, "file_size") == "41"
|
||||
assert resolve_metadata(meta, "created_at") == "2026-05-16T03:28:01.000000"
|
||||
assert resolve_metadata(meta, "compression") == "none"
|
||||
|
||||
def test_legacy_hyphenated_keys_resolve(self):
|
||||
"""Some old paths used hyphens without the dg- prefix."""
|
||||
meta = {
|
||||
"original-name": "old.zip",
|
||||
"file-sha256": "cafe1234",
|
||||
"file-size": "2048",
|
||||
}
|
||||
assert resolve_metadata(meta, "original_name") == "old.zip"
|
||||
assert resolve_metadata(meta, "file_sha256") == "cafe1234"
|
||||
assert resolve_metadata(meta, "file_size") == "2048"
|
||||
|
||||
def test_priority_new_wins_when_both_present(self):
|
||||
"""If both schemes happen to coexist on one object, prefer the
|
||||
canonical dashed key — that's the writer's current intent."""
|
||||
meta = {
|
||||
f"{METADATA_PREFIX}original-name": "new.zip",
|
||||
"original_name": "old.zip",
|
||||
}
|
||||
assert resolve_metadata(meta, "original_name") == "new.zip"
|
||||
|
||||
def test_missing_returns_none(self):
|
||||
assert resolve_metadata({}, "tool") is None
|
||||
assert resolve_metadata({"unrelated": "x"}, "original_name") is None
|
||||
|
||||
def test_empty_string_treated_as_missing(self):
|
||||
"""Empty values must not satisfy the resolver — callers rely on
|
||||
None to trigger the fallback branch."""
|
||||
meta = {f"{METADATA_PREFIX}original-name": ""}
|
||||
assert resolve_metadata(meta, "original_name") is None
|
||||
|
||||
|
||||
class TestAliasTableContract:
|
||||
"""Pin the alias-table shape so a future regression on the
|
||||
ordering (which would break `priority_new_wins_when_both_present`)
|
||||
is caught immediately."""
|
||||
|
||||
def test_every_field_lists_new_dashed_first(self):
|
||||
"""The first alias in each tuple must be the canonical
|
||||
dg-*-with-dashes form. This is what `resolve_metadata` relies
|
||||
on for the "new wins over legacy when both present" rule."""
|
||||
for field, aliases in METADATA_KEY_ALIASES.items():
|
||||
assert aliases[0].startswith(METADATA_PREFIX), (
|
||||
f"{field}: first alias {aliases[0]!r} must be dashed namespace"
|
||||
)
|
||||
|
||||
def test_every_field_includes_legacy_underscored_form(self):
|
||||
"""Backward compat: bare underscored key must always be in the
|
||||
alias list. Pre-v6.1.2 direct uploads use them, and they
|
||||
must keep resolving forever."""
|
||||
for field, aliases in METADATA_KEY_ALIASES.items():
|
||||
assert field in aliases, (
|
||||
f"{field}: alias list must include the bare underscored "
|
||||
f"key {field!r} for legacy-upload compatibility"
|
||||
)
|
||||
|
||||
def test_compression_field_present(self):
|
||||
"""v6.1.2 added `compression` to the alias table so the
|
||||
direct-upload sentinel works on both schemes."""
|
||||
assert "compression" in METADATA_KEY_ALIASES
|
||||
|
||||
def test_source_name_field_present(self):
|
||||
"""Reference files' source_name should resolve uniformly."""
|
||||
assert "source_name" in METADATA_KEY_ALIASES
|
||||
@@ -1,70 +0,0 @@
|
||||
"""Tests for S3-compatible storage compatibility.
|
||||
|
||||
Ensures the S3 adapter works with non-AWS S3 endpoints (Hetzner, MinIO, etc.)
|
||||
that don't support newer AWS-specific features like automatic request checksums.
|
||||
"""
|
||||
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
from deltaglider.adapters.storage_s3 import S3StorageAdapter
|
||||
|
||||
|
||||
class TestS3CompatibleEndpoints:
|
||||
"""Verify S3 adapter configuration for non-AWS endpoint compatibility."""
|
||||
|
||||
def test_client_disables_automatic_checksums(self):
|
||||
"""boto3 1.36+ sends CRC32/CRC64 checksums by default.
|
||||
|
||||
S3-compatible stores (Hetzner, MinIO) reject these with BadRequest.
|
||||
The adapter must set request_checksum_calculation='when_required'.
|
||||
"""
|
||||
with patch("deltaglider.adapters.storage_s3.boto3.client") as mock_client:
|
||||
S3StorageAdapter(endpoint_url="https://example.com")
|
||||
|
||||
mock_client.assert_called_once()
|
||||
call_kwargs = mock_client.call_args
|
||||
config = call_kwargs.kwargs.get("config") or call_kwargs[1].get("config")
|
||||
|
||||
assert config is not None, "boto3 client must be created with a Config object"
|
||||
assert config.request_checksum_calculation == "when_required"
|
||||
assert config.response_checksum_validation == "when_required"
|
||||
|
||||
def test_put_object_no_checksum_kwargs(self, temp_dir):
|
||||
"""put_object must not pass ChecksumAlgorithm or similar kwargs."""
|
||||
mock_client = MagicMock()
|
||||
mock_client.put_object.return_value = {"ETag": '"abc123"'}
|
||||
|
||||
adapter = S3StorageAdapter(client=mock_client)
|
||||
|
||||
test_file = temp_dir / "test.sha1"
|
||||
test_file.write_text("abc123")
|
||||
|
||||
adapter.put(
|
||||
"my-bucket/test/test.sha1",
|
||||
test_file,
|
||||
{"compression": "none", "tool": "deltaglider"},
|
||||
)
|
||||
|
||||
mock_client.put_object.assert_called_once()
|
||||
call_kwargs = mock_client.put_object.call_args.kwargs
|
||||
|
||||
checksum_keys = {
|
||||
"ChecksumAlgorithm",
|
||||
"ChecksumCRC32",
|
||||
"ChecksumCRC32C",
|
||||
"ChecksumCRC64NVME",
|
||||
"ChecksumSHA1",
|
||||
"ChecksumSHA256",
|
||||
"ContentMD5",
|
||||
}
|
||||
passed_checksum_keys = checksum_keys & set(call_kwargs.keys())
|
||||
assert not passed_checksum_keys, (
|
||||
f"put_object must not pass checksum kwargs for S3-compatible "
|
||||
f"endpoint support, but found: {passed_checksum_keys}"
|
||||
)
|
||||
|
||||
def test_preconfigured_client_is_used_as_is(self):
|
||||
"""When a pre-configured client is passed, it should be used directly."""
|
||||
mock_client = MagicMock()
|
||||
adapter = S3StorageAdapter(client=mock_client)
|
||||
assert adapter.client is mock_client
|
||||
Reference in New Issue
Block a user