mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-04-30 20:24:35 +02:00
Compare commits
7 Commits
fix/tempdi
...
v6.1.1
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a98fc7c178 | ||
|
|
82e00623de | ||
|
|
e8c76f1dc7 | ||
|
|
c492a5087b | ||
|
|
85af5a95c8 | ||
|
|
60b70309fa | ||
|
|
b0699f952a |
2
.github/workflows/ci.yml
vendored
2
.github/workflows/ci.yml
vendored
@@ -98,7 +98,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
services:
|
||||
localstack:
|
||||
image: localstack/localstack:latest
|
||||
image: localstack/localstack:4.4
|
||||
ports:
|
||||
- 4566:4566
|
||||
env:
|
||||
|
||||
2
.github/workflows/release-manual.yml
vendored
2
.github/workflows/release-manual.yml
vendored
@@ -146,7 +146,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
services:
|
||||
localstack:
|
||||
image: localstack/localstack:latest
|
||||
image: localstack/localstack:4.4
|
||||
ports:
|
||||
- 4566:4566
|
||||
env:
|
||||
|
||||
2
.github/workflows/release.yml
vendored
2
.github/workflows/release.yml
vendored
@@ -150,7 +150,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
services:
|
||||
localstack:
|
||||
image: localstack/localstack:latest
|
||||
image: localstack/localstack:4.4
|
||||
ports:
|
||||
- 4566:4566
|
||||
env:
|
||||
|
||||
13
CHANGELOG.md
13
CHANGELOG.md
@@ -5,6 +5,19 @@ All notable changes to this project will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [6.1.1] - 2026-03-23
|
||||
|
||||
### Fixed
|
||||
- **S3-Compatible Endpoint Support**: Disabled boto3 automatic request checksums (CRC32/CRC64) that were added in boto3 1.36+. S3-compatible stores like Hetzner Object Storage reject these headers with `BadRequest`, breaking direct (non-delta) file uploads. Sets `request_checksum_calculation="when_required"` to restore compatibility while still working with AWS S3.
|
||||
- **CI: LocalStack pinned to 4.4** — `localstack/localstack:latest` now requires a paid license; pinned to last free version across all workflows and docker-compose files.
|
||||
|
||||
### Changed
|
||||
- **Dependency Pinning**: All runtime dependencies now use major-version upper bounds (`boto3>=1.35.0,<2.0.0`, etc.) to prevent surprise breaking changes in Docker builds.
|
||||
|
||||
### Added
|
||||
- **S3 Compatibility Tests**: New `test_s3_compat.py` unit tests verifying the boto3 client disables automatic checksums and `put_object` doesn't pass checksum kwargs — regression protection for non-AWS S3 endpoints.
|
||||
- **Dependency Management Guide**: Added quarterly dependency refresh checklist and known compatibility constraints to CLAUDE.md.
|
||||
|
||||
## [6.1.0] - 2025-02-07
|
||||
|
||||
### Added
|
||||
|
||||
22
CLAUDE.md
22
CLAUDE.md
@@ -256,4 +256,24 @@ Core delta logic is in `src/deltaglider/core/service.py`:
|
||||
- **Auto-Cleanup**: Corrupted or tampered cache files automatically deleted on decryption failures
|
||||
- **Persistent Keys**: Set `DG_CACHE_ENCRYPTION_KEY` only for cross-process cache sharing (use secrets management)
|
||||
- **Content-Addressed Storage**: SHA256-based filenames prevent collision attacks
|
||||
- **Zero-Trust Cache**: All cache operations include cryptographic validation
|
||||
- **Zero-Trust Cache**: All cache operations include cryptographic validation
|
||||
|
||||
## Dependency Management
|
||||
|
||||
### Pinning Strategy
|
||||
Runtime dependencies in `pyproject.toml` use **compatible range pins** (`>=x.y.z,<NEXT_MAJOR`). This prevents surprise breaking changes from major versions while allowing patch/minor updates.
|
||||
|
||||
**Critical dependency: `boto3`** — This is the most breakage-prone dependency. AWS periodically changes default behaviors in minor releases (e.g., boto3 1.36+ added automatic request checksums that break S3-compatible stores like Hetzner Object Storage). The S3 adapter (`adapters/storage_s3.py`) explicitly sets `request_checksum_calculation="when_required"` to maintain compatibility with non-AWS S3 endpoints.
|
||||
|
||||
### Quarterly Dependency Refresh (do every ~3 months)
|
||||
1. **Check for updates**: `uv pip compile pyproject.toml --upgrade --dry-run`
|
||||
2. **Update in a branch**: bump version floors in `pyproject.toml` to current stable releases
|
||||
3. **Run full test suite**: `uv run pytest` (unit + integration)
|
||||
4. **Test against S3-compatible stores**: test a small file upload against Hetzner (or whichever non-AWS endpoint is in use) — boto3 updates are the most likely to break this
|
||||
5. **Rebuild Docker image** and test the same upload from the container
|
||||
6. **Check changelogs** for boto3, cryptography, and click for any deprecation notices or behavior changes
|
||||
|
||||
### Known Compatibility Constraints
|
||||
- **boto3**: Must use `request_checksum_calculation="when_required"` for Hetzner/MinIO compatibility. If upgrading past a new major behavior change, test direct uploads (non-delta path) of small files to non-AWS endpoints.
|
||||
- **cryptography**: Fernet API has been stable, but major versions may drop old OpenSSL support. Verify cache encryption still works after upgrades.
|
||||
- **click**: CLI argument parsing. Major versions may change decorator behavior. Run integration tests (`test_aws_cli_commands_v2.py`) after upgrades.
|
||||
@@ -2,7 +2,7 @@ version: '3.8'
|
||||
|
||||
services:
|
||||
localstack:
|
||||
image: localstack/localstack:latest
|
||||
image: localstack/localstack:4.4
|
||||
ports:
|
||||
- "4566:4566"
|
||||
environment:
|
||||
|
||||
@@ -22,7 +22,7 @@ services:
|
||||
retries: 5
|
||||
|
||||
localstack:
|
||||
image: localstack/localstack:latest
|
||||
image: localstack/localstack:4.4
|
||||
container_name: deltaglider-localstack
|
||||
ports:
|
||||
- "4566:4566"
|
||||
|
||||
@@ -49,11 +49,11 @@ classifiers = [
|
||||
]
|
||||
|
||||
dependencies = [
|
||||
"boto3>=1.35.0",
|
||||
"click>=8.1.0",
|
||||
"cryptography>=42.0.0",
|
||||
"python-dateutil>=2.9.0",
|
||||
"requests>=2.32.0",
|
||||
"boto3>=1.35.0,<2.0.0",
|
||||
"click>=8.1.0,<9.0.0",
|
||||
"cryptography>=42.0.0,<45.0.0",
|
||||
"python-dateutil>=2.9.0,<3.0.0",
|
||||
"requests>=2.32.0,<3.0.0",
|
||||
]
|
||||
|
||||
[project.urls]
|
||||
|
||||
@@ -7,6 +7,7 @@ from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Any, BinaryIO, Optional
|
||||
|
||||
import boto3
|
||||
from botocore.config import Config
|
||||
from botocore.exceptions import ClientError
|
||||
|
||||
from ..ports.storage import ObjectHead, PutResult, StoragePort
|
||||
@@ -42,6 +43,13 @@ class S3StorageAdapter(StoragePort):
|
||||
client_params: dict[str, Any] = {
|
||||
"service_name": "s3",
|
||||
"endpoint_url": endpoint_url or os.environ.get("AWS_ENDPOINT_URL"),
|
||||
# Disable automatic request checksums (CRC32/CRC64) added in
|
||||
# boto3 1.36+. S3-compatible stores like Hetzner Object Storage
|
||||
# reject the checksum headers with BadRequest.
|
||||
"config": Config(
|
||||
request_checksum_calculation="when_required",
|
||||
response_checksum_validation="when_required",
|
||||
),
|
||||
}
|
||||
|
||||
# Merge in any additional boto3 kwargs (credentials, region, etc.)
|
||||
@@ -225,47 +233,94 @@ class S3StorageAdapter(StoragePort):
|
||||
f"AWS S3 limit (2KB). Some metadata may be lost!"
|
||||
)
|
||||
|
||||
try:
|
||||
response = self.client.put_object(
|
||||
Bucket=bucket,
|
||||
Key=object_key,
|
||||
Body=body_data,
|
||||
ContentType=content_type,
|
||||
Metadata=clean_metadata,
|
||||
)
|
||||
import time
|
||||
|
||||
# VERIFICATION: Check if metadata was actually stored (especially for delta files)
|
||||
if object_key.endswith(".delta") and clean_metadata:
|
||||
try:
|
||||
# Verify metadata was stored by doing a HEAD immediately
|
||||
verify_response = self.client.head_object(Bucket=bucket, Key=object_key)
|
||||
stored_metadata = verify_response.get("Metadata", {})
|
||||
max_retries = 3
|
||||
last_error: ClientError | None = None
|
||||
|
||||
if not stored_metadata:
|
||||
logger.error(
|
||||
f"PUT {object_key}: CRITICAL - Metadata was sent but NOT STORED! "
|
||||
f"Sent {len(clean_metadata)} keys, received 0 keys back."
|
||||
)
|
||||
elif len(stored_metadata) < len(clean_metadata):
|
||||
missing_keys = set(clean_metadata.keys()) - set(stored_metadata.keys())
|
||||
logger.warning(
|
||||
f"PUT {object_key}: Metadata partially stored. "
|
||||
f"Sent {len(clean_metadata)} keys, stored {len(stored_metadata)} keys. "
|
||||
f"Missing keys: {missing_keys}"
|
||||
)
|
||||
elif logger.isEnabledFor(logging.DEBUG):
|
||||
logger.debug(
|
||||
f"PUT {object_key}: Metadata verified - all {len(clean_metadata)} keys stored"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"PUT {object_key}: Could not verify metadata: {e}")
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
response = self.client.put_object(
|
||||
Bucket=bucket,
|
||||
Key=object_key,
|
||||
Body=body_data,
|
||||
ContentType=content_type,
|
||||
Metadata=clean_metadata,
|
||||
)
|
||||
|
||||
return PutResult(
|
||||
etag=response["ETag"].strip('"'),
|
||||
version_id=response.get("VersionId"),
|
||||
)
|
||||
except ClientError as e:
|
||||
raise RuntimeError(f"Failed to put object: {e}") from e
|
||||
# VERIFICATION: Check if metadata was actually stored (especially for delta files)
|
||||
if object_key.endswith(".delta") and clean_metadata:
|
||||
try:
|
||||
# Verify metadata was stored by doing a HEAD immediately
|
||||
verify_response = self.client.head_object(Bucket=bucket, Key=object_key)
|
||||
stored_metadata = verify_response.get("Metadata", {})
|
||||
|
||||
if not stored_metadata:
|
||||
logger.error(
|
||||
f"PUT {object_key}: CRITICAL - Metadata was sent but NOT STORED! "
|
||||
f"Sent {len(clean_metadata)} keys, received 0 keys back."
|
||||
)
|
||||
elif len(stored_metadata) < len(clean_metadata):
|
||||
missing_keys = set(clean_metadata.keys()) - set(stored_metadata.keys())
|
||||
logger.warning(
|
||||
f"PUT {object_key}: Metadata partially stored. "
|
||||
f"Sent {len(clean_metadata)} keys, stored {len(stored_metadata)} keys. "
|
||||
f"Missing keys: {missing_keys}"
|
||||
)
|
||||
elif logger.isEnabledFor(logging.DEBUG):
|
||||
logger.debug(
|
||||
f"PUT {object_key}: Metadata verified - "
|
||||
f"all {len(clean_metadata)} keys stored"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"PUT {object_key}: Could not verify metadata: {e}")
|
||||
|
||||
return PutResult(
|
||||
etag=response["ETag"].strip('"'),
|
||||
version_id=response.get("VersionId"),
|
||||
)
|
||||
except ClientError as e:
|
||||
last_error = e
|
||||
if attempt < max_retries - 1:
|
||||
delay = 2**attempt # 1s, 2s
|
||||
# Log full error details
|
||||
error_response = e.response if hasattr(e, "response") else {}
|
||||
http_headers = error_response.get("ResponseMetadata", {}).get("HTTPHeaders", {})
|
||||
logger.warning(
|
||||
f"PUT {object_key}: Attempt {attempt + 1}/{max_retries} failed: {e}. "
|
||||
f"Retrying in {delay}s... "
|
||||
f"Details: bucket={bucket}, key={object_key}, "
|
||||
f"body_size={len(body_data)}, content_type={content_type}, "
|
||||
f"metadata_keys={list(clean_metadata.keys())}, "
|
||||
f"endpoint={self.client.meta.endpoint_url}, "
|
||||
f"http_status={error_response.get('ResponseMetadata', {}).get('HTTPStatusCode')}, "
|
||||
f"error_code={error_response.get('Error', {}).get('Code')}, "
|
||||
f"error_message={error_response.get('Error', {}).get('Message')}, "
|
||||
f"request_id={error_response.get('ResponseMetadata', {}).get('RequestId')}, "
|
||||
f"http_headers={dict(http_headers)}"
|
||||
)
|
||||
# Enable botocore wire-level logging for the retry
|
||||
logging.getLogger("botocore").setLevel(logging.DEBUG)
|
||||
time.sleep(delay)
|
||||
else:
|
||||
# Final attempt failed — log everything
|
||||
error_response = e.response if hasattr(e, "response") else {}
|
||||
http_headers = error_response.get("ResponseMetadata", {}).get("HTTPHeaders", {})
|
||||
logger.error(
|
||||
f"PUT {object_key}: All {max_retries} attempts failed. "
|
||||
f"Last error: {e}. "
|
||||
f"Details: bucket={bucket}, key={object_key}, "
|
||||
f"body_size={len(body_data)}, content_type={content_type}, "
|
||||
f"metadata={clean_metadata}, "
|
||||
f"endpoint={self.client.meta.endpoint_url}, "
|
||||
f"http_status={error_response.get('ResponseMetadata', {}).get('HTTPStatusCode')}, "
|
||||
f"error_code={error_response.get('Error', {}).get('Code')}, "
|
||||
f"error_message={error_response.get('Error', {}).get('Message')}, "
|
||||
f"request_id={error_response.get('ResponseMetadata', {}).get('RequestId')}, "
|
||||
f"http_headers={dict(http_headers)}"
|
||||
)
|
||||
|
||||
raise RuntimeError(f"Failed to put object: {last_error}") from last_error
|
||||
|
||||
def delete(self, key: str) -> None:
|
||||
"""Delete object."""
|
||||
|
||||
@@ -155,8 +155,11 @@ def _version_callback(ctx: click.Context, param: click.Parameter, value: bool) -
|
||||
@click.pass_context
|
||||
def cli(ctx: click.Context, debug: bool) -> None:
|
||||
"""DeltaGlider - Delta-aware S3 file storage wrapper."""
|
||||
import logging
|
||||
|
||||
log_level = "DEBUG" if debug else os.environ.get("DG_LOG_LEVEL", "INFO")
|
||||
ctx.obj = create_service(log_level)
|
||||
logging.getLogger("deltaglider").info("deltaglider %s", __version__)
|
||||
|
||||
|
||||
@cli.command()
|
||||
|
||||
70
tests/unit/test_s3_compat.py
Normal file
70
tests/unit/test_s3_compat.py
Normal file
@@ -0,0 +1,70 @@
|
||||
"""Tests for S3-compatible storage compatibility.
|
||||
|
||||
Ensures the S3 adapter works with non-AWS S3 endpoints (Hetzner, MinIO, etc.)
|
||||
that don't support newer AWS-specific features like automatic request checksums.
|
||||
"""
|
||||
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
from deltaglider.adapters.storage_s3 import S3StorageAdapter
|
||||
|
||||
|
||||
class TestS3CompatibleEndpoints:
|
||||
"""Verify S3 adapter configuration for non-AWS endpoint compatibility."""
|
||||
|
||||
def test_client_disables_automatic_checksums(self):
|
||||
"""boto3 1.36+ sends CRC32/CRC64 checksums by default.
|
||||
|
||||
S3-compatible stores (Hetzner, MinIO) reject these with BadRequest.
|
||||
The adapter must set request_checksum_calculation='when_required'.
|
||||
"""
|
||||
with patch("deltaglider.adapters.storage_s3.boto3.client") as mock_client:
|
||||
S3StorageAdapter(endpoint_url="https://example.com")
|
||||
|
||||
mock_client.assert_called_once()
|
||||
call_kwargs = mock_client.call_args
|
||||
config = call_kwargs.kwargs.get("config") or call_kwargs[1].get("config")
|
||||
|
||||
assert config is not None, "boto3 client must be created with a Config object"
|
||||
assert config.request_checksum_calculation == "when_required"
|
||||
assert config.response_checksum_validation == "when_required"
|
||||
|
||||
def test_put_object_no_checksum_kwargs(self, temp_dir):
|
||||
"""put_object must not pass ChecksumAlgorithm or similar kwargs."""
|
||||
mock_client = MagicMock()
|
||||
mock_client.put_object.return_value = {"ETag": '"abc123"'}
|
||||
|
||||
adapter = S3StorageAdapter(client=mock_client)
|
||||
|
||||
test_file = temp_dir / "test.sha1"
|
||||
test_file.write_text("abc123")
|
||||
|
||||
adapter.put(
|
||||
"my-bucket/test/test.sha1",
|
||||
test_file,
|
||||
{"compression": "none", "tool": "deltaglider"},
|
||||
)
|
||||
|
||||
mock_client.put_object.assert_called_once()
|
||||
call_kwargs = mock_client.put_object.call_args.kwargs
|
||||
|
||||
checksum_keys = {
|
||||
"ChecksumAlgorithm",
|
||||
"ChecksumCRC32",
|
||||
"ChecksumCRC32C",
|
||||
"ChecksumCRC64NVME",
|
||||
"ChecksumSHA1",
|
||||
"ChecksumSHA256",
|
||||
"ContentMD5",
|
||||
}
|
||||
passed_checksum_keys = checksum_keys & set(call_kwargs.keys())
|
||||
assert not passed_checksum_keys, (
|
||||
f"put_object must not pass checksum kwargs for S3-compatible "
|
||||
f"endpoint support, but found: {passed_checksum_keys}"
|
||||
)
|
||||
|
||||
def test_preconfigured_client_is_used_as_is(self):
|
||||
"""When a pre-configured client is passed, it should be used directly."""
|
||||
mock_client = MagicMock()
|
||||
adapter = S3StorageAdapter(client=mock_client)
|
||||
assert adapter.client is mock_client
|
||||
Reference in New Issue
Block a user