mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-04-30 12:14:32 +02:00
Compare commits
21 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5e3b76791e | ||
|
|
fb2877bfd3 | ||
|
|
88fd1f51cd | ||
|
|
0857e02edd | ||
|
|
689cf00d02 | ||
|
|
743d52e783 | ||
|
|
8bc0a0eaf3 | ||
|
|
4cf25e4681 | ||
|
|
69ed9056d2 | ||
|
|
38134f28f5 | ||
|
|
fa1f8b85a9 | ||
|
|
a06cc2939c | ||
|
|
5b8477ed61 | ||
|
|
e706ddebdd | ||
|
|
50db9bbb27 | ||
|
|
c25568e315 | ||
|
|
ca1186a3f6 | ||
|
|
4217535e8c | ||
|
|
0064d7e74b | ||
|
|
9c1659a1f1 | ||
|
|
34c871b0d7 |
1
.github/workflows/release-manual.yml
vendored
1
.github/workflows/release-manual.yml
vendored
@@ -231,6 +231,7 @@ jobs:
|
||||
|
||||
- name: Create GitHub Release
|
||||
uses: softprops/action-gh-release@v1
|
||||
continue-on-error: true # Don't fail if GitHub release creation fails
|
||||
with:
|
||||
tag_name: ${{ needs.validate.outputs.tag_name }}
|
||||
name: Release v${{ github.event.inputs.version }}
|
||||
|
||||
1
.github/workflows/release.yml
vendored
1
.github/workflows/release.yml
vendored
@@ -235,6 +235,7 @@ jobs:
|
||||
|
||||
- name: Create GitHub Release
|
||||
uses: softprops/action-gh-release@v1
|
||||
continue-on-error: true # Don't fail if GitHub release creation fails
|
||||
with:
|
||||
tag_name: ${{ needs.validate-and-tag.outputs.tag_name }}
|
||||
name: Release v${{ github.event.inputs.version }}
|
||||
|
||||
117
CHANGELOG.md
Normal file
117
CHANGELOG.md
Normal file
@@ -0,0 +1,117 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [5.0.1] - 2025-01-10
|
||||
|
||||
### Changed
|
||||
- **Code Organization**: Refactored client.py from 1560 to 1154 lines (26% reduction)
|
||||
- Extracted client operations into modular `client_operations/` package:
|
||||
- `bucket.py` - S3 bucket management operations
|
||||
- `presigned.py` - Presigned URL generation
|
||||
- `batch.py` - Batch upload/download operations
|
||||
- `stats.py` - Analytics and statistics operations
|
||||
- Improved code maintainability with logical separation of concerns
|
||||
- Better developer experience with cleaner module structure
|
||||
|
||||
### Internal
|
||||
- Full type safety maintained with mypy (0 errors)
|
||||
- All 99 tests passing
|
||||
- Code quality checks passing (ruff)
|
||||
- No breaking changes - all public APIs remain unchanged
|
||||
|
||||
## [5.0.0] - 2025-01-10
|
||||
|
||||
### Added
|
||||
- boto3-compatible TypedDict types for S3 responses (no boto3 import needed)
|
||||
- Complete boto3 compatibility vision document
|
||||
- Type-safe response builders using TypedDict patterns
|
||||
|
||||
### Changed
|
||||
- **BREAKING**: `list_objects()` now returns boto3-compatible dict instead of custom dataclass
|
||||
- Use `response['Contents']` instead of `response.contents`
|
||||
- Use `response.get('IsTruncated')` instead of `response.is_truncated`
|
||||
- Use `response.get('NextContinuationToken')` instead of `response.next_continuation_token`
|
||||
- DeltaGlider metadata now in `Metadata` field of each object
|
||||
- Internal response building now uses TypedDict for compile-time type safety
|
||||
- All S3 responses are dicts at runtime (TypedDict is a dict!)
|
||||
|
||||
### Fixed
|
||||
- Updated all documentation examples to use dict-based responses
|
||||
- Fixed pagination examples in README and API docs
|
||||
- Corrected SDK documentation with accurate method signatures
|
||||
|
||||
## [4.2.4] - 2025-01-10
|
||||
|
||||
### Fixed
|
||||
- Show only filename in `ls` output instead of full path for cleaner display
|
||||
- Correct `ls` command path handling and prefix display logic
|
||||
|
||||
## [4.2.3] - 2025-01-07
|
||||
|
||||
### Added
|
||||
- Comprehensive test coverage for `delete_objects_recursive()` method with 19 thorough tests
|
||||
- Tests cover delta suffix handling, error/warning aggregation, statistics tracking, and edge cases
|
||||
- Better code organization with separate `client_models.py` and `client_delete_helpers.py` modules
|
||||
|
||||
### Fixed
|
||||
- Fixed all mypy type errors using proper `cast()` for type safety
|
||||
- Improved type hints for dictionary operations in client code
|
||||
|
||||
### Changed
|
||||
- Refactored client code into logical modules for better maintainability
|
||||
- Enhanced code quality with comprehensive linting and type checking
|
||||
- All 99 integration/unit tests passing with zero type errors
|
||||
|
||||
### Internal
|
||||
- Better separation of concerns in client module
|
||||
- Improved developer experience with clearer code structure
|
||||
|
||||
## [4.2.2] - 2024-10-06
|
||||
|
||||
### Fixed
|
||||
- Add .delta suffix fallback for `delete_object()` method
|
||||
- Handle regular S3 objects without DeltaGlider metadata
|
||||
- Update mypy type ignore comment for compatibility
|
||||
|
||||
## [4.2.1] - 2024-10-06
|
||||
|
||||
### Fixed
|
||||
- Make GitHub release creation non-blocking in workflows
|
||||
|
||||
## [4.2.0] - 2024-10-03
|
||||
|
||||
### Added
|
||||
- AWS credential parameters to `create_client()` function
|
||||
- Support for custom endpoint URLs
|
||||
- Enhanced boto3 compatibility
|
||||
|
||||
## [4.1.0] - 2024-09-29
|
||||
|
||||
### Added
|
||||
- boto3-compatible client API
|
||||
- Bucket management methods
|
||||
- Comprehensive SDK documentation
|
||||
|
||||
## [4.0.0] - 2024-09-21
|
||||
|
||||
### Added
|
||||
- Initial public release
|
||||
- CLI with AWS S3 compatibility
|
||||
- Delta compression for versioned artifacts
|
||||
- 99%+ compression for similar files
|
||||
|
||||
[5.0.1]: https://github.com/beshu-tech/deltaglider/compare/v5.0.0...v5.0.1
|
||||
[5.0.0]: https://github.com/beshu-tech/deltaglider/compare/v4.2.4...v5.0.0
|
||||
[4.2.4]: https://github.com/beshu-tech/deltaglider/compare/v4.2.3...v4.2.4
|
||||
[4.2.3]: https://github.com/beshu-tech/deltaglider/compare/v4.2.2...v4.2.3
|
||||
[4.2.2]: https://github.com/beshu-tech/deltaglider/compare/v4.2.1...v4.2.2
|
||||
[4.2.1]: https://github.com/beshu-tech/deltaglider/compare/v4.2.0...v4.2.1
|
||||
[4.2.0]: https://github.com/beshu-tech/deltaglider/compare/v4.1.0...v4.2.0
|
||||
[4.1.0]: https://github.com/beshu-tech/deltaglider/compare/v4.0.0...v4.1.0
|
||||
[4.0.0]: https://github.com/beshu-tech/deltaglider/releases/tag/v4.0.0
|
||||
11
Dockerfile
11
Dockerfile
@@ -30,7 +30,16 @@ RUN --mount=type=cache,target=/root/.cache/uv \
|
||||
# Runtime stage - minimal image
|
||||
FROM python:${PYTHON_VERSION}
|
||||
|
||||
# Install xdelta3
|
||||
# Skip man pages and docs to speed up builds
|
||||
RUN mkdir -p /etc/dpkg/dpkg.cfg.d && \
|
||||
echo 'path-exclude /usr/share/doc/*' > /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/man/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/groff/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/info/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/lintian/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/linda/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc
|
||||
|
||||
# Install xdelta3 (now much faster without man pages)
|
||||
RUN apt-get update && \
|
||||
apt-get install -y --no-install-recommends xdelta3 && \
|
||||
apt-get clean && \
|
||||
|
||||
18
README.md
18
README.md
@@ -207,14 +207,18 @@ with open('downloaded.zip', 'wb') as f:
|
||||
|
||||
# Smart list_objects with optimized performance
|
||||
response = client.list_objects(Bucket='releases', Prefix='v2.0.0/')
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
|
||||
# Paginated listing for large buckets
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100)
|
||||
while response.is_truncated:
|
||||
while response.get('IsTruncated'):
|
||||
for obj in response['Contents']:
|
||||
print(obj['Key'])
|
||||
response = client.list_objects(
|
||||
Bucket='releases',
|
||||
MaxKeys=100,
|
||||
ContinuationToken=response.next_continuation_token
|
||||
ContinuationToken=response.get('NextContinuationToken')
|
||||
)
|
||||
|
||||
# Delete and inspect objects
|
||||
@@ -459,7 +463,9 @@ Migrating from `aws s3` to `deltaglider` is as simple as changing the command na
|
||||
- ✅ **S3 compatible**: Works with AWS, MinIO, Cloudflare R2, etc.
|
||||
- ✅ **Atomic operations**: No partial states
|
||||
- ✅ **Concurrent safe**: Multiple clients supported
|
||||
- ✅ **Well tested**: 95%+ code coverage
|
||||
- ✅ **Thoroughly tested**: 99 integration/unit tests, comprehensive test coverage
|
||||
- ✅ **Type safe**: Full mypy type checking, zero type errors
|
||||
- ✅ **Code quality**: Automated linting with ruff, clean codebase
|
||||
|
||||
## Development
|
||||
|
||||
@@ -471,9 +477,13 @@ cd deltaglider
|
||||
# Install with dev dependencies
|
||||
uv pip install -e ".[dev]"
|
||||
|
||||
# Run tests
|
||||
# Run tests (99 integration/unit tests)
|
||||
uv run pytest
|
||||
|
||||
# Run quality checks
|
||||
uv run ruff check src/ # Linting
|
||||
uv run mypy src/ # Type checking
|
||||
|
||||
# Run with local MinIO
|
||||
docker-compose up -d
|
||||
export AWS_ENDPOINT_URL=http://localhost:9000
|
||||
|
||||
316
docs/BOTO3_COMPATIBILITY_VISION.md
Normal file
316
docs/BOTO3_COMPATIBILITY_VISION.md
Normal file
@@ -0,0 +1,316 @@
|
||||
# boto3 Compatibility Vision
|
||||
|
||||
## Current State (v4.2.3)
|
||||
|
||||
DeltaGlider currently uses custom dataclasses for responses:
|
||||
|
||||
```python
|
||||
from deltaglider import create_client, ListObjectsResponse, ObjectInfo
|
||||
|
||||
client = create_client()
|
||||
response: ListObjectsResponse = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response.contents: # Custom field name
|
||||
print(f"{obj.key}: {obj.size}") # Custom ObjectInfo dataclass
|
||||
```
|
||||
|
||||
**Problems:**
|
||||
- ❌ Not a true drop-in replacement for boto3
|
||||
- ❌ Users need to learn DeltaGlider-specific types
|
||||
- ❌ Can't use with tools expecting boto3 responses
|
||||
- ❌ Different API surface (`.contents` vs `['Contents']`)
|
||||
|
||||
## Target State (v5.0.0)
|
||||
|
||||
DeltaGlider should return native boto3-compatible dicts with TypedDict type hints:
|
||||
|
||||
```python
|
||||
from deltaglider import create_client, ListObjectsV2Response
|
||||
|
||||
client = create_client()
|
||||
response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response['Contents']: # boto3-compatible!
|
||||
print(f"{obj['Key']}: {obj['Size']}") # Works exactly like boto3
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ **True drop-in replacement** - swap `boto3.client('s3')` with `create_client()`
|
||||
- ✅ **No learning curve** - if you know boto3, you know DeltaGlider
|
||||
- ✅ **Tool compatibility** - works with any library expecting boto3 types
|
||||
- ✅ **Type safety** - TypedDict provides IDE autocomplete without boto3 import
|
||||
- ✅ **Zero runtime overhead** - TypedDict compiles to plain dict
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Type Definitions ✅ (DONE)
|
||||
|
||||
Created `deltaglider/types.py` with comprehensive TypedDict definitions:
|
||||
|
||||
```python
|
||||
from typing import TypedDict, NotRequired
|
||||
from datetime import datetime
|
||||
|
||||
class S3Object(TypedDict):
|
||||
Key: str
|
||||
Size: int
|
||||
LastModified: datetime
|
||||
ETag: NotRequired[str]
|
||||
StorageClass: NotRequired[str]
|
||||
|
||||
class ListObjectsV2Response(TypedDict):
|
||||
Contents: list[S3Object]
|
||||
CommonPrefixes: NotRequired[list[dict[str, str]]]
|
||||
IsTruncated: NotRequired[bool]
|
||||
NextContinuationToken: NotRequired[str]
|
||||
```
|
||||
|
||||
**Key insight:** TypedDict provides type safety at development time but compiles to plain `dict` at runtime!
|
||||
|
||||
### Phase 2: Refactor Client Methods (TODO)
|
||||
|
||||
Update all client methods to return boto3-compatible dicts:
|
||||
|
||||
#### `list_objects()`
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
def list_objects(...) -> ListObjectsResponse: # Custom dataclass
|
||||
return ListObjectsResponse(
|
||||
name=bucket,
|
||||
contents=[ObjectInfo(...), ...] # Custom dataclass
|
||||
)
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
def list_objects(...) -> ListObjectsV2Response: # TypedDict
|
||||
return {
|
||||
'Contents': [
|
||||
{
|
||||
'Key': 'file.zip', # .delta suffix already stripped
|
||||
'Size': 1024,
|
||||
'LastModified': datetime(...),
|
||||
'ETag': '"abc123"',
|
||||
}
|
||||
],
|
||||
'CommonPrefixes': [{'Prefix': 'dir/'}],
|
||||
'IsTruncated': False,
|
||||
}
|
||||
```
|
||||
|
||||
**Key changes:**
|
||||
1. Return plain dict instead of custom dataclass
|
||||
2. Use boto3 field names: `Contents` not `contents`, `Key` not `key`
|
||||
3. Strip `.delta` suffix transparently (already done)
|
||||
4. Hide `reference.bin` files (already done)
|
||||
|
||||
#### `put_object()`
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
def put_object(...) -> dict[str, Any]:
|
||||
return {
|
||||
"ETag": etag,
|
||||
"VersionId": None,
|
||||
"DeltaGliderInfo": {...} # Custom field
|
||||
}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
def put_object(...) -> PutObjectResponse: # TypedDict
|
||||
return {
|
||||
'ETag': etag,
|
||||
'ResponseMetadata': {'HTTPStatusCode': 200},
|
||||
# DeltaGlider metadata goes in Metadata field
|
||||
'Metadata': {
|
||||
'deltaglider-is-delta': 'true',
|
||||
'deltaglider-compression-ratio': '0.99'
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `get_object()`
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
def get_object(...) -> dict[str, Any]:
|
||||
return {
|
||||
"Body": data,
|
||||
"ContentLength": len(data),
|
||||
"DeltaGliderInfo": {...} # Custom field
|
||||
}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
def get_object(...) -> GetObjectResponse: # TypedDict
|
||||
return {
|
||||
'Body': data, # bytes, not StreamingBody (simpler!)
|
||||
'ContentLength': len(data),
|
||||
'LastModified': datetime(...),
|
||||
'ETag': '"abc123"',
|
||||
'Metadata': { # DeltaGlider metadata here
|
||||
'deltaglider-is-delta': 'true'
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `delete_object()`, `delete_objects()`, `head_object()`, etc.
|
||||
|
||||
All follow the same pattern: return boto3-compatible dicts with TypedDict hints.
|
||||
|
||||
### Phase 3: Backward Compatibility (TODO)
|
||||
|
||||
Keep old dataclasses for 1-2 versions with deprecation warnings:
|
||||
|
||||
```python
|
||||
class ListObjectsResponse:
|
||||
"""DEPRECATED: Use dict responses with ListObjectsV2Response type hint.
|
||||
|
||||
This will be removed in v6.0.0. Update your code:
|
||||
|
||||
Before:
|
||||
response.contents[0].key
|
||||
|
||||
After:
|
||||
response['Contents'][0]['Key']
|
||||
"""
|
||||
def __init__(self, data: dict):
|
||||
warnings.warn(
|
||||
"ListObjectsResponse dataclass is deprecated. "
|
||||
"Use dict responses with ListObjectsV2Response type hint.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2
|
||||
)
|
||||
self._data = data
|
||||
|
||||
@property
|
||||
def contents(self):
|
||||
return [ObjectInfo(obj) for obj in self._data.get('Contents', [])]
|
||||
```
|
||||
|
||||
### Phase 4: Update Documentation (TODO)
|
||||
|
||||
1. Update all examples to use dict responses
|
||||
2. Add migration guide from v4.x to v5.0
|
||||
3. Update BOTO3_COMPATIBILITY.md
|
||||
4. Add "Drop-in Replacement" marketing language
|
||||
|
||||
### Phase 5: Update Tests (TODO)
|
||||
|
||||
Convert all tests from:
|
||||
```python
|
||||
assert response.contents[0].key == "file.zip"
|
||||
```
|
||||
|
||||
To:
|
||||
```python
|
||||
assert response['Contents'][0]['Key'] == "file.zip"
|
||||
```
|
||||
|
||||
## Migration Guide (for users)
|
||||
|
||||
### v4.x → v5.0
|
||||
|
||||
**Old code (v4.x):**
|
||||
```python
|
||||
from deltaglider import create_client
|
||||
|
||||
client = create_client()
|
||||
response = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response.contents: # Dataclass attribute
|
||||
print(f"{obj.key}: {obj.size}") # Dataclass attributes
|
||||
```
|
||||
|
||||
**New code (v5.0):**
|
||||
```python
|
||||
from deltaglider import create_client, ListObjectsV2Response
|
||||
|
||||
client = create_client()
|
||||
response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response['Contents']: # Dict key (boto3-compatible)
|
||||
print(f"{obj['Key']}: {obj['Size']}") # Dict keys (boto3-compatible)
|
||||
```
|
||||
|
||||
**Or even simpler - no type hint needed:**
|
||||
```python
|
||||
client = create_client()
|
||||
response = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']}")
|
||||
```
|
||||
|
||||
## Benefits Summary
|
||||
|
||||
### For Users
|
||||
- **Zero learning curve** - if you know boto3, you're done
|
||||
- **Drop-in replacement** - literally change one line (client creation)
|
||||
- **Type safety** - TypedDict provides autocomplete without boto3 dependency
|
||||
- **Tool compatibility** - works with all boto3-compatible libraries
|
||||
|
||||
### For DeltaGlider
|
||||
- **Simpler codebase** - no custom dataclasses to maintain
|
||||
- **Better marketing** - true "drop-in replacement" claim
|
||||
- **Easier testing** - test against boto3 behavior directly
|
||||
- **Future-proof** - if boto3 adds fields, users can access them immediately
|
||||
|
||||
## Technical Details
|
||||
|
||||
### How TypedDict Works
|
||||
|
||||
```python
|
||||
from typing import TypedDict
|
||||
|
||||
class MyResponse(TypedDict):
|
||||
Key: str
|
||||
Size: int
|
||||
|
||||
# At runtime, this is just a dict!
|
||||
response: MyResponse = {'Key': 'file.zip', 'Size': 1024}
|
||||
print(type(response)) # <class 'dict'>
|
||||
|
||||
# But mypy and IDEs understand the structure
|
||||
response['Key'] # ✅ Autocomplete works!
|
||||
response['Nonexistent'] # ❌ Mypy error: Key 'Nonexistent' not found
|
||||
```
|
||||
|
||||
### DeltaGlider-Specific Metadata
|
||||
|
||||
Store in standard boto3 `Metadata` field:
|
||||
|
||||
```python
|
||||
{
|
||||
'Key': 'file.zip',
|
||||
'Size': 1024,
|
||||
'Metadata': {
|
||||
# DeltaGlider-specific fields (prefixed for safety)
|
||||
'deltaglider-is-delta': 'true',
|
||||
'deltaglider-compression-ratio': '0.99',
|
||||
'deltaglider-original-size': '100000',
|
||||
'deltaglider-reference-key': 'releases/v1.0.0/reference.bin',
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This is:
|
||||
- ✅ boto3-compatible (Metadata is a standard field)
|
||||
- ✅ Namespaced (deltaglider- prefix prevents conflicts)
|
||||
- ✅ Optional (tools can ignore it)
|
||||
- ✅ Type-safe (Metadata: NotRequired[dict[str, str]])
|
||||
|
||||
## Status
|
||||
|
||||
- ✅ **Phase 1:** TypedDict definitions created
|
||||
- ✅ **Phase 2:** `list_objects()` refactored to return boto3-compatible dict
|
||||
- ⏳ **Phase 3:** Refactor remaining methods (`put_object`, `get_object`, etc.) (TODO)
|
||||
- ⏳ **Phase 4:** Backward compatibility with deprecation warnings (TODO)
|
||||
- ⏳ **Phase 5:** Documentation updates (TODO)
|
||||
- ⏳ **Phase 6:** Full test coverage updates (PARTIAL - list_objects tests done)
|
||||
|
||||
**Current:** v4.2.3+ (Phase 2 complete - `list_objects()` boto3-compatible)
|
||||
**Target:** v5.0.0 release (all phases complete)
|
||||
@@ -38,10 +38,21 @@ response = client.get_object(Bucket='releases', Key='v1.0.0/app.zip')
|
||||
# Optimized list_objects with smart performance defaults (NEW!)
|
||||
# Fast by default - no unnecessary metadata fetching
|
||||
response = client.list_objects(Bucket='releases', Prefix='v1.0.0/')
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
|
||||
# Pagination for large buckets
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100,
|
||||
ContinuationToken=response.next_continuation_token)
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100)
|
||||
while response.get('IsTruncated'):
|
||||
# Process current page
|
||||
for obj in response['Contents']:
|
||||
print(obj['Key'])
|
||||
# Get next page
|
||||
response = client.list_objects(
|
||||
Bucket='releases',
|
||||
MaxKeys=100,
|
||||
ContinuationToken=response.get('NextContinuationToken')
|
||||
)
|
||||
|
||||
# Get detailed compression stats only when needed
|
||||
response = client.list_objects(Bucket='releases', FetchMetadata=True) # Slower but detailed
|
||||
@@ -101,6 +112,8 @@ client.put_object(Bucket='mybucket', Key='myfile.zip', Body=data)
|
||||
- **Data Integrity**: SHA256 verification on every operation
|
||||
- **Transparent**: Works with existing tools and workflows
|
||||
- **Production Ready**: Battle-tested with 200K+ files
|
||||
- **Thoroughly Tested**: 99 integration/unit tests with comprehensive coverage
|
||||
- **Type Safe**: Full mypy type checking, zero type errors
|
||||
|
||||
## When to Use DeltaGlider
|
||||
|
||||
|
||||
@@ -94,7 +94,7 @@ def list_objects(
|
||||
StartAfter: Optional[str] = None,
|
||||
FetchMetadata: bool = False,
|
||||
**kwargs
|
||||
) -> ListObjectsResponse
|
||||
) -> dict[str, Any]
|
||||
```
|
||||
|
||||
##### Parameters
|
||||
@@ -117,19 +117,32 @@ The method intelligently optimizes performance by:
|
||||
2. Only fetching metadata for delta files when explicitly requested
|
||||
3. Supporting efficient pagination for large buckets
|
||||
|
||||
##### Returns
|
||||
|
||||
boto3-compatible dict with:
|
||||
- **Contents** (`list[dict]`): List of S3Object dicts with Key, Size, LastModified, Metadata
|
||||
- **CommonPrefixes** (`list[dict]`): Optional list of common prefixes (folders)
|
||||
- **IsTruncated** (`bool`): Whether more results are available
|
||||
- **NextContinuationToken** (`str`): Token for next page
|
||||
- **KeyCount** (`int`): Number of keys returned
|
||||
|
||||
##### Examples
|
||||
|
||||
```python
|
||||
# Fast listing for UI display (no metadata fetching)
|
||||
response = client.list_objects(Bucket='releases')
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
|
||||
# Paginated listing for large buckets
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100)
|
||||
while response.is_truncated:
|
||||
while response.get('IsTruncated'):
|
||||
for obj in response['Contents']:
|
||||
print(obj['Key'])
|
||||
response = client.list_objects(
|
||||
Bucket='releases',
|
||||
MaxKeys=100,
|
||||
ContinuationToken=response.next_continuation_token
|
||||
ContinuationToken=response.get('NextContinuationToken')
|
||||
)
|
||||
|
||||
# Get detailed compression stats (slower, only for analytics)
|
||||
@@ -137,6 +150,11 @@ response = client.list_objects(
|
||||
Bucket='releases',
|
||||
FetchMetadata=True # Only fetches for delta files
|
||||
)
|
||||
for obj in response['Contents']:
|
||||
metadata = obj.get('Metadata', {})
|
||||
if metadata.get('deltaglider-is-delta') == 'true':
|
||||
compression = metadata.get('deltaglider-compression-ratio', 'unknown')
|
||||
print(f"{obj['Key']}: {compression} compression")
|
||||
```
|
||||
|
||||
#### `get_bucket_stats`
|
||||
|
||||
64
examples/boto3_compatible_types.py
Normal file
64
examples/boto3_compatible_types.py
Normal file
@@ -0,0 +1,64 @@
|
||||
"""Example: Using boto3-compatible responses without importing boto3.
|
||||
|
||||
This demonstrates how DeltaGlider provides full type safety and boto3 compatibility
|
||||
without requiring boto3 imports in user code.
|
||||
|
||||
As of v5.0.0, DeltaGlider returns plain dicts (not custom dataclasses) that are
|
||||
100% compatible with boto3 S3 responses. You get IDE autocomplete through TypedDict
|
||||
type hints without any runtime overhead.
|
||||
"""
|
||||
|
||||
from deltaglider import ListObjectsV2Response, S3Object, create_client
|
||||
|
||||
# Create client (no boto3 import needed!)
|
||||
client = create_client()
|
||||
|
||||
# Type hints work perfectly without boto3
|
||||
def process_files(bucket: str, prefix: str) -> None:
|
||||
"""Process files in S3 with full type safety."""
|
||||
# Return type is fully typed - IDE autocomplete works!
|
||||
response: ListObjectsV2Response = client.list_objects(
|
||||
Bucket=bucket, Prefix=prefix, Delimiter="/"
|
||||
)
|
||||
|
||||
# Response is a plain dict - 100% boto3-compatible
|
||||
# TypedDict provides autocomplete and type checking
|
||||
for obj in response["Contents"]:
|
||||
# obj is typed as S3Object - all fields have autocomplete!
|
||||
key: str = obj["Key"] # ✅ IDE knows this is str
|
||||
size: int = obj["Size"] # ✅ IDE knows this is int
|
||||
print(f"{key}: {size} bytes")
|
||||
|
||||
# DeltaGlider metadata is in the standard Metadata field
|
||||
metadata = obj.get("Metadata", {})
|
||||
if metadata.get("deltaglider-is-delta") == "true":
|
||||
compression = metadata.get("deltaglider-compression-ratio", "unknown")
|
||||
print(f" └─ Delta file (compression: {compression})")
|
||||
|
||||
# Optional fields work too
|
||||
for prefix_dict in response.get("CommonPrefixes", []):
|
||||
print(f"Directory: {prefix_dict['Prefix']}")
|
||||
|
||||
# Pagination info
|
||||
if response.get("IsTruncated"):
|
||||
next_token = response.get("NextContinuationToken")
|
||||
print(f"More results available, token: {next_token}")
|
||||
|
||||
|
||||
# This is 100% compatible with boto3 code!
|
||||
def works_with_boto3_or_deltaglider(s3_client) -> None:
|
||||
"""This function works with EITHER boto3 or DeltaGlider client."""
|
||||
# Because the response structure is identical!
|
||||
response = s3_client.list_objects(Bucket="my-bucket")
|
||||
|
||||
for obj in response["Contents"]:
|
||||
print(obj["Key"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Example usage
|
||||
print("✅ Full type safety without boto3 imports!")
|
||||
print("✅ 100% compatible with boto3")
|
||||
print("✅ Drop-in replacement")
|
||||
print("✅ Plain dict responses (not custom dataclasses)")
|
||||
print("✅ DeltaGlider metadata in standard Metadata field")
|
||||
@@ -7,23 +7,36 @@ except ImportError:
|
||||
__version__ = "0.0.0+unknown"
|
||||
|
||||
# Import client API
|
||||
from .client import (
|
||||
from .client import DeltaGliderClient, create_client
|
||||
from .client_models import (
|
||||
BucketStats,
|
||||
CompressionEstimate,
|
||||
DeltaGliderClient,
|
||||
ListObjectsResponse,
|
||||
ObjectInfo,
|
||||
UploadSummary,
|
||||
create_client,
|
||||
)
|
||||
from .core import DeltaService, DeltaSpace, ObjectKey
|
||||
|
||||
# Import boto3-compatible type aliases (no boto3 import required!)
|
||||
from .types import (
|
||||
CopyObjectResponse,
|
||||
CreateBucketResponse,
|
||||
DeleteObjectResponse,
|
||||
DeleteObjectsResponse,
|
||||
GetObjectResponse,
|
||||
HeadObjectResponse,
|
||||
ListBucketsResponse,
|
||||
ListObjectsV2Response,
|
||||
PutObjectResponse,
|
||||
S3Object,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"__version__",
|
||||
# Client
|
||||
"DeltaGliderClient",
|
||||
"create_client",
|
||||
# Data classes
|
||||
# Data classes (legacy - will be deprecated in favor of TypedDict)
|
||||
"UploadSummary",
|
||||
"CompressionEstimate",
|
||||
"ObjectInfo",
|
||||
@@ -33,4 +46,15 @@ __all__ = [
|
||||
"DeltaService",
|
||||
"DeltaSpace",
|
||||
"ObjectKey",
|
||||
# boto3-compatible types (no boto3 import needed!)
|
||||
"ListObjectsV2Response",
|
||||
"PutObjectResponse",
|
||||
"GetObjectResponse",
|
||||
"DeleteObjectResponse",
|
||||
"DeleteObjectsResponse",
|
||||
"HeadObjectResponse",
|
||||
"ListBucketsResponse",
|
||||
"CreateBucketResponse",
|
||||
"CopyObjectResponse",
|
||||
"S3Object",
|
||||
]
|
||||
|
||||
@@ -240,6 +240,13 @@ def ls(
|
||||
prefix_str: str
|
||||
bucket_name, prefix_str = parse_s3_url(s3_url)
|
||||
|
||||
# Ensure prefix ends with / if it's meant to be a directory
|
||||
# This helps with proper path handling
|
||||
if prefix_str and not prefix_str.endswith("/"):
|
||||
# Check if this is a file or directory by listing
|
||||
# For now, assume it's a directory prefix
|
||||
prefix_str = prefix_str + "/"
|
||||
|
||||
# Format bytes to human readable
|
||||
def format_bytes(size: int) -> str:
|
||||
if not human_readable:
|
||||
@@ -252,33 +259,38 @@ def ls(
|
||||
return f"{size_float:.1f}P"
|
||||
|
||||
# List objects using SDK (automatically filters .delta and reference.bin)
|
||||
from deltaglider.client import DeltaGliderClient, ListObjectsResponse
|
||||
from deltaglider.client import DeltaGliderClient
|
||||
|
||||
client = DeltaGliderClient(service)
|
||||
dg_response: ListObjectsResponse = client.list_objects(
|
||||
Bucket=bucket_name, Prefix=prefix_str, MaxKeys=10000
|
||||
dg_response = client.list_objects(
|
||||
Bucket=bucket_name,
|
||||
Prefix=prefix_str,
|
||||
MaxKeys=10000,
|
||||
Delimiter="/" if not recursive else "",
|
||||
)
|
||||
objects = dg_response.contents
|
||||
objects = dg_response["Contents"]
|
||||
|
||||
# Filter by recursive flag
|
||||
if not recursive:
|
||||
# Only show direct children
|
||||
seen_prefixes = set()
|
||||
# Show common prefixes (subdirectories) from S3 response
|
||||
for common_prefix in dg_response.get("CommonPrefixes", []):
|
||||
prefix_path = common_prefix.get("Prefix", "")
|
||||
# Show only the directory name, not the full path
|
||||
if prefix_str:
|
||||
# Strip the current prefix to show only the subdirectory
|
||||
display_name = prefix_path[len(prefix_str) :]
|
||||
else:
|
||||
display_name = prefix_path
|
||||
click.echo(f" PRE {display_name}")
|
||||
|
||||
# Only show files at current level (not in subdirectories)
|
||||
filtered_objects = []
|
||||
for obj in objects:
|
||||
rel_path = obj.key[len(prefix_str) :] if prefix_str else obj.key
|
||||
if "/" in rel_path:
|
||||
# It's in a subdirectory
|
||||
subdir = rel_path.split("/")[0] + "/"
|
||||
if subdir not in seen_prefixes:
|
||||
seen_prefixes.add(subdir)
|
||||
# Show as directory
|
||||
full_prefix = f"{prefix_str}{subdir}" if prefix_str else subdir
|
||||
click.echo(f" PRE {full_prefix}")
|
||||
else:
|
||||
# Direct file
|
||||
if rel_path: # Only add if there's actually a file at this level
|
||||
filtered_objects.append(obj)
|
||||
obj_key = obj["Key"]
|
||||
rel_path = obj_key[len(prefix_str) :] if prefix_str else obj_key
|
||||
# Only include if it's a direct child (no / in relative path)
|
||||
if "/" not in rel_path and rel_path:
|
||||
filtered_objects.append(obj)
|
||||
objects = filtered_objects
|
||||
|
||||
# Display objects (SDK already filters reference.bin and strips .delta)
|
||||
@@ -286,19 +298,26 @@ def ls(
|
||||
total_count = 0
|
||||
|
||||
for obj in objects:
|
||||
total_size += obj.size
|
||||
total_size += obj["Size"]
|
||||
total_count += 1
|
||||
|
||||
# Format the display
|
||||
size_str = format_bytes(obj.size)
|
||||
size_str = format_bytes(obj["Size"])
|
||||
# last_modified is a string from SDK, parse it if needed
|
||||
if isinstance(obj.last_modified, str):
|
||||
last_modified = obj.get("LastModified", "")
|
||||
if isinstance(last_modified, str):
|
||||
# Already a string, extract date portion
|
||||
date_str = obj.last_modified[:19].replace("T", " ")
|
||||
date_str = last_modified[:19].replace("T", " ")
|
||||
else:
|
||||
date_str = obj.last_modified.strftime("%Y-%m-%d %H:%M:%S")
|
||||
date_str = last_modified.strftime("%Y-%m-%d %H:%M:%S")
|
||||
|
||||
click.echo(f"{date_str} {size_str:>10} s3://{bucket_name}/{obj.key}")
|
||||
# Show only the filename relative to current prefix (like AWS CLI)
|
||||
if prefix_str:
|
||||
display_key = obj["Key"][len(prefix_str) :]
|
||||
else:
|
||||
display_key = obj["Key"]
|
||||
|
||||
click.echo(f"{date_str} {size_str:>10} {display_key}")
|
||||
|
||||
# Show summary if requested
|
||||
if summarize:
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
35
src/deltaglider/client_delete_helpers.py
Normal file
35
src/deltaglider/client_delete_helpers.py
Normal file
@@ -0,0 +1,35 @@
|
||||
"""Helper utilities for client delete operations."""
|
||||
|
||||
from .core import DeltaService, ObjectKey
|
||||
from .core.errors import NotFoundError
|
||||
|
||||
|
||||
def delete_with_delta_suffix(
|
||||
service: DeltaService, bucket: str, key: str
|
||||
) -> tuple[str, dict[str, object]]:
|
||||
"""Delete an object, retrying with '.delta' suffix when needed.
|
||||
|
||||
Args:
|
||||
service: DeltaService-like instance exposing ``delete(ObjectKey)``.
|
||||
bucket: Target bucket.
|
||||
key: Requested key (without forcing .delta suffix).
|
||||
|
||||
Returns:
|
||||
Tuple containing the actual key deleted in storage and the delete result dict.
|
||||
|
||||
Raises:
|
||||
NotFoundError: Propagated when both the direct and '.delta' keys are missing.
|
||||
"""
|
||||
actual_key = key
|
||||
object_key = ObjectKey(bucket=bucket, key=actual_key)
|
||||
|
||||
try:
|
||||
delete_result = service.delete(object_key)
|
||||
except NotFoundError:
|
||||
if key.endswith(".delta"):
|
||||
raise
|
||||
actual_key = f"{key}.delta"
|
||||
object_key = ObjectKey(bucket=bucket, key=actual_key)
|
||||
delete_result = service.delete(object_key)
|
||||
|
||||
return actual_key, delete_result
|
||||
99
src/deltaglider/client_models.py
Normal file
99
src/deltaglider/client_models.py
Normal file
@@ -0,0 +1,99 @@
|
||||
"""Shared data models for the DeltaGlider client."""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
|
||||
@dataclass
|
||||
class UploadSummary:
|
||||
"""User-friendly upload summary."""
|
||||
|
||||
operation: str
|
||||
bucket: str
|
||||
key: str
|
||||
original_size: int
|
||||
stored_size: int
|
||||
is_delta: bool
|
||||
delta_ratio: float = 0.0
|
||||
|
||||
@property
|
||||
def original_size_mb(self) -> float:
|
||||
"""Original size in MB."""
|
||||
return self.original_size / (1024 * 1024)
|
||||
|
||||
@property
|
||||
def stored_size_mb(self) -> float:
|
||||
"""Stored size in MB."""
|
||||
return self.stored_size / (1024 * 1024)
|
||||
|
||||
@property
|
||||
def savings_percent(self) -> float:
|
||||
"""Percentage saved through compression."""
|
||||
if self.original_size == 0:
|
||||
return 0.0
|
||||
return ((self.original_size - self.stored_size) / self.original_size) * 100
|
||||
|
||||
|
||||
@dataclass
|
||||
class CompressionEstimate:
|
||||
"""Compression estimate for a file."""
|
||||
|
||||
original_size: int
|
||||
estimated_compressed_size: int
|
||||
estimated_ratio: float
|
||||
confidence: float
|
||||
recommended_reference: str | None = None
|
||||
should_use_delta: bool = True
|
||||
|
||||
|
||||
@dataclass
|
||||
class ObjectInfo:
|
||||
"""Detailed object information with compression stats."""
|
||||
|
||||
key: str
|
||||
size: int
|
||||
last_modified: str
|
||||
etag: str | None = None
|
||||
storage_class: str = "STANDARD"
|
||||
|
||||
# DeltaGlider-specific fields
|
||||
original_size: int | None = None
|
||||
compressed_size: int | None = None
|
||||
compression_ratio: float | None = None
|
||||
is_delta: bool = False
|
||||
reference_key: str | None = None
|
||||
delta_chain_length: int = 0
|
||||
|
||||
|
||||
@dataclass
|
||||
class ListObjectsResponse:
|
||||
"""Response from list_objects, compatible with boto3."""
|
||||
|
||||
name: str # Bucket name
|
||||
prefix: str = ""
|
||||
delimiter: str = ""
|
||||
max_keys: int = 1000
|
||||
common_prefixes: list[dict[str, str]] = field(default_factory=list)
|
||||
contents: list[ObjectInfo] = field(default_factory=list)
|
||||
is_truncated: bool = False
|
||||
next_continuation_token: str | None = None
|
||||
continuation_token: str | None = None
|
||||
key_count: int = 0
|
||||
|
||||
@property
|
||||
def objects(self) -> list[ObjectInfo]:
|
||||
"""Alias for contents, for convenience."""
|
||||
return self.contents
|
||||
|
||||
|
||||
@dataclass
|
||||
class BucketStats:
|
||||
"""Statistics for a bucket."""
|
||||
|
||||
bucket: str
|
||||
object_count: int
|
||||
total_size: int
|
||||
compressed_size: int
|
||||
space_saved: int
|
||||
average_compression_ratio: float
|
||||
delta_objects: int
|
||||
direct_objects: int
|
||||
37
src/deltaglider/client_operations/__init__.py
Normal file
37
src/deltaglider/client_operations/__init__.py
Normal file
@@ -0,0 +1,37 @@
|
||||
"""Client operation modules for DeltaGliderClient.
|
||||
|
||||
This package contains modular operation implementations:
|
||||
- bucket: S3 bucket management (create, delete, list)
|
||||
- presigned: Presigned URL generation for temporary access
|
||||
- batch: Batch upload/download operations
|
||||
- stats: Statistics and analytics operations
|
||||
"""
|
||||
|
||||
from .batch import download_batch, upload_batch, upload_chunked
|
||||
from .bucket import create_bucket, delete_bucket, list_buckets
|
||||
from .presigned import generate_presigned_post, generate_presigned_url
|
||||
from .stats import (
|
||||
estimate_compression,
|
||||
find_similar_files,
|
||||
get_bucket_stats,
|
||||
get_object_info,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Bucket operations
|
||||
"create_bucket",
|
||||
"delete_bucket",
|
||||
"list_buckets",
|
||||
# Presigned operations
|
||||
"generate_presigned_url",
|
||||
"generate_presigned_post",
|
||||
# Batch operations
|
||||
"upload_chunked",
|
||||
"upload_batch",
|
||||
"download_batch",
|
||||
# Stats operations
|
||||
"get_bucket_stats",
|
||||
"get_object_info",
|
||||
"estimate_compression",
|
||||
"find_similar_files",
|
||||
]
|
||||
159
src/deltaglider/client_operations/batch.py
Normal file
159
src/deltaglider/client_operations/batch.py
Normal file
@@ -0,0 +1,159 @@
|
||||
"""Batch upload/download operations for DeltaGlider client.
|
||||
|
||||
This module contains DeltaGlider-specific batch operations:
|
||||
- upload_batch
|
||||
- download_batch
|
||||
- upload_chunked
|
||||
"""
|
||||
|
||||
from collections.abc import Callable
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from ..client_models import UploadSummary
|
||||
|
||||
|
||||
def upload_chunked(
|
||||
client: Any, # DeltaGliderClient
|
||||
file_path: str | Path,
|
||||
s3_url: str,
|
||||
chunk_size: int = 5 * 1024 * 1024,
|
||||
progress_callback: Callable[[int, int, int, int], None] | None = None,
|
||||
max_ratio: float = 0.5,
|
||||
) -> UploadSummary:
|
||||
"""Upload a file in chunks with progress callback.
|
||||
|
||||
This method reads the file in chunks to avoid loading large files entirely into memory,
|
||||
making it suitable for uploading very large files. Progress is reported after each chunk.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
file_path: Local file to upload
|
||||
s3_url: S3 destination URL (s3://bucket/path/filename)
|
||||
chunk_size: Size of each chunk in bytes (default 5MB)
|
||||
progress_callback: Callback(chunk_number, total_chunks, bytes_sent, total_bytes)
|
||||
max_ratio: Maximum acceptable delta/file ratio for compression
|
||||
|
||||
Returns:
|
||||
UploadSummary with compression statistics
|
||||
|
||||
Example:
|
||||
def on_progress(chunk_num, total_chunks, bytes_sent, total_bytes):
|
||||
percent = (bytes_sent / total_bytes) * 100
|
||||
print(f"Upload progress: {percent:.1f}%")
|
||||
|
||||
client.upload_chunked(
|
||||
"large_file.zip",
|
||||
"s3://bucket/releases/large_file.zip",
|
||||
chunk_size=10 * 1024 * 1024, # 10MB chunks
|
||||
progress_callback=on_progress
|
||||
)
|
||||
"""
|
||||
file_path = Path(file_path)
|
||||
file_size = file_path.stat().st_size
|
||||
|
||||
# For small files, just use regular upload
|
||||
if file_size <= chunk_size:
|
||||
if progress_callback:
|
||||
progress_callback(1, 1, file_size, file_size)
|
||||
result: UploadSummary = client.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
return result
|
||||
|
||||
# Calculate chunks
|
||||
total_chunks = (file_size + chunk_size - 1) // chunk_size
|
||||
|
||||
# Create a temporary file for chunked processing
|
||||
# For now, we read the entire file but report progress in chunks
|
||||
# Future enhancement: implement true streaming upload in storage adapter
|
||||
bytes_read = 0
|
||||
|
||||
with open(file_path, "rb") as f:
|
||||
for chunk_num in range(1, total_chunks + 1):
|
||||
# Read chunk (simulated for progress reporting)
|
||||
chunk_data = f.read(chunk_size)
|
||||
bytes_read += len(chunk_data)
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(chunk_num, total_chunks, bytes_read, file_size)
|
||||
|
||||
# Perform the actual upload
|
||||
# TODO: When storage adapter supports streaming, pass chunks directly
|
||||
upload_result: UploadSummary = client.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
|
||||
# Final progress callback
|
||||
if progress_callback:
|
||||
progress_callback(total_chunks, total_chunks, file_size, file_size)
|
||||
|
||||
return upload_result
|
||||
|
||||
|
||||
def upload_batch(
|
||||
client: Any, # DeltaGliderClient
|
||||
files: list[str | Path],
|
||||
s3_prefix: str,
|
||||
max_ratio: float = 0.5,
|
||||
progress_callback: Callable[[str, int, int], None] | None = None,
|
||||
) -> list[UploadSummary]:
|
||||
"""Upload multiple files in batch.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
files: List of local file paths
|
||||
s3_prefix: S3 destination prefix (s3://bucket/prefix/)
|
||||
max_ratio: Maximum acceptable delta/file ratio
|
||||
progress_callback: Callback(filename, current_file_index, total_files)
|
||||
|
||||
Returns:
|
||||
List of UploadSummary objects
|
||||
"""
|
||||
results = []
|
||||
|
||||
for i, file_path in enumerate(files):
|
||||
file_path = Path(file_path)
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(file_path.name, i + 1, len(files))
|
||||
|
||||
# Upload each file
|
||||
s3_url = f"{s3_prefix.rstrip('/')}/{file_path.name}"
|
||||
summary = client.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
results.append(summary)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def download_batch(
|
||||
client: Any, # DeltaGliderClient
|
||||
s3_urls: list[str],
|
||||
output_dir: str | Path,
|
||||
progress_callback: Callable[[str, int, int], None] | None = None,
|
||||
) -> list[Path]:
|
||||
"""Download multiple files in batch.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
s3_urls: List of S3 URLs to download
|
||||
output_dir: Local directory to save files
|
||||
progress_callback: Callback(filename, current_file_index, total_files)
|
||||
|
||||
Returns:
|
||||
List of downloaded file paths
|
||||
"""
|
||||
output_dir = Path(output_dir)
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
results = []
|
||||
|
||||
for i, s3_url in enumerate(s3_urls):
|
||||
# Extract filename from URL
|
||||
filename = s3_url.split("/")[-1]
|
||||
if filename.endswith(".delta"):
|
||||
filename = filename[:-6] # Remove .delta suffix
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(filename, i + 1, len(s3_urls))
|
||||
|
||||
output_path = output_dir / filename
|
||||
client.download(s3_url, output_path)
|
||||
results.append(output_path)
|
||||
|
||||
return results
|
||||
152
src/deltaglider/client_operations/bucket.py
Normal file
152
src/deltaglider/client_operations/bucket.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""Bucket management operations for DeltaGlider client.
|
||||
|
||||
This module contains boto3-compatible bucket operations:
|
||||
- create_bucket
|
||||
- delete_bucket
|
||||
- list_buckets
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
|
||||
def create_bucket(
|
||||
client: Any, # DeltaGliderClient (avoiding circular import)
|
||||
Bucket: str,
|
||||
CreateBucketConfiguration: dict[str, str] | None = None,
|
||||
**kwargs: Any,
|
||||
) -> dict[str, Any]:
|
||||
"""Create an S3 bucket (boto3-compatible).
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
Bucket: Bucket name to create
|
||||
CreateBucketConfiguration: Optional bucket configuration (e.g., LocationConstraint)
|
||||
**kwargs: Additional S3 parameters (for compatibility)
|
||||
|
||||
Returns:
|
||||
Response dict with bucket location
|
||||
|
||||
Example:
|
||||
>>> client = create_client()
|
||||
>>> client.create_bucket(Bucket='my-bucket')
|
||||
>>> # With region
|
||||
>>> client.create_bucket(
|
||||
... Bucket='my-bucket',
|
||||
... CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
|
||||
... )
|
||||
"""
|
||||
storage_adapter = client.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
params: dict[str, Any] = {"Bucket": Bucket}
|
||||
if CreateBucketConfiguration:
|
||||
params["CreateBucketConfiguration"] = CreateBucketConfiguration
|
||||
|
||||
response = storage_adapter.client.create_bucket(**params)
|
||||
return {
|
||||
"Location": response.get("Location", f"/{Bucket}"),
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
if "BucketAlreadyExists" in error_msg or "BucketAlreadyOwnedByYou" in error_msg:
|
||||
# Bucket already exists - return success
|
||||
client.service.logger.debug(f"Bucket {Bucket} already exists")
|
||||
return {
|
||||
"Location": f"/{Bucket}",
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
raise RuntimeError(f"Failed to create bucket: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket creation")
|
||||
|
||||
|
||||
def delete_bucket(
|
||||
client: Any, # DeltaGliderClient
|
||||
Bucket: str,
|
||||
**kwargs: Any,
|
||||
) -> dict[str, Any]:
|
||||
"""Delete an S3 bucket (boto3-compatible).
|
||||
|
||||
Note: Bucket must be empty before deletion.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
Bucket: Bucket name to delete
|
||||
**kwargs: Additional S3 parameters (for compatibility)
|
||||
|
||||
Returns:
|
||||
Response dict with deletion status
|
||||
|
||||
Example:
|
||||
>>> client = create_client()
|
||||
>>> client.delete_bucket(Bucket='my-bucket')
|
||||
"""
|
||||
storage_adapter = client.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
storage_adapter.client.delete_bucket(Bucket=Bucket)
|
||||
return {
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 204,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
if "NoSuchBucket" in error_msg:
|
||||
# Bucket doesn't exist - return success
|
||||
client.service.logger.debug(f"Bucket {Bucket} does not exist")
|
||||
return {
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 204,
|
||||
},
|
||||
}
|
||||
raise RuntimeError(f"Failed to delete bucket: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket deletion")
|
||||
|
||||
|
||||
def list_buckets(
|
||||
client: Any, # DeltaGliderClient
|
||||
**kwargs: Any,
|
||||
) -> dict[str, Any]:
|
||||
"""List all S3 buckets (boto3-compatible).
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
**kwargs: Additional S3 parameters (for compatibility)
|
||||
|
||||
Returns:
|
||||
Response dict with bucket list
|
||||
|
||||
Example:
|
||||
>>> client = create_client()
|
||||
>>> response = client.list_buckets()
|
||||
>>> for bucket in response['Buckets']:
|
||||
... print(bucket['Name'])
|
||||
"""
|
||||
storage_adapter = client.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
response = storage_adapter.client.list_buckets()
|
||||
return {
|
||||
"Buckets": response.get("Buckets", []),
|
||||
"Owner": response.get("Owner", {}),
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Failed to list buckets: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket listing")
|
||||
124
src/deltaglider/client_operations/presigned.py
Normal file
124
src/deltaglider/client_operations/presigned.py
Normal file
@@ -0,0 +1,124 @@
|
||||
"""Presigned URL operations for DeltaGlider client.
|
||||
|
||||
This module contains boto3-compatible presigned URL operations:
|
||||
- generate_presigned_url
|
||||
- generate_presigned_post
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
|
||||
def try_boto3_presigned_operation(
|
||||
client: Any, # DeltaGliderClient
|
||||
operation: str,
|
||||
**kwargs: Any,
|
||||
) -> Any | None:
|
||||
"""Try to generate presigned operation using boto3 client, return None if not available."""
|
||||
storage_adapter = client.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
if operation == "url":
|
||||
return str(storage_adapter.client.generate_presigned_url(**kwargs))
|
||||
elif operation == "post":
|
||||
return dict(storage_adapter.client.generate_presigned_post(**kwargs))
|
||||
except AttributeError:
|
||||
# storage_adapter does not have a 'client' attribute
|
||||
pass
|
||||
except Exception as e:
|
||||
# Fall back to manual construction if needed
|
||||
client.service.logger.warning(f"Failed to generate presigned {operation}: {e}")
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def generate_presigned_url(
|
||||
client: Any, # DeltaGliderClient
|
||||
ClientMethod: str,
|
||||
Params: dict[str, Any],
|
||||
ExpiresIn: int = 3600,
|
||||
) -> str:
|
||||
"""Generate presigned URL (boto3-compatible).
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
ClientMethod: Method name ('get_object' or 'put_object')
|
||||
Params: Parameters dict with Bucket and Key
|
||||
ExpiresIn: URL expiration in seconds
|
||||
|
||||
Returns:
|
||||
Presigned URL string
|
||||
"""
|
||||
# Try boto3 first, fallback to manual construction
|
||||
url = try_boto3_presigned_operation(
|
||||
client,
|
||||
"url",
|
||||
ClientMethod=ClientMethod,
|
||||
Params=Params,
|
||||
ExpiresIn=ExpiresIn,
|
||||
)
|
||||
if url is not None:
|
||||
return str(url)
|
||||
|
||||
# Fallback: construct URL manually (less secure, for dev/testing only)
|
||||
bucket = Params.get("Bucket", "")
|
||||
key = Params.get("Key", "")
|
||||
|
||||
if client.endpoint_url:
|
||||
base_url = client.endpoint_url
|
||||
else:
|
||||
base_url = f"https://{bucket}.s3.amazonaws.com"
|
||||
|
||||
# Warning: This is not a real presigned URL, just a placeholder
|
||||
client.service.logger.warning("Using placeholder presigned URL - not suitable for production")
|
||||
return f"{base_url}/{key}?expires={ExpiresIn}"
|
||||
|
||||
|
||||
def generate_presigned_post(
|
||||
client: Any, # DeltaGliderClient
|
||||
Bucket: str,
|
||||
Key: str,
|
||||
Fields: dict[str, str] | None = None,
|
||||
Conditions: list[Any] | None = None,
|
||||
ExpiresIn: int = 3600,
|
||||
) -> dict[str, Any]:
|
||||
"""Generate presigned POST data for HTML forms (boto3-compatible).
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
Bucket: S3 bucket name
|
||||
Key: Object key
|
||||
Fields: Additional fields to include
|
||||
Conditions: Upload conditions
|
||||
ExpiresIn: URL expiration in seconds
|
||||
|
||||
Returns:
|
||||
Dict with 'url' and 'fields' for form submission
|
||||
"""
|
||||
# Try boto3 first, fallback to manual construction
|
||||
response = try_boto3_presigned_operation(
|
||||
client,
|
||||
"post",
|
||||
Bucket=Bucket,
|
||||
Key=Key,
|
||||
Fields=Fields,
|
||||
Conditions=Conditions,
|
||||
ExpiresIn=ExpiresIn,
|
||||
)
|
||||
if response is not None:
|
||||
return dict(response)
|
||||
|
||||
# Fallback: return minimal structure for compatibility
|
||||
if client.endpoint_url:
|
||||
url = f"{client.endpoint_url}/{Bucket}"
|
||||
else:
|
||||
url = f"https://{Bucket}.s3.amazonaws.com"
|
||||
|
||||
return {
|
||||
"url": url,
|
||||
"fields": {
|
||||
"key": Key,
|
||||
**(Fields or {}),
|
||||
},
|
||||
}
|
||||
337
src/deltaglider/client_operations/stats.py
Normal file
337
src/deltaglider/client_operations/stats.py
Normal file
@@ -0,0 +1,337 @@
|
||||
"""Statistics and analysis operations for DeltaGlider client.
|
||||
|
||||
This module contains DeltaGlider-specific statistics operations:
|
||||
- get_bucket_stats
|
||||
- get_object_info
|
||||
- estimate_compression
|
||||
- find_similar_files
|
||||
"""
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from ..client_models import BucketStats, CompressionEstimate, ObjectInfo
|
||||
|
||||
|
||||
def get_object_info(
|
||||
client: Any, # DeltaGliderClient
|
||||
s3_url: str,
|
||||
) -> ObjectInfo:
|
||||
"""Get detailed object information including compression stats.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
s3_url: S3 URL of the object
|
||||
|
||||
Returns:
|
||||
ObjectInfo with detailed metadata
|
||||
"""
|
||||
# Parse URL
|
||||
if not s3_url.startswith("s3://"):
|
||||
raise ValueError(f"Invalid S3 URL: {s3_url}")
|
||||
|
||||
s3_path = s3_url[5:]
|
||||
parts = s3_path.split("/", 1)
|
||||
bucket = parts[0]
|
||||
key = parts[1] if len(parts) > 1 else ""
|
||||
|
||||
# Get object metadata
|
||||
obj_head = client.service.storage.head(f"{bucket}/{key}")
|
||||
if not obj_head:
|
||||
raise FileNotFoundError(f"Object not found: {s3_url}")
|
||||
|
||||
metadata = obj_head.metadata
|
||||
is_delta = key.endswith(".delta")
|
||||
|
||||
return ObjectInfo(
|
||||
key=key,
|
||||
size=obj_head.size,
|
||||
last_modified=metadata.get("last_modified", ""),
|
||||
etag=metadata.get("etag"),
|
||||
original_size=int(metadata.get("file_size", obj_head.size)),
|
||||
compressed_size=obj_head.size,
|
||||
compression_ratio=float(metadata.get("compression_ratio", 0.0)),
|
||||
is_delta=is_delta,
|
||||
reference_key=metadata.get("ref_key"),
|
||||
)
|
||||
|
||||
|
||||
def get_bucket_stats(
|
||||
client: Any, # DeltaGliderClient
|
||||
bucket: str,
|
||||
detailed_stats: bool = False,
|
||||
) -> BucketStats:
|
||||
"""Get statistics for a bucket with optional detailed compression metrics.
|
||||
|
||||
This method provides two modes:
|
||||
- Quick stats (default): Fast overview using LIST only (~50ms)
|
||||
- Detailed stats: Accurate compression metrics with HEAD requests (slower)
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
bucket: S3 bucket name
|
||||
detailed_stats: If True, fetch accurate compression ratios for delta files (default: False)
|
||||
|
||||
Returns:
|
||||
BucketStats with compression and space savings info
|
||||
|
||||
Performance:
|
||||
- With detailed_stats=False: ~50ms for any bucket size (1 LIST call per 1000 objects)
|
||||
- With detailed_stats=True: ~2-3s per 1000 objects (adds HEAD calls for delta files only)
|
||||
|
||||
Example:
|
||||
# Quick stats for dashboard display
|
||||
stats = client.get_bucket_stats('releases')
|
||||
print(f"Objects: {stats.object_count}, Size: {stats.total_size}")
|
||||
|
||||
# Detailed stats for analytics (slower but accurate)
|
||||
stats = client.get_bucket_stats('releases', detailed_stats=True)
|
||||
print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
|
||||
"""
|
||||
# List all objects with smart metadata fetching
|
||||
all_objects = []
|
||||
continuation_token = None
|
||||
|
||||
while True:
|
||||
response = client.list_objects(
|
||||
Bucket=bucket,
|
||||
MaxKeys=1000,
|
||||
ContinuationToken=continuation_token,
|
||||
FetchMetadata=detailed_stats, # Only fetch metadata if detailed stats requested
|
||||
)
|
||||
|
||||
# Extract S3Objects from response (with Metadata containing DeltaGlider info)
|
||||
for obj_dict in response["Contents"]:
|
||||
# Convert dict back to ObjectInfo for backward compatibility with stats calculation
|
||||
metadata = obj_dict.get("Metadata", {})
|
||||
# Parse compression ratio safely (handle "unknown" value)
|
||||
compression_ratio_str = metadata.get("deltaglider-compression-ratio", "0.0")
|
||||
try:
|
||||
compression_ratio = (
|
||||
float(compression_ratio_str) if compression_ratio_str != "unknown" else 0.0
|
||||
)
|
||||
except ValueError:
|
||||
compression_ratio = 0.0
|
||||
|
||||
all_objects.append(
|
||||
ObjectInfo(
|
||||
key=obj_dict["Key"],
|
||||
size=obj_dict["Size"],
|
||||
last_modified=obj_dict.get("LastModified", ""),
|
||||
etag=obj_dict.get("ETag"),
|
||||
storage_class=obj_dict.get("StorageClass", "STANDARD"),
|
||||
original_size=int(metadata.get("deltaglider-original-size", obj_dict["Size"])),
|
||||
compressed_size=obj_dict["Size"],
|
||||
is_delta=metadata.get("deltaglider-is-delta", "false") == "true",
|
||||
compression_ratio=compression_ratio,
|
||||
reference_key=metadata.get("deltaglider-reference-key"),
|
||||
)
|
||||
)
|
||||
|
||||
if not response.get("IsTruncated"):
|
||||
break
|
||||
|
||||
continuation_token = response.get("NextContinuationToken")
|
||||
|
||||
# Calculate statistics
|
||||
total_size = 0
|
||||
compressed_size = 0
|
||||
delta_count = 0
|
||||
direct_count = 0
|
||||
|
||||
for obj in all_objects:
|
||||
# Skip reference.bin files - they are internal implementation details
|
||||
# and their size is already accounted for in delta metadata
|
||||
if obj.key.endswith("/reference.bin") or obj.key == "reference.bin":
|
||||
continue
|
||||
|
||||
compressed_size += obj.size
|
||||
|
||||
if obj.is_delta:
|
||||
delta_count += 1
|
||||
# Use actual original size if we have it, otherwise estimate
|
||||
total_size += obj.original_size or obj.size
|
||||
else:
|
||||
direct_count += 1
|
||||
# For non-delta files, original equals compressed
|
||||
total_size += obj.size
|
||||
|
||||
space_saved = total_size - compressed_size
|
||||
avg_ratio = (space_saved / total_size) if total_size > 0 else 0.0
|
||||
|
||||
return BucketStats(
|
||||
bucket=bucket,
|
||||
object_count=len(all_objects),
|
||||
total_size=total_size,
|
||||
compressed_size=compressed_size,
|
||||
space_saved=space_saved,
|
||||
average_compression_ratio=avg_ratio,
|
||||
delta_objects=delta_count,
|
||||
direct_objects=direct_count,
|
||||
)
|
||||
|
||||
|
||||
def estimate_compression(
|
||||
client: Any, # DeltaGliderClient
|
||||
file_path: str | Path,
|
||||
bucket: str,
|
||||
prefix: str = "",
|
||||
sample_size: int = 1024 * 1024,
|
||||
) -> CompressionEstimate:
|
||||
"""Estimate compression ratio before upload.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
file_path: Local file to estimate
|
||||
bucket: Target bucket
|
||||
prefix: Target prefix (for finding similar files)
|
||||
sample_size: Bytes to sample for estimation (default 1MB)
|
||||
|
||||
Returns:
|
||||
CompressionEstimate with predicted compression
|
||||
"""
|
||||
file_path = Path(file_path)
|
||||
file_size = file_path.stat().st_size
|
||||
|
||||
# Check file extension
|
||||
ext = file_path.suffix.lower()
|
||||
delta_extensions = {
|
||||
".zip",
|
||||
".tar",
|
||||
".gz",
|
||||
".tar.gz",
|
||||
".tgz",
|
||||
".bz2",
|
||||
".tar.bz2",
|
||||
".xz",
|
||||
".tar.xz",
|
||||
".7z",
|
||||
".rar",
|
||||
".dmg",
|
||||
".iso",
|
||||
".pkg",
|
||||
".deb",
|
||||
".rpm",
|
||||
".apk",
|
||||
".jar",
|
||||
".war",
|
||||
".ear",
|
||||
}
|
||||
|
||||
# Already compressed formats that won't benefit from delta
|
||||
incompressible = {".jpg", ".jpeg", ".png", ".mp4", ".mp3", ".avi", ".mov"}
|
||||
|
||||
if ext in incompressible:
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=file_size,
|
||||
estimated_ratio=0.0,
|
||||
confidence=0.95,
|
||||
should_use_delta=False,
|
||||
)
|
||||
|
||||
if ext not in delta_extensions:
|
||||
# Unknown type, conservative estimate
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=file_size,
|
||||
estimated_ratio=0.0,
|
||||
confidence=0.5,
|
||||
should_use_delta=file_size > 1024 * 1024, # Only for files > 1MB
|
||||
)
|
||||
|
||||
# Look for similar files in the target location
|
||||
similar_files = find_similar_files(client, bucket, prefix, file_path.name)
|
||||
|
||||
if similar_files:
|
||||
# If we have similar files, estimate high compression
|
||||
estimated_ratio = 0.99 # 99% compression typical for similar versions
|
||||
confidence = 0.9
|
||||
recommended_ref = similar_files[0]["Key"] if similar_files else None
|
||||
else:
|
||||
# First file of its type
|
||||
estimated_ratio = 0.0
|
||||
confidence = 0.7
|
||||
recommended_ref = None
|
||||
|
||||
estimated_size = int(file_size * (1 - estimated_ratio))
|
||||
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=estimated_size,
|
||||
estimated_ratio=estimated_ratio,
|
||||
confidence=confidence,
|
||||
recommended_reference=recommended_ref,
|
||||
should_use_delta=True,
|
||||
)
|
||||
|
||||
|
||||
def find_similar_files(
|
||||
client: Any, # DeltaGliderClient
|
||||
bucket: str,
|
||||
prefix: str,
|
||||
filename: str,
|
||||
limit: int = 5,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Find similar files that could serve as references.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
bucket: S3 bucket
|
||||
prefix: Prefix to search in
|
||||
filename: Filename to match against
|
||||
limit: Maximum number of results
|
||||
|
||||
Returns:
|
||||
List of similar files with scores
|
||||
"""
|
||||
# List objects in the prefix (no metadata needed for similarity check)
|
||||
response = client.list_objects(
|
||||
Bucket=bucket,
|
||||
Prefix=prefix,
|
||||
MaxKeys=1000,
|
||||
FetchMetadata=False, # Don't need metadata for similarity
|
||||
)
|
||||
|
||||
similar: list[dict[str, Any]] = []
|
||||
base_name = Path(filename).stem
|
||||
ext = Path(filename).suffix
|
||||
|
||||
for obj in response["Contents"]:
|
||||
obj_key = obj["Key"]
|
||||
obj_base = Path(obj_key).stem
|
||||
obj_ext = Path(obj_key).suffix
|
||||
|
||||
# Skip delta files and references
|
||||
if obj_key.endswith(".delta") or obj_key.endswith("reference.bin"):
|
||||
continue
|
||||
|
||||
score = 0.0
|
||||
|
||||
# Extension match
|
||||
if ext == obj_ext:
|
||||
score += 0.5
|
||||
|
||||
# Base name similarity
|
||||
if base_name in obj_base or obj_base in base_name:
|
||||
score += 0.3
|
||||
|
||||
# Version pattern match
|
||||
if re.search(r"v?\d+[\.\d]*", base_name) and re.search(r"v?\d+[\.\d]*", obj_base):
|
||||
score += 0.2
|
||||
|
||||
if score > 0.5:
|
||||
similar.append(
|
||||
{
|
||||
"Key": obj_key,
|
||||
"Size": obj["Size"],
|
||||
"Similarity": score,
|
||||
"LastModified": obj["LastModified"],
|
||||
}
|
||||
)
|
||||
|
||||
# Sort by similarity
|
||||
similar.sort(key=lambda x: x["Similarity"], reverse=True) # type: ignore
|
||||
|
||||
return similar[:limit]
|
||||
@@ -21,7 +21,6 @@ from .errors import (
|
||||
IntegrityMismatchError,
|
||||
NotFoundError,
|
||||
PolicyViolationWarning,
|
||||
StorageIOError,
|
||||
)
|
||||
from .models import (
|
||||
DeltaMeta,
|
||||
@@ -171,10 +170,28 @@ class DeltaService:
|
||||
if obj_head is None:
|
||||
raise NotFoundError(f"Object not found: {object_key.key}")
|
||||
|
||||
# Check if this is a regular S3 object (not uploaded via DeltaGlider)
|
||||
# Regular S3 objects won't have DeltaGlider metadata
|
||||
if "file_sha256" not in obj_head.metadata:
|
||||
raise StorageIOError(f"Missing metadata on {object_key.key}")
|
||||
# This is a regular S3 object, download it directly
|
||||
self.logger.info(
|
||||
"Downloading regular S3 object (no DeltaGlider metadata)",
|
||||
key=object_key.key,
|
||||
)
|
||||
self._get_direct(object_key, obj_head, out)
|
||||
duration = (self.clock.now() - start_time).total_seconds()
|
||||
self.logger.log_operation(
|
||||
op="get",
|
||||
key=object_key.key,
|
||||
deltaspace=f"{object_key.bucket}",
|
||||
sizes={"file": obj_head.size},
|
||||
durations={"total": duration},
|
||||
cache_hit=False,
|
||||
)
|
||||
self.metrics.timing("deltaglider.get.duration", duration)
|
||||
return
|
||||
|
||||
# Check if this is a direct upload (non-delta)
|
||||
# Check if this is a direct upload (non-delta) uploaded via DeltaGlider
|
||||
if obj_head.metadata.get("compression") == "none":
|
||||
# Direct download without delta processing
|
||||
self._get_direct(object_key, obj_head, out)
|
||||
|
||||
152
src/deltaglider/response_builders.py
Normal file
152
src/deltaglider/response_builders.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""Type-safe response builders using TypedDicts for internal type safety.
|
||||
|
||||
This module provides builder functions that construct boto3-compatible responses
|
||||
with full compile-time type validation using TypedDicts. At runtime, TypedDicts
|
||||
are plain dicts, so there's no conversion overhead.
|
||||
|
||||
Benefits:
|
||||
- Field name typos caught by mypy (e.g., "HTTPStatusCode" → "HttpStatusCode")
|
||||
- Wrong types caught by mypy (e.g., string instead of int)
|
||||
- Missing required fields caught by mypy
|
||||
- Extra unknown fields caught by mypy
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
from .types import (
|
||||
CommonPrefix,
|
||||
DeleteObjectResponse,
|
||||
GetObjectResponse,
|
||||
ListObjectsV2Response,
|
||||
PutObjectResponse,
|
||||
ResponseMetadata,
|
||||
S3Object,
|
||||
)
|
||||
|
||||
|
||||
def build_response_metadata(status_code: int = 200) -> ResponseMetadata:
|
||||
"""Build ResponseMetadata with full type safety via TypedDict.
|
||||
|
||||
TypedDict is a dict at runtime - no conversion needed!
|
||||
mypy validates all fields match ResponseMetadata TypedDict.
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
"""
|
||||
# Build as TypedDict - mypy validates field names and types!
|
||||
metadata: ResponseMetadata = {
|
||||
"HTTPStatusCode": status_code,
|
||||
# All other fields are NotRequired - can be omitted!
|
||||
}
|
||||
return metadata # Returns dict at runtime, ResponseMetadata type at compile-time
|
||||
|
||||
|
||||
def build_put_response(
|
||||
etag: str,
|
||||
*,
|
||||
version_id: str | None = None,
|
||||
deltaglider_info: dict[str, Any] | None = None,
|
||||
) -> PutObjectResponse:
|
||||
"""Build PutObjectResponse with full type safety via TypedDict.
|
||||
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
mypy validates all field names, types, and structure.
|
||||
"""
|
||||
# Build as TypedDict - mypy catches typos and type errors!
|
||||
response: PutObjectResponse = {
|
||||
"ETag": etag,
|
||||
"ResponseMetadata": build_response_metadata(),
|
||||
}
|
||||
|
||||
if version_id:
|
||||
response["VersionId"] = version_id
|
||||
|
||||
# DeltaGlider extension - add as Any field
|
||||
if deltaglider_info:
|
||||
response["DeltaGliderInfo"] = deltaglider_info # type: ignore[typeddict-item]
|
||||
|
||||
return response # Returns dict at runtime, PutObjectResponse type at compile-time
|
||||
|
||||
|
||||
def build_get_response(
|
||||
body: Any,
|
||||
content_length: int,
|
||||
etag: str,
|
||||
metadata: dict[str, Any],
|
||||
) -> GetObjectResponse:
|
||||
"""Build GetObjectResponse with full type safety via TypedDict.
|
||||
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
mypy validates all field names, types, and structure.
|
||||
"""
|
||||
# Build as TypedDict - mypy catches typos and type errors!
|
||||
response: GetObjectResponse = {
|
||||
"Body": body,
|
||||
"ContentLength": content_length,
|
||||
"ETag": etag,
|
||||
"Metadata": metadata,
|
||||
"ResponseMetadata": build_response_metadata(),
|
||||
}
|
||||
return response # Returns dict at runtime, GetObjectResponse type at compile-time
|
||||
|
||||
|
||||
def build_list_objects_response(
|
||||
bucket: str,
|
||||
prefix: str,
|
||||
delimiter: str,
|
||||
max_keys: int,
|
||||
contents: list[S3Object],
|
||||
common_prefixes: list[CommonPrefix] | None,
|
||||
is_truncated: bool,
|
||||
next_continuation_token: str | None,
|
||||
continuation_token: str | None,
|
||||
) -> ListObjectsV2Response:
|
||||
"""Build ListObjectsV2Response with full type safety via TypedDict.
|
||||
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
mypy validates all field names, types, and structure.
|
||||
"""
|
||||
# Build as TypedDict - mypy catches typos and type errors!
|
||||
response: ListObjectsV2Response = {
|
||||
"IsTruncated": is_truncated,
|
||||
"Contents": contents,
|
||||
"Name": bucket,
|
||||
"Prefix": prefix,
|
||||
"Delimiter": delimiter,
|
||||
"MaxKeys": max_keys,
|
||||
"KeyCount": len(contents),
|
||||
"ResponseMetadata": build_response_metadata(),
|
||||
}
|
||||
|
||||
# Add optional fields
|
||||
if common_prefixes:
|
||||
response["CommonPrefixes"] = common_prefixes
|
||||
|
||||
if next_continuation_token:
|
||||
response["NextContinuationToken"] = next_continuation_token
|
||||
|
||||
if continuation_token:
|
||||
response["ContinuationToken"] = continuation_token
|
||||
|
||||
return response # Returns dict at runtime, ListObjectsV2Response type at compile-time
|
||||
|
||||
|
||||
def build_delete_response(
|
||||
delete_marker: bool = False,
|
||||
status_code: int = 204,
|
||||
deltaglider_info: dict[str, Any] | None = None,
|
||||
) -> DeleteObjectResponse:
|
||||
"""Build DeleteObjectResponse with full type safety via TypedDict.
|
||||
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
mypy validates all field names, types, and structure.
|
||||
"""
|
||||
# Build as TypedDict - mypy catches typos and type errors!
|
||||
response: DeleteObjectResponse = {
|
||||
"DeleteMarker": delete_marker,
|
||||
"ResponseMetadata": build_response_metadata(status_code),
|
||||
}
|
||||
|
||||
# DeltaGlider extension
|
||||
if deltaglider_info:
|
||||
response["DeltaGliderInfo"] = deltaglider_info # type: ignore[typeddict-item]
|
||||
|
||||
return response # Returns dict at runtime, DeleteObjectResponse type at compile-time
|
||||
355
src/deltaglider/types.py
Normal file
355
src/deltaglider/types.py
Normal file
@@ -0,0 +1,355 @@
|
||||
"""Type definitions for boto3-compatible responses.
|
||||
|
||||
These TypedDict definitions provide type hints for DeltaGlider's boto3-compatible
|
||||
responses. All methods return plain `dict[str, Any]` at runtime for maximum
|
||||
flexibility and boto3 compatibility.
|
||||
|
||||
## Basic Usage (Recommended)
|
||||
|
||||
Use DeltaGlider with simple dict access - no type imports needed:
|
||||
|
||||
```python
|
||||
from deltaglider import create_client
|
||||
|
||||
client = create_client()
|
||||
|
||||
# Returns plain dict - 100% boto3 compatible
|
||||
response = client.put_object(Bucket='my-bucket', Key='file.zip', Body=data)
|
||||
print(response['ETag'])
|
||||
|
||||
# List objects with dict access
|
||||
listing = client.list_objects(Bucket='my-bucket')
|
||||
for obj in listing['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
```
|
||||
|
||||
## Optional Type Hints
|
||||
|
||||
For IDE autocomplete and type checking, you can use our convenience TypedDicts:
|
||||
|
||||
```python
|
||||
from deltaglider import create_client
|
||||
from deltaglider.types import PutObjectResponse, ListObjectsV2Response
|
||||
|
||||
client = create_client()
|
||||
response: PutObjectResponse = client.put_object(...) # IDE autocomplete
|
||||
listing: ListObjectsV2Response = client.list_objects(...)
|
||||
```
|
||||
|
||||
## Advanced: boto3-stubs Integration
|
||||
|
||||
For strictest type checking (requires boto3-stubs installation):
|
||||
|
||||
```bash
|
||||
pip install boto3-stubs[s3]
|
||||
```
|
||||
|
||||
```python
|
||||
from mypy_boto3_s3.type_defs import PutObjectOutputTypeDef
|
||||
response: PutObjectOutputTypeDef = client.put_object(...)
|
||||
```
|
||||
|
||||
**Note**: boto3-stubs TypedDefs are very strict and require ALL optional fields.
|
||||
DeltaGlider returns partial dicts for better boto3 compatibility, so boto3-stubs
|
||||
types may show false positive errors. Use `dict[str, Any]` or our TypedDicts instead.
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
DeltaGlider returns `dict[str, Any]` from all boto3-compatible methods because:
|
||||
1. **Flexibility**: boto3 responses vary by service and operation
|
||||
2. **Compatibility**: Exact match with boto3 runtime behavior
|
||||
3. **Simplicity**: No complex type dependencies for users
|
||||
4. **Optional Typing**: Users choose their preferred level of type safety
|
||||
"""
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any, Literal, NotRequired, TypedDict
|
||||
|
||||
# ============================================================================
|
||||
# S3 Object Types
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class S3Object(TypedDict):
|
||||
"""An S3 object returned in list operations.
|
||||
|
||||
Compatible with boto3's S3.Client.list_objects_v2() response Contents.
|
||||
"""
|
||||
|
||||
Key: str
|
||||
Size: int
|
||||
LastModified: datetime
|
||||
ETag: NotRequired[str]
|
||||
StorageClass: NotRequired[str]
|
||||
Owner: NotRequired[dict[str, str]]
|
||||
Metadata: NotRequired[dict[str, str]]
|
||||
|
||||
|
||||
class CommonPrefix(TypedDict):
|
||||
"""A common prefix (directory) in S3 listing.
|
||||
|
||||
Compatible with boto3's S3.Client.list_objects_v2() response CommonPrefixes.
|
||||
"""
|
||||
|
||||
Prefix: str
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Response Metadata (used in all responses)
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class ResponseMetadata(TypedDict):
|
||||
"""Metadata about the API response.
|
||||
|
||||
Compatible with all boto3 responses.
|
||||
"""
|
||||
|
||||
RequestId: NotRequired[str]
|
||||
HostId: NotRequired[str]
|
||||
HTTPStatusCode: int
|
||||
HTTPHeaders: NotRequired[dict[str, str]]
|
||||
RetryAttempts: NotRequired[int]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# List Operations Response Types
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class ListObjectsV2Response(TypedDict):
|
||||
"""Response from list_objects_v2 operation.
|
||||
|
||||
100% compatible with boto3's S3.Client.list_objects_v2() response.
|
||||
|
||||
Example:
|
||||
```python
|
||||
client = create_client()
|
||||
response: ListObjectsV2Response = client.list_objects(
|
||||
Bucket='my-bucket',
|
||||
Prefix='path/',
|
||||
Delimiter='/'
|
||||
)
|
||||
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
|
||||
for prefix in response.get('CommonPrefixes', []):
|
||||
print(f"Directory: {prefix['Prefix']}")
|
||||
```
|
||||
"""
|
||||
|
||||
Contents: list[S3Object]
|
||||
Name: NotRequired[str] # Bucket name
|
||||
Prefix: NotRequired[str]
|
||||
Delimiter: NotRequired[str]
|
||||
MaxKeys: NotRequired[int]
|
||||
CommonPrefixes: NotRequired[list[CommonPrefix]]
|
||||
EncodingType: NotRequired[str]
|
||||
KeyCount: NotRequired[int]
|
||||
ContinuationToken: NotRequired[str]
|
||||
NextContinuationToken: NotRequired[str]
|
||||
StartAfter: NotRequired[str]
|
||||
IsTruncated: NotRequired[bool]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Put/Get/Delete Response Types
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class PutObjectResponse(TypedDict):
|
||||
"""Response from put_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.put_object() response.
|
||||
"""
|
||||
|
||||
ETag: str
|
||||
VersionId: NotRequired[str]
|
||||
ServerSideEncryption: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
class GetObjectResponse(TypedDict):
|
||||
"""Response from get_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.get_object() response.
|
||||
"""
|
||||
|
||||
Body: Any # StreamingBody in boto3, bytes in DeltaGlider
|
||||
ContentLength: int
|
||||
ContentType: NotRequired[str]
|
||||
ETag: NotRequired[str]
|
||||
LastModified: NotRequired[datetime]
|
||||
Metadata: NotRequired[dict[str, str]]
|
||||
VersionId: NotRequired[str]
|
||||
StorageClass: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
class DeleteObjectResponse(TypedDict):
|
||||
"""Response from delete_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.delete_object() response.
|
||||
"""
|
||||
|
||||
DeleteMarker: NotRequired[bool]
|
||||
VersionId: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
class DeletedObject(TypedDict):
|
||||
"""A successfully deleted object.
|
||||
|
||||
Compatible with boto3's S3.Client.delete_objects() response Deleted.
|
||||
"""
|
||||
|
||||
Key: str
|
||||
VersionId: NotRequired[str]
|
||||
DeleteMarker: NotRequired[bool]
|
||||
DeleteMarkerVersionId: NotRequired[str]
|
||||
|
||||
|
||||
class DeleteError(TypedDict):
|
||||
"""An error that occurred during deletion.
|
||||
|
||||
Compatible with boto3's S3.Client.delete_objects() response Errors.
|
||||
"""
|
||||
|
||||
Key: str
|
||||
Code: str
|
||||
Message: str
|
||||
VersionId: NotRequired[str]
|
||||
|
||||
|
||||
class DeleteObjectsResponse(TypedDict):
|
||||
"""Response from delete_objects operation.
|
||||
|
||||
Compatible with boto3's S3.Client.delete_objects() response.
|
||||
"""
|
||||
|
||||
Deleted: NotRequired[list[DeletedObject]]
|
||||
Errors: NotRequired[list[DeleteError]]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Head Object Response
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class HeadObjectResponse(TypedDict):
|
||||
"""Response from head_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.head_object() response.
|
||||
"""
|
||||
|
||||
ContentLength: int
|
||||
ContentType: NotRequired[str]
|
||||
ETag: NotRequired[str]
|
||||
LastModified: NotRequired[datetime]
|
||||
Metadata: NotRequired[dict[str, str]]
|
||||
VersionId: NotRequired[str]
|
||||
StorageClass: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Bucket Operations
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class Bucket(TypedDict):
|
||||
"""An S3 bucket.
|
||||
|
||||
Compatible with boto3's S3.Client.list_buckets() response Buckets.
|
||||
"""
|
||||
|
||||
Name: str
|
||||
CreationDate: datetime
|
||||
|
||||
|
||||
class ListBucketsResponse(TypedDict):
|
||||
"""Response from list_buckets operation.
|
||||
|
||||
Compatible with boto3's S3.Client.list_buckets() response.
|
||||
"""
|
||||
|
||||
Buckets: list[Bucket]
|
||||
Owner: NotRequired[dict[str, str]]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
class CreateBucketResponse(TypedDict):
|
||||
"""Response from create_bucket operation.
|
||||
|
||||
Compatible with boto3's S3.Client.create_bucket() response.
|
||||
"""
|
||||
|
||||
Location: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Multipart Upload Types
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class CompletedPart(TypedDict):
|
||||
"""A completed part in a multipart upload."""
|
||||
|
||||
PartNumber: int
|
||||
ETag: str
|
||||
|
||||
|
||||
class CompleteMultipartUploadResponse(TypedDict):
|
||||
"""Response from complete_multipart_upload operation."""
|
||||
|
||||
Location: NotRequired[str]
|
||||
Bucket: NotRequired[str]
|
||||
Key: NotRequired[str]
|
||||
ETag: NotRequired[str]
|
||||
VersionId: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Copy Operations
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class CopyObjectResponse(TypedDict):
|
||||
"""Response from copy_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.copy_object() response.
|
||||
"""
|
||||
|
||||
CopyObjectResult: NotRequired[dict[str, Any]]
|
||||
ETag: NotRequired[str]
|
||||
LastModified: NotRequired[datetime]
|
||||
VersionId: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Type Aliases for Convenience
|
||||
# ============================================================================
|
||||
|
||||
# Common parameter types
|
||||
BucketName = str
|
||||
ObjectKey = str
|
||||
Prefix = str
|
||||
Delimiter = str
|
||||
|
||||
# Storage class options
|
||||
StorageClass = Literal[
|
||||
"STANDARD",
|
||||
"REDUCED_REDUNDANCY",
|
||||
"STANDARD_IA",
|
||||
"ONEZONE_IA",
|
||||
"INTELLIGENT_TIERING",
|
||||
"GLACIER",
|
||||
"DEEP_ARCHIVE",
|
||||
"GLACIER_IR",
|
||||
]
|
||||
@@ -10,7 +10,6 @@ from deltaglider import create_client
|
||||
from deltaglider.client import (
|
||||
BucketStats,
|
||||
CompressionEstimate,
|
||||
ListObjectsResponse,
|
||||
ObjectInfo,
|
||||
)
|
||||
|
||||
@@ -258,28 +257,56 @@ class TestBoto3Compatibility:
|
||||
content = response["Body"].read()
|
||||
assert content == b"Test Content"
|
||||
|
||||
def test_get_object_regular_s3_file(self, client):
|
||||
"""Test get_object with regular S3 files (not uploaded via DeltaGlider)."""
|
||||
|
||||
content = b"Regular S3 File Content"
|
||||
|
||||
# Add as a regular S3 object WITHOUT DeltaGlider metadata
|
||||
client.service.storage.objects["test-bucket/regular-file.pdf"] = {
|
||||
"data": content,
|
||||
"size": len(content),
|
||||
"metadata": {}, # No DeltaGlider metadata
|
||||
}
|
||||
|
||||
# Should successfully download the regular S3 object
|
||||
response = client.get_object(Bucket="test-bucket", Key="regular-file.pdf")
|
||||
|
||||
assert "Body" in response
|
||||
downloaded_content = response["Body"].read()
|
||||
assert downloaded_content == content
|
||||
assert response["ContentLength"] == len(content)
|
||||
|
||||
def test_list_objects(self, client):
|
||||
"""Test list_objects with various options."""
|
||||
"""Test list_objects with various options (boto3-compatible dict response)."""
|
||||
# List all objects (default: FetchMetadata=False)
|
||||
response = client.list_objects(Bucket="test-bucket")
|
||||
|
||||
assert isinstance(response, ListObjectsResponse)
|
||||
assert response.key_count > 0
|
||||
assert len(response.contents) > 0
|
||||
# Response is now a boto3-compatible dict (not ListObjectsResponse)
|
||||
assert isinstance(response, dict)
|
||||
assert response["KeyCount"] > 0
|
||||
assert len(response["Contents"]) > 0
|
||||
|
||||
# Verify S3Object structure
|
||||
for obj in response["Contents"]:
|
||||
assert "Key" in obj
|
||||
assert "Size" in obj
|
||||
assert "LastModified" in obj
|
||||
assert "Metadata" in obj # DeltaGlider metadata
|
||||
|
||||
# Test with FetchMetadata=True (should only affect delta files)
|
||||
response_with_metadata = client.list_objects(Bucket="test-bucket", FetchMetadata=True)
|
||||
assert isinstance(response_with_metadata, ListObjectsResponse)
|
||||
assert response_with_metadata.key_count > 0
|
||||
assert isinstance(response_with_metadata, dict)
|
||||
assert response_with_metadata["KeyCount"] > 0
|
||||
|
||||
def test_list_objects_with_delimiter(self, client):
|
||||
"""Test list_objects with delimiter for folder simulation."""
|
||||
"""Test list_objects with delimiter for folder simulation (boto3-compatible dict response)."""
|
||||
response = client.list_objects(Bucket="test-bucket", Prefix="", Delimiter="/")
|
||||
|
||||
# Should have common prefixes for folders
|
||||
assert len(response.common_prefixes) > 0
|
||||
assert {"Prefix": "folder1/"} in response.common_prefixes
|
||||
assert {"Prefix": "folder2/"} in response.common_prefixes
|
||||
assert len(response.get("CommonPrefixes", [])) > 0
|
||||
assert {"Prefix": "folder1/"} in response["CommonPrefixes"]
|
||||
assert {"Prefix": "folder2/"} in response["CommonPrefixes"]
|
||||
|
||||
def test_delete_object(self, client):
|
||||
"""Test delete_object."""
|
||||
@@ -291,6 +318,24 @@ class TestBoto3Compatibility:
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 204
|
||||
assert "test-bucket/to-delete.txt" not in client.service.storage.objects
|
||||
|
||||
def test_delete_object_with_delta_suffix_fallback(self, client):
|
||||
"""Test delete_object with automatic .delta suffix fallback."""
|
||||
# Add object with .delta suffix (as DeltaGlider stores it)
|
||||
client.service.storage.objects["test-bucket/file.zip.delta"] = {
|
||||
"size": 100,
|
||||
"metadata": {
|
||||
"original_name": "file.zip",
|
||||
"compression": "delta",
|
||||
},
|
||||
}
|
||||
|
||||
# Delete using original name (without .delta)
|
||||
response = client.delete_object(Bucket="test-bucket", Key="file.zip")
|
||||
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 204
|
||||
assert response["DeltaGliderInfo"]["Deleted"] is True
|
||||
assert "test-bucket/file.zip.delta" not in client.service.storage.objects
|
||||
|
||||
def test_delete_objects(self, client):
|
||||
"""Test batch delete."""
|
||||
# Add objects
|
||||
|
||||
524
tests/integration/test_delete_objects_recursive.py
Normal file
524
tests/integration/test_delete_objects_recursive.py
Normal file
@@ -0,0 +1,524 @@
|
||||
"""Comprehensive tests for DeltaGliderClient.delete_objects_recursive() method."""
|
||||
|
||||
from datetime import UTC, datetime
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from deltaglider import create_client
|
||||
|
||||
|
||||
class MockStorage:
|
||||
"""Mock storage for testing."""
|
||||
|
||||
def __init__(self):
|
||||
self.objects = {}
|
||||
self.delete_calls = []
|
||||
|
||||
def head(self, key):
|
||||
"""Mock head operation."""
|
||||
from deltaglider.ports.storage import ObjectHead
|
||||
|
||||
if key in self.objects:
|
||||
obj = self.objects[key]
|
||||
return ObjectHead(
|
||||
key=key,
|
||||
size=obj["size"],
|
||||
etag=obj.get("etag", "mock-etag"),
|
||||
last_modified=obj.get("last_modified", datetime.now(UTC)),
|
||||
metadata=obj.get("metadata", {}),
|
||||
)
|
||||
return None
|
||||
|
||||
def list(self, prefix):
|
||||
"""Mock list operation for StoragePort interface."""
|
||||
for key, _obj in self.objects.items():
|
||||
if key.startswith(prefix):
|
||||
obj_head = self.head(key)
|
||||
if obj_head is not None:
|
||||
yield obj_head
|
||||
|
||||
def delete(self, key):
|
||||
"""Mock delete operation."""
|
||||
self.delete_calls.append(key)
|
||||
if key in self.objects:
|
||||
del self.objects[key]
|
||||
return True
|
||||
return False
|
||||
|
||||
def get(self, key):
|
||||
"""Mock get operation."""
|
||||
if key in self.objects:
|
||||
return self.objects[key].get("content", b"mock-content")
|
||||
return None
|
||||
|
||||
def put(self, key, data, metadata=None):
|
||||
"""Mock put operation."""
|
||||
self.objects[key] = {
|
||||
"size": len(data),
|
||||
"content": data,
|
||||
"metadata": metadata or {},
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_storage():
|
||||
"""Create mock storage."""
|
||||
return MockStorage()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def client(tmp_path):
|
||||
"""Create DeltaGliderClient with mock storage."""
|
||||
# Use create_client to get a properly configured client
|
||||
client = create_client(cache_dir=str(tmp_path / "cache"))
|
||||
|
||||
# Replace storage with mock
|
||||
mock_storage = MockStorage()
|
||||
client.service.storage = mock_storage
|
||||
|
||||
return client
|
||||
|
||||
|
||||
class TestDeleteObjectsRecursiveBasicFunctionality:
|
||||
"""Test basic functionality of delete_objects_recursive."""
|
||||
|
||||
def test_delete_single_object_with_file_prefix(self, client):
|
||||
"""Test deleting a single object when prefix is a file (no trailing slash)."""
|
||||
# Setup: Add a regular file
|
||||
client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.txt")
|
||||
|
||||
# Verify response structure
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
assert "DeletedCount" in response
|
||||
assert "FailedCount" in response
|
||||
assert "DeltaGliderInfo" in response
|
||||
|
||||
# Verify DeltaGliderInfo structure
|
||||
info = response["DeltaGliderInfo"]
|
||||
assert "DeltasDeleted" in info
|
||||
assert "ReferencesDeleted" in info
|
||||
assert "DirectDeleted" in info
|
||||
assert "OtherDeleted" in info
|
||||
|
||||
def test_delete_directory_with_trailing_slash(self, client):
|
||||
"""Test deleting all objects under a prefix with trailing slash."""
|
||||
# Setup: Add multiple files under a prefix
|
||||
client.service.storage.objects["test-bucket/dir/file1.txt"] = {"size": 100}
|
||||
client.service.storage.objects["test-bucket/dir/file2.txt"] = {"size": 200}
|
||||
client.service.storage.objects["test-bucket/dir/sub/file3.txt"] = {"size": 300}
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="dir/")
|
||||
|
||||
# Verify
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
assert response["DeletedCount"] >= 0
|
||||
assert response["FailedCount"] == 0
|
||||
|
||||
def test_delete_empty_prefix_returns_zero_counts(self, client):
|
||||
"""Test deleting with empty prefix returns zero counts."""
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="")
|
||||
|
||||
# Verify
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
assert response["DeletedCount"] >= 0
|
||||
assert response["FailedCount"] == 0
|
||||
|
||||
|
||||
class TestDeleteObjectsRecursiveDeltaSuffixHandling:
|
||||
"""Test delta suffix fallback logic."""
|
||||
|
||||
def test_delete_file_with_delta_suffix_fallback(self, client):
|
||||
"""Test that delete falls back to .delta suffix if original not found."""
|
||||
# Setup: Add file with .delta suffix
|
||||
client.service.storage.objects["test-bucket/archive.zip.delta"] = {
|
||||
"size": 500,
|
||||
"metadata": {"original_name": "archive.zip"},
|
||||
}
|
||||
|
||||
# Execute: Delete using original name (without .delta)
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="archive.zip")
|
||||
|
||||
# Verify
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
assert "test-bucket/archive.zip.delta" not in client.service.storage.objects
|
||||
|
||||
def test_delete_file_already_with_delta_suffix(self, client):
|
||||
"""Test deleting a file that already has .delta suffix."""
|
||||
# Setup
|
||||
client.service.storage.objects["test-bucket/file.zip.delta"] = {"size": 300}
|
||||
|
||||
# Execute: Delete using .delta suffix directly
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.zip.delta")
|
||||
|
||||
# Verify
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
|
||||
def test_delta_suffix_not_added_for_directory_prefix(self, client):
|
||||
"""Test that .delta suffix is not added when prefix ends with /."""
|
||||
# Setup
|
||||
client.service.storage.objects["test-bucket/dir/file.txt"] = {"size": 100}
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="dir/")
|
||||
|
||||
# Verify - should not attempt to delete "dir/.delta"
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
|
||||
|
||||
class TestDeleteObjectsRecursiveStatisticsAggregation:
|
||||
"""Test statistics aggregation from core service."""
|
||||
|
||||
def test_aggregates_deleted_count_from_service_and_single_deletes(self, client):
|
||||
"""Test that deleted counts are aggregated correctly."""
|
||||
# Setup: Mock service.delete_recursive to return specific counts
|
||||
mock_result = {
|
||||
"deleted_count": 5,
|
||||
"failed_count": 0,
|
||||
"deltas_deleted": 2,
|
||||
"references_deleted": 1,
|
||||
"direct_deleted": 2,
|
||||
"other_deleted": 0,
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="test/")
|
||||
|
||||
# Verify aggregation
|
||||
assert response["DeletedCount"] == 5
|
||||
assert response["FailedCount"] == 0
|
||||
assert response["DeltaGliderInfo"]["DeltasDeleted"] == 2
|
||||
assert response["DeltaGliderInfo"]["ReferencesDeleted"] == 1
|
||||
assert response["DeltaGliderInfo"]["DirectDeleted"] == 2
|
||||
assert response["DeltaGliderInfo"]["OtherDeleted"] == 0
|
||||
|
||||
def test_aggregates_single_delete_counts_with_service_counts(self, client):
|
||||
"""Test that single file deletes are aggregated with service counts."""
|
||||
# Setup: Add file to trigger single delete path
|
||||
client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}
|
||||
|
||||
# Mock service.delete_recursive to return additional counts
|
||||
mock_result = {
|
||||
"deleted_count": 3,
|
||||
"failed_count": 0,
|
||||
"deltas_deleted": 1,
|
||||
"references_deleted": 0,
|
||||
"direct_deleted": 2,
|
||||
"other_deleted": 0,
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.txt")
|
||||
|
||||
# Verify that counts include both single delete and service delete
|
||||
assert response["DeletedCount"] >= 3 # At least service count
|
||||
assert response["DeltaGliderInfo"]["DeltasDeleted"] >= 1
|
||||
|
||||
|
||||
class TestDeleteObjectsRecursiveErrorHandling:
|
||||
"""Test error handling and error aggregation."""
|
||||
|
||||
def test_single_delete_error_captured_in_errors_list(self, client):
|
||||
"""Test that errors from single deletes are captured."""
|
||||
# Setup: Add file
|
||||
client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}
|
||||
|
||||
# Mock delete_with_delta_suffix to raise exception
|
||||
with patch("deltaglider.client.delete_with_delta_suffix") as mock_delete:
|
||||
mock_delete.side_effect = RuntimeError("Simulated delete error")
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.txt")
|
||||
|
||||
# Verify error captured
|
||||
assert response["FailedCount"] > 0
|
||||
assert "Errors" in response
|
||||
assert any("Simulated delete error" in err for err in response["Errors"])
|
||||
|
||||
def test_service_errors_propagated_in_response(self, client):
|
||||
"""Test that errors from service.delete_recursive are propagated."""
|
||||
# Mock service to return errors
|
||||
mock_result = {
|
||||
"deleted_count": 2,
|
||||
"failed_count": 1,
|
||||
"deltas_deleted": 2,
|
||||
"references_deleted": 0,
|
||||
"direct_deleted": 0,
|
||||
"other_deleted": 0,
|
||||
"errors": ["Error deleting object1", "Error deleting object2"],
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="test/")
|
||||
|
||||
# Verify
|
||||
assert response["FailedCount"] == 1
|
||||
assert "Errors" in response
|
||||
assert "Error deleting object1" in response["Errors"]
|
||||
assert "Error deleting object2" in response["Errors"]
|
||||
|
||||
def test_combines_single_and_service_errors(self, client):
|
||||
"""Test that errors from both single deletes and service are combined."""
|
||||
# Setup
|
||||
client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}
|
||||
|
||||
# Mock service to also return errors
|
||||
mock_result = {
|
||||
"deleted_count": 1,
|
||||
"failed_count": 1,
|
||||
"deltas_deleted": 0,
|
||||
"references_deleted": 0,
|
||||
"direct_deleted": 0,
|
||||
"other_deleted": 0,
|
||||
"errors": ["Service delete error"],
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
# Mock delete_with_delta_suffix to raise exception
|
||||
with patch("deltaglider.client.delete_with_delta_suffix") as mock_delete:
|
||||
mock_delete.side_effect = RuntimeError("Single delete error")
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.txt")
|
||||
|
||||
# Verify both errors present
|
||||
assert "Errors" in response
|
||||
errors_str = " ".join(response["Errors"])
|
||||
assert "Single delete error" in errors_str
|
||||
assert "Service delete error" in errors_str
|
||||
|
||||
|
||||
class TestDeleteObjectsRecursiveWarningsHandling:
|
||||
"""Test warning aggregation."""
|
||||
|
||||
def test_service_warnings_propagated_in_response(self, client):
|
||||
"""Test that warnings from service.delete_recursive are propagated."""
|
||||
# Mock service to return warnings
|
||||
mock_result = {
|
||||
"deleted_count": 3,
|
||||
"failed_count": 0,
|
||||
"deltas_deleted": 2,
|
||||
"references_deleted": 1,
|
||||
"direct_deleted": 0,
|
||||
"other_deleted": 0,
|
||||
"warnings": ["Reference deleted, 2 dependent deltas invalidated"],
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="test/")
|
||||
|
||||
# Verify
|
||||
assert "Warnings" in response
|
||||
assert "Reference deleted, 2 dependent deltas invalidated" in response["Warnings"]
|
||||
|
||||
def test_single_delete_warnings_propagated(self, client):
|
||||
"""Test that warnings from single deletes are captured."""
|
||||
# Setup
|
||||
client.service.storage.objects["test-bucket/ref.bin"] = {"size": 100}
|
||||
|
||||
# Mock service
|
||||
mock_result = {
|
||||
"deleted_count": 0,
|
||||
"failed_count": 0,
|
||||
"deltas_deleted": 0,
|
||||
"references_deleted": 0,
|
||||
"direct_deleted": 0,
|
||||
"other_deleted": 0,
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
# Mock delete_with_delta_suffix to return warnings
|
||||
with patch("deltaglider.client.delete_with_delta_suffix") as mock_delete:
|
||||
mock_delete.return_value = (
|
||||
"ref.bin",
|
||||
{
|
||||
"deleted": True,
|
||||
"type": "reference",
|
||||
"warnings": ["Warning from single delete"],
|
||||
},
|
||||
)
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="ref.bin")
|
||||
|
||||
# Verify
|
||||
assert "Warnings" in response
|
||||
assert "Warning from single delete" in response["Warnings"]
|
||||
|
||||
|
||||
class TestDeleteObjectsRecursiveSingleDeleteDetails:
|
||||
"""Test SingleDeletes detail tracking."""
|
||||
|
||||
def test_single_delete_details_included_for_file_prefix(self, client):
|
||||
"""Test that SingleDeletes details are included when deleting file prefix."""
|
||||
# Setup
|
||||
client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}
|
||||
|
||||
# Mock service
|
||||
mock_result = {
|
||||
"deleted_count": 0,
|
||||
"failed_count": 0,
|
||||
"deltas_deleted": 0,
|
||||
"references_deleted": 0,
|
||||
"direct_deleted": 0,
|
||||
"other_deleted": 0,
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
# Mock delete_with_delta_suffix
|
||||
with patch("deltaglider.client.delete_with_delta_suffix") as mock_delete:
|
||||
mock_delete.return_value = (
|
||||
"file.txt",
|
||||
{
|
||||
"deleted": True,
|
||||
"type": "direct",
|
||||
"dependent_deltas": 0,
|
||||
"warnings": [],
|
||||
},
|
||||
)
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.txt")
|
||||
|
||||
# Verify
|
||||
assert "SingleDeletes" in response["DeltaGliderInfo"]
|
||||
single_deletes = response["DeltaGliderInfo"]["SingleDeletes"]
|
||||
assert len(single_deletes) > 0
|
||||
assert single_deletes[0]["Key"] == "file.txt"
|
||||
assert single_deletes[0]["Type"] == "direct"
|
||||
assert "DependentDeltas" in single_deletes[0]
|
||||
assert "Warnings" in single_deletes[0]
|
||||
|
||||
def test_single_delete_includes_stored_key_when_different(self, client):
|
||||
"""Test that StoredKey is included when actual key differs from requested."""
|
||||
# Setup
|
||||
client.service.storage.objects["test-bucket/file.zip.delta"] = {"size": 200}
|
||||
|
||||
# Mock delete_with_delta_suffix to return different key
|
||||
from deltaglider import client_delete_helpers
|
||||
|
||||
original_delete = client_delete_helpers.delete_with_delta_suffix
|
||||
|
||||
def mock_delete(service, bucket, key):
|
||||
actual_key = "file.zip.delta" if key == "file.zip" else key
|
||||
return (
|
||||
actual_key,
|
||||
{
|
||||
"deleted": True,
|
||||
"type": "delta",
|
||||
"dependent_deltas": 0,
|
||||
"warnings": [],
|
||||
},
|
||||
)
|
||||
|
||||
client_delete_helpers.delete_with_delta_suffix = mock_delete
|
||||
|
||||
# Mock service
|
||||
mock_result = {
|
||||
"deleted_count": 0,
|
||||
"failed_count": 0,
|
||||
"deltas_deleted": 0,
|
||||
"references_deleted": 0,
|
||||
"direct_deleted": 0,
|
||||
"other_deleted": 0,
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
try:
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.zip")
|
||||
|
||||
# Verify
|
||||
assert "SingleDeletes" in response["DeltaGliderInfo"]
|
||||
single_deletes = response["DeltaGliderInfo"]["SingleDeletes"]
|
||||
if len(single_deletes) > 0:
|
||||
# If actual key differs, StoredKey should be present
|
||||
detail = single_deletes[0]
|
||||
if detail["Key"] != "file.zip.delta":
|
||||
assert "StoredKey" in detail
|
||||
finally:
|
||||
client_delete_helpers.delete_with_delta_suffix = original_delete
|
||||
|
||||
|
||||
class TestDeleteObjectsRecursiveEdgeCases:
|
||||
"""Test edge cases and boundary conditions."""
|
||||
|
||||
def test_nonexistent_prefix_returns_zero_counts(self, client):
|
||||
"""Test deleting nonexistent prefix returns zero counts."""
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="nonexistent/path/")
|
||||
|
||||
# Verify
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
assert response["DeletedCount"] >= 0
|
||||
assert response["FailedCount"] == 0
|
||||
|
||||
def test_duplicate_candidates_handled_correctly(self, client):
|
||||
"""Test that duplicate delete candidates are handled correctly."""
|
||||
# Setup: This tests the seen_candidates logic
|
||||
client.service.storage.objects["test-bucket/file.delta"] = {"size": 100}
|
||||
|
||||
# Execute: Should not attempt to delete "file.delta" twice
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.delta")
|
||||
|
||||
# Verify no errors
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
|
||||
def test_unknown_result_type_categorized_as_other(self, client):
|
||||
"""Test that unknown result types are categorized as 'other'."""
|
||||
# Setup
|
||||
client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}
|
||||
|
||||
# Mock service
|
||||
mock_result = {
|
||||
"deleted_count": 0,
|
||||
"failed_count": 0,
|
||||
"deltas_deleted": 0,
|
||||
"references_deleted": 0,
|
||||
"direct_deleted": 0,
|
||||
"other_deleted": 0,
|
||||
}
|
||||
client.service.delete_recursive = Mock(return_value=mock_result)
|
||||
|
||||
# Mock delete_with_delta_suffix to return unknown type
|
||||
with patch("deltaglider.client.delete_with_delta_suffix") as mock_delete:
|
||||
mock_delete.return_value = (
|
||||
"file.txt",
|
||||
{
|
||||
"deleted": True,
|
||||
"type": "unknown_type", # Not in single_counts keys
|
||||
"dependent_deltas": 0,
|
||||
"warnings": [],
|
||||
},
|
||||
)
|
||||
|
||||
# Execute
|
||||
response = client.delete_objects_recursive(Bucket="test-bucket", Prefix="file.txt")
|
||||
|
||||
# Verify it's categorized as "other"
|
||||
assert response["DeltaGliderInfo"]["OtherDeleted"] >= 1
|
||||
# Also verify the detail shows the unknown type
|
||||
if "SingleDeletes" in response["DeltaGliderInfo"]:
|
||||
assert response["DeltaGliderInfo"]["SingleDeletes"][0]["Type"] == "unknown_type"
|
||||
|
||||
def test_kwargs_parameter_accepted(self, client):
|
||||
"""Test that additional kwargs are accepted without error."""
|
||||
# Execute with extra parameters
|
||||
response = client.delete_objects_recursive(
|
||||
Bucket="test-bucket",
|
||||
Prefix="test/",
|
||||
ExtraParam="value", # Should be ignored
|
||||
AnotherParam=123,
|
||||
)
|
||||
|
||||
# Verify no errors
|
||||
assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
|
||||
@@ -53,8 +53,11 @@ class TestSDKFiltering:
|
||||
client = DeltaGliderClient(service)
|
||||
response = client.list_objects(Bucket="test-bucket", Prefix="releases/")
|
||||
|
||||
# Response is now a boto3-compatible dict
|
||||
contents = response["Contents"]
|
||||
|
||||
# Verify .delta suffix is stripped
|
||||
keys = [obj.key for obj in response.contents]
|
||||
keys = [obj["Key"] for obj in contents]
|
||||
assert "releases/app-v1.zip" in keys
|
||||
assert "releases/app-v2.zip" in keys
|
||||
assert "releases/README.md" in keys
|
||||
@@ -63,8 +66,10 @@ class TestSDKFiltering:
|
||||
for key in keys:
|
||||
assert not key.endswith(".delta"), f"Found .delta suffix in: {key}"
|
||||
|
||||
# Verify is_delta flag is set correctly
|
||||
delta_objects = [obj for obj in response.contents if obj.is_delta]
|
||||
# Verify is_delta flag is set correctly in Metadata
|
||||
delta_objects = [
|
||||
obj for obj in contents if obj.get("Metadata", {}).get("deltaglider-is-delta") == "true"
|
||||
]
|
||||
assert len(delta_objects) == 2
|
||||
|
||||
def test_list_objects_filters_reference_bin(self):
|
||||
@@ -106,15 +111,18 @@ class TestSDKFiltering:
|
||||
client = DeltaGliderClient(service)
|
||||
response = client.list_objects(Bucket="test-bucket", Prefix="releases/")
|
||||
|
||||
# Response is now a boto3-compatible dict
|
||||
contents = response["Contents"]
|
||||
|
||||
# Verify NO reference.bin files in output
|
||||
keys = [obj.key for obj in response.contents]
|
||||
keys = [obj["Key"] for obj in contents]
|
||||
for key in keys:
|
||||
assert not key.endswith("reference.bin"), f"Found reference.bin in: {key}"
|
||||
|
||||
# Should only have the app.zip (with .delta stripped)
|
||||
assert len(response.contents) == 1
|
||||
assert response.contents[0].key == "releases/app.zip"
|
||||
assert response.contents[0].is_delta is True
|
||||
assert len(contents) == 1
|
||||
assert contents[0]["Key"] == "releases/app.zip"
|
||||
assert contents[0].get("Metadata", {}).get("deltaglider-is-delta") == "true"
|
||||
|
||||
def test_list_objects_combined_filtering(self):
|
||||
"""Test filtering of both .delta and reference.bin together."""
|
||||
@@ -170,12 +178,15 @@ class TestSDKFiltering:
|
||||
client = DeltaGliderClient(service)
|
||||
response = client.list_objects(Bucket="test-bucket", Prefix="data/")
|
||||
|
||||
# Response is now a boto3-compatible dict
|
||||
contents = response["Contents"]
|
||||
|
||||
# Should filter out 2 reference.bin files
|
||||
# Should strip .delta from 3 files
|
||||
# Should keep 1 regular file as-is
|
||||
assert len(response.contents) == 4 # 3 deltas + 1 regular file
|
||||
assert len(contents) == 4 # 3 deltas + 1 regular file
|
||||
|
||||
keys = [obj.key for obj in response.contents]
|
||||
keys = [obj["Key"] for obj in contents]
|
||||
expected_keys = ["data/file1.zip", "data/file2.zip", "data/file3.txt", "data/sub/app.jar"]
|
||||
assert sorted(keys) == sorted(expected_keys)
|
||||
|
||||
|
||||
@@ -147,22 +147,36 @@ class TestDeltaServiceGet:
|
||||
service.get(delta_key, temp_dir / "output.zip")
|
||||
|
||||
def test_get_missing_metadata(self, service, mock_storage, temp_dir):
|
||||
"""Test get with missing metadata."""
|
||||
"""Test get with missing metadata (regular S3 object)."""
|
||||
# Setup
|
||||
delta_key = ObjectKey(bucket="test-bucket", key="test/file.zip.delta")
|
||||
|
||||
# Create test content
|
||||
test_content = b"regular S3 file content"
|
||||
|
||||
# Mock a regular S3 object without DeltaGlider metadata
|
||||
mock_storage.head.return_value = ObjectHead(
|
||||
key="test/file.zip.delta",
|
||||
size=100,
|
||||
size=len(test_content),
|
||||
etag="abc",
|
||||
last_modified=None,
|
||||
metadata={}, # Missing required metadata
|
||||
metadata={}, # Missing DeltaGlider metadata - this is a regular S3 object
|
||||
)
|
||||
|
||||
# Execute and verify
|
||||
from deltaglider.core.errors import StorageIOError
|
||||
# Mock the storage.get to return the content
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
with pytest.raises(StorageIOError):
|
||||
service.get(delta_key, temp_dir / "output.zip")
|
||||
mock_stream = MagicMock()
|
||||
mock_stream.read.side_effect = [test_content, b""] # Return content then EOF
|
||||
mock_storage.get.return_value = mock_stream
|
||||
|
||||
# Execute - should successfully download regular S3 object
|
||||
output_path = temp_dir / "output.zip"
|
||||
service.get(delta_key, output_path)
|
||||
|
||||
# Verify - file should be downloaded
|
||||
assert output_path.exists()
|
||||
assert output_path.read_bytes() == test_content
|
||||
|
||||
|
||||
class TestDeltaServiceVerify:
|
||||
|
||||
Reference in New Issue
Block a user