mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-04-30 20:24:35 +02:00
Compare commits
10 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5e3b76791e | ||
|
|
fb2877bfd3 | ||
|
|
88fd1f51cd | ||
|
|
0857e02edd | ||
|
|
689cf00d02 | ||
|
|
743d52e783 | ||
|
|
8bc0a0eaf3 | ||
|
|
4cf25e4681 | ||
|
|
69ed9056d2 | ||
|
|
38134f28f5 |
43
CHANGELOG.md
43
CHANGELOG.md
@@ -5,6 +5,47 @@ All notable changes to this project will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [5.0.1] - 2025-01-10
|
||||
|
||||
### Changed
|
||||
- **Code Organization**: Refactored client.py from 1560 to 1154 lines (26% reduction)
|
||||
- Extracted client operations into modular `client_operations/` package:
|
||||
- `bucket.py` - S3 bucket management operations
|
||||
- `presigned.py` - Presigned URL generation
|
||||
- `batch.py` - Batch upload/download operations
|
||||
- `stats.py` - Analytics and statistics operations
|
||||
- Improved code maintainability with logical separation of concerns
|
||||
- Better developer experience with cleaner module structure
|
||||
|
||||
### Internal
|
||||
- Full type safety maintained with mypy (0 errors)
|
||||
- All 99 tests passing
|
||||
- Code quality checks passing (ruff)
|
||||
- No breaking changes - all public APIs remain unchanged
|
||||
|
||||
## [5.0.0] - 2025-01-10
|
||||
|
||||
### Added
|
||||
- boto3-compatible TypedDict types for S3 responses (no boto3 import needed)
|
||||
- Complete boto3 compatibility vision document
|
||||
- Type-safe response builders using TypedDict patterns
|
||||
|
||||
### Changed
|
||||
- **BREAKING**: `list_objects()` now returns boto3-compatible dict instead of custom dataclass
|
||||
- Use `response['Contents']` instead of `response.contents`
|
||||
- Use `response.get('IsTruncated')` instead of `response.is_truncated`
|
||||
- Use `response.get('NextContinuationToken')` instead of `response.next_continuation_token`
|
||||
- DeltaGlider metadata now in `Metadata` field of each object
|
||||
- Internal response building now uses TypedDict for compile-time type safety
|
||||
- All S3 responses are dicts at runtime (TypedDict is a dict!)
|
||||
|
||||
### Fixed
|
||||
- Updated all documentation examples to use dict-based responses
|
||||
- Fixed pagination examples in README and API docs
|
||||
- Corrected SDK documentation with accurate method signatures
|
||||
|
||||
## [4.2.4] - 2025-01-10
|
||||
|
||||
### Fixed
|
||||
@@ -65,6 +106,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
- Delta compression for versioned artifacts
|
||||
- 99%+ compression for similar files
|
||||
|
||||
[5.0.1]: https://github.com/beshu-tech/deltaglider/compare/v5.0.0...v5.0.1
|
||||
[5.0.0]: https://github.com/beshu-tech/deltaglider/compare/v4.2.4...v5.0.0
|
||||
[4.2.4]: https://github.com/beshu-tech/deltaglider/compare/v4.2.3...v4.2.4
|
||||
[4.2.3]: https://github.com/beshu-tech/deltaglider/compare/v4.2.2...v4.2.3
|
||||
[4.2.2]: https://github.com/beshu-tech/deltaglider/compare/v4.2.1...v4.2.2
|
||||
|
||||
11
Dockerfile
11
Dockerfile
@@ -30,7 +30,16 @@ RUN --mount=type=cache,target=/root/.cache/uv \
|
||||
# Runtime stage - minimal image
|
||||
FROM python:${PYTHON_VERSION}
|
||||
|
||||
# Install xdelta3
|
||||
# Skip man pages and docs to speed up builds
|
||||
RUN mkdir -p /etc/dpkg/dpkg.cfg.d && \
|
||||
echo 'path-exclude /usr/share/doc/*' > /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/man/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/groff/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/info/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/lintian/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
|
||||
echo 'path-exclude /usr/share/linda/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc
|
||||
|
||||
# Install xdelta3 (now much faster without man pages)
|
||||
RUN apt-get update && \
|
||||
apt-get install -y --no-install-recommends xdelta3 && \
|
||||
apt-get clean && \
|
||||
|
||||
@@ -207,14 +207,18 @@ with open('downloaded.zip', 'wb') as f:
|
||||
|
||||
# Smart list_objects with optimized performance
|
||||
response = client.list_objects(Bucket='releases', Prefix='v2.0.0/')
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
|
||||
# Paginated listing for large buckets
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100)
|
||||
while response.is_truncated:
|
||||
while response.get('IsTruncated'):
|
||||
for obj in response['Contents']:
|
||||
print(obj['Key'])
|
||||
response = client.list_objects(
|
||||
Bucket='releases',
|
||||
MaxKeys=100,
|
||||
ContinuationToken=response.next_continuation_token
|
||||
ContinuationToken=response.get('NextContinuationToken')
|
||||
)
|
||||
|
||||
# Delete and inspect objects
|
||||
|
||||
316
docs/BOTO3_COMPATIBILITY_VISION.md
Normal file
316
docs/BOTO3_COMPATIBILITY_VISION.md
Normal file
@@ -0,0 +1,316 @@
|
||||
# boto3 Compatibility Vision
|
||||
|
||||
## Current State (v4.2.3)
|
||||
|
||||
DeltaGlider currently uses custom dataclasses for responses:
|
||||
|
||||
```python
|
||||
from deltaglider import create_client, ListObjectsResponse, ObjectInfo
|
||||
|
||||
client = create_client()
|
||||
response: ListObjectsResponse = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response.contents: # Custom field name
|
||||
print(f"{obj.key}: {obj.size}") # Custom ObjectInfo dataclass
|
||||
```
|
||||
|
||||
**Problems:**
|
||||
- ❌ Not a true drop-in replacement for boto3
|
||||
- ❌ Users need to learn DeltaGlider-specific types
|
||||
- ❌ Can't use with tools expecting boto3 responses
|
||||
- ❌ Different API surface (`.contents` vs `['Contents']`)
|
||||
|
||||
## Target State (v5.0.0)
|
||||
|
||||
DeltaGlider should return native boto3-compatible dicts with TypedDict type hints:
|
||||
|
||||
```python
|
||||
from deltaglider import create_client, ListObjectsV2Response
|
||||
|
||||
client = create_client()
|
||||
response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response['Contents']: # boto3-compatible!
|
||||
print(f"{obj['Key']}: {obj['Size']}") # Works exactly like boto3
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ **True drop-in replacement** - swap `boto3.client('s3')` with `create_client()`
|
||||
- ✅ **No learning curve** - if you know boto3, you know DeltaGlider
|
||||
- ✅ **Tool compatibility** - works with any library expecting boto3 types
|
||||
- ✅ **Type safety** - TypedDict provides IDE autocomplete without boto3 import
|
||||
- ✅ **Zero runtime overhead** - TypedDict compiles to plain dict
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Type Definitions ✅ (DONE)
|
||||
|
||||
Created `deltaglider/types.py` with comprehensive TypedDict definitions:
|
||||
|
||||
```python
|
||||
from typing import TypedDict, NotRequired
|
||||
from datetime import datetime
|
||||
|
||||
class S3Object(TypedDict):
|
||||
Key: str
|
||||
Size: int
|
||||
LastModified: datetime
|
||||
ETag: NotRequired[str]
|
||||
StorageClass: NotRequired[str]
|
||||
|
||||
class ListObjectsV2Response(TypedDict):
|
||||
Contents: list[S3Object]
|
||||
CommonPrefixes: NotRequired[list[dict[str, str]]]
|
||||
IsTruncated: NotRequired[bool]
|
||||
NextContinuationToken: NotRequired[str]
|
||||
```
|
||||
|
||||
**Key insight:** TypedDict provides type safety at development time but compiles to plain `dict` at runtime!
|
||||
|
||||
### Phase 2: Refactor Client Methods (TODO)
|
||||
|
||||
Update all client methods to return boto3-compatible dicts:
|
||||
|
||||
#### `list_objects()`
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
def list_objects(...) -> ListObjectsResponse: # Custom dataclass
|
||||
return ListObjectsResponse(
|
||||
name=bucket,
|
||||
contents=[ObjectInfo(...), ...] # Custom dataclass
|
||||
)
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
def list_objects(...) -> ListObjectsV2Response: # TypedDict
|
||||
return {
|
||||
'Contents': [
|
||||
{
|
||||
'Key': 'file.zip', # .delta suffix already stripped
|
||||
'Size': 1024,
|
||||
'LastModified': datetime(...),
|
||||
'ETag': '"abc123"',
|
||||
}
|
||||
],
|
||||
'CommonPrefixes': [{'Prefix': 'dir/'}],
|
||||
'IsTruncated': False,
|
||||
}
|
||||
```
|
||||
|
||||
**Key changes:**
|
||||
1. Return plain dict instead of custom dataclass
|
||||
2. Use boto3 field names: `Contents` not `contents`, `Key` not `key`
|
||||
3. Strip `.delta` suffix transparently (already done)
|
||||
4. Hide `reference.bin` files (already done)
|
||||
|
||||
#### `put_object()`
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
def put_object(...) -> dict[str, Any]:
|
||||
return {
|
||||
"ETag": etag,
|
||||
"VersionId": None,
|
||||
"DeltaGliderInfo": {...} # Custom field
|
||||
}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
def put_object(...) -> PutObjectResponse: # TypedDict
|
||||
return {
|
||||
'ETag': etag,
|
||||
'ResponseMetadata': {'HTTPStatusCode': 200},
|
||||
# DeltaGlider metadata goes in Metadata field
|
||||
'Metadata': {
|
||||
'deltaglider-is-delta': 'true',
|
||||
'deltaglider-compression-ratio': '0.99'
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `get_object()`
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
def get_object(...) -> dict[str, Any]:
|
||||
return {
|
||||
"Body": data,
|
||||
"ContentLength": len(data),
|
||||
"DeltaGliderInfo": {...} # Custom field
|
||||
}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
def get_object(...) -> GetObjectResponse: # TypedDict
|
||||
return {
|
||||
'Body': data, # bytes, not StreamingBody (simpler!)
|
||||
'ContentLength': len(data),
|
||||
'LastModified': datetime(...),
|
||||
'ETag': '"abc123"',
|
||||
'Metadata': { # DeltaGlider metadata here
|
||||
'deltaglider-is-delta': 'true'
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `delete_object()`, `delete_objects()`, `head_object()`, etc.
|
||||
|
||||
All follow the same pattern: return boto3-compatible dicts with TypedDict hints.
|
||||
|
||||
### Phase 3: Backward Compatibility (TODO)
|
||||
|
||||
Keep old dataclasses for 1-2 versions with deprecation warnings:
|
||||
|
||||
```python
|
||||
class ListObjectsResponse:
|
||||
"""DEPRECATED: Use dict responses with ListObjectsV2Response type hint.
|
||||
|
||||
This will be removed in v6.0.0. Update your code:
|
||||
|
||||
Before:
|
||||
response.contents[0].key
|
||||
|
||||
After:
|
||||
response['Contents'][0]['Key']
|
||||
"""
|
||||
def __init__(self, data: dict):
|
||||
warnings.warn(
|
||||
"ListObjectsResponse dataclass is deprecated. "
|
||||
"Use dict responses with ListObjectsV2Response type hint.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2
|
||||
)
|
||||
self._data = data
|
||||
|
||||
@property
|
||||
def contents(self):
|
||||
return [ObjectInfo(obj) for obj in self._data.get('Contents', [])]
|
||||
```
|
||||
|
||||
### Phase 4: Update Documentation (TODO)
|
||||
|
||||
1. Update all examples to use dict responses
|
||||
2. Add migration guide from v4.x to v5.0
|
||||
3. Update BOTO3_COMPATIBILITY.md
|
||||
4. Add "Drop-in Replacement" marketing language
|
||||
|
||||
### Phase 5: Update Tests (TODO)
|
||||
|
||||
Convert all tests from:
|
||||
```python
|
||||
assert response.contents[0].key == "file.zip"
|
||||
```
|
||||
|
||||
To:
|
||||
```python
|
||||
assert response['Contents'][0]['Key'] == "file.zip"
|
||||
```
|
||||
|
||||
## Migration Guide (for users)
|
||||
|
||||
### v4.x → v5.0
|
||||
|
||||
**Old code (v4.x):**
|
||||
```python
|
||||
from deltaglider import create_client
|
||||
|
||||
client = create_client()
|
||||
response = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response.contents: # Dataclass attribute
|
||||
print(f"{obj.key}: {obj.size}") # Dataclass attributes
|
||||
```
|
||||
|
||||
**New code (v5.0):**
|
||||
```python
|
||||
from deltaglider import create_client, ListObjectsV2Response
|
||||
|
||||
client = create_client()
|
||||
response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response['Contents']: # Dict key (boto3-compatible)
|
||||
print(f"{obj['Key']}: {obj['Size']}") # Dict keys (boto3-compatible)
|
||||
```
|
||||
|
||||
**Or even simpler - no type hint needed:**
|
||||
```python
|
||||
client = create_client()
|
||||
response = client.list_objects(Bucket='my-bucket')
|
||||
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']}")
|
||||
```
|
||||
|
||||
## Benefits Summary
|
||||
|
||||
### For Users
|
||||
- **Zero learning curve** - if you know boto3, you're done
|
||||
- **Drop-in replacement** - literally change one line (client creation)
|
||||
- **Type safety** - TypedDict provides autocomplete without boto3 dependency
|
||||
- **Tool compatibility** - works with all boto3-compatible libraries
|
||||
|
||||
### For DeltaGlider
|
||||
- **Simpler codebase** - no custom dataclasses to maintain
|
||||
- **Better marketing** - true "drop-in replacement" claim
|
||||
- **Easier testing** - test against boto3 behavior directly
|
||||
- **Future-proof** - if boto3 adds fields, users can access them immediately
|
||||
|
||||
## Technical Details
|
||||
|
||||
### How TypedDict Works
|
||||
|
||||
```python
|
||||
from typing import TypedDict
|
||||
|
||||
class MyResponse(TypedDict):
|
||||
Key: str
|
||||
Size: int
|
||||
|
||||
# At runtime, this is just a dict!
|
||||
response: MyResponse = {'Key': 'file.zip', 'Size': 1024}
|
||||
print(type(response)) # <class 'dict'>
|
||||
|
||||
# But mypy and IDEs understand the structure
|
||||
response['Key'] # ✅ Autocomplete works!
|
||||
response['Nonexistent'] # ❌ Mypy error: Key 'Nonexistent' not found
|
||||
```
|
||||
|
||||
### DeltaGlider-Specific Metadata
|
||||
|
||||
Store in standard boto3 `Metadata` field:
|
||||
|
||||
```python
|
||||
{
|
||||
'Key': 'file.zip',
|
||||
'Size': 1024,
|
||||
'Metadata': {
|
||||
# DeltaGlider-specific fields (prefixed for safety)
|
||||
'deltaglider-is-delta': 'true',
|
||||
'deltaglider-compression-ratio': '0.99',
|
||||
'deltaglider-original-size': '100000',
|
||||
'deltaglider-reference-key': 'releases/v1.0.0/reference.bin',
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This is:
|
||||
- ✅ boto3-compatible (Metadata is a standard field)
|
||||
- ✅ Namespaced (deltaglider- prefix prevents conflicts)
|
||||
- ✅ Optional (tools can ignore it)
|
||||
- ✅ Type-safe (Metadata: NotRequired[dict[str, str]])
|
||||
|
||||
## Status
|
||||
|
||||
- ✅ **Phase 1:** TypedDict definitions created
|
||||
- ✅ **Phase 2:** `list_objects()` refactored to return boto3-compatible dict
|
||||
- ⏳ **Phase 3:** Refactor remaining methods (`put_object`, `get_object`, etc.) (TODO)
|
||||
- ⏳ **Phase 4:** Backward compatibility with deprecation warnings (TODO)
|
||||
- ⏳ **Phase 5:** Documentation updates (TODO)
|
||||
- ⏳ **Phase 6:** Full test coverage updates (PARTIAL - list_objects tests done)
|
||||
|
||||
**Current:** v4.2.3+ (Phase 2 complete - `list_objects()` boto3-compatible)
|
||||
**Target:** v5.0.0 release (all phases complete)
|
||||
@@ -38,10 +38,21 @@ response = client.get_object(Bucket='releases', Key='v1.0.0/app.zip')
|
||||
# Optimized list_objects with smart performance defaults (NEW!)
|
||||
# Fast by default - no unnecessary metadata fetching
|
||||
response = client.list_objects(Bucket='releases', Prefix='v1.0.0/')
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
|
||||
# Pagination for large buckets
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100,
|
||||
ContinuationToken=response.next_continuation_token)
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100)
|
||||
while response.get('IsTruncated'):
|
||||
# Process current page
|
||||
for obj in response['Contents']:
|
||||
print(obj['Key'])
|
||||
# Get next page
|
||||
response = client.list_objects(
|
||||
Bucket='releases',
|
||||
MaxKeys=100,
|
||||
ContinuationToken=response.get('NextContinuationToken')
|
||||
)
|
||||
|
||||
# Get detailed compression stats only when needed
|
||||
response = client.list_objects(Bucket='releases', FetchMetadata=True) # Slower but detailed
|
||||
|
||||
@@ -94,7 +94,7 @@ def list_objects(
|
||||
StartAfter: Optional[str] = None,
|
||||
FetchMetadata: bool = False,
|
||||
**kwargs
|
||||
) -> ListObjectsResponse
|
||||
) -> dict[str, Any]
|
||||
```
|
||||
|
||||
##### Parameters
|
||||
@@ -117,19 +117,32 @@ The method intelligently optimizes performance by:
|
||||
2. Only fetching metadata for delta files when explicitly requested
|
||||
3. Supporting efficient pagination for large buckets
|
||||
|
||||
##### Returns
|
||||
|
||||
boto3-compatible dict with:
|
||||
- **Contents** (`list[dict]`): List of S3Object dicts with Key, Size, LastModified, Metadata
|
||||
- **CommonPrefixes** (`list[dict]`): Optional list of common prefixes (folders)
|
||||
- **IsTruncated** (`bool`): Whether more results are available
|
||||
- **NextContinuationToken** (`str`): Token for next page
|
||||
- **KeyCount** (`int`): Number of keys returned
|
||||
|
||||
##### Examples
|
||||
|
||||
```python
|
||||
# Fast listing for UI display (no metadata fetching)
|
||||
response = client.list_objects(Bucket='releases')
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
|
||||
# Paginated listing for large buckets
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100)
|
||||
while response.is_truncated:
|
||||
while response.get('IsTruncated'):
|
||||
for obj in response['Contents']:
|
||||
print(obj['Key'])
|
||||
response = client.list_objects(
|
||||
Bucket='releases',
|
||||
MaxKeys=100,
|
||||
ContinuationToken=response.next_continuation_token
|
||||
ContinuationToken=response.get('NextContinuationToken')
|
||||
)
|
||||
|
||||
# Get detailed compression stats (slower, only for analytics)
|
||||
@@ -137,6 +150,11 @@ response = client.list_objects(
|
||||
Bucket='releases',
|
||||
FetchMetadata=True # Only fetches for delta files
|
||||
)
|
||||
for obj in response['Contents']:
|
||||
metadata = obj.get('Metadata', {})
|
||||
if metadata.get('deltaglider-is-delta') == 'true':
|
||||
compression = metadata.get('deltaglider-compression-ratio', 'unknown')
|
||||
print(f"{obj['Key']}: {compression} compression")
|
||||
```
|
||||
|
||||
#### `get_bucket_stats`
|
||||
|
||||
64
examples/boto3_compatible_types.py
Normal file
64
examples/boto3_compatible_types.py
Normal file
@@ -0,0 +1,64 @@
|
||||
"""Example: Using boto3-compatible responses without importing boto3.
|
||||
|
||||
This demonstrates how DeltaGlider provides full type safety and boto3 compatibility
|
||||
without requiring boto3 imports in user code.
|
||||
|
||||
As of v5.0.0, DeltaGlider returns plain dicts (not custom dataclasses) that are
|
||||
100% compatible with boto3 S3 responses. You get IDE autocomplete through TypedDict
|
||||
type hints without any runtime overhead.
|
||||
"""
|
||||
|
||||
from deltaglider import ListObjectsV2Response, S3Object, create_client
|
||||
|
||||
# Create client (no boto3 import needed!)
|
||||
client = create_client()
|
||||
|
||||
# Type hints work perfectly without boto3
|
||||
def process_files(bucket: str, prefix: str) -> None:
|
||||
"""Process files in S3 with full type safety."""
|
||||
# Return type is fully typed - IDE autocomplete works!
|
||||
response: ListObjectsV2Response = client.list_objects(
|
||||
Bucket=bucket, Prefix=prefix, Delimiter="/"
|
||||
)
|
||||
|
||||
# Response is a plain dict - 100% boto3-compatible
|
||||
# TypedDict provides autocomplete and type checking
|
||||
for obj in response["Contents"]:
|
||||
# obj is typed as S3Object - all fields have autocomplete!
|
||||
key: str = obj["Key"] # ✅ IDE knows this is str
|
||||
size: int = obj["Size"] # ✅ IDE knows this is int
|
||||
print(f"{key}: {size} bytes")
|
||||
|
||||
# DeltaGlider metadata is in the standard Metadata field
|
||||
metadata = obj.get("Metadata", {})
|
||||
if metadata.get("deltaglider-is-delta") == "true":
|
||||
compression = metadata.get("deltaglider-compression-ratio", "unknown")
|
||||
print(f" └─ Delta file (compression: {compression})")
|
||||
|
||||
# Optional fields work too
|
||||
for prefix_dict in response.get("CommonPrefixes", []):
|
||||
print(f"Directory: {prefix_dict['Prefix']}")
|
||||
|
||||
# Pagination info
|
||||
if response.get("IsTruncated"):
|
||||
next_token = response.get("NextContinuationToken")
|
||||
print(f"More results available, token: {next_token}")
|
||||
|
||||
|
||||
# This is 100% compatible with boto3 code!
|
||||
def works_with_boto3_or_deltaglider(s3_client) -> None:
|
||||
"""This function works with EITHER boto3 or DeltaGlider client."""
|
||||
# Because the response structure is identical!
|
||||
response = s3_client.list_objects(Bucket="my-bucket")
|
||||
|
||||
for obj in response["Contents"]:
|
||||
print(obj["Key"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Example usage
|
||||
print("✅ Full type safety without boto3 imports!")
|
||||
print("✅ 100% compatible with boto3")
|
||||
print("✅ Drop-in replacement")
|
||||
print("✅ Plain dict responses (not custom dataclasses)")
|
||||
print("✅ DeltaGlider metadata in standard Metadata field")
|
||||
@@ -17,12 +17,26 @@ from .client_models import (
|
||||
)
|
||||
from .core import DeltaService, DeltaSpace, ObjectKey
|
||||
|
||||
# Import boto3-compatible type aliases (no boto3 import required!)
|
||||
from .types import (
|
||||
CopyObjectResponse,
|
||||
CreateBucketResponse,
|
||||
DeleteObjectResponse,
|
||||
DeleteObjectsResponse,
|
||||
GetObjectResponse,
|
||||
HeadObjectResponse,
|
||||
ListBucketsResponse,
|
||||
ListObjectsV2Response,
|
||||
PutObjectResponse,
|
||||
S3Object,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"__version__",
|
||||
# Client
|
||||
"DeltaGliderClient",
|
||||
"create_client",
|
||||
# Data classes
|
||||
# Data classes (legacy - will be deprecated in favor of TypedDict)
|
||||
"UploadSummary",
|
||||
"CompressionEstimate",
|
||||
"ObjectInfo",
|
||||
@@ -32,4 +46,15 @@ __all__ = [
|
||||
"DeltaService",
|
||||
"DeltaSpace",
|
||||
"ObjectKey",
|
||||
# boto3-compatible types (no boto3 import needed!)
|
||||
"ListObjectsV2Response",
|
||||
"PutObjectResponse",
|
||||
"GetObjectResponse",
|
||||
"DeleteObjectResponse",
|
||||
"DeleteObjectsResponse",
|
||||
"HeadObjectResponse",
|
||||
"ListBucketsResponse",
|
||||
"CreateBucketResponse",
|
||||
"CopyObjectResponse",
|
||||
"S3Object",
|
||||
]
|
||||
|
||||
@@ -259,23 +259,26 @@ def ls(
|
||||
return f"{size_float:.1f}P"
|
||||
|
||||
# List objects using SDK (automatically filters .delta and reference.bin)
|
||||
from deltaglider.client import DeltaGliderClient, ListObjectsResponse
|
||||
from deltaglider.client import DeltaGliderClient
|
||||
|
||||
client = DeltaGliderClient(service)
|
||||
dg_response: ListObjectsResponse = client.list_objects(
|
||||
Bucket=bucket_name, Prefix=prefix_str, MaxKeys=10000, Delimiter="/" if not recursive else ""
|
||||
dg_response = client.list_objects(
|
||||
Bucket=bucket_name,
|
||||
Prefix=prefix_str,
|
||||
MaxKeys=10000,
|
||||
Delimiter="/" if not recursive else "",
|
||||
)
|
||||
objects = dg_response.contents
|
||||
objects = dg_response["Contents"]
|
||||
|
||||
# Filter by recursive flag
|
||||
if not recursive:
|
||||
# Show common prefixes (subdirectories) from S3 response
|
||||
for common_prefix in dg_response.common_prefixes:
|
||||
for common_prefix in dg_response.get("CommonPrefixes", []):
|
||||
prefix_path = common_prefix.get("Prefix", "")
|
||||
# Show only the directory name, not the full path
|
||||
if prefix_str:
|
||||
# Strip the current prefix to show only the subdirectory
|
||||
display_name = prefix_path[len(prefix_str):]
|
||||
display_name = prefix_path[len(prefix_str) :]
|
||||
else:
|
||||
display_name = prefix_path
|
||||
click.echo(f" PRE {display_name}")
|
||||
@@ -283,7 +286,8 @@ def ls(
|
||||
# Only show files at current level (not in subdirectories)
|
||||
filtered_objects = []
|
||||
for obj in objects:
|
||||
rel_path = obj.key[len(prefix_str):] if prefix_str else obj.key
|
||||
obj_key = obj["Key"]
|
||||
rel_path = obj_key[len(prefix_str) :] if prefix_str else obj_key
|
||||
# Only include if it's a direct child (no / in relative path)
|
||||
if "/" not in rel_path and rel_path:
|
||||
filtered_objects.append(obj)
|
||||
@@ -294,23 +298,24 @@ def ls(
|
||||
total_count = 0
|
||||
|
||||
for obj in objects:
|
||||
total_size += obj.size
|
||||
total_size += obj["Size"]
|
||||
total_count += 1
|
||||
|
||||
# Format the display
|
||||
size_str = format_bytes(obj.size)
|
||||
size_str = format_bytes(obj["Size"])
|
||||
# last_modified is a string from SDK, parse it if needed
|
||||
if isinstance(obj.last_modified, str):
|
||||
last_modified = obj.get("LastModified", "")
|
||||
if isinstance(last_modified, str):
|
||||
# Already a string, extract date portion
|
||||
date_str = obj.last_modified[:19].replace("T", " ")
|
||||
date_str = last_modified[:19].replace("T", " ")
|
||||
else:
|
||||
date_str = obj.last_modified.strftime("%Y-%m-%d %H:%M:%S")
|
||||
date_str = last_modified.strftime("%Y-%m-%d %H:%M:%S")
|
||||
|
||||
# Show only the filename relative to current prefix (like AWS CLI)
|
||||
if prefix_str:
|
||||
display_key = obj.key[len(prefix_str):]
|
||||
display_key = obj["Key"][len(prefix_str) :]
|
||||
else:
|
||||
display_key = obj.key
|
||||
display_key = obj["Key"]
|
||||
|
||||
click.echo(f"{date_str} {size_str:>10} {display_key}")
|
||||
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
"""DeltaGlider client with boto3-compatible APIs and advanced features."""
|
||||
|
||||
# ruff: noqa: I001
|
||||
import tempfile
|
||||
from collections.abc import Callable
|
||||
from pathlib import Path
|
||||
@@ -10,12 +11,36 @@ from .client_delete_helpers import delete_with_delta_suffix
|
||||
from .client_models import (
|
||||
BucketStats,
|
||||
CompressionEstimate,
|
||||
ListObjectsResponse,
|
||||
ObjectInfo,
|
||||
UploadSummary,
|
||||
)
|
||||
|
||||
# fmt: off - Keep all client_operations imports together
|
||||
from .client_operations import (
|
||||
create_bucket as _create_bucket,
|
||||
delete_bucket as _delete_bucket,
|
||||
download_batch as _download_batch,
|
||||
estimate_compression as _estimate_compression,
|
||||
find_similar_files as _find_similar_files,
|
||||
generate_presigned_post as _generate_presigned_post,
|
||||
generate_presigned_url as _generate_presigned_url,
|
||||
get_bucket_stats as _get_bucket_stats,
|
||||
get_object_info as _get_object_info,
|
||||
list_buckets as _list_buckets,
|
||||
upload_batch as _upload_batch,
|
||||
upload_chunked as _upload_chunked,
|
||||
)
|
||||
# fmt: on
|
||||
|
||||
from .core import DeltaService, DeltaSpace, ObjectKey
|
||||
from .core.errors import NotFoundError
|
||||
from .response_builders import (
|
||||
build_delete_response,
|
||||
build_get_response,
|
||||
build_list_objects_response,
|
||||
build_put_response,
|
||||
)
|
||||
from .types import CommonPrefix, S3Object
|
||||
|
||||
|
||||
class DeltaGliderClient:
|
||||
@@ -123,21 +148,33 @@ class DeltaGliderClient:
|
||||
# Calculate ETag from file content
|
||||
sha256_hash = self.service.hasher.sha256(tmp_path)
|
||||
|
||||
# Return boto3-compatible response with delta info
|
||||
return {
|
||||
"ETag": f'"{sha256_hash}"',
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
"DeltaGlider": {
|
||||
"original_size": summary.file_size,
|
||||
"stored_size": summary.delta_size or summary.file_size,
|
||||
"is_delta": summary.delta_size is not None,
|
||||
"compression_ratio": summary.delta_ratio or 1.0,
|
||||
"stored_as": summary.key,
|
||||
"operation": summary.operation,
|
||||
},
|
||||
# Build DeltaGlider compression info
|
||||
deltaglider_info: dict[str, Any] = {
|
||||
"OriginalSizeMB": summary.file_size / (1024 * 1024),
|
||||
"StoredSizeMB": (summary.delta_size or summary.file_size) / (1024 * 1024),
|
||||
"IsDelta": summary.delta_size is not None,
|
||||
"CompressionRatio": summary.delta_ratio or 1.0,
|
||||
"SavingsPercent": (
|
||||
(
|
||||
(summary.file_size - (summary.delta_size or summary.file_size))
|
||||
/ summary.file_size
|
||||
* 100
|
||||
)
|
||||
if summary.file_size > 0
|
||||
else 0.0
|
||||
),
|
||||
"StoredAs": summary.key,
|
||||
"Operation": summary.operation,
|
||||
}
|
||||
|
||||
# Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
|
||||
return cast(
|
||||
dict[str, Any],
|
||||
build_put_response(
|
||||
etag=f'"{sha256_hash}"',
|
||||
deltaglider_info=deltaglider_info,
|
||||
),
|
||||
)
|
||||
finally:
|
||||
# Clean up temp file
|
||||
if tmp_path.exists():
|
||||
@@ -173,19 +210,19 @@ class DeltaGliderClient:
|
||||
|
||||
# Get metadata
|
||||
obj_head = self.service.storage.head(f"{Bucket}/{Key}")
|
||||
file_size = tmp_path.stat().st_size
|
||||
etag = f'"{self.service.hasher.sha256(tmp_path)}"'
|
||||
|
||||
return {
|
||||
"Body": body, # File-like object
|
||||
"ContentLength": tmp_path.stat().st_size,
|
||||
"ContentType": obj_head.metadata.get("content_type", "binary/octet-stream")
|
||||
if obj_head
|
||||
else "binary/octet-stream",
|
||||
"ETag": f'"{self.service.hasher.sha256(tmp_path)}"',
|
||||
"Metadata": obj_head.metadata if obj_head else {},
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
# Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
|
||||
return cast(
|
||||
dict[str, Any],
|
||||
build_get_response(
|
||||
body=body, # type: ignore[arg-type] # File object is compatible with bytes
|
||||
content_length=file_size,
|
||||
etag=etag,
|
||||
metadata=obj_head.metadata if obj_head else {},
|
||||
),
|
||||
)
|
||||
|
||||
def list_objects(
|
||||
self,
|
||||
@@ -197,7 +234,7 @@ class DeltaGliderClient:
|
||||
StartAfter: str | None = None,
|
||||
FetchMetadata: bool = False,
|
||||
**kwargs: Any,
|
||||
) -> ListObjectsResponse:
|
||||
) -> dict[str, Any]:
|
||||
"""List objects in bucket with smart metadata fetching.
|
||||
|
||||
This method optimizes performance by:
|
||||
@@ -227,11 +264,11 @@ class DeltaGliderClient:
|
||||
# Fast listing for UI display (no metadata)
|
||||
response = client.list_objects(Bucket='releases', MaxKeys=100)
|
||||
|
||||
# Paginated listing
|
||||
# Paginated listing (boto3-compatible dict response)
|
||||
response = client.list_objects(
|
||||
Bucket='releases',
|
||||
MaxKeys=50,
|
||||
ContinuationToken=response.next_continuation_token
|
||||
ContinuationToken=response.get('NextContinuationToken')
|
||||
)
|
||||
|
||||
# Detailed listing with compression stats (slower, only for analytics)
|
||||
@@ -265,8 +302,8 @@ class DeltaGliderClient:
|
||||
"is_truncated": False,
|
||||
}
|
||||
|
||||
# Convert to ObjectInfo objects with smart metadata fetching
|
||||
contents = []
|
||||
# Convert to boto3-compatible S3Object TypedDicts (type-safe!)
|
||||
contents: list[S3Object] = []
|
||||
for obj in result.get("objects", []):
|
||||
# Skip reference.bin files (internal files, never exposed to users)
|
||||
if obj["key"].endswith("/reference.bin") or obj["key"] == "reference.bin":
|
||||
@@ -280,20 +317,12 @@ class DeltaGliderClient:
|
||||
if is_delta:
|
||||
display_key = display_key[:-6] # Remove .delta suffix
|
||||
|
||||
# Create object info with basic data (no HEAD request)
|
||||
info = ObjectInfo(
|
||||
key=display_key, # Use cleaned key without .delta
|
||||
size=obj["size"],
|
||||
last_modified=obj.get("last_modified", ""),
|
||||
etag=obj.get("etag"),
|
||||
storage_class=obj.get("storage_class", "STANDARD"),
|
||||
# DeltaGlider fields
|
||||
original_size=obj["size"], # For non-delta, original = stored
|
||||
compressed_size=obj["size"],
|
||||
is_delta=is_delta,
|
||||
compression_ratio=0.0 if not is_delta else None,
|
||||
reference_key=None,
|
||||
)
|
||||
# Build DeltaGlider metadata
|
||||
deltaglider_metadata: dict[str, str] = {
|
||||
"deltaglider-is-delta": str(is_delta).lower(),
|
||||
"deltaglider-original-size": str(obj["size"]),
|
||||
"deltaglider-compression-ratio": "0.0" if not is_delta else "unknown",
|
||||
}
|
||||
|
||||
# SMART METADATA FETCHING:
|
||||
# 1. NEVER fetch metadata for non-delta files (no point)
|
||||
@@ -304,30 +333,52 @@ class DeltaGliderClient:
|
||||
if obj_head and obj_head.metadata:
|
||||
metadata = obj_head.metadata
|
||||
# Update with actual compression stats
|
||||
info.original_size = int(metadata.get("file_size", obj["size"]))
|
||||
info.compression_ratio = float(metadata.get("compression_ratio", 0.0))
|
||||
info.reference_key = metadata.get("ref_key")
|
||||
original_size = int(metadata.get("file_size", obj["size"]))
|
||||
compression_ratio = float(metadata.get("compression_ratio", 0.0))
|
||||
reference_key = metadata.get("ref_key")
|
||||
|
||||
deltaglider_metadata["deltaglider-original-size"] = str(original_size)
|
||||
deltaglider_metadata["deltaglider-compression-ratio"] = str(
|
||||
compression_ratio
|
||||
)
|
||||
if reference_key:
|
||||
deltaglider_metadata["deltaglider-reference-key"] = reference_key
|
||||
except Exception as e:
|
||||
# Log but don't fail the listing
|
||||
self.service.logger.debug(f"Failed to fetch metadata for {obj['key']}: {e}")
|
||||
|
||||
contents.append(info)
|
||||
# Create boto3-compatible S3Object TypedDict - mypy validates structure!
|
||||
s3_obj: S3Object = {
|
||||
"Key": display_key, # Use cleaned key without .delta
|
||||
"Size": obj["size"],
|
||||
"LastModified": obj.get("last_modified", ""),
|
||||
"ETag": obj.get("etag"),
|
||||
"StorageClass": obj.get("storage_class", "STANDARD"),
|
||||
"Metadata": deltaglider_metadata,
|
||||
}
|
||||
contents.append(s3_obj)
|
||||
|
||||
# Build response with pagination support
|
||||
response = ListObjectsResponse(
|
||||
name=Bucket,
|
||||
prefix=Prefix,
|
||||
delimiter=Delimiter,
|
||||
max_keys=MaxKeys,
|
||||
contents=contents,
|
||||
common_prefixes=[{"Prefix": p} for p in result.get("common_prefixes", [])],
|
||||
is_truncated=result.get("is_truncated", False),
|
||||
next_continuation_token=result.get("next_continuation_token"),
|
||||
continuation_token=ContinuationToken,
|
||||
key_count=len(contents),
|
||||
# Build type-safe boto3-compatible CommonPrefix TypedDicts
|
||||
common_prefixes = result.get("common_prefixes", [])
|
||||
common_prefix_dicts: list[CommonPrefix] | None = (
|
||||
[CommonPrefix(Prefix=p) for p in common_prefixes] if common_prefixes else None
|
||||
)
|
||||
|
||||
return response
|
||||
# Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
|
||||
return cast(
|
||||
dict[str, Any],
|
||||
build_list_objects_response(
|
||||
bucket=Bucket,
|
||||
prefix=Prefix,
|
||||
delimiter=Delimiter,
|
||||
max_keys=MaxKeys,
|
||||
contents=contents,
|
||||
common_prefixes=common_prefix_dicts,
|
||||
is_truncated=result.get("is_truncated", False),
|
||||
next_continuation_token=result.get("next_continuation_token"),
|
||||
continuation_token=ContinuationToken,
|
||||
),
|
||||
)
|
||||
|
||||
def delete_object(
|
||||
self,
|
||||
@@ -347,32 +398,31 @@ class DeltaGliderClient:
|
||||
"""
|
||||
_, delete_result = delete_with_delta_suffix(self.service, Bucket, Key)
|
||||
|
||||
response = {
|
||||
"DeleteMarker": False,
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 204,
|
||||
},
|
||||
"DeltaGliderInfo": {
|
||||
"Type": delete_result.get("type"),
|
||||
"Deleted": delete_result.get("deleted", False),
|
||||
},
|
||||
# Build DeltaGlider-specific info
|
||||
deltaglider_info: dict[str, Any] = {
|
||||
"Type": delete_result.get("type"),
|
||||
"Deleted": delete_result.get("deleted", False),
|
||||
}
|
||||
|
||||
# Add warnings if any
|
||||
warnings = delete_result.get("warnings")
|
||||
if warnings:
|
||||
delta_info = response.get("DeltaGliderInfo")
|
||||
if delta_info and isinstance(delta_info, dict):
|
||||
delta_info["Warnings"] = warnings
|
||||
deltaglider_info["Warnings"] = warnings
|
||||
|
||||
# Add dependent delta count for references
|
||||
dependent_deltas = delete_result.get("dependent_deltas")
|
||||
if dependent_deltas:
|
||||
delta_info = response.get("DeltaGliderInfo")
|
||||
if delta_info and isinstance(delta_info, dict):
|
||||
delta_info["DependentDeltas"] = dependent_deltas
|
||||
deltaglider_info["DependentDeltas"] = dependent_deltas
|
||||
|
||||
return response
|
||||
# Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
|
||||
return cast(
|
||||
dict[str, Any],
|
||||
build_delete_response(
|
||||
delete_marker=False,
|
||||
status_code=204,
|
||||
deltaglider_info=deltaglider_info,
|
||||
),
|
||||
)
|
||||
|
||||
def delete_objects(
|
||||
self,
|
||||
@@ -760,40 +810,9 @@ class DeltaGliderClient:
|
||||
progress_callback=on_progress
|
||||
)
|
||||
"""
|
||||
file_path = Path(file_path)
|
||||
file_size = file_path.stat().st_size
|
||||
|
||||
# For small files, just use regular upload
|
||||
if file_size <= chunk_size:
|
||||
if progress_callback:
|
||||
progress_callback(1, 1, file_size, file_size)
|
||||
return self.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
|
||||
# Calculate chunks
|
||||
total_chunks = (file_size + chunk_size - 1) // chunk_size
|
||||
|
||||
# Create a temporary file for chunked processing
|
||||
# For now, we read the entire file but report progress in chunks
|
||||
# Future enhancement: implement true streaming upload in storage adapter
|
||||
bytes_read = 0
|
||||
|
||||
with open(file_path, "rb") as f:
|
||||
for chunk_num in range(1, total_chunks + 1):
|
||||
# Read chunk (simulated for progress reporting)
|
||||
chunk_data = f.read(chunk_size)
|
||||
bytes_read += len(chunk_data)
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(chunk_num, total_chunks, bytes_read, file_size)
|
||||
|
||||
# Perform the actual upload
|
||||
# TODO: When storage adapter supports streaming, pass chunks directly
|
||||
result = self.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
|
||||
# Final progress callback
|
||||
if progress_callback:
|
||||
progress_callback(total_chunks, total_chunks, file_size, file_size)
|
||||
|
||||
result: UploadSummary = _upload_chunked(
|
||||
self, file_path, s3_url, chunk_size, progress_callback, max_ratio
|
||||
)
|
||||
return result
|
||||
|
||||
def upload_batch(
|
||||
@@ -814,20 +833,7 @@ class DeltaGliderClient:
|
||||
Returns:
|
||||
List of UploadSummary objects
|
||||
"""
|
||||
results = []
|
||||
|
||||
for i, file_path in enumerate(files):
|
||||
file_path = Path(file_path)
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(file_path.name, i + 1, len(files))
|
||||
|
||||
# Upload each file
|
||||
s3_url = f"{s3_prefix.rstrip('/')}/{file_path.name}"
|
||||
summary = self.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
results.append(summary)
|
||||
|
||||
return results
|
||||
return _upload_batch(self, files, s3_prefix, max_ratio, progress_callback)
|
||||
|
||||
def download_batch(
|
||||
self,
|
||||
@@ -845,24 +851,7 @@ class DeltaGliderClient:
|
||||
Returns:
|
||||
List of downloaded file paths
|
||||
"""
|
||||
output_dir = Path(output_dir)
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
results = []
|
||||
|
||||
for i, s3_url in enumerate(s3_urls):
|
||||
# Extract filename from URL
|
||||
filename = s3_url.split("/")[-1]
|
||||
if filename.endswith(".delta"):
|
||||
filename = filename[:-6] # Remove .delta suffix
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(filename, i + 1, len(s3_urls))
|
||||
|
||||
output_path = output_dir / filename
|
||||
self.download(s3_url, output_path)
|
||||
results.append(output_path)
|
||||
|
||||
return results
|
||||
return _download_batch(self, s3_urls, output_dir, progress_callback)
|
||||
|
||||
def estimate_compression(
|
||||
self,
|
||||
@@ -882,80 +871,10 @@ class DeltaGliderClient:
|
||||
Returns:
|
||||
CompressionEstimate with predicted compression
|
||||
"""
|
||||
file_path = Path(file_path)
|
||||
file_size = file_path.stat().st_size
|
||||
|
||||
# Check file extension
|
||||
ext = file_path.suffix.lower()
|
||||
delta_extensions = {
|
||||
".zip",
|
||||
".tar",
|
||||
".gz",
|
||||
".tar.gz",
|
||||
".tgz",
|
||||
".bz2",
|
||||
".tar.bz2",
|
||||
".xz",
|
||||
".tar.xz",
|
||||
".7z",
|
||||
".rar",
|
||||
".dmg",
|
||||
".iso",
|
||||
".pkg",
|
||||
".deb",
|
||||
".rpm",
|
||||
".apk",
|
||||
".jar",
|
||||
".war",
|
||||
".ear",
|
||||
}
|
||||
|
||||
# Already compressed formats that won't benefit from delta
|
||||
incompressible = {".jpg", ".jpeg", ".png", ".mp4", ".mp3", ".avi", ".mov"}
|
||||
|
||||
if ext in incompressible:
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=file_size,
|
||||
estimated_ratio=0.0,
|
||||
confidence=0.95,
|
||||
should_use_delta=False,
|
||||
)
|
||||
|
||||
if ext not in delta_extensions:
|
||||
# Unknown type, conservative estimate
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=file_size,
|
||||
estimated_ratio=0.0,
|
||||
confidence=0.5,
|
||||
should_use_delta=file_size > 1024 * 1024, # Only for files > 1MB
|
||||
)
|
||||
|
||||
# Look for similar files in the target location
|
||||
similar_files = self.find_similar_files(bucket, prefix, file_path.name)
|
||||
|
||||
if similar_files:
|
||||
# If we have similar files, estimate high compression
|
||||
estimated_ratio = 0.99 # 99% compression typical for similar versions
|
||||
confidence = 0.9
|
||||
recommended_ref = similar_files[0]["Key"] if similar_files else None
|
||||
else:
|
||||
# First file of its type
|
||||
estimated_ratio = 0.0
|
||||
confidence = 0.7
|
||||
recommended_ref = None
|
||||
|
||||
estimated_size = int(file_size * (1 - estimated_ratio))
|
||||
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=estimated_size,
|
||||
estimated_ratio=estimated_ratio,
|
||||
confidence=confidence,
|
||||
recommended_reference=recommended_ref,
|
||||
should_use_delta=True,
|
||||
result: CompressionEstimate = _estimate_compression(
|
||||
self, file_path, bucket, prefix, sample_size
|
||||
)
|
||||
return result
|
||||
|
||||
def find_similar_files(
|
||||
self,
|
||||
@@ -975,56 +894,7 @@ class DeltaGliderClient:
|
||||
Returns:
|
||||
List of similar files with scores
|
||||
"""
|
||||
# List objects in the prefix (no metadata needed for similarity check)
|
||||
response = self.list_objects(
|
||||
Bucket=bucket,
|
||||
Prefix=prefix,
|
||||
MaxKeys=1000,
|
||||
FetchMetadata=False, # Don't need metadata for similarity
|
||||
)
|
||||
|
||||
similar: list[dict[str, Any]] = []
|
||||
base_name = Path(filename).stem
|
||||
ext = Path(filename).suffix
|
||||
|
||||
for obj in response.contents:
|
||||
obj_base = Path(obj.key).stem
|
||||
obj_ext = Path(obj.key).suffix
|
||||
|
||||
# Skip delta files and references
|
||||
if obj.key.endswith(".delta") or obj.key.endswith("reference.bin"):
|
||||
continue
|
||||
|
||||
score = 0.0
|
||||
|
||||
# Extension match
|
||||
if ext == obj_ext:
|
||||
score += 0.5
|
||||
|
||||
# Base name similarity
|
||||
if base_name in obj_base or obj_base in base_name:
|
||||
score += 0.3
|
||||
|
||||
# Version pattern match
|
||||
import re
|
||||
|
||||
if re.search(r"v?\d+[\.\d]*", base_name) and re.search(r"v?\d+[\.\d]*", obj_base):
|
||||
score += 0.2
|
||||
|
||||
if score > 0.5:
|
||||
similar.append(
|
||||
{
|
||||
"Key": obj.key,
|
||||
"Size": obj.size,
|
||||
"Similarity": score,
|
||||
"LastModified": obj.last_modified,
|
||||
}
|
||||
)
|
||||
|
||||
# Sort by similarity
|
||||
similar.sort(key=lambda x: x["Similarity"], reverse=True) # type: ignore
|
||||
|
||||
return similar[:limit]
|
||||
return _find_similar_files(self, bucket, prefix, filename, limit)
|
||||
|
||||
def get_object_info(self, s3_url: str) -> ObjectInfo:
|
||||
"""Get detailed object information including compression stats.
|
||||
@@ -1035,34 +905,8 @@ class DeltaGliderClient:
|
||||
Returns:
|
||||
ObjectInfo with detailed metadata
|
||||
"""
|
||||
# Parse URL
|
||||
if not s3_url.startswith("s3://"):
|
||||
raise ValueError(f"Invalid S3 URL: {s3_url}")
|
||||
|
||||
s3_path = s3_url[5:]
|
||||
parts = s3_path.split("/", 1)
|
||||
bucket = parts[0]
|
||||
key = parts[1] if len(parts) > 1 else ""
|
||||
|
||||
# Get object metadata
|
||||
obj_head = self.service.storage.head(f"{bucket}/{key}")
|
||||
if not obj_head:
|
||||
raise FileNotFoundError(f"Object not found: {s3_url}")
|
||||
|
||||
metadata = obj_head.metadata
|
||||
is_delta = key.endswith(".delta")
|
||||
|
||||
return ObjectInfo(
|
||||
key=key,
|
||||
size=obj_head.size,
|
||||
last_modified=metadata.get("last_modified", ""),
|
||||
etag=metadata.get("etag"),
|
||||
original_size=int(metadata.get("file_size", obj_head.size)),
|
||||
compressed_size=obj_head.size,
|
||||
compression_ratio=float(metadata.get("compression_ratio", 0.0)),
|
||||
is_delta=is_delta,
|
||||
reference_key=metadata.get("ref_key"),
|
||||
)
|
||||
result: ObjectInfo = _get_object_info(self, s3_url)
|
||||
return result
|
||||
|
||||
def get_bucket_stats(self, bucket: str, detailed_stats: bool = False) -> BucketStats:
|
||||
"""Get statistics for a bucket with optional detailed compression metrics.
|
||||
@@ -1091,76 +935,8 @@ class DeltaGliderClient:
|
||||
stats = client.get_bucket_stats('releases', detailed_stats=True)
|
||||
print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
|
||||
"""
|
||||
# List all objects with smart metadata fetching
|
||||
all_objects = []
|
||||
continuation_token = None
|
||||
|
||||
while True:
|
||||
response = self.list_objects(
|
||||
Bucket=bucket,
|
||||
MaxKeys=1000,
|
||||
ContinuationToken=continuation_token,
|
||||
FetchMetadata=detailed_stats, # Only fetch metadata if detailed stats requested
|
||||
)
|
||||
|
||||
all_objects.extend(response.contents)
|
||||
|
||||
if not response.is_truncated:
|
||||
break
|
||||
|
||||
continuation_token = response.next_continuation_token
|
||||
|
||||
# Calculate statistics
|
||||
total_size = 0
|
||||
compressed_size = 0
|
||||
delta_count = 0
|
||||
direct_count = 0
|
||||
|
||||
for obj in all_objects:
|
||||
compressed_size += obj.size
|
||||
|
||||
if obj.is_delta:
|
||||
delta_count += 1
|
||||
# Use actual original size if we have it, otherwise estimate
|
||||
total_size += obj.original_size or obj.size
|
||||
else:
|
||||
direct_count += 1
|
||||
# For non-delta files, original equals compressed
|
||||
total_size += obj.size
|
||||
|
||||
space_saved = total_size - compressed_size
|
||||
avg_ratio = (space_saved / total_size) if total_size > 0 else 0.0
|
||||
|
||||
return BucketStats(
|
||||
bucket=bucket,
|
||||
object_count=len(all_objects),
|
||||
total_size=total_size,
|
||||
compressed_size=compressed_size,
|
||||
space_saved=space_saved,
|
||||
average_compression_ratio=avg_ratio,
|
||||
delta_objects=delta_count,
|
||||
direct_objects=direct_count,
|
||||
)
|
||||
|
||||
def _try_boto3_presigned_operation(self, operation: str, **kwargs: Any) -> Any | None:
|
||||
"""Try to generate presigned operation using boto3 client, return None if not available."""
|
||||
storage_adapter = self.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
if operation == "url":
|
||||
return str(storage_adapter.client.generate_presigned_url(**kwargs))
|
||||
elif operation == "post":
|
||||
return dict(storage_adapter.client.generate_presigned_post(**kwargs))
|
||||
except AttributeError:
|
||||
# storage_adapter does not have a 'client' attribute
|
||||
pass
|
||||
except Exception as e:
|
||||
# Fall back to manual construction if needed
|
||||
self.service.logger.warning(f"Failed to generate presigned {operation}: {e}")
|
||||
|
||||
return None
|
||||
result: BucketStats = _get_bucket_stats(self, bucket, detailed_stats)
|
||||
return result
|
||||
|
||||
def generate_presigned_url(
|
||||
self,
|
||||
@@ -1178,28 +954,7 @@ class DeltaGliderClient:
|
||||
Returns:
|
||||
Presigned URL string
|
||||
"""
|
||||
# Try boto3 first, fallback to manual construction
|
||||
url = self._try_boto3_presigned_operation(
|
||||
"url",
|
||||
ClientMethod=ClientMethod,
|
||||
Params=Params,
|
||||
ExpiresIn=ExpiresIn,
|
||||
)
|
||||
if url is not None:
|
||||
return str(url)
|
||||
|
||||
# Fallback: construct URL manually (less secure, for dev/testing only)
|
||||
bucket = Params.get("Bucket", "")
|
||||
key = Params.get("Key", "")
|
||||
|
||||
if self.endpoint_url:
|
||||
base_url = self.endpoint_url
|
||||
else:
|
||||
base_url = f"https://{bucket}.s3.amazonaws.com"
|
||||
|
||||
# Warning: This is not a real presigned URL, just a placeholder
|
||||
self.service.logger.warning("Using placeholder presigned URL - not suitable for production")
|
||||
return f"{base_url}/{key}?expires={ExpiresIn}"
|
||||
return _generate_presigned_url(self, ClientMethod, Params, ExpiresIn)
|
||||
|
||||
def generate_presigned_post(
|
||||
self,
|
||||
@@ -1221,31 +976,7 @@ class DeltaGliderClient:
|
||||
Returns:
|
||||
Dict with 'url' and 'fields' for form submission
|
||||
"""
|
||||
# Try boto3 first, fallback to manual construction
|
||||
response = self._try_boto3_presigned_operation(
|
||||
"post",
|
||||
Bucket=Bucket,
|
||||
Key=Key,
|
||||
Fields=Fields,
|
||||
Conditions=Conditions,
|
||||
ExpiresIn=ExpiresIn,
|
||||
)
|
||||
if response is not None:
|
||||
return dict(response)
|
||||
|
||||
# Fallback: return minimal structure for compatibility
|
||||
if self.endpoint_url:
|
||||
url = f"{self.endpoint_url}/{Bucket}"
|
||||
else:
|
||||
url = f"https://{Bucket}.s3.amazonaws.com"
|
||||
|
||||
return {
|
||||
"url": url,
|
||||
"fields": {
|
||||
"key": Key,
|
||||
**(Fields or {}),
|
||||
},
|
||||
}
|
||||
return _generate_presigned_post(self, Bucket, Key, Fields, Conditions, ExpiresIn)
|
||||
|
||||
# ============================================================================
|
||||
# Bucket Management APIs (boto3-compatible)
|
||||
@@ -1276,36 +1007,7 @@ class DeltaGliderClient:
|
||||
... CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
|
||||
... )
|
||||
"""
|
||||
storage_adapter = self.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
params: dict[str, Any] = {"Bucket": Bucket}
|
||||
if CreateBucketConfiguration:
|
||||
params["CreateBucketConfiguration"] = CreateBucketConfiguration
|
||||
|
||||
response = storage_adapter.client.create_bucket(**params)
|
||||
return {
|
||||
"Location": response.get("Location", f"/{Bucket}"),
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
if "BucketAlreadyExists" in error_msg or "BucketAlreadyOwnedByYou" in error_msg:
|
||||
# Bucket already exists - return success
|
||||
self.service.logger.debug(f"Bucket {Bucket} already exists")
|
||||
return {
|
||||
"Location": f"/{Bucket}",
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
raise RuntimeError(f"Failed to create bucket: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket creation")
|
||||
return _create_bucket(self, Bucket, CreateBucketConfiguration, **kwargs)
|
||||
|
||||
def delete_bucket(
|
||||
self,
|
||||
@@ -1327,30 +1029,7 @@ class DeltaGliderClient:
|
||||
>>> client = create_client()
|
||||
>>> client.delete_bucket(Bucket='my-bucket')
|
||||
"""
|
||||
storage_adapter = self.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
storage_adapter.client.delete_bucket(Bucket=Bucket)
|
||||
return {
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 204,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
if "NoSuchBucket" in error_msg:
|
||||
# Bucket doesn't exist - return success
|
||||
self.service.logger.debug(f"Bucket {Bucket} does not exist")
|
||||
return {
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 204,
|
||||
},
|
||||
}
|
||||
raise RuntimeError(f"Failed to delete bucket: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket deletion")
|
||||
return _delete_bucket(self, Bucket, **kwargs)
|
||||
|
||||
def list_buckets(self, **kwargs: Any) -> dict[str, Any]:
|
||||
"""List all S3 buckets (boto3-compatible).
|
||||
@@ -1367,23 +1046,7 @@ class DeltaGliderClient:
|
||||
>>> for bucket in response['Buckets']:
|
||||
... print(bucket['Name'])
|
||||
"""
|
||||
storage_adapter = self.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
response = storage_adapter.client.list_buckets()
|
||||
return {
|
||||
"Buckets": response.get("Buckets", []),
|
||||
"Owner": response.get("Owner", {}),
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Failed to list buckets: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket listing")
|
||||
return _list_buckets(self, **kwargs)
|
||||
|
||||
def _parse_tagging(self, tagging: str) -> dict[str, str]:
|
||||
"""Parse URL-encoded tagging string to dict."""
|
||||
@@ -1480,7 +1143,7 @@ def create_client(
|
||||
metrics = NoopMetricsAdapter()
|
||||
|
||||
# Get default values
|
||||
tool_version = kwargs.pop("tool_version", "deltaglider/0.2.0")
|
||||
tool_version = kwargs.pop("tool_version", "deltaglider/5.0.0")
|
||||
max_ratio = kwargs.pop("max_ratio", 0.5)
|
||||
|
||||
# Create service
|
||||
|
||||
37
src/deltaglider/client_operations/__init__.py
Normal file
37
src/deltaglider/client_operations/__init__.py
Normal file
@@ -0,0 +1,37 @@
|
||||
"""Client operation modules for DeltaGliderClient.
|
||||
|
||||
This package contains modular operation implementations:
|
||||
- bucket: S3 bucket management (create, delete, list)
|
||||
- presigned: Presigned URL generation for temporary access
|
||||
- batch: Batch upload/download operations
|
||||
- stats: Statistics and analytics operations
|
||||
"""
|
||||
|
||||
from .batch import download_batch, upload_batch, upload_chunked
|
||||
from .bucket import create_bucket, delete_bucket, list_buckets
|
||||
from .presigned import generate_presigned_post, generate_presigned_url
|
||||
from .stats import (
|
||||
estimate_compression,
|
||||
find_similar_files,
|
||||
get_bucket_stats,
|
||||
get_object_info,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Bucket operations
|
||||
"create_bucket",
|
||||
"delete_bucket",
|
||||
"list_buckets",
|
||||
# Presigned operations
|
||||
"generate_presigned_url",
|
||||
"generate_presigned_post",
|
||||
# Batch operations
|
||||
"upload_chunked",
|
||||
"upload_batch",
|
||||
"download_batch",
|
||||
# Stats operations
|
||||
"get_bucket_stats",
|
||||
"get_object_info",
|
||||
"estimate_compression",
|
||||
"find_similar_files",
|
||||
]
|
||||
159
src/deltaglider/client_operations/batch.py
Normal file
159
src/deltaglider/client_operations/batch.py
Normal file
@@ -0,0 +1,159 @@
|
||||
"""Batch upload/download operations for DeltaGlider client.
|
||||
|
||||
This module contains DeltaGlider-specific batch operations:
|
||||
- upload_batch
|
||||
- download_batch
|
||||
- upload_chunked
|
||||
"""
|
||||
|
||||
from collections.abc import Callable
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from ..client_models import UploadSummary
|
||||
|
||||
|
||||
def upload_chunked(
|
||||
client: Any, # DeltaGliderClient
|
||||
file_path: str | Path,
|
||||
s3_url: str,
|
||||
chunk_size: int = 5 * 1024 * 1024,
|
||||
progress_callback: Callable[[int, int, int, int], None] | None = None,
|
||||
max_ratio: float = 0.5,
|
||||
) -> UploadSummary:
|
||||
"""Upload a file in chunks with progress callback.
|
||||
|
||||
This method reads the file in chunks to avoid loading large files entirely into memory,
|
||||
making it suitable for uploading very large files. Progress is reported after each chunk.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
file_path: Local file to upload
|
||||
s3_url: S3 destination URL (s3://bucket/path/filename)
|
||||
chunk_size: Size of each chunk in bytes (default 5MB)
|
||||
progress_callback: Callback(chunk_number, total_chunks, bytes_sent, total_bytes)
|
||||
max_ratio: Maximum acceptable delta/file ratio for compression
|
||||
|
||||
Returns:
|
||||
UploadSummary with compression statistics
|
||||
|
||||
Example:
|
||||
def on_progress(chunk_num, total_chunks, bytes_sent, total_bytes):
|
||||
percent = (bytes_sent / total_bytes) * 100
|
||||
print(f"Upload progress: {percent:.1f}%")
|
||||
|
||||
client.upload_chunked(
|
||||
"large_file.zip",
|
||||
"s3://bucket/releases/large_file.zip",
|
||||
chunk_size=10 * 1024 * 1024, # 10MB chunks
|
||||
progress_callback=on_progress
|
||||
)
|
||||
"""
|
||||
file_path = Path(file_path)
|
||||
file_size = file_path.stat().st_size
|
||||
|
||||
# For small files, just use regular upload
|
||||
if file_size <= chunk_size:
|
||||
if progress_callback:
|
||||
progress_callback(1, 1, file_size, file_size)
|
||||
result: UploadSummary = client.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
return result
|
||||
|
||||
# Calculate chunks
|
||||
total_chunks = (file_size + chunk_size - 1) // chunk_size
|
||||
|
||||
# Create a temporary file for chunked processing
|
||||
# For now, we read the entire file but report progress in chunks
|
||||
# Future enhancement: implement true streaming upload in storage adapter
|
||||
bytes_read = 0
|
||||
|
||||
with open(file_path, "rb") as f:
|
||||
for chunk_num in range(1, total_chunks + 1):
|
||||
# Read chunk (simulated for progress reporting)
|
||||
chunk_data = f.read(chunk_size)
|
||||
bytes_read += len(chunk_data)
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(chunk_num, total_chunks, bytes_read, file_size)
|
||||
|
||||
# Perform the actual upload
|
||||
# TODO: When storage adapter supports streaming, pass chunks directly
|
||||
upload_result: UploadSummary = client.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
|
||||
# Final progress callback
|
||||
if progress_callback:
|
||||
progress_callback(total_chunks, total_chunks, file_size, file_size)
|
||||
|
||||
return upload_result
|
||||
|
||||
|
||||
def upload_batch(
|
||||
client: Any, # DeltaGliderClient
|
||||
files: list[str | Path],
|
||||
s3_prefix: str,
|
||||
max_ratio: float = 0.5,
|
||||
progress_callback: Callable[[str, int, int], None] | None = None,
|
||||
) -> list[UploadSummary]:
|
||||
"""Upload multiple files in batch.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
files: List of local file paths
|
||||
s3_prefix: S3 destination prefix (s3://bucket/prefix/)
|
||||
max_ratio: Maximum acceptable delta/file ratio
|
||||
progress_callback: Callback(filename, current_file_index, total_files)
|
||||
|
||||
Returns:
|
||||
List of UploadSummary objects
|
||||
"""
|
||||
results = []
|
||||
|
||||
for i, file_path in enumerate(files):
|
||||
file_path = Path(file_path)
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(file_path.name, i + 1, len(files))
|
||||
|
||||
# Upload each file
|
||||
s3_url = f"{s3_prefix.rstrip('/')}/{file_path.name}"
|
||||
summary = client.upload(file_path, s3_url, max_ratio=max_ratio)
|
||||
results.append(summary)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def download_batch(
|
||||
client: Any, # DeltaGliderClient
|
||||
s3_urls: list[str],
|
||||
output_dir: str | Path,
|
||||
progress_callback: Callable[[str, int, int], None] | None = None,
|
||||
) -> list[Path]:
|
||||
"""Download multiple files in batch.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
s3_urls: List of S3 URLs to download
|
||||
output_dir: Local directory to save files
|
||||
progress_callback: Callback(filename, current_file_index, total_files)
|
||||
|
||||
Returns:
|
||||
List of downloaded file paths
|
||||
"""
|
||||
output_dir = Path(output_dir)
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
results = []
|
||||
|
||||
for i, s3_url in enumerate(s3_urls):
|
||||
# Extract filename from URL
|
||||
filename = s3_url.split("/")[-1]
|
||||
if filename.endswith(".delta"):
|
||||
filename = filename[:-6] # Remove .delta suffix
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(filename, i + 1, len(s3_urls))
|
||||
|
||||
output_path = output_dir / filename
|
||||
client.download(s3_url, output_path)
|
||||
results.append(output_path)
|
||||
|
||||
return results
|
||||
152
src/deltaglider/client_operations/bucket.py
Normal file
152
src/deltaglider/client_operations/bucket.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""Bucket management operations for DeltaGlider client.
|
||||
|
||||
This module contains boto3-compatible bucket operations:
|
||||
- create_bucket
|
||||
- delete_bucket
|
||||
- list_buckets
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
|
||||
def create_bucket(
|
||||
client: Any, # DeltaGliderClient (avoiding circular import)
|
||||
Bucket: str,
|
||||
CreateBucketConfiguration: dict[str, str] | None = None,
|
||||
**kwargs: Any,
|
||||
) -> dict[str, Any]:
|
||||
"""Create an S3 bucket (boto3-compatible).
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
Bucket: Bucket name to create
|
||||
CreateBucketConfiguration: Optional bucket configuration (e.g., LocationConstraint)
|
||||
**kwargs: Additional S3 parameters (for compatibility)
|
||||
|
||||
Returns:
|
||||
Response dict with bucket location
|
||||
|
||||
Example:
|
||||
>>> client = create_client()
|
||||
>>> client.create_bucket(Bucket='my-bucket')
|
||||
>>> # With region
|
||||
>>> client.create_bucket(
|
||||
... Bucket='my-bucket',
|
||||
... CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
|
||||
... )
|
||||
"""
|
||||
storage_adapter = client.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
params: dict[str, Any] = {"Bucket": Bucket}
|
||||
if CreateBucketConfiguration:
|
||||
params["CreateBucketConfiguration"] = CreateBucketConfiguration
|
||||
|
||||
response = storage_adapter.client.create_bucket(**params)
|
||||
return {
|
||||
"Location": response.get("Location", f"/{Bucket}"),
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
if "BucketAlreadyExists" in error_msg or "BucketAlreadyOwnedByYou" in error_msg:
|
||||
# Bucket already exists - return success
|
||||
client.service.logger.debug(f"Bucket {Bucket} already exists")
|
||||
return {
|
||||
"Location": f"/{Bucket}",
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
raise RuntimeError(f"Failed to create bucket: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket creation")
|
||||
|
||||
|
||||
def delete_bucket(
|
||||
client: Any, # DeltaGliderClient
|
||||
Bucket: str,
|
||||
**kwargs: Any,
|
||||
) -> dict[str, Any]:
|
||||
"""Delete an S3 bucket (boto3-compatible).
|
||||
|
||||
Note: Bucket must be empty before deletion.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
Bucket: Bucket name to delete
|
||||
**kwargs: Additional S3 parameters (for compatibility)
|
||||
|
||||
Returns:
|
||||
Response dict with deletion status
|
||||
|
||||
Example:
|
||||
>>> client = create_client()
|
||||
>>> client.delete_bucket(Bucket='my-bucket')
|
||||
"""
|
||||
storage_adapter = client.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
storage_adapter.client.delete_bucket(Bucket=Bucket)
|
||||
return {
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 204,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
if "NoSuchBucket" in error_msg:
|
||||
# Bucket doesn't exist - return success
|
||||
client.service.logger.debug(f"Bucket {Bucket} does not exist")
|
||||
return {
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 204,
|
||||
},
|
||||
}
|
||||
raise RuntimeError(f"Failed to delete bucket: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket deletion")
|
||||
|
||||
|
||||
def list_buckets(
|
||||
client: Any, # DeltaGliderClient
|
||||
**kwargs: Any,
|
||||
) -> dict[str, Any]:
|
||||
"""List all S3 buckets (boto3-compatible).
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
**kwargs: Additional S3 parameters (for compatibility)
|
||||
|
||||
Returns:
|
||||
Response dict with bucket list
|
||||
|
||||
Example:
|
||||
>>> client = create_client()
|
||||
>>> response = client.list_buckets()
|
||||
>>> for bucket in response['Buckets']:
|
||||
... print(bucket['Name'])
|
||||
"""
|
||||
storage_adapter = client.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
response = storage_adapter.client.list_buckets()
|
||||
return {
|
||||
"Buckets": response.get("Buckets", []),
|
||||
"Owner": response.get("Owner", {}),
|
||||
"ResponseMetadata": {
|
||||
"HTTPStatusCode": 200,
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Failed to list buckets: {e}") from e
|
||||
else:
|
||||
raise NotImplementedError("Storage adapter does not support bucket listing")
|
||||
124
src/deltaglider/client_operations/presigned.py
Normal file
124
src/deltaglider/client_operations/presigned.py
Normal file
@@ -0,0 +1,124 @@
|
||||
"""Presigned URL operations for DeltaGlider client.
|
||||
|
||||
This module contains boto3-compatible presigned URL operations:
|
||||
- generate_presigned_url
|
||||
- generate_presigned_post
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
|
||||
def try_boto3_presigned_operation(
|
||||
client: Any, # DeltaGliderClient
|
||||
operation: str,
|
||||
**kwargs: Any,
|
||||
) -> Any | None:
|
||||
"""Try to generate presigned operation using boto3 client, return None if not available."""
|
||||
storage_adapter = client.service.storage
|
||||
|
||||
# Check if storage adapter has boto3 client
|
||||
if hasattr(storage_adapter, "client"):
|
||||
try:
|
||||
if operation == "url":
|
||||
return str(storage_adapter.client.generate_presigned_url(**kwargs))
|
||||
elif operation == "post":
|
||||
return dict(storage_adapter.client.generate_presigned_post(**kwargs))
|
||||
except AttributeError:
|
||||
# storage_adapter does not have a 'client' attribute
|
||||
pass
|
||||
except Exception as e:
|
||||
# Fall back to manual construction if needed
|
||||
client.service.logger.warning(f"Failed to generate presigned {operation}: {e}")
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def generate_presigned_url(
|
||||
client: Any, # DeltaGliderClient
|
||||
ClientMethod: str,
|
||||
Params: dict[str, Any],
|
||||
ExpiresIn: int = 3600,
|
||||
) -> str:
|
||||
"""Generate presigned URL (boto3-compatible).
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
ClientMethod: Method name ('get_object' or 'put_object')
|
||||
Params: Parameters dict with Bucket and Key
|
||||
ExpiresIn: URL expiration in seconds
|
||||
|
||||
Returns:
|
||||
Presigned URL string
|
||||
"""
|
||||
# Try boto3 first, fallback to manual construction
|
||||
url = try_boto3_presigned_operation(
|
||||
client,
|
||||
"url",
|
||||
ClientMethod=ClientMethod,
|
||||
Params=Params,
|
||||
ExpiresIn=ExpiresIn,
|
||||
)
|
||||
if url is not None:
|
||||
return str(url)
|
||||
|
||||
# Fallback: construct URL manually (less secure, for dev/testing only)
|
||||
bucket = Params.get("Bucket", "")
|
||||
key = Params.get("Key", "")
|
||||
|
||||
if client.endpoint_url:
|
||||
base_url = client.endpoint_url
|
||||
else:
|
||||
base_url = f"https://{bucket}.s3.amazonaws.com"
|
||||
|
||||
# Warning: This is not a real presigned URL, just a placeholder
|
||||
client.service.logger.warning("Using placeholder presigned URL - not suitable for production")
|
||||
return f"{base_url}/{key}?expires={ExpiresIn}"
|
||||
|
||||
|
||||
def generate_presigned_post(
|
||||
client: Any, # DeltaGliderClient
|
||||
Bucket: str,
|
||||
Key: str,
|
||||
Fields: dict[str, str] | None = None,
|
||||
Conditions: list[Any] | None = None,
|
||||
ExpiresIn: int = 3600,
|
||||
) -> dict[str, Any]:
|
||||
"""Generate presigned POST data for HTML forms (boto3-compatible).
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
Bucket: S3 bucket name
|
||||
Key: Object key
|
||||
Fields: Additional fields to include
|
||||
Conditions: Upload conditions
|
||||
ExpiresIn: URL expiration in seconds
|
||||
|
||||
Returns:
|
||||
Dict with 'url' and 'fields' for form submission
|
||||
"""
|
||||
# Try boto3 first, fallback to manual construction
|
||||
response = try_boto3_presigned_operation(
|
||||
client,
|
||||
"post",
|
||||
Bucket=Bucket,
|
||||
Key=Key,
|
||||
Fields=Fields,
|
||||
Conditions=Conditions,
|
||||
ExpiresIn=ExpiresIn,
|
||||
)
|
||||
if response is not None:
|
||||
return dict(response)
|
||||
|
||||
# Fallback: return minimal structure for compatibility
|
||||
if client.endpoint_url:
|
||||
url = f"{client.endpoint_url}/{Bucket}"
|
||||
else:
|
||||
url = f"https://{Bucket}.s3.amazonaws.com"
|
||||
|
||||
return {
|
||||
"url": url,
|
||||
"fields": {
|
||||
"key": Key,
|
||||
**(Fields or {}),
|
||||
},
|
||||
}
|
||||
337
src/deltaglider/client_operations/stats.py
Normal file
337
src/deltaglider/client_operations/stats.py
Normal file
@@ -0,0 +1,337 @@
|
||||
"""Statistics and analysis operations for DeltaGlider client.
|
||||
|
||||
This module contains DeltaGlider-specific statistics operations:
|
||||
- get_bucket_stats
|
||||
- get_object_info
|
||||
- estimate_compression
|
||||
- find_similar_files
|
||||
"""
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from ..client_models import BucketStats, CompressionEstimate, ObjectInfo
|
||||
|
||||
|
||||
def get_object_info(
|
||||
client: Any, # DeltaGliderClient
|
||||
s3_url: str,
|
||||
) -> ObjectInfo:
|
||||
"""Get detailed object information including compression stats.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
s3_url: S3 URL of the object
|
||||
|
||||
Returns:
|
||||
ObjectInfo with detailed metadata
|
||||
"""
|
||||
# Parse URL
|
||||
if not s3_url.startswith("s3://"):
|
||||
raise ValueError(f"Invalid S3 URL: {s3_url}")
|
||||
|
||||
s3_path = s3_url[5:]
|
||||
parts = s3_path.split("/", 1)
|
||||
bucket = parts[0]
|
||||
key = parts[1] if len(parts) > 1 else ""
|
||||
|
||||
# Get object metadata
|
||||
obj_head = client.service.storage.head(f"{bucket}/{key}")
|
||||
if not obj_head:
|
||||
raise FileNotFoundError(f"Object not found: {s3_url}")
|
||||
|
||||
metadata = obj_head.metadata
|
||||
is_delta = key.endswith(".delta")
|
||||
|
||||
return ObjectInfo(
|
||||
key=key,
|
||||
size=obj_head.size,
|
||||
last_modified=metadata.get("last_modified", ""),
|
||||
etag=metadata.get("etag"),
|
||||
original_size=int(metadata.get("file_size", obj_head.size)),
|
||||
compressed_size=obj_head.size,
|
||||
compression_ratio=float(metadata.get("compression_ratio", 0.0)),
|
||||
is_delta=is_delta,
|
||||
reference_key=metadata.get("ref_key"),
|
||||
)
|
||||
|
||||
|
||||
def get_bucket_stats(
|
||||
client: Any, # DeltaGliderClient
|
||||
bucket: str,
|
||||
detailed_stats: bool = False,
|
||||
) -> BucketStats:
|
||||
"""Get statistics for a bucket with optional detailed compression metrics.
|
||||
|
||||
This method provides two modes:
|
||||
- Quick stats (default): Fast overview using LIST only (~50ms)
|
||||
- Detailed stats: Accurate compression metrics with HEAD requests (slower)
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
bucket: S3 bucket name
|
||||
detailed_stats: If True, fetch accurate compression ratios for delta files (default: False)
|
||||
|
||||
Returns:
|
||||
BucketStats with compression and space savings info
|
||||
|
||||
Performance:
|
||||
- With detailed_stats=False: ~50ms for any bucket size (1 LIST call per 1000 objects)
|
||||
- With detailed_stats=True: ~2-3s per 1000 objects (adds HEAD calls for delta files only)
|
||||
|
||||
Example:
|
||||
# Quick stats for dashboard display
|
||||
stats = client.get_bucket_stats('releases')
|
||||
print(f"Objects: {stats.object_count}, Size: {stats.total_size}")
|
||||
|
||||
# Detailed stats for analytics (slower but accurate)
|
||||
stats = client.get_bucket_stats('releases', detailed_stats=True)
|
||||
print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
|
||||
"""
|
||||
# List all objects with smart metadata fetching
|
||||
all_objects = []
|
||||
continuation_token = None
|
||||
|
||||
while True:
|
||||
response = client.list_objects(
|
||||
Bucket=bucket,
|
||||
MaxKeys=1000,
|
||||
ContinuationToken=continuation_token,
|
||||
FetchMetadata=detailed_stats, # Only fetch metadata if detailed stats requested
|
||||
)
|
||||
|
||||
# Extract S3Objects from response (with Metadata containing DeltaGlider info)
|
||||
for obj_dict in response["Contents"]:
|
||||
# Convert dict back to ObjectInfo for backward compatibility with stats calculation
|
||||
metadata = obj_dict.get("Metadata", {})
|
||||
# Parse compression ratio safely (handle "unknown" value)
|
||||
compression_ratio_str = metadata.get("deltaglider-compression-ratio", "0.0")
|
||||
try:
|
||||
compression_ratio = (
|
||||
float(compression_ratio_str) if compression_ratio_str != "unknown" else 0.0
|
||||
)
|
||||
except ValueError:
|
||||
compression_ratio = 0.0
|
||||
|
||||
all_objects.append(
|
||||
ObjectInfo(
|
||||
key=obj_dict["Key"],
|
||||
size=obj_dict["Size"],
|
||||
last_modified=obj_dict.get("LastModified", ""),
|
||||
etag=obj_dict.get("ETag"),
|
||||
storage_class=obj_dict.get("StorageClass", "STANDARD"),
|
||||
original_size=int(metadata.get("deltaglider-original-size", obj_dict["Size"])),
|
||||
compressed_size=obj_dict["Size"],
|
||||
is_delta=metadata.get("deltaglider-is-delta", "false") == "true",
|
||||
compression_ratio=compression_ratio,
|
||||
reference_key=metadata.get("deltaglider-reference-key"),
|
||||
)
|
||||
)
|
||||
|
||||
if not response.get("IsTruncated"):
|
||||
break
|
||||
|
||||
continuation_token = response.get("NextContinuationToken")
|
||||
|
||||
# Calculate statistics
|
||||
total_size = 0
|
||||
compressed_size = 0
|
||||
delta_count = 0
|
||||
direct_count = 0
|
||||
|
||||
for obj in all_objects:
|
||||
# Skip reference.bin files - they are internal implementation details
|
||||
# and their size is already accounted for in delta metadata
|
||||
if obj.key.endswith("/reference.bin") or obj.key == "reference.bin":
|
||||
continue
|
||||
|
||||
compressed_size += obj.size
|
||||
|
||||
if obj.is_delta:
|
||||
delta_count += 1
|
||||
# Use actual original size if we have it, otherwise estimate
|
||||
total_size += obj.original_size or obj.size
|
||||
else:
|
||||
direct_count += 1
|
||||
# For non-delta files, original equals compressed
|
||||
total_size += obj.size
|
||||
|
||||
space_saved = total_size - compressed_size
|
||||
avg_ratio = (space_saved / total_size) if total_size > 0 else 0.0
|
||||
|
||||
return BucketStats(
|
||||
bucket=bucket,
|
||||
object_count=len(all_objects),
|
||||
total_size=total_size,
|
||||
compressed_size=compressed_size,
|
||||
space_saved=space_saved,
|
||||
average_compression_ratio=avg_ratio,
|
||||
delta_objects=delta_count,
|
||||
direct_objects=direct_count,
|
||||
)
|
||||
|
||||
|
||||
def estimate_compression(
|
||||
client: Any, # DeltaGliderClient
|
||||
file_path: str | Path,
|
||||
bucket: str,
|
||||
prefix: str = "",
|
||||
sample_size: int = 1024 * 1024,
|
||||
) -> CompressionEstimate:
|
||||
"""Estimate compression ratio before upload.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
file_path: Local file to estimate
|
||||
bucket: Target bucket
|
||||
prefix: Target prefix (for finding similar files)
|
||||
sample_size: Bytes to sample for estimation (default 1MB)
|
||||
|
||||
Returns:
|
||||
CompressionEstimate with predicted compression
|
||||
"""
|
||||
file_path = Path(file_path)
|
||||
file_size = file_path.stat().st_size
|
||||
|
||||
# Check file extension
|
||||
ext = file_path.suffix.lower()
|
||||
delta_extensions = {
|
||||
".zip",
|
||||
".tar",
|
||||
".gz",
|
||||
".tar.gz",
|
||||
".tgz",
|
||||
".bz2",
|
||||
".tar.bz2",
|
||||
".xz",
|
||||
".tar.xz",
|
||||
".7z",
|
||||
".rar",
|
||||
".dmg",
|
||||
".iso",
|
||||
".pkg",
|
||||
".deb",
|
||||
".rpm",
|
||||
".apk",
|
||||
".jar",
|
||||
".war",
|
||||
".ear",
|
||||
}
|
||||
|
||||
# Already compressed formats that won't benefit from delta
|
||||
incompressible = {".jpg", ".jpeg", ".png", ".mp4", ".mp3", ".avi", ".mov"}
|
||||
|
||||
if ext in incompressible:
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=file_size,
|
||||
estimated_ratio=0.0,
|
||||
confidence=0.95,
|
||||
should_use_delta=False,
|
||||
)
|
||||
|
||||
if ext not in delta_extensions:
|
||||
# Unknown type, conservative estimate
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=file_size,
|
||||
estimated_ratio=0.0,
|
||||
confidence=0.5,
|
||||
should_use_delta=file_size > 1024 * 1024, # Only for files > 1MB
|
||||
)
|
||||
|
||||
# Look for similar files in the target location
|
||||
similar_files = find_similar_files(client, bucket, prefix, file_path.name)
|
||||
|
||||
if similar_files:
|
||||
# If we have similar files, estimate high compression
|
||||
estimated_ratio = 0.99 # 99% compression typical for similar versions
|
||||
confidence = 0.9
|
||||
recommended_ref = similar_files[0]["Key"] if similar_files else None
|
||||
else:
|
||||
# First file of its type
|
||||
estimated_ratio = 0.0
|
||||
confidence = 0.7
|
||||
recommended_ref = None
|
||||
|
||||
estimated_size = int(file_size * (1 - estimated_ratio))
|
||||
|
||||
return CompressionEstimate(
|
||||
original_size=file_size,
|
||||
estimated_compressed_size=estimated_size,
|
||||
estimated_ratio=estimated_ratio,
|
||||
confidence=confidence,
|
||||
recommended_reference=recommended_ref,
|
||||
should_use_delta=True,
|
||||
)
|
||||
|
||||
|
||||
def find_similar_files(
|
||||
client: Any, # DeltaGliderClient
|
||||
bucket: str,
|
||||
prefix: str,
|
||||
filename: str,
|
||||
limit: int = 5,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Find similar files that could serve as references.
|
||||
|
||||
Args:
|
||||
client: DeltaGliderClient instance
|
||||
bucket: S3 bucket
|
||||
prefix: Prefix to search in
|
||||
filename: Filename to match against
|
||||
limit: Maximum number of results
|
||||
|
||||
Returns:
|
||||
List of similar files with scores
|
||||
"""
|
||||
# List objects in the prefix (no metadata needed for similarity check)
|
||||
response = client.list_objects(
|
||||
Bucket=bucket,
|
||||
Prefix=prefix,
|
||||
MaxKeys=1000,
|
||||
FetchMetadata=False, # Don't need metadata for similarity
|
||||
)
|
||||
|
||||
similar: list[dict[str, Any]] = []
|
||||
base_name = Path(filename).stem
|
||||
ext = Path(filename).suffix
|
||||
|
||||
for obj in response["Contents"]:
|
||||
obj_key = obj["Key"]
|
||||
obj_base = Path(obj_key).stem
|
||||
obj_ext = Path(obj_key).suffix
|
||||
|
||||
# Skip delta files and references
|
||||
if obj_key.endswith(".delta") or obj_key.endswith("reference.bin"):
|
||||
continue
|
||||
|
||||
score = 0.0
|
||||
|
||||
# Extension match
|
||||
if ext == obj_ext:
|
||||
score += 0.5
|
||||
|
||||
# Base name similarity
|
||||
if base_name in obj_base or obj_base in base_name:
|
||||
score += 0.3
|
||||
|
||||
# Version pattern match
|
||||
if re.search(r"v?\d+[\.\d]*", base_name) and re.search(r"v?\d+[\.\d]*", obj_base):
|
||||
score += 0.2
|
||||
|
||||
if score > 0.5:
|
||||
similar.append(
|
||||
{
|
||||
"Key": obj_key,
|
||||
"Size": obj["Size"],
|
||||
"Similarity": score,
|
||||
"LastModified": obj["LastModified"],
|
||||
}
|
||||
)
|
||||
|
||||
# Sort by similarity
|
||||
similar.sort(key=lambda x: x["Similarity"], reverse=True) # type: ignore
|
||||
|
||||
return similar[:limit]
|
||||
152
src/deltaglider/response_builders.py
Normal file
152
src/deltaglider/response_builders.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""Type-safe response builders using TypedDicts for internal type safety.
|
||||
|
||||
This module provides builder functions that construct boto3-compatible responses
|
||||
with full compile-time type validation using TypedDicts. At runtime, TypedDicts
|
||||
are plain dicts, so there's no conversion overhead.
|
||||
|
||||
Benefits:
|
||||
- Field name typos caught by mypy (e.g., "HTTPStatusCode" → "HttpStatusCode")
|
||||
- Wrong types caught by mypy (e.g., string instead of int)
|
||||
- Missing required fields caught by mypy
|
||||
- Extra unknown fields caught by mypy
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
from .types import (
|
||||
CommonPrefix,
|
||||
DeleteObjectResponse,
|
||||
GetObjectResponse,
|
||||
ListObjectsV2Response,
|
||||
PutObjectResponse,
|
||||
ResponseMetadata,
|
||||
S3Object,
|
||||
)
|
||||
|
||||
|
||||
def build_response_metadata(status_code: int = 200) -> ResponseMetadata:
|
||||
"""Build ResponseMetadata with full type safety via TypedDict.
|
||||
|
||||
TypedDict is a dict at runtime - no conversion needed!
|
||||
mypy validates all fields match ResponseMetadata TypedDict.
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
"""
|
||||
# Build as TypedDict - mypy validates field names and types!
|
||||
metadata: ResponseMetadata = {
|
||||
"HTTPStatusCode": status_code,
|
||||
# All other fields are NotRequired - can be omitted!
|
||||
}
|
||||
return metadata # Returns dict at runtime, ResponseMetadata type at compile-time
|
||||
|
||||
|
||||
def build_put_response(
|
||||
etag: str,
|
||||
*,
|
||||
version_id: str | None = None,
|
||||
deltaglider_info: dict[str, Any] | None = None,
|
||||
) -> PutObjectResponse:
|
||||
"""Build PutObjectResponse with full type safety via TypedDict.
|
||||
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
mypy validates all field names, types, and structure.
|
||||
"""
|
||||
# Build as TypedDict - mypy catches typos and type errors!
|
||||
response: PutObjectResponse = {
|
||||
"ETag": etag,
|
||||
"ResponseMetadata": build_response_metadata(),
|
||||
}
|
||||
|
||||
if version_id:
|
||||
response["VersionId"] = version_id
|
||||
|
||||
# DeltaGlider extension - add as Any field
|
||||
if deltaglider_info:
|
||||
response["DeltaGliderInfo"] = deltaglider_info # type: ignore[typeddict-item]
|
||||
|
||||
return response # Returns dict at runtime, PutObjectResponse type at compile-time
|
||||
|
||||
|
||||
def build_get_response(
|
||||
body: Any,
|
||||
content_length: int,
|
||||
etag: str,
|
||||
metadata: dict[str, Any],
|
||||
) -> GetObjectResponse:
|
||||
"""Build GetObjectResponse with full type safety via TypedDict.
|
||||
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
mypy validates all field names, types, and structure.
|
||||
"""
|
||||
# Build as TypedDict - mypy catches typos and type errors!
|
||||
response: GetObjectResponse = {
|
||||
"Body": body,
|
||||
"ContentLength": content_length,
|
||||
"ETag": etag,
|
||||
"Metadata": metadata,
|
||||
"ResponseMetadata": build_response_metadata(),
|
||||
}
|
||||
return response # Returns dict at runtime, GetObjectResponse type at compile-time
|
||||
|
||||
|
||||
def build_list_objects_response(
|
||||
bucket: str,
|
||||
prefix: str,
|
||||
delimiter: str,
|
||||
max_keys: int,
|
||||
contents: list[S3Object],
|
||||
common_prefixes: list[CommonPrefix] | None,
|
||||
is_truncated: bool,
|
||||
next_continuation_token: str | None,
|
||||
continuation_token: str | None,
|
||||
) -> ListObjectsV2Response:
|
||||
"""Build ListObjectsV2Response with full type safety via TypedDict.
|
||||
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
mypy validates all field names, types, and structure.
|
||||
"""
|
||||
# Build as TypedDict - mypy catches typos and type errors!
|
||||
response: ListObjectsV2Response = {
|
||||
"IsTruncated": is_truncated,
|
||||
"Contents": contents,
|
||||
"Name": bucket,
|
||||
"Prefix": prefix,
|
||||
"Delimiter": delimiter,
|
||||
"MaxKeys": max_keys,
|
||||
"KeyCount": len(contents),
|
||||
"ResponseMetadata": build_response_metadata(),
|
||||
}
|
||||
|
||||
# Add optional fields
|
||||
if common_prefixes:
|
||||
response["CommonPrefixes"] = common_prefixes
|
||||
|
||||
if next_continuation_token:
|
||||
response["NextContinuationToken"] = next_continuation_token
|
||||
|
||||
if continuation_token:
|
||||
response["ContinuationToken"] = continuation_token
|
||||
|
||||
return response # Returns dict at runtime, ListObjectsV2Response type at compile-time
|
||||
|
||||
|
||||
def build_delete_response(
|
||||
delete_marker: bool = False,
|
||||
status_code: int = 204,
|
||||
deltaglider_info: dict[str, Any] | None = None,
|
||||
) -> DeleteObjectResponse:
|
||||
"""Build DeleteObjectResponse with full type safety via TypedDict.
|
||||
|
||||
Uses our types.py TypedDict which has proper NotRequired fields.
|
||||
mypy validates all field names, types, and structure.
|
||||
"""
|
||||
# Build as TypedDict - mypy catches typos and type errors!
|
||||
response: DeleteObjectResponse = {
|
||||
"DeleteMarker": delete_marker,
|
||||
"ResponseMetadata": build_response_metadata(status_code),
|
||||
}
|
||||
|
||||
# DeltaGlider extension
|
||||
if deltaglider_info:
|
||||
response["DeltaGliderInfo"] = deltaglider_info # type: ignore[typeddict-item]
|
||||
|
||||
return response # Returns dict at runtime, DeleteObjectResponse type at compile-time
|
||||
355
src/deltaglider/types.py
Normal file
355
src/deltaglider/types.py
Normal file
@@ -0,0 +1,355 @@
|
||||
"""Type definitions for boto3-compatible responses.
|
||||
|
||||
These TypedDict definitions provide type hints for DeltaGlider's boto3-compatible
|
||||
responses. All methods return plain `dict[str, Any]` at runtime for maximum
|
||||
flexibility and boto3 compatibility.
|
||||
|
||||
## Basic Usage (Recommended)
|
||||
|
||||
Use DeltaGlider with simple dict access - no type imports needed:
|
||||
|
||||
```python
|
||||
from deltaglider import create_client
|
||||
|
||||
client = create_client()
|
||||
|
||||
# Returns plain dict - 100% boto3 compatible
|
||||
response = client.put_object(Bucket='my-bucket', Key='file.zip', Body=data)
|
||||
print(response['ETag'])
|
||||
|
||||
# List objects with dict access
|
||||
listing = client.list_objects(Bucket='my-bucket')
|
||||
for obj in listing['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
```
|
||||
|
||||
## Optional Type Hints
|
||||
|
||||
For IDE autocomplete and type checking, you can use our convenience TypedDicts:
|
||||
|
||||
```python
|
||||
from deltaglider import create_client
|
||||
from deltaglider.types import PutObjectResponse, ListObjectsV2Response
|
||||
|
||||
client = create_client()
|
||||
response: PutObjectResponse = client.put_object(...) # IDE autocomplete
|
||||
listing: ListObjectsV2Response = client.list_objects(...)
|
||||
```
|
||||
|
||||
## Advanced: boto3-stubs Integration
|
||||
|
||||
For strictest type checking (requires boto3-stubs installation):
|
||||
|
||||
```bash
|
||||
pip install boto3-stubs[s3]
|
||||
```
|
||||
|
||||
```python
|
||||
from mypy_boto3_s3.type_defs import PutObjectOutputTypeDef
|
||||
response: PutObjectOutputTypeDef = client.put_object(...)
|
||||
```
|
||||
|
||||
**Note**: boto3-stubs TypedDefs are very strict and require ALL optional fields.
|
||||
DeltaGlider returns partial dicts for better boto3 compatibility, so boto3-stubs
|
||||
types may show false positive errors. Use `dict[str, Any]` or our TypedDicts instead.
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
DeltaGlider returns `dict[str, Any]` from all boto3-compatible methods because:
|
||||
1. **Flexibility**: boto3 responses vary by service and operation
|
||||
2. **Compatibility**: Exact match with boto3 runtime behavior
|
||||
3. **Simplicity**: No complex type dependencies for users
|
||||
4. **Optional Typing**: Users choose their preferred level of type safety
|
||||
"""
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any, Literal, NotRequired, TypedDict
|
||||
|
||||
# ============================================================================
|
||||
# S3 Object Types
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class S3Object(TypedDict):
|
||||
"""An S3 object returned in list operations.
|
||||
|
||||
Compatible with boto3's S3.Client.list_objects_v2() response Contents.
|
||||
"""
|
||||
|
||||
Key: str
|
||||
Size: int
|
||||
LastModified: datetime
|
||||
ETag: NotRequired[str]
|
||||
StorageClass: NotRequired[str]
|
||||
Owner: NotRequired[dict[str, str]]
|
||||
Metadata: NotRequired[dict[str, str]]
|
||||
|
||||
|
||||
class CommonPrefix(TypedDict):
|
||||
"""A common prefix (directory) in S3 listing.
|
||||
|
||||
Compatible with boto3's S3.Client.list_objects_v2() response CommonPrefixes.
|
||||
"""
|
||||
|
||||
Prefix: str
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Response Metadata (used in all responses)
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class ResponseMetadata(TypedDict):
|
||||
"""Metadata about the API response.
|
||||
|
||||
Compatible with all boto3 responses.
|
||||
"""
|
||||
|
||||
RequestId: NotRequired[str]
|
||||
HostId: NotRequired[str]
|
||||
HTTPStatusCode: int
|
||||
HTTPHeaders: NotRequired[dict[str, str]]
|
||||
RetryAttempts: NotRequired[int]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# List Operations Response Types
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class ListObjectsV2Response(TypedDict):
|
||||
"""Response from list_objects_v2 operation.
|
||||
|
||||
100% compatible with boto3's S3.Client.list_objects_v2() response.
|
||||
|
||||
Example:
|
||||
```python
|
||||
client = create_client()
|
||||
response: ListObjectsV2Response = client.list_objects(
|
||||
Bucket='my-bucket',
|
||||
Prefix='path/',
|
||||
Delimiter='/'
|
||||
)
|
||||
|
||||
for obj in response['Contents']:
|
||||
print(f"{obj['Key']}: {obj['Size']} bytes")
|
||||
|
||||
for prefix in response.get('CommonPrefixes', []):
|
||||
print(f"Directory: {prefix['Prefix']}")
|
||||
```
|
||||
"""
|
||||
|
||||
Contents: list[S3Object]
|
||||
Name: NotRequired[str] # Bucket name
|
||||
Prefix: NotRequired[str]
|
||||
Delimiter: NotRequired[str]
|
||||
MaxKeys: NotRequired[int]
|
||||
CommonPrefixes: NotRequired[list[CommonPrefix]]
|
||||
EncodingType: NotRequired[str]
|
||||
KeyCount: NotRequired[int]
|
||||
ContinuationToken: NotRequired[str]
|
||||
NextContinuationToken: NotRequired[str]
|
||||
StartAfter: NotRequired[str]
|
||||
IsTruncated: NotRequired[bool]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Put/Get/Delete Response Types
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class PutObjectResponse(TypedDict):
|
||||
"""Response from put_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.put_object() response.
|
||||
"""
|
||||
|
||||
ETag: str
|
||||
VersionId: NotRequired[str]
|
||||
ServerSideEncryption: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
class GetObjectResponse(TypedDict):
|
||||
"""Response from get_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.get_object() response.
|
||||
"""
|
||||
|
||||
Body: Any # StreamingBody in boto3, bytes in DeltaGlider
|
||||
ContentLength: int
|
||||
ContentType: NotRequired[str]
|
||||
ETag: NotRequired[str]
|
||||
LastModified: NotRequired[datetime]
|
||||
Metadata: NotRequired[dict[str, str]]
|
||||
VersionId: NotRequired[str]
|
||||
StorageClass: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
class DeleteObjectResponse(TypedDict):
|
||||
"""Response from delete_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.delete_object() response.
|
||||
"""
|
||||
|
||||
DeleteMarker: NotRequired[bool]
|
||||
VersionId: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
class DeletedObject(TypedDict):
|
||||
"""A successfully deleted object.
|
||||
|
||||
Compatible with boto3's S3.Client.delete_objects() response Deleted.
|
||||
"""
|
||||
|
||||
Key: str
|
||||
VersionId: NotRequired[str]
|
||||
DeleteMarker: NotRequired[bool]
|
||||
DeleteMarkerVersionId: NotRequired[str]
|
||||
|
||||
|
||||
class DeleteError(TypedDict):
|
||||
"""An error that occurred during deletion.
|
||||
|
||||
Compatible with boto3's S3.Client.delete_objects() response Errors.
|
||||
"""
|
||||
|
||||
Key: str
|
||||
Code: str
|
||||
Message: str
|
||||
VersionId: NotRequired[str]
|
||||
|
||||
|
||||
class DeleteObjectsResponse(TypedDict):
|
||||
"""Response from delete_objects operation.
|
||||
|
||||
Compatible with boto3's S3.Client.delete_objects() response.
|
||||
"""
|
||||
|
||||
Deleted: NotRequired[list[DeletedObject]]
|
||||
Errors: NotRequired[list[DeleteError]]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Head Object Response
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class HeadObjectResponse(TypedDict):
|
||||
"""Response from head_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.head_object() response.
|
||||
"""
|
||||
|
||||
ContentLength: int
|
||||
ContentType: NotRequired[str]
|
||||
ETag: NotRequired[str]
|
||||
LastModified: NotRequired[datetime]
|
||||
Metadata: NotRequired[dict[str, str]]
|
||||
VersionId: NotRequired[str]
|
||||
StorageClass: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Bucket Operations
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class Bucket(TypedDict):
|
||||
"""An S3 bucket.
|
||||
|
||||
Compatible with boto3's S3.Client.list_buckets() response Buckets.
|
||||
"""
|
||||
|
||||
Name: str
|
||||
CreationDate: datetime
|
||||
|
||||
|
||||
class ListBucketsResponse(TypedDict):
|
||||
"""Response from list_buckets operation.
|
||||
|
||||
Compatible with boto3's S3.Client.list_buckets() response.
|
||||
"""
|
||||
|
||||
Buckets: list[Bucket]
|
||||
Owner: NotRequired[dict[str, str]]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
class CreateBucketResponse(TypedDict):
|
||||
"""Response from create_bucket operation.
|
||||
|
||||
Compatible with boto3's S3.Client.create_bucket() response.
|
||||
"""
|
||||
|
||||
Location: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Multipart Upload Types
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class CompletedPart(TypedDict):
|
||||
"""A completed part in a multipart upload."""
|
||||
|
||||
PartNumber: int
|
||||
ETag: str
|
||||
|
||||
|
||||
class CompleteMultipartUploadResponse(TypedDict):
|
||||
"""Response from complete_multipart_upload operation."""
|
||||
|
||||
Location: NotRequired[str]
|
||||
Bucket: NotRequired[str]
|
||||
Key: NotRequired[str]
|
||||
ETag: NotRequired[str]
|
||||
VersionId: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Copy Operations
|
||||
# ============================================================================
|
||||
|
||||
|
||||
class CopyObjectResponse(TypedDict):
|
||||
"""Response from copy_object operation.
|
||||
|
||||
Compatible with boto3's S3.Client.copy_object() response.
|
||||
"""
|
||||
|
||||
CopyObjectResult: NotRequired[dict[str, Any]]
|
||||
ETag: NotRequired[str]
|
||||
LastModified: NotRequired[datetime]
|
||||
VersionId: NotRequired[str]
|
||||
ResponseMetadata: NotRequired[ResponseMetadata]
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Type Aliases for Convenience
|
||||
# ============================================================================
|
||||
|
||||
# Common parameter types
|
||||
BucketName = str
|
||||
ObjectKey = str
|
||||
Prefix = str
|
||||
Delimiter = str
|
||||
|
||||
# Storage class options
|
||||
StorageClass = Literal[
|
||||
"STANDARD",
|
||||
"REDUCED_REDUNDANCY",
|
||||
"STANDARD_IA",
|
||||
"ONEZONE_IA",
|
||||
"INTELLIGENT_TIERING",
|
||||
"GLACIER",
|
||||
"DEEP_ARCHIVE",
|
||||
"GLACIER_IR",
|
||||
]
|
||||
@@ -10,7 +10,6 @@ from deltaglider import create_client
|
||||
from deltaglider.client import (
|
||||
BucketStats,
|
||||
CompressionEstimate,
|
||||
ListObjectsResponse,
|
||||
ObjectInfo,
|
||||
)
|
||||
|
||||
@@ -279,27 +278,35 @@ class TestBoto3Compatibility:
|
||||
assert response["ContentLength"] == len(content)
|
||||
|
||||
def test_list_objects(self, client):
|
||||
"""Test list_objects with various options."""
|
||||
"""Test list_objects with various options (boto3-compatible dict response)."""
|
||||
# List all objects (default: FetchMetadata=False)
|
||||
response = client.list_objects(Bucket="test-bucket")
|
||||
|
||||
assert isinstance(response, ListObjectsResponse)
|
||||
assert response.key_count > 0
|
||||
assert len(response.contents) > 0
|
||||
# Response is now a boto3-compatible dict (not ListObjectsResponse)
|
||||
assert isinstance(response, dict)
|
||||
assert response["KeyCount"] > 0
|
||||
assert len(response["Contents"]) > 0
|
||||
|
||||
# Verify S3Object structure
|
||||
for obj in response["Contents"]:
|
||||
assert "Key" in obj
|
||||
assert "Size" in obj
|
||||
assert "LastModified" in obj
|
||||
assert "Metadata" in obj # DeltaGlider metadata
|
||||
|
||||
# Test with FetchMetadata=True (should only affect delta files)
|
||||
response_with_metadata = client.list_objects(Bucket="test-bucket", FetchMetadata=True)
|
||||
assert isinstance(response_with_metadata, ListObjectsResponse)
|
||||
assert response_with_metadata.key_count > 0
|
||||
assert isinstance(response_with_metadata, dict)
|
||||
assert response_with_metadata["KeyCount"] > 0
|
||||
|
||||
def test_list_objects_with_delimiter(self, client):
|
||||
"""Test list_objects with delimiter for folder simulation."""
|
||||
"""Test list_objects with delimiter for folder simulation (boto3-compatible dict response)."""
|
||||
response = client.list_objects(Bucket="test-bucket", Prefix="", Delimiter="/")
|
||||
|
||||
# Should have common prefixes for folders
|
||||
assert len(response.common_prefixes) > 0
|
||||
assert {"Prefix": "folder1/"} in response.common_prefixes
|
||||
assert {"Prefix": "folder2/"} in response.common_prefixes
|
||||
assert len(response.get("CommonPrefixes", [])) > 0
|
||||
assert {"Prefix": "folder1/"} in response["CommonPrefixes"]
|
||||
assert {"Prefix": "folder2/"} in response["CommonPrefixes"]
|
||||
|
||||
def test_delete_object(self, client):
|
||||
"""Test delete_object."""
|
||||
|
||||
@@ -53,8 +53,11 @@ class TestSDKFiltering:
|
||||
client = DeltaGliderClient(service)
|
||||
response = client.list_objects(Bucket="test-bucket", Prefix="releases/")
|
||||
|
||||
# Response is now a boto3-compatible dict
|
||||
contents = response["Contents"]
|
||||
|
||||
# Verify .delta suffix is stripped
|
||||
keys = [obj.key for obj in response.contents]
|
||||
keys = [obj["Key"] for obj in contents]
|
||||
assert "releases/app-v1.zip" in keys
|
||||
assert "releases/app-v2.zip" in keys
|
||||
assert "releases/README.md" in keys
|
||||
@@ -63,8 +66,10 @@ class TestSDKFiltering:
|
||||
for key in keys:
|
||||
assert not key.endswith(".delta"), f"Found .delta suffix in: {key}"
|
||||
|
||||
# Verify is_delta flag is set correctly
|
||||
delta_objects = [obj for obj in response.contents if obj.is_delta]
|
||||
# Verify is_delta flag is set correctly in Metadata
|
||||
delta_objects = [
|
||||
obj for obj in contents if obj.get("Metadata", {}).get("deltaglider-is-delta") == "true"
|
||||
]
|
||||
assert len(delta_objects) == 2
|
||||
|
||||
def test_list_objects_filters_reference_bin(self):
|
||||
@@ -106,15 +111,18 @@ class TestSDKFiltering:
|
||||
client = DeltaGliderClient(service)
|
||||
response = client.list_objects(Bucket="test-bucket", Prefix="releases/")
|
||||
|
||||
# Response is now a boto3-compatible dict
|
||||
contents = response["Contents"]
|
||||
|
||||
# Verify NO reference.bin files in output
|
||||
keys = [obj.key for obj in response.contents]
|
||||
keys = [obj["Key"] for obj in contents]
|
||||
for key in keys:
|
||||
assert not key.endswith("reference.bin"), f"Found reference.bin in: {key}"
|
||||
|
||||
# Should only have the app.zip (with .delta stripped)
|
||||
assert len(response.contents) == 1
|
||||
assert response.contents[0].key == "releases/app.zip"
|
||||
assert response.contents[0].is_delta is True
|
||||
assert len(contents) == 1
|
||||
assert contents[0]["Key"] == "releases/app.zip"
|
||||
assert contents[0].get("Metadata", {}).get("deltaglider-is-delta") == "true"
|
||||
|
||||
def test_list_objects_combined_filtering(self):
|
||||
"""Test filtering of both .delta and reference.bin together."""
|
||||
@@ -170,12 +178,15 @@ class TestSDKFiltering:
|
||||
client = DeltaGliderClient(service)
|
||||
response = client.list_objects(Bucket="test-bucket", Prefix="data/")
|
||||
|
||||
# Response is now a boto3-compatible dict
|
||||
contents = response["Contents"]
|
||||
|
||||
# Should filter out 2 reference.bin files
|
||||
# Should strip .delta from 3 files
|
||||
# Should keep 1 regular file as-is
|
||||
assert len(response.contents) == 4 # 3 deltas + 1 regular file
|
||||
assert len(contents) == 4 # 3 deltas + 1 regular file
|
||||
|
||||
keys = [obj.key for obj in response.contents]
|
||||
keys = [obj["Key"] for obj in contents]
|
||||
expected_keys = ["data/file1.zip", "data/file2.zip", "data/file3.txt", "data/sub/app.jar"]
|
||||
assert sorted(keys) == sorted(expected_keys)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user