10 Commits

Author SHA1 Message Date
Simone Scarduzio
5e3b76791e fix: Exclude reference.bin from bucket stats calculations
reference.bin files are internal implementation details used for delta
compression. Their size was being incorrectly counted in both total_size
and compressed_size, resulting in 0% savings contribution.

Since delta file metadata already contains the original file_size that
the delta represents, including reference.bin would double-count storage.

This fix skips reference.bin files during stats calculation, consistent
with how they're filtered in other parts of the codebase (aws_compat.py,
sync.py, client.py).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-09 22:20:32 +02:00
Simone Scarduzio
fb2877bfd3 docs: Update CHANGELOG.md for v5.0.1 release
- Document code organization improvements
- Note 26% reduction in client.py size
- List new client_operations/ package modules
- Maintain full backward compatibility
- All tests passing, type safety maintained
2025-10-09 08:31:09 +02:00
Simone Scarduzio
88fd1f51cd refactor 2025-10-08 22:27:32 +02:00
Simone Scarduzio
0857e02edd perf: Skip man pages in Docker build to speed up xdelta3 installation
Added dpkg configuration to exclude man pages, docs, and other unnecessary
files during apt-get install. This significantly speeds up Docker builds
by skipping the slow man-db triggers.

Before: ~30-60 seconds processing man pages
After: <5 seconds

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-08 14:43:01 +02:00
Simone Scarduzio
689cf00d02 ruff 2025-10-08 14:39:23 +02:00
Simone Scarduzio
743d52e783 docs: Fix pagination examples in SDK README
Updated docs/sdk/README.md with correct boto3-compatible dict response patterns
for list_objects() pagination and iteration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-08 14:33:47 +02:00
Simone Scarduzio
8bc0a0eaf3 docs: Fix outdated examples and update documentation for boto3-compatible responses
Updated all documentation to reflect the boto3-compatible dict responses:
- Fixed pagination examples in README.md to use dict access
- Updated docs/sdk/api.md with correct list_objects() signature and examples
- Added return type documentation for list_objects()
- Updated CHANGELOG.md with breaking changes and migration info

All examples now use:
- response['Contents'] instead of response.contents
- response.get('IsTruncated') instead of response.is_truncated
- response.get('NextContinuationToken') for pagination

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-08 14:33:03 +02:00
Simone Scarduzio
4cf25e4681 docs: Update vision doc with Phase 2 completion status 2025-10-08 14:24:16 +02:00
Simone Scarduzio
69ed9056d2 feat: Implement boto3-compatible dict responses (Phase 2)
Changed list_objects() to return boto3-compatible dict instead of custom
ListObjectsResponse dataclass. This makes DeltaGlider a true drop-in replacement
for boto3.client('s3').

Changes:
- list_objects() now returns dict[str, Any] with boto3-compatible structure:
  * Contents: list[S3Object] (dict with Key, Size, LastModified, etc.)
  * CommonPrefixes: list[dict] for folder simulation
  * IsTruncated, NextContinuationToken for pagination
  * DeltaGlider metadata stored in standard Metadata field

- Updated all client methods that use list_objects() to work with dict responses:
  * find_similar_files()
  * get_bucket_stats()
  * CLI ls command

- Updated all tests to use dict access (response['Contents']) instead of
  dataclass access (response.contents)

- Updated examples/boto3_compatible_types.py to demonstrate usage

- DeltaGlider-specific metadata now in Metadata field:
  * deltaglider-is-delta: "true"/"false"
  * deltaglider-original-size: string number
  * deltaglider-compression-ratio: string number or "unknown"
  * deltaglider-reference-key: optional string

Benefits:
- True drop-in replacement for boto3
- No learning curve - if you know boto3, you know DeltaGlider
- Works with any boto3-compatible library
- Type safety through TypedDict (no boto3 import needed)
- Zero runtime overhead (TypedDict compiles to plain dict)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-08 14:23:50 +02:00
Simone Scarduzio
38134f28f5 feat: Add boto3-compatible TypedDict types (no boto3 import needed)
Add comprehensive TypedDict definitions for all boto3 S3 response types.
This provides full type safety without requiring boto3 imports in user code.

Benefits:
-  Type safety: IDE autocomplete and mypy type checking
-  No boto3 dependency: Just typing module (stdlib)
-  Runtime compatibility: TypedDict compiles to plain dict
-  Drop-in replacement: Exact same structure as boto3 responses

Types added:
- ListObjectsV2Response, S3Object, CommonPrefix
- PutObjectResponse, GetObjectResponse, DeleteObjectResponse
- HeadObjectResponse, DeleteObjectsResponse
- ListBucketsResponse, CreateBucketResponse, CopyObjectResponse
- ResponseMetadata, and more

Next step: Refactor client methods to return these dicts instead of
custom dataclasses (ListObjectsResponse, ObjectInfo, etc.)

Example usage:
```python
from deltaglider import ListObjectsV2Response, create_client

client = create_client()
response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')

for obj in response['Contents']:
    print(f"{obj['Key']}: {obj['Size']} bytes")  # Full autocomplete!
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-08 14:14:37 +02:00
19 changed files with 2020 additions and 528 deletions

View File

@@ -5,6 +5,47 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [5.0.1] - 2025-01-10
### Changed
- **Code Organization**: Refactored client.py from 1560 to 1154 lines (26% reduction)
- Extracted client operations into modular `client_operations/` package:
- `bucket.py` - S3 bucket management operations
- `presigned.py` - Presigned URL generation
- `batch.py` - Batch upload/download operations
- `stats.py` - Analytics and statistics operations
- Improved code maintainability with logical separation of concerns
- Better developer experience with cleaner module structure
### Internal
- Full type safety maintained with mypy (0 errors)
- All 99 tests passing
- Code quality checks passing (ruff)
- No breaking changes - all public APIs remain unchanged
## [5.0.0] - 2025-01-10
### Added
- boto3-compatible TypedDict types for S3 responses (no boto3 import needed)
- Complete boto3 compatibility vision document
- Type-safe response builders using TypedDict patterns
### Changed
- **BREAKING**: `list_objects()` now returns boto3-compatible dict instead of custom dataclass
- Use `response['Contents']` instead of `response.contents`
- Use `response.get('IsTruncated')` instead of `response.is_truncated`
- Use `response.get('NextContinuationToken')` instead of `response.next_continuation_token`
- DeltaGlider metadata now in `Metadata` field of each object
- Internal response building now uses TypedDict for compile-time type safety
- All S3 responses are dicts at runtime (TypedDict is a dict!)
### Fixed
- Updated all documentation examples to use dict-based responses
- Fixed pagination examples in README and API docs
- Corrected SDK documentation with accurate method signatures
## [4.2.4] - 2025-01-10
### Fixed
@@ -65,6 +106,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Delta compression for versioned artifacts
- 99%+ compression for similar files
[5.0.1]: https://github.com/beshu-tech/deltaglider/compare/v5.0.0...v5.0.1
[5.0.0]: https://github.com/beshu-tech/deltaglider/compare/v4.2.4...v5.0.0
[4.2.4]: https://github.com/beshu-tech/deltaglider/compare/v4.2.3...v4.2.4
[4.2.3]: https://github.com/beshu-tech/deltaglider/compare/v4.2.2...v4.2.3
[4.2.2]: https://github.com/beshu-tech/deltaglider/compare/v4.2.1...v4.2.2

View File

@@ -30,7 +30,16 @@ RUN --mount=type=cache,target=/root/.cache/uv \
# Runtime stage - minimal image
FROM python:${PYTHON_VERSION}
# Install xdelta3
# Skip man pages and docs to speed up builds
RUN mkdir -p /etc/dpkg/dpkg.cfg.d && \
echo 'path-exclude /usr/share/doc/*' > /etc/dpkg/dpkg.cfg.d/01_nodoc && \
echo 'path-exclude /usr/share/man/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
echo 'path-exclude /usr/share/groff/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
echo 'path-exclude /usr/share/info/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
echo 'path-exclude /usr/share/lintian/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc && \
echo 'path-exclude /usr/share/linda/*' >> /etc/dpkg/dpkg.cfg.d/01_nodoc
# Install xdelta3 (now much faster without man pages)
RUN apt-get update && \
apt-get install -y --no-install-recommends xdelta3 && \
apt-get clean && \

View File

@@ -207,14 +207,18 @@ with open('downloaded.zip', 'wb') as f:
# Smart list_objects with optimized performance
response = client.list_objects(Bucket='releases', Prefix='v2.0.0/')
for obj in response['Contents']:
print(f"{obj['Key']}: {obj['Size']} bytes")
# Paginated listing for large buckets
response = client.list_objects(Bucket='releases', MaxKeys=100)
while response.is_truncated:
while response.get('IsTruncated'):
for obj in response['Contents']:
print(obj['Key'])
response = client.list_objects(
Bucket='releases',
MaxKeys=100,
ContinuationToken=response.next_continuation_token
ContinuationToken=response.get('NextContinuationToken')
)
# Delete and inspect objects

View File

@@ -0,0 +1,316 @@
# boto3 Compatibility Vision
## Current State (v4.2.3)
DeltaGlider currently uses custom dataclasses for responses:
```python
from deltaglider import create_client, ListObjectsResponse, ObjectInfo
client = create_client()
response: ListObjectsResponse = client.list_objects(Bucket='my-bucket')
for obj in response.contents: # Custom field name
print(f"{obj.key}: {obj.size}") # Custom ObjectInfo dataclass
```
**Problems:**
- ❌ Not a true drop-in replacement for boto3
- ❌ Users need to learn DeltaGlider-specific types
- ❌ Can't use with tools expecting boto3 responses
- ❌ Different API surface (`.contents` vs `['Contents']`)
## Target State (v5.0.0)
DeltaGlider should return native boto3-compatible dicts with TypedDict type hints:
```python
from deltaglider import create_client, ListObjectsV2Response
client = create_client()
response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')
for obj in response['Contents']: # boto3-compatible!
print(f"{obj['Key']}: {obj['Size']}") # Works exactly like boto3
```
**Benefits:**
-**True drop-in replacement** - swap `boto3.client('s3')` with `create_client()`
-**No learning curve** - if you know boto3, you know DeltaGlider
-**Tool compatibility** - works with any library expecting boto3 types
-**Type safety** - TypedDict provides IDE autocomplete without boto3 import
-**Zero runtime overhead** - TypedDict compiles to plain dict
## Implementation Plan
### Phase 1: Type Definitions ✅ (DONE)
Created `deltaglider/types.py` with comprehensive TypedDict definitions:
```python
from typing import TypedDict, NotRequired
from datetime import datetime
class S3Object(TypedDict):
Key: str
Size: int
LastModified: datetime
ETag: NotRequired[str]
StorageClass: NotRequired[str]
class ListObjectsV2Response(TypedDict):
Contents: list[S3Object]
CommonPrefixes: NotRequired[list[dict[str, str]]]
IsTruncated: NotRequired[bool]
NextContinuationToken: NotRequired[str]
```
**Key insight:** TypedDict provides type safety at development time but compiles to plain `dict` at runtime!
### Phase 2: Refactor Client Methods (TODO)
Update all client methods to return boto3-compatible dicts:
#### `list_objects()`
**Before:**
```python
def list_objects(...) -> ListObjectsResponse: # Custom dataclass
return ListObjectsResponse(
name=bucket,
contents=[ObjectInfo(...), ...] # Custom dataclass
)
```
**After:**
```python
def list_objects(...) -> ListObjectsV2Response: # TypedDict
return {
'Contents': [
{
'Key': 'file.zip', # .delta suffix already stripped
'Size': 1024,
'LastModified': datetime(...),
'ETag': '"abc123"',
}
],
'CommonPrefixes': [{'Prefix': 'dir/'}],
'IsTruncated': False,
}
```
**Key changes:**
1. Return plain dict instead of custom dataclass
2. Use boto3 field names: `Contents` not `contents`, `Key` not `key`
3. Strip `.delta` suffix transparently (already done)
4. Hide `reference.bin` files (already done)
#### `put_object()`
**Before:**
```python
def put_object(...) -> dict[str, Any]:
return {
"ETag": etag,
"VersionId": None,
"DeltaGliderInfo": {...} # Custom field
}
```
**After:**
```python
def put_object(...) -> PutObjectResponse: # TypedDict
return {
'ETag': etag,
'ResponseMetadata': {'HTTPStatusCode': 200},
# DeltaGlider metadata goes in Metadata field
'Metadata': {
'deltaglider-is-delta': 'true',
'deltaglider-compression-ratio': '0.99'
}
}
```
#### `get_object()`
**Before:**
```python
def get_object(...) -> dict[str, Any]:
return {
"Body": data,
"ContentLength": len(data),
"DeltaGliderInfo": {...} # Custom field
}
```
**After:**
```python
def get_object(...) -> GetObjectResponse: # TypedDict
return {
'Body': data, # bytes, not StreamingBody (simpler!)
'ContentLength': len(data),
'LastModified': datetime(...),
'ETag': '"abc123"',
'Metadata': { # DeltaGlider metadata here
'deltaglider-is-delta': 'true'
}
}
```
#### `delete_object()`, `delete_objects()`, `head_object()`, etc.
All follow the same pattern: return boto3-compatible dicts with TypedDict hints.
### Phase 3: Backward Compatibility (TODO)
Keep old dataclasses for 1-2 versions with deprecation warnings:
```python
class ListObjectsResponse:
"""DEPRECATED: Use dict responses with ListObjectsV2Response type hint.
This will be removed in v6.0.0. Update your code:
Before:
response.contents[0].key
After:
response['Contents'][0]['Key']
"""
def __init__(self, data: dict):
warnings.warn(
"ListObjectsResponse dataclass is deprecated. "
"Use dict responses with ListObjectsV2Response type hint.",
DeprecationWarning,
stacklevel=2
)
self._data = data
@property
def contents(self):
return [ObjectInfo(obj) for obj in self._data.get('Contents', [])]
```
### Phase 4: Update Documentation (TODO)
1. Update all examples to use dict responses
2. Add migration guide from v4.x to v5.0
3. Update BOTO3_COMPATIBILITY.md
4. Add "Drop-in Replacement" marketing language
### Phase 5: Update Tests (TODO)
Convert all tests from:
```python
assert response.contents[0].key == "file.zip"
```
To:
```python
assert response['Contents'][0]['Key'] == "file.zip"
```
## Migration Guide (for users)
### v4.x → v5.0
**Old code (v4.x):**
```python
from deltaglider import create_client
client = create_client()
response = client.list_objects(Bucket='my-bucket')
for obj in response.contents: # Dataclass attribute
print(f"{obj.key}: {obj.size}") # Dataclass attributes
```
**New code (v5.0):**
```python
from deltaglider import create_client, ListObjectsV2Response
client = create_client()
response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')
for obj in response['Contents']: # Dict key (boto3-compatible)
print(f"{obj['Key']}: {obj['Size']}") # Dict keys (boto3-compatible)
```
**Or even simpler - no type hint needed:**
```python
client = create_client()
response = client.list_objects(Bucket='my-bucket')
for obj in response['Contents']:
print(f"{obj['Key']}: {obj['Size']}")
```
## Benefits Summary
### For Users
- **Zero learning curve** - if you know boto3, you're done
- **Drop-in replacement** - literally change one line (client creation)
- **Type safety** - TypedDict provides autocomplete without boto3 dependency
- **Tool compatibility** - works with all boto3-compatible libraries
### For DeltaGlider
- **Simpler codebase** - no custom dataclasses to maintain
- **Better marketing** - true "drop-in replacement" claim
- **Easier testing** - test against boto3 behavior directly
- **Future-proof** - if boto3 adds fields, users can access them immediately
## Technical Details
### How TypedDict Works
```python
from typing import TypedDict
class MyResponse(TypedDict):
Key: str
Size: int
# At runtime, this is just a dict!
response: MyResponse = {'Key': 'file.zip', 'Size': 1024}
print(type(response)) # <class 'dict'>
# But mypy and IDEs understand the structure
response['Key'] # ✅ Autocomplete works!
response['Nonexistent'] # ❌ Mypy error: Key 'Nonexistent' not found
```
### DeltaGlider-Specific Metadata
Store in standard boto3 `Metadata` field:
```python
{
'Key': 'file.zip',
'Size': 1024,
'Metadata': {
# DeltaGlider-specific fields (prefixed for safety)
'deltaglider-is-delta': 'true',
'deltaglider-compression-ratio': '0.99',
'deltaglider-original-size': '100000',
'deltaglider-reference-key': 'releases/v1.0.0/reference.bin',
}
}
```
This is:
- ✅ boto3-compatible (Metadata is a standard field)
- ✅ Namespaced (deltaglider- prefix prevents conflicts)
- ✅ Optional (tools can ignore it)
- ✅ Type-safe (Metadata: NotRequired[dict[str, str]])
## Status
-**Phase 1:** TypedDict definitions created
-**Phase 2:** `list_objects()` refactored to return boto3-compatible dict
-**Phase 3:** Refactor remaining methods (`put_object`, `get_object`, etc.) (TODO)
-**Phase 4:** Backward compatibility with deprecation warnings (TODO)
-**Phase 5:** Documentation updates (TODO)
-**Phase 6:** Full test coverage updates (PARTIAL - list_objects tests done)
**Current:** v4.2.3+ (Phase 2 complete - `list_objects()` boto3-compatible)
**Target:** v5.0.0 release (all phases complete)

View File

@@ -38,10 +38,21 @@ response = client.get_object(Bucket='releases', Key='v1.0.0/app.zip')
# Optimized list_objects with smart performance defaults (NEW!)
# Fast by default - no unnecessary metadata fetching
response = client.list_objects(Bucket='releases', Prefix='v1.0.0/')
for obj in response['Contents']:
print(f"{obj['Key']}: {obj['Size']} bytes")
# Pagination for large buckets
response = client.list_objects(Bucket='releases', MaxKeys=100,
ContinuationToken=response.next_continuation_token)
response = client.list_objects(Bucket='releases', MaxKeys=100)
while response.get('IsTruncated'):
# Process current page
for obj in response['Contents']:
print(obj['Key'])
# Get next page
response = client.list_objects(
Bucket='releases',
MaxKeys=100,
ContinuationToken=response.get('NextContinuationToken')
)
# Get detailed compression stats only when needed
response = client.list_objects(Bucket='releases', FetchMetadata=True) # Slower but detailed

View File

@@ -94,7 +94,7 @@ def list_objects(
StartAfter: Optional[str] = None,
FetchMetadata: bool = False,
**kwargs
) -> ListObjectsResponse
) -> dict[str, Any]
```
##### Parameters
@@ -117,19 +117,32 @@ The method intelligently optimizes performance by:
2. Only fetching metadata for delta files when explicitly requested
3. Supporting efficient pagination for large buckets
##### Returns
boto3-compatible dict with:
- **Contents** (`list[dict]`): List of S3Object dicts with Key, Size, LastModified, Metadata
- **CommonPrefixes** (`list[dict]`): Optional list of common prefixes (folders)
- **IsTruncated** (`bool`): Whether more results are available
- **NextContinuationToken** (`str`): Token for next page
- **KeyCount** (`int`): Number of keys returned
##### Examples
```python
# Fast listing for UI display (no metadata fetching)
response = client.list_objects(Bucket='releases')
for obj in response['Contents']:
print(f"{obj['Key']}: {obj['Size']} bytes")
# Paginated listing for large buckets
response = client.list_objects(Bucket='releases', MaxKeys=100)
while response.is_truncated:
while response.get('IsTruncated'):
for obj in response['Contents']:
print(obj['Key'])
response = client.list_objects(
Bucket='releases',
MaxKeys=100,
ContinuationToken=response.next_continuation_token
ContinuationToken=response.get('NextContinuationToken')
)
# Get detailed compression stats (slower, only for analytics)
@@ -137,6 +150,11 @@ response = client.list_objects(
Bucket='releases',
FetchMetadata=True # Only fetches for delta files
)
for obj in response['Contents']:
metadata = obj.get('Metadata', {})
if metadata.get('deltaglider-is-delta') == 'true':
compression = metadata.get('deltaglider-compression-ratio', 'unknown')
print(f"{obj['Key']}: {compression} compression")
```
#### `get_bucket_stats`

View File

@@ -0,0 +1,64 @@
"""Example: Using boto3-compatible responses without importing boto3.
This demonstrates how DeltaGlider provides full type safety and boto3 compatibility
without requiring boto3 imports in user code.
As of v5.0.0, DeltaGlider returns plain dicts (not custom dataclasses) that are
100% compatible with boto3 S3 responses. You get IDE autocomplete through TypedDict
type hints without any runtime overhead.
"""
from deltaglider import ListObjectsV2Response, S3Object, create_client
# Create client (no boto3 import needed!)
client = create_client()
# Type hints work perfectly without boto3
def process_files(bucket: str, prefix: str) -> None:
"""Process files in S3 with full type safety."""
# Return type is fully typed - IDE autocomplete works!
response: ListObjectsV2Response = client.list_objects(
Bucket=bucket, Prefix=prefix, Delimiter="/"
)
# Response is a plain dict - 100% boto3-compatible
# TypedDict provides autocomplete and type checking
for obj in response["Contents"]:
# obj is typed as S3Object - all fields have autocomplete!
key: str = obj["Key"] # ✅ IDE knows this is str
size: int = obj["Size"] # ✅ IDE knows this is int
print(f"{key}: {size} bytes")
# DeltaGlider metadata is in the standard Metadata field
metadata = obj.get("Metadata", {})
if metadata.get("deltaglider-is-delta") == "true":
compression = metadata.get("deltaglider-compression-ratio", "unknown")
print(f" └─ Delta file (compression: {compression})")
# Optional fields work too
for prefix_dict in response.get("CommonPrefixes", []):
print(f"Directory: {prefix_dict['Prefix']}")
# Pagination info
if response.get("IsTruncated"):
next_token = response.get("NextContinuationToken")
print(f"More results available, token: {next_token}")
# This is 100% compatible with boto3 code!
def works_with_boto3_or_deltaglider(s3_client) -> None:
"""This function works with EITHER boto3 or DeltaGlider client."""
# Because the response structure is identical!
response = s3_client.list_objects(Bucket="my-bucket")
for obj in response["Contents"]:
print(obj["Key"])
if __name__ == "__main__":
# Example usage
print("✅ Full type safety without boto3 imports!")
print("✅ 100% compatible with boto3")
print("✅ Drop-in replacement")
print("✅ Plain dict responses (not custom dataclasses)")
print("✅ DeltaGlider metadata in standard Metadata field")

View File

@@ -17,12 +17,26 @@ from .client_models import (
)
from .core import DeltaService, DeltaSpace, ObjectKey
# Import boto3-compatible type aliases (no boto3 import required!)
from .types import (
CopyObjectResponse,
CreateBucketResponse,
DeleteObjectResponse,
DeleteObjectsResponse,
GetObjectResponse,
HeadObjectResponse,
ListBucketsResponse,
ListObjectsV2Response,
PutObjectResponse,
S3Object,
)
__all__ = [
"__version__",
# Client
"DeltaGliderClient",
"create_client",
# Data classes
# Data classes (legacy - will be deprecated in favor of TypedDict)
"UploadSummary",
"CompressionEstimate",
"ObjectInfo",
@@ -32,4 +46,15 @@ __all__ = [
"DeltaService",
"DeltaSpace",
"ObjectKey",
# boto3-compatible types (no boto3 import needed!)
"ListObjectsV2Response",
"PutObjectResponse",
"GetObjectResponse",
"DeleteObjectResponse",
"DeleteObjectsResponse",
"HeadObjectResponse",
"ListBucketsResponse",
"CreateBucketResponse",
"CopyObjectResponse",
"S3Object",
]

View File

@@ -259,23 +259,26 @@ def ls(
return f"{size_float:.1f}P"
# List objects using SDK (automatically filters .delta and reference.bin)
from deltaglider.client import DeltaGliderClient, ListObjectsResponse
from deltaglider.client import DeltaGliderClient
client = DeltaGliderClient(service)
dg_response: ListObjectsResponse = client.list_objects(
Bucket=bucket_name, Prefix=prefix_str, MaxKeys=10000, Delimiter="/" if not recursive else ""
dg_response = client.list_objects(
Bucket=bucket_name,
Prefix=prefix_str,
MaxKeys=10000,
Delimiter="/" if not recursive else "",
)
objects = dg_response.contents
objects = dg_response["Contents"]
# Filter by recursive flag
if not recursive:
# Show common prefixes (subdirectories) from S3 response
for common_prefix in dg_response.common_prefixes:
for common_prefix in dg_response.get("CommonPrefixes", []):
prefix_path = common_prefix.get("Prefix", "")
# Show only the directory name, not the full path
if prefix_str:
# Strip the current prefix to show only the subdirectory
display_name = prefix_path[len(prefix_str):]
display_name = prefix_path[len(prefix_str) :]
else:
display_name = prefix_path
click.echo(f" PRE {display_name}")
@@ -283,7 +286,8 @@ def ls(
# Only show files at current level (not in subdirectories)
filtered_objects = []
for obj in objects:
rel_path = obj.key[len(prefix_str):] if prefix_str else obj.key
obj_key = obj["Key"]
rel_path = obj_key[len(prefix_str) :] if prefix_str else obj_key
# Only include if it's a direct child (no / in relative path)
if "/" not in rel_path and rel_path:
filtered_objects.append(obj)
@@ -294,23 +298,24 @@ def ls(
total_count = 0
for obj in objects:
total_size += obj.size
total_size += obj["Size"]
total_count += 1
# Format the display
size_str = format_bytes(obj.size)
size_str = format_bytes(obj["Size"])
# last_modified is a string from SDK, parse it if needed
if isinstance(obj.last_modified, str):
last_modified = obj.get("LastModified", "")
if isinstance(last_modified, str):
# Already a string, extract date portion
date_str = obj.last_modified[:19].replace("T", " ")
date_str = last_modified[:19].replace("T", " ")
else:
date_str = obj.last_modified.strftime("%Y-%m-%d %H:%M:%S")
date_str = last_modified.strftime("%Y-%m-%d %H:%M:%S")
# Show only the filename relative to current prefix (like AWS CLI)
if prefix_str:
display_key = obj.key[len(prefix_str):]
display_key = obj["Key"][len(prefix_str) :]
else:
display_key = obj.key
display_key = obj["Key"]
click.echo(f"{date_str} {size_str:>10} {display_key}")

View File

@@ -1,5 +1,6 @@
"""DeltaGlider client with boto3-compatible APIs and advanced features."""
# ruff: noqa: I001
import tempfile
from collections.abc import Callable
from pathlib import Path
@@ -10,12 +11,36 @@ from .client_delete_helpers import delete_with_delta_suffix
from .client_models import (
BucketStats,
CompressionEstimate,
ListObjectsResponse,
ObjectInfo,
UploadSummary,
)
# fmt: off - Keep all client_operations imports together
from .client_operations import (
create_bucket as _create_bucket,
delete_bucket as _delete_bucket,
download_batch as _download_batch,
estimate_compression as _estimate_compression,
find_similar_files as _find_similar_files,
generate_presigned_post as _generate_presigned_post,
generate_presigned_url as _generate_presigned_url,
get_bucket_stats as _get_bucket_stats,
get_object_info as _get_object_info,
list_buckets as _list_buckets,
upload_batch as _upload_batch,
upload_chunked as _upload_chunked,
)
# fmt: on
from .core import DeltaService, DeltaSpace, ObjectKey
from .core.errors import NotFoundError
from .response_builders import (
build_delete_response,
build_get_response,
build_list_objects_response,
build_put_response,
)
from .types import CommonPrefix, S3Object
class DeltaGliderClient:
@@ -123,21 +148,33 @@ class DeltaGliderClient:
# Calculate ETag from file content
sha256_hash = self.service.hasher.sha256(tmp_path)
# Return boto3-compatible response with delta info
return {
"ETag": f'"{sha256_hash}"',
"ResponseMetadata": {
"HTTPStatusCode": 200,
},
"DeltaGlider": {
"original_size": summary.file_size,
"stored_size": summary.delta_size or summary.file_size,
"is_delta": summary.delta_size is not None,
"compression_ratio": summary.delta_ratio or 1.0,
"stored_as": summary.key,
"operation": summary.operation,
},
# Build DeltaGlider compression info
deltaglider_info: dict[str, Any] = {
"OriginalSizeMB": summary.file_size / (1024 * 1024),
"StoredSizeMB": (summary.delta_size or summary.file_size) / (1024 * 1024),
"IsDelta": summary.delta_size is not None,
"CompressionRatio": summary.delta_ratio or 1.0,
"SavingsPercent": (
(
(summary.file_size - (summary.delta_size or summary.file_size))
/ summary.file_size
* 100
)
if summary.file_size > 0
else 0.0
),
"StoredAs": summary.key,
"Operation": summary.operation,
}
# Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
return cast(
dict[str, Any],
build_put_response(
etag=f'"{sha256_hash}"',
deltaglider_info=deltaglider_info,
),
)
finally:
# Clean up temp file
if tmp_path.exists():
@@ -173,19 +210,19 @@ class DeltaGliderClient:
# Get metadata
obj_head = self.service.storage.head(f"{Bucket}/{Key}")
file_size = tmp_path.stat().st_size
etag = f'"{self.service.hasher.sha256(tmp_path)}"'
return {
"Body": body, # File-like object
"ContentLength": tmp_path.stat().st_size,
"ContentType": obj_head.metadata.get("content_type", "binary/octet-stream")
if obj_head
else "binary/octet-stream",
"ETag": f'"{self.service.hasher.sha256(tmp_path)}"',
"Metadata": obj_head.metadata if obj_head else {},
"ResponseMetadata": {
"HTTPStatusCode": 200,
},
}
# Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
return cast(
dict[str, Any],
build_get_response(
body=body, # type: ignore[arg-type] # File object is compatible with bytes
content_length=file_size,
etag=etag,
metadata=obj_head.metadata if obj_head else {},
),
)
def list_objects(
self,
@@ -197,7 +234,7 @@ class DeltaGliderClient:
StartAfter: str | None = None,
FetchMetadata: bool = False,
**kwargs: Any,
) -> ListObjectsResponse:
) -> dict[str, Any]:
"""List objects in bucket with smart metadata fetching.
This method optimizes performance by:
@@ -227,11 +264,11 @@ class DeltaGliderClient:
# Fast listing for UI display (no metadata)
response = client.list_objects(Bucket='releases', MaxKeys=100)
# Paginated listing
# Paginated listing (boto3-compatible dict response)
response = client.list_objects(
Bucket='releases',
MaxKeys=50,
ContinuationToken=response.next_continuation_token
ContinuationToken=response.get('NextContinuationToken')
)
# Detailed listing with compression stats (slower, only for analytics)
@@ -265,8 +302,8 @@ class DeltaGliderClient:
"is_truncated": False,
}
# Convert to ObjectInfo objects with smart metadata fetching
contents = []
# Convert to boto3-compatible S3Object TypedDicts (type-safe!)
contents: list[S3Object] = []
for obj in result.get("objects", []):
# Skip reference.bin files (internal files, never exposed to users)
if obj["key"].endswith("/reference.bin") or obj["key"] == "reference.bin":
@@ -280,20 +317,12 @@ class DeltaGliderClient:
if is_delta:
display_key = display_key[:-6] # Remove .delta suffix
# Create object info with basic data (no HEAD request)
info = ObjectInfo(
key=display_key, # Use cleaned key without .delta
size=obj["size"],
last_modified=obj.get("last_modified", ""),
etag=obj.get("etag"),
storage_class=obj.get("storage_class", "STANDARD"),
# DeltaGlider fields
original_size=obj["size"], # For non-delta, original = stored
compressed_size=obj["size"],
is_delta=is_delta,
compression_ratio=0.0 if not is_delta else None,
reference_key=None,
)
# Build DeltaGlider metadata
deltaglider_metadata: dict[str, str] = {
"deltaglider-is-delta": str(is_delta).lower(),
"deltaglider-original-size": str(obj["size"]),
"deltaglider-compression-ratio": "0.0" if not is_delta else "unknown",
}
# SMART METADATA FETCHING:
# 1. NEVER fetch metadata for non-delta files (no point)
@@ -304,30 +333,52 @@ class DeltaGliderClient:
if obj_head and obj_head.metadata:
metadata = obj_head.metadata
# Update with actual compression stats
info.original_size = int(metadata.get("file_size", obj["size"]))
info.compression_ratio = float(metadata.get("compression_ratio", 0.0))
info.reference_key = metadata.get("ref_key")
original_size = int(metadata.get("file_size", obj["size"]))
compression_ratio = float(metadata.get("compression_ratio", 0.0))
reference_key = metadata.get("ref_key")
deltaglider_metadata["deltaglider-original-size"] = str(original_size)
deltaglider_metadata["deltaglider-compression-ratio"] = str(
compression_ratio
)
if reference_key:
deltaglider_metadata["deltaglider-reference-key"] = reference_key
except Exception as e:
# Log but don't fail the listing
self.service.logger.debug(f"Failed to fetch metadata for {obj['key']}: {e}")
contents.append(info)
# Create boto3-compatible S3Object TypedDict - mypy validates structure!
s3_obj: S3Object = {
"Key": display_key, # Use cleaned key without .delta
"Size": obj["size"],
"LastModified": obj.get("last_modified", ""),
"ETag": obj.get("etag"),
"StorageClass": obj.get("storage_class", "STANDARD"),
"Metadata": deltaglider_metadata,
}
contents.append(s3_obj)
# Build response with pagination support
response = ListObjectsResponse(
name=Bucket,
prefix=Prefix,
delimiter=Delimiter,
max_keys=MaxKeys,
contents=contents,
common_prefixes=[{"Prefix": p} for p in result.get("common_prefixes", [])],
is_truncated=result.get("is_truncated", False),
next_continuation_token=result.get("next_continuation_token"),
continuation_token=ContinuationToken,
key_count=len(contents),
# Build type-safe boto3-compatible CommonPrefix TypedDicts
common_prefixes = result.get("common_prefixes", [])
common_prefix_dicts: list[CommonPrefix] | None = (
[CommonPrefix(Prefix=p) for p in common_prefixes] if common_prefixes else None
)
return response
# Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
return cast(
dict[str, Any],
build_list_objects_response(
bucket=Bucket,
prefix=Prefix,
delimiter=Delimiter,
max_keys=MaxKeys,
contents=contents,
common_prefixes=common_prefix_dicts,
is_truncated=result.get("is_truncated", False),
next_continuation_token=result.get("next_continuation_token"),
continuation_token=ContinuationToken,
),
)
def delete_object(
self,
@@ -347,32 +398,31 @@ class DeltaGliderClient:
"""
_, delete_result = delete_with_delta_suffix(self.service, Bucket, Key)
response = {
"DeleteMarker": False,
"ResponseMetadata": {
"HTTPStatusCode": 204,
},
"DeltaGliderInfo": {
"Type": delete_result.get("type"),
"Deleted": delete_result.get("deleted", False),
},
# Build DeltaGlider-specific info
deltaglider_info: dict[str, Any] = {
"Type": delete_result.get("type"),
"Deleted": delete_result.get("deleted", False),
}
# Add warnings if any
warnings = delete_result.get("warnings")
if warnings:
delta_info = response.get("DeltaGliderInfo")
if delta_info and isinstance(delta_info, dict):
delta_info["Warnings"] = warnings
deltaglider_info["Warnings"] = warnings
# Add dependent delta count for references
dependent_deltas = delete_result.get("dependent_deltas")
if dependent_deltas:
delta_info = response.get("DeltaGliderInfo")
if delta_info and isinstance(delta_info, dict):
delta_info["DependentDeltas"] = dependent_deltas
deltaglider_info["DependentDeltas"] = dependent_deltas
return response
# Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
return cast(
dict[str, Any],
build_delete_response(
delete_marker=False,
status_code=204,
deltaglider_info=deltaglider_info,
),
)
def delete_objects(
self,
@@ -760,40 +810,9 @@ class DeltaGliderClient:
progress_callback=on_progress
)
"""
file_path = Path(file_path)
file_size = file_path.stat().st_size
# For small files, just use regular upload
if file_size <= chunk_size:
if progress_callback:
progress_callback(1, 1, file_size, file_size)
return self.upload(file_path, s3_url, max_ratio=max_ratio)
# Calculate chunks
total_chunks = (file_size + chunk_size - 1) // chunk_size
# Create a temporary file for chunked processing
# For now, we read the entire file but report progress in chunks
# Future enhancement: implement true streaming upload in storage adapter
bytes_read = 0
with open(file_path, "rb") as f:
for chunk_num in range(1, total_chunks + 1):
# Read chunk (simulated for progress reporting)
chunk_data = f.read(chunk_size)
bytes_read += len(chunk_data)
if progress_callback:
progress_callback(chunk_num, total_chunks, bytes_read, file_size)
# Perform the actual upload
# TODO: When storage adapter supports streaming, pass chunks directly
result = self.upload(file_path, s3_url, max_ratio=max_ratio)
# Final progress callback
if progress_callback:
progress_callback(total_chunks, total_chunks, file_size, file_size)
result: UploadSummary = _upload_chunked(
self, file_path, s3_url, chunk_size, progress_callback, max_ratio
)
return result
def upload_batch(
@@ -814,20 +833,7 @@ class DeltaGliderClient:
Returns:
List of UploadSummary objects
"""
results = []
for i, file_path in enumerate(files):
file_path = Path(file_path)
if progress_callback:
progress_callback(file_path.name, i + 1, len(files))
# Upload each file
s3_url = f"{s3_prefix.rstrip('/')}/{file_path.name}"
summary = self.upload(file_path, s3_url, max_ratio=max_ratio)
results.append(summary)
return results
return _upload_batch(self, files, s3_prefix, max_ratio, progress_callback)
def download_batch(
self,
@@ -845,24 +851,7 @@ class DeltaGliderClient:
Returns:
List of downloaded file paths
"""
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
results = []
for i, s3_url in enumerate(s3_urls):
# Extract filename from URL
filename = s3_url.split("/")[-1]
if filename.endswith(".delta"):
filename = filename[:-6] # Remove .delta suffix
if progress_callback:
progress_callback(filename, i + 1, len(s3_urls))
output_path = output_dir / filename
self.download(s3_url, output_path)
results.append(output_path)
return results
return _download_batch(self, s3_urls, output_dir, progress_callback)
def estimate_compression(
self,
@@ -882,80 +871,10 @@ class DeltaGliderClient:
Returns:
CompressionEstimate with predicted compression
"""
file_path = Path(file_path)
file_size = file_path.stat().st_size
# Check file extension
ext = file_path.suffix.lower()
delta_extensions = {
".zip",
".tar",
".gz",
".tar.gz",
".tgz",
".bz2",
".tar.bz2",
".xz",
".tar.xz",
".7z",
".rar",
".dmg",
".iso",
".pkg",
".deb",
".rpm",
".apk",
".jar",
".war",
".ear",
}
# Already compressed formats that won't benefit from delta
incompressible = {".jpg", ".jpeg", ".png", ".mp4", ".mp3", ".avi", ".mov"}
if ext in incompressible:
return CompressionEstimate(
original_size=file_size,
estimated_compressed_size=file_size,
estimated_ratio=0.0,
confidence=0.95,
should_use_delta=False,
)
if ext not in delta_extensions:
# Unknown type, conservative estimate
return CompressionEstimate(
original_size=file_size,
estimated_compressed_size=file_size,
estimated_ratio=0.0,
confidence=0.5,
should_use_delta=file_size > 1024 * 1024, # Only for files > 1MB
)
# Look for similar files in the target location
similar_files = self.find_similar_files(bucket, prefix, file_path.name)
if similar_files:
# If we have similar files, estimate high compression
estimated_ratio = 0.99 # 99% compression typical for similar versions
confidence = 0.9
recommended_ref = similar_files[0]["Key"] if similar_files else None
else:
# First file of its type
estimated_ratio = 0.0
confidence = 0.7
recommended_ref = None
estimated_size = int(file_size * (1 - estimated_ratio))
return CompressionEstimate(
original_size=file_size,
estimated_compressed_size=estimated_size,
estimated_ratio=estimated_ratio,
confidence=confidence,
recommended_reference=recommended_ref,
should_use_delta=True,
result: CompressionEstimate = _estimate_compression(
self, file_path, bucket, prefix, sample_size
)
return result
def find_similar_files(
self,
@@ -975,56 +894,7 @@ class DeltaGliderClient:
Returns:
List of similar files with scores
"""
# List objects in the prefix (no metadata needed for similarity check)
response = self.list_objects(
Bucket=bucket,
Prefix=prefix,
MaxKeys=1000,
FetchMetadata=False, # Don't need metadata for similarity
)
similar: list[dict[str, Any]] = []
base_name = Path(filename).stem
ext = Path(filename).suffix
for obj in response.contents:
obj_base = Path(obj.key).stem
obj_ext = Path(obj.key).suffix
# Skip delta files and references
if obj.key.endswith(".delta") or obj.key.endswith("reference.bin"):
continue
score = 0.0
# Extension match
if ext == obj_ext:
score += 0.5
# Base name similarity
if base_name in obj_base or obj_base in base_name:
score += 0.3
# Version pattern match
import re
if re.search(r"v?\d+[\.\d]*", base_name) and re.search(r"v?\d+[\.\d]*", obj_base):
score += 0.2
if score > 0.5:
similar.append(
{
"Key": obj.key,
"Size": obj.size,
"Similarity": score,
"LastModified": obj.last_modified,
}
)
# Sort by similarity
similar.sort(key=lambda x: x["Similarity"], reverse=True) # type: ignore
return similar[:limit]
return _find_similar_files(self, bucket, prefix, filename, limit)
def get_object_info(self, s3_url: str) -> ObjectInfo:
"""Get detailed object information including compression stats.
@@ -1035,34 +905,8 @@ class DeltaGliderClient:
Returns:
ObjectInfo with detailed metadata
"""
# Parse URL
if not s3_url.startswith("s3://"):
raise ValueError(f"Invalid S3 URL: {s3_url}")
s3_path = s3_url[5:]
parts = s3_path.split("/", 1)
bucket = parts[0]
key = parts[1] if len(parts) > 1 else ""
# Get object metadata
obj_head = self.service.storage.head(f"{bucket}/{key}")
if not obj_head:
raise FileNotFoundError(f"Object not found: {s3_url}")
metadata = obj_head.metadata
is_delta = key.endswith(".delta")
return ObjectInfo(
key=key,
size=obj_head.size,
last_modified=metadata.get("last_modified", ""),
etag=metadata.get("etag"),
original_size=int(metadata.get("file_size", obj_head.size)),
compressed_size=obj_head.size,
compression_ratio=float(metadata.get("compression_ratio", 0.0)),
is_delta=is_delta,
reference_key=metadata.get("ref_key"),
)
result: ObjectInfo = _get_object_info(self, s3_url)
return result
def get_bucket_stats(self, bucket: str, detailed_stats: bool = False) -> BucketStats:
"""Get statistics for a bucket with optional detailed compression metrics.
@@ -1091,76 +935,8 @@ class DeltaGliderClient:
stats = client.get_bucket_stats('releases', detailed_stats=True)
print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
"""
# List all objects with smart metadata fetching
all_objects = []
continuation_token = None
while True:
response = self.list_objects(
Bucket=bucket,
MaxKeys=1000,
ContinuationToken=continuation_token,
FetchMetadata=detailed_stats, # Only fetch metadata if detailed stats requested
)
all_objects.extend(response.contents)
if not response.is_truncated:
break
continuation_token = response.next_continuation_token
# Calculate statistics
total_size = 0
compressed_size = 0
delta_count = 0
direct_count = 0
for obj in all_objects:
compressed_size += obj.size
if obj.is_delta:
delta_count += 1
# Use actual original size if we have it, otherwise estimate
total_size += obj.original_size or obj.size
else:
direct_count += 1
# For non-delta files, original equals compressed
total_size += obj.size
space_saved = total_size - compressed_size
avg_ratio = (space_saved / total_size) if total_size > 0 else 0.0
return BucketStats(
bucket=bucket,
object_count=len(all_objects),
total_size=total_size,
compressed_size=compressed_size,
space_saved=space_saved,
average_compression_ratio=avg_ratio,
delta_objects=delta_count,
direct_objects=direct_count,
)
def _try_boto3_presigned_operation(self, operation: str, **kwargs: Any) -> Any | None:
"""Try to generate presigned operation using boto3 client, return None if not available."""
storage_adapter = self.service.storage
# Check if storage adapter has boto3 client
if hasattr(storage_adapter, "client"):
try:
if operation == "url":
return str(storage_adapter.client.generate_presigned_url(**kwargs))
elif operation == "post":
return dict(storage_adapter.client.generate_presigned_post(**kwargs))
except AttributeError:
# storage_adapter does not have a 'client' attribute
pass
except Exception as e:
# Fall back to manual construction if needed
self.service.logger.warning(f"Failed to generate presigned {operation}: {e}")
return None
result: BucketStats = _get_bucket_stats(self, bucket, detailed_stats)
return result
def generate_presigned_url(
self,
@@ -1178,28 +954,7 @@ class DeltaGliderClient:
Returns:
Presigned URL string
"""
# Try boto3 first, fallback to manual construction
url = self._try_boto3_presigned_operation(
"url",
ClientMethod=ClientMethod,
Params=Params,
ExpiresIn=ExpiresIn,
)
if url is not None:
return str(url)
# Fallback: construct URL manually (less secure, for dev/testing only)
bucket = Params.get("Bucket", "")
key = Params.get("Key", "")
if self.endpoint_url:
base_url = self.endpoint_url
else:
base_url = f"https://{bucket}.s3.amazonaws.com"
# Warning: This is not a real presigned URL, just a placeholder
self.service.logger.warning("Using placeholder presigned URL - not suitable for production")
return f"{base_url}/{key}?expires={ExpiresIn}"
return _generate_presigned_url(self, ClientMethod, Params, ExpiresIn)
def generate_presigned_post(
self,
@@ -1221,31 +976,7 @@ class DeltaGliderClient:
Returns:
Dict with 'url' and 'fields' for form submission
"""
# Try boto3 first, fallback to manual construction
response = self._try_boto3_presigned_operation(
"post",
Bucket=Bucket,
Key=Key,
Fields=Fields,
Conditions=Conditions,
ExpiresIn=ExpiresIn,
)
if response is not None:
return dict(response)
# Fallback: return minimal structure for compatibility
if self.endpoint_url:
url = f"{self.endpoint_url}/{Bucket}"
else:
url = f"https://{Bucket}.s3.amazonaws.com"
return {
"url": url,
"fields": {
"key": Key,
**(Fields or {}),
},
}
return _generate_presigned_post(self, Bucket, Key, Fields, Conditions, ExpiresIn)
# ============================================================================
# Bucket Management APIs (boto3-compatible)
@@ -1276,36 +1007,7 @@ class DeltaGliderClient:
... CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
... )
"""
storage_adapter = self.service.storage
# Check if storage adapter has boto3 client
if hasattr(storage_adapter, "client"):
try:
params: dict[str, Any] = {"Bucket": Bucket}
if CreateBucketConfiguration:
params["CreateBucketConfiguration"] = CreateBucketConfiguration
response = storage_adapter.client.create_bucket(**params)
return {
"Location": response.get("Location", f"/{Bucket}"),
"ResponseMetadata": {
"HTTPStatusCode": 200,
},
}
except Exception as e:
error_msg = str(e)
if "BucketAlreadyExists" in error_msg or "BucketAlreadyOwnedByYou" in error_msg:
# Bucket already exists - return success
self.service.logger.debug(f"Bucket {Bucket} already exists")
return {
"Location": f"/{Bucket}",
"ResponseMetadata": {
"HTTPStatusCode": 200,
},
}
raise RuntimeError(f"Failed to create bucket: {e}") from e
else:
raise NotImplementedError("Storage adapter does not support bucket creation")
return _create_bucket(self, Bucket, CreateBucketConfiguration, **kwargs)
def delete_bucket(
self,
@@ -1327,30 +1029,7 @@ class DeltaGliderClient:
>>> client = create_client()
>>> client.delete_bucket(Bucket='my-bucket')
"""
storage_adapter = self.service.storage
# Check if storage adapter has boto3 client
if hasattr(storage_adapter, "client"):
try:
storage_adapter.client.delete_bucket(Bucket=Bucket)
return {
"ResponseMetadata": {
"HTTPStatusCode": 204,
},
}
except Exception as e:
error_msg = str(e)
if "NoSuchBucket" in error_msg:
# Bucket doesn't exist - return success
self.service.logger.debug(f"Bucket {Bucket} does not exist")
return {
"ResponseMetadata": {
"HTTPStatusCode": 204,
},
}
raise RuntimeError(f"Failed to delete bucket: {e}") from e
else:
raise NotImplementedError("Storage adapter does not support bucket deletion")
return _delete_bucket(self, Bucket, **kwargs)
def list_buckets(self, **kwargs: Any) -> dict[str, Any]:
"""List all S3 buckets (boto3-compatible).
@@ -1367,23 +1046,7 @@ class DeltaGliderClient:
>>> for bucket in response['Buckets']:
... print(bucket['Name'])
"""
storage_adapter = self.service.storage
# Check if storage adapter has boto3 client
if hasattr(storage_adapter, "client"):
try:
response = storage_adapter.client.list_buckets()
return {
"Buckets": response.get("Buckets", []),
"Owner": response.get("Owner", {}),
"ResponseMetadata": {
"HTTPStatusCode": 200,
},
}
except Exception as e:
raise RuntimeError(f"Failed to list buckets: {e}") from e
else:
raise NotImplementedError("Storage adapter does not support bucket listing")
return _list_buckets(self, **kwargs)
def _parse_tagging(self, tagging: str) -> dict[str, str]:
"""Parse URL-encoded tagging string to dict."""
@@ -1480,7 +1143,7 @@ def create_client(
metrics = NoopMetricsAdapter()
# Get default values
tool_version = kwargs.pop("tool_version", "deltaglider/0.2.0")
tool_version = kwargs.pop("tool_version", "deltaglider/5.0.0")
max_ratio = kwargs.pop("max_ratio", 0.5)
# Create service

View File

@@ -0,0 +1,37 @@
"""Client operation modules for DeltaGliderClient.
This package contains modular operation implementations:
- bucket: S3 bucket management (create, delete, list)
- presigned: Presigned URL generation for temporary access
- batch: Batch upload/download operations
- stats: Statistics and analytics operations
"""
from .batch import download_batch, upload_batch, upload_chunked
from .bucket import create_bucket, delete_bucket, list_buckets
from .presigned import generate_presigned_post, generate_presigned_url
from .stats import (
estimate_compression,
find_similar_files,
get_bucket_stats,
get_object_info,
)
__all__ = [
# Bucket operations
"create_bucket",
"delete_bucket",
"list_buckets",
# Presigned operations
"generate_presigned_url",
"generate_presigned_post",
# Batch operations
"upload_chunked",
"upload_batch",
"download_batch",
# Stats operations
"get_bucket_stats",
"get_object_info",
"estimate_compression",
"find_similar_files",
]

View File

@@ -0,0 +1,159 @@
"""Batch upload/download operations for DeltaGlider client.
This module contains DeltaGlider-specific batch operations:
- upload_batch
- download_batch
- upload_chunked
"""
from collections.abc import Callable
from pathlib import Path
from typing import Any
from ..client_models import UploadSummary
def upload_chunked(
client: Any, # DeltaGliderClient
file_path: str | Path,
s3_url: str,
chunk_size: int = 5 * 1024 * 1024,
progress_callback: Callable[[int, int, int, int], None] | None = None,
max_ratio: float = 0.5,
) -> UploadSummary:
"""Upload a file in chunks with progress callback.
This method reads the file in chunks to avoid loading large files entirely into memory,
making it suitable for uploading very large files. Progress is reported after each chunk.
Args:
client: DeltaGliderClient instance
file_path: Local file to upload
s3_url: S3 destination URL (s3://bucket/path/filename)
chunk_size: Size of each chunk in bytes (default 5MB)
progress_callback: Callback(chunk_number, total_chunks, bytes_sent, total_bytes)
max_ratio: Maximum acceptable delta/file ratio for compression
Returns:
UploadSummary with compression statistics
Example:
def on_progress(chunk_num, total_chunks, bytes_sent, total_bytes):
percent = (bytes_sent / total_bytes) * 100
print(f"Upload progress: {percent:.1f}%")
client.upload_chunked(
"large_file.zip",
"s3://bucket/releases/large_file.zip",
chunk_size=10 * 1024 * 1024, # 10MB chunks
progress_callback=on_progress
)
"""
file_path = Path(file_path)
file_size = file_path.stat().st_size
# For small files, just use regular upload
if file_size <= chunk_size:
if progress_callback:
progress_callback(1, 1, file_size, file_size)
result: UploadSummary = client.upload(file_path, s3_url, max_ratio=max_ratio)
return result
# Calculate chunks
total_chunks = (file_size + chunk_size - 1) // chunk_size
# Create a temporary file for chunked processing
# For now, we read the entire file but report progress in chunks
# Future enhancement: implement true streaming upload in storage adapter
bytes_read = 0
with open(file_path, "rb") as f:
for chunk_num in range(1, total_chunks + 1):
# Read chunk (simulated for progress reporting)
chunk_data = f.read(chunk_size)
bytes_read += len(chunk_data)
if progress_callback:
progress_callback(chunk_num, total_chunks, bytes_read, file_size)
# Perform the actual upload
# TODO: When storage adapter supports streaming, pass chunks directly
upload_result: UploadSummary = client.upload(file_path, s3_url, max_ratio=max_ratio)
# Final progress callback
if progress_callback:
progress_callback(total_chunks, total_chunks, file_size, file_size)
return upload_result
def upload_batch(
client: Any, # DeltaGliderClient
files: list[str | Path],
s3_prefix: str,
max_ratio: float = 0.5,
progress_callback: Callable[[str, int, int], None] | None = None,
) -> list[UploadSummary]:
"""Upload multiple files in batch.
Args:
client: DeltaGliderClient instance
files: List of local file paths
s3_prefix: S3 destination prefix (s3://bucket/prefix/)
max_ratio: Maximum acceptable delta/file ratio
progress_callback: Callback(filename, current_file_index, total_files)
Returns:
List of UploadSummary objects
"""
results = []
for i, file_path in enumerate(files):
file_path = Path(file_path)
if progress_callback:
progress_callback(file_path.name, i + 1, len(files))
# Upload each file
s3_url = f"{s3_prefix.rstrip('/')}/{file_path.name}"
summary = client.upload(file_path, s3_url, max_ratio=max_ratio)
results.append(summary)
return results
def download_batch(
client: Any, # DeltaGliderClient
s3_urls: list[str],
output_dir: str | Path,
progress_callback: Callable[[str, int, int], None] | None = None,
) -> list[Path]:
"""Download multiple files in batch.
Args:
client: DeltaGliderClient instance
s3_urls: List of S3 URLs to download
output_dir: Local directory to save files
progress_callback: Callback(filename, current_file_index, total_files)
Returns:
List of downloaded file paths
"""
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
results = []
for i, s3_url in enumerate(s3_urls):
# Extract filename from URL
filename = s3_url.split("/")[-1]
if filename.endswith(".delta"):
filename = filename[:-6] # Remove .delta suffix
if progress_callback:
progress_callback(filename, i + 1, len(s3_urls))
output_path = output_dir / filename
client.download(s3_url, output_path)
results.append(output_path)
return results

View File

@@ -0,0 +1,152 @@
"""Bucket management operations for DeltaGlider client.
This module contains boto3-compatible bucket operations:
- create_bucket
- delete_bucket
- list_buckets
"""
from typing import Any
def create_bucket(
client: Any, # DeltaGliderClient (avoiding circular import)
Bucket: str,
CreateBucketConfiguration: dict[str, str] | None = None,
**kwargs: Any,
) -> dict[str, Any]:
"""Create an S3 bucket (boto3-compatible).
Args:
client: DeltaGliderClient instance
Bucket: Bucket name to create
CreateBucketConfiguration: Optional bucket configuration (e.g., LocationConstraint)
**kwargs: Additional S3 parameters (for compatibility)
Returns:
Response dict with bucket location
Example:
>>> client = create_client()
>>> client.create_bucket(Bucket='my-bucket')
>>> # With region
>>> client.create_bucket(
... Bucket='my-bucket',
... CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
... )
"""
storage_adapter = client.service.storage
# Check if storage adapter has boto3 client
if hasattr(storage_adapter, "client"):
try:
params: dict[str, Any] = {"Bucket": Bucket}
if CreateBucketConfiguration:
params["CreateBucketConfiguration"] = CreateBucketConfiguration
response = storage_adapter.client.create_bucket(**params)
return {
"Location": response.get("Location", f"/{Bucket}"),
"ResponseMetadata": {
"HTTPStatusCode": 200,
},
}
except Exception as e:
error_msg = str(e)
if "BucketAlreadyExists" in error_msg or "BucketAlreadyOwnedByYou" in error_msg:
# Bucket already exists - return success
client.service.logger.debug(f"Bucket {Bucket} already exists")
return {
"Location": f"/{Bucket}",
"ResponseMetadata": {
"HTTPStatusCode": 200,
},
}
raise RuntimeError(f"Failed to create bucket: {e}") from e
else:
raise NotImplementedError("Storage adapter does not support bucket creation")
def delete_bucket(
client: Any, # DeltaGliderClient
Bucket: str,
**kwargs: Any,
) -> dict[str, Any]:
"""Delete an S3 bucket (boto3-compatible).
Note: Bucket must be empty before deletion.
Args:
client: DeltaGliderClient instance
Bucket: Bucket name to delete
**kwargs: Additional S3 parameters (for compatibility)
Returns:
Response dict with deletion status
Example:
>>> client = create_client()
>>> client.delete_bucket(Bucket='my-bucket')
"""
storage_adapter = client.service.storage
# Check if storage adapter has boto3 client
if hasattr(storage_adapter, "client"):
try:
storage_adapter.client.delete_bucket(Bucket=Bucket)
return {
"ResponseMetadata": {
"HTTPStatusCode": 204,
},
}
except Exception as e:
error_msg = str(e)
if "NoSuchBucket" in error_msg:
# Bucket doesn't exist - return success
client.service.logger.debug(f"Bucket {Bucket} does not exist")
return {
"ResponseMetadata": {
"HTTPStatusCode": 204,
},
}
raise RuntimeError(f"Failed to delete bucket: {e}") from e
else:
raise NotImplementedError("Storage adapter does not support bucket deletion")
def list_buckets(
client: Any, # DeltaGliderClient
**kwargs: Any,
) -> dict[str, Any]:
"""List all S3 buckets (boto3-compatible).
Args:
client: DeltaGliderClient instance
**kwargs: Additional S3 parameters (for compatibility)
Returns:
Response dict with bucket list
Example:
>>> client = create_client()
>>> response = client.list_buckets()
>>> for bucket in response['Buckets']:
... print(bucket['Name'])
"""
storage_adapter = client.service.storage
# Check if storage adapter has boto3 client
if hasattr(storage_adapter, "client"):
try:
response = storage_adapter.client.list_buckets()
return {
"Buckets": response.get("Buckets", []),
"Owner": response.get("Owner", {}),
"ResponseMetadata": {
"HTTPStatusCode": 200,
},
}
except Exception as e:
raise RuntimeError(f"Failed to list buckets: {e}") from e
else:
raise NotImplementedError("Storage adapter does not support bucket listing")

View File

@@ -0,0 +1,124 @@
"""Presigned URL operations for DeltaGlider client.
This module contains boto3-compatible presigned URL operations:
- generate_presigned_url
- generate_presigned_post
"""
from typing import Any
def try_boto3_presigned_operation(
client: Any, # DeltaGliderClient
operation: str,
**kwargs: Any,
) -> Any | None:
"""Try to generate presigned operation using boto3 client, return None if not available."""
storage_adapter = client.service.storage
# Check if storage adapter has boto3 client
if hasattr(storage_adapter, "client"):
try:
if operation == "url":
return str(storage_adapter.client.generate_presigned_url(**kwargs))
elif operation == "post":
return dict(storage_adapter.client.generate_presigned_post(**kwargs))
except AttributeError:
# storage_adapter does not have a 'client' attribute
pass
except Exception as e:
# Fall back to manual construction if needed
client.service.logger.warning(f"Failed to generate presigned {operation}: {e}")
return None
def generate_presigned_url(
client: Any, # DeltaGliderClient
ClientMethod: str,
Params: dict[str, Any],
ExpiresIn: int = 3600,
) -> str:
"""Generate presigned URL (boto3-compatible).
Args:
client: DeltaGliderClient instance
ClientMethod: Method name ('get_object' or 'put_object')
Params: Parameters dict with Bucket and Key
ExpiresIn: URL expiration in seconds
Returns:
Presigned URL string
"""
# Try boto3 first, fallback to manual construction
url = try_boto3_presigned_operation(
client,
"url",
ClientMethod=ClientMethod,
Params=Params,
ExpiresIn=ExpiresIn,
)
if url is not None:
return str(url)
# Fallback: construct URL manually (less secure, for dev/testing only)
bucket = Params.get("Bucket", "")
key = Params.get("Key", "")
if client.endpoint_url:
base_url = client.endpoint_url
else:
base_url = f"https://{bucket}.s3.amazonaws.com"
# Warning: This is not a real presigned URL, just a placeholder
client.service.logger.warning("Using placeholder presigned URL - not suitable for production")
return f"{base_url}/{key}?expires={ExpiresIn}"
def generate_presigned_post(
client: Any, # DeltaGliderClient
Bucket: str,
Key: str,
Fields: dict[str, str] | None = None,
Conditions: list[Any] | None = None,
ExpiresIn: int = 3600,
) -> dict[str, Any]:
"""Generate presigned POST data for HTML forms (boto3-compatible).
Args:
client: DeltaGliderClient instance
Bucket: S3 bucket name
Key: Object key
Fields: Additional fields to include
Conditions: Upload conditions
ExpiresIn: URL expiration in seconds
Returns:
Dict with 'url' and 'fields' for form submission
"""
# Try boto3 first, fallback to manual construction
response = try_boto3_presigned_operation(
client,
"post",
Bucket=Bucket,
Key=Key,
Fields=Fields,
Conditions=Conditions,
ExpiresIn=ExpiresIn,
)
if response is not None:
return dict(response)
# Fallback: return minimal structure for compatibility
if client.endpoint_url:
url = f"{client.endpoint_url}/{Bucket}"
else:
url = f"https://{Bucket}.s3.amazonaws.com"
return {
"url": url,
"fields": {
"key": Key,
**(Fields or {}),
},
}

View File

@@ -0,0 +1,337 @@
"""Statistics and analysis operations for DeltaGlider client.
This module contains DeltaGlider-specific statistics operations:
- get_bucket_stats
- get_object_info
- estimate_compression
- find_similar_files
"""
import re
from pathlib import Path
from typing import Any
from ..client_models import BucketStats, CompressionEstimate, ObjectInfo
def get_object_info(
client: Any, # DeltaGliderClient
s3_url: str,
) -> ObjectInfo:
"""Get detailed object information including compression stats.
Args:
client: DeltaGliderClient instance
s3_url: S3 URL of the object
Returns:
ObjectInfo with detailed metadata
"""
# Parse URL
if not s3_url.startswith("s3://"):
raise ValueError(f"Invalid S3 URL: {s3_url}")
s3_path = s3_url[5:]
parts = s3_path.split("/", 1)
bucket = parts[0]
key = parts[1] if len(parts) > 1 else ""
# Get object metadata
obj_head = client.service.storage.head(f"{bucket}/{key}")
if not obj_head:
raise FileNotFoundError(f"Object not found: {s3_url}")
metadata = obj_head.metadata
is_delta = key.endswith(".delta")
return ObjectInfo(
key=key,
size=obj_head.size,
last_modified=metadata.get("last_modified", ""),
etag=metadata.get("etag"),
original_size=int(metadata.get("file_size", obj_head.size)),
compressed_size=obj_head.size,
compression_ratio=float(metadata.get("compression_ratio", 0.0)),
is_delta=is_delta,
reference_key=metadata.get("ref_key"),
)
def get_bucket_stats(
client: Any, # DeltaGliderClient
bucket: str,
detailed_stats: bool = False,
) -> BucketStats:
"""Get statistics for a bucket with optional detailed compression metrics.
This method provides two modes:
- Quick stats (default): Fast overview using LIST only (~50ms)
- Detailed stats: Accurate compression metrics with HEAD requests (slower)
Args:
client: DeltaGliderClient instance
bucket: S3 bucket name
detailed_stats: If True, fetch accurate compression ratios for delta files (default: False)
Returns:
BucketStats with compression and space savings info
Performance:
- With detailed_stats=False: ~50ms for any bucket size (1 LIST call per 1000 objects)
- With detailed_stats=True: ~2-3s per 1000 objects (adds HEAD calls for delta files only)
Example:
# Quick stats for dashboard display
stats = client.get_bucket_stats('releases')
print(f"Objects: {stats.object_count}, Size: {stats.total_size}")
# Detailed stats for analytics (slower but accurate)
stats = client.get_bucket_stats('releases', detailed_stats=True)
print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
"""
# List all objects with smart metadata fetching
all_objects = []
continuation_token = None
while True:
response = client.list_objects(
Bucket=bucket,
MaxKeys=1000,
ContinuationToken=continuation_token,
FetchMetadata=detailed_stats, # Only fetch metadata if detailed stats requested
)
# Extract S3Objects from response (with Metadata containing DeltaGlider info)
for obj_dict in response["Contents"]:
# Convert dict back to ObjectInfo for backward compatibility with stats calculation
metadata = obj_dict.get("Metadata", {})
# Parse compression ratio safely (handle "unknown" value)
compression_ratio_str = metadata.get("deltaglider-compression-ratio", "0.0")
try:
compression_ratio = (
float(compression_ratio_str) if compression_ratio_str != "unknown" else 0.0
)
except ValueError:
compression_ratio = 0.0
all_objects.append(
ObjectInfo(
key=obj_dict["Key"],
size=obj_dict["Size"],
last_modified=obj_dict.get("LastModified", ""),
etag=obj_dict.get("ETag"),
storage_class=obj_dict.get("StorageClass", "STANDARD"),
original_size=int(metadata.get("deltaglider-original-size", obj_dict["Size"])),
compressed_size=obj_dict["Size"],
is_delta=metadata.get("deltaglider-is-delta", "false") == "true",
compression_ratio=compression_ratio,
reference_key=metadata.get("deltaglider-reference-key"),
)
)
if not response.get("IsTruncated"):
break
continuation_token = response.get("NextContinuationToken")
# Calculate statistics
total_size = 0
compressed_size = 0
delta_count = 0
direct_count = 0
for obj in all_objects:
# Skip reference.bin files - they are internal implementation details
# and their size is already accounted for in delta metadata
if obj.key.endswith("/reference.bin") or obj.key == "reference.bin":
continue
compressed_size += obj.size
if obj.is_delta:
delta_count += 1
# Use actual original size if we have it, otherwise estimate
total_size += obj.original_size or obj.size
else:
direct_count += 1
# For non-delta files, original equals compressed
total_size += obj.size
space_saved = total_size - compressed_size
avg_ratio = (space_saved / total_size) if total_size > 0 else 0.0
return BucketStats(
bucket=bucket,
object_count=len(all_objects),
total_size=total_size,
compressed_size=compressed_size,
space_saved=space_saved,
average_compression_ratio=avg_ratio,
delta_objects=delta_count,
direct_objects=direct_count,
)
def estimate_compression(
client: Any, # DeltaGliderClient
file_path: str | Path,
bucket: str,
prefix: str = "",
sample_size: int = 1024 * 1024,
) -> CompressionEstimate:
"""Estimate compression ratio before upload.
Args:
client: DeltaGliderClient instance
file_path: Local file to estimate
bucket: Target bucket
prefix: Target prefix (for finding similar files)
sample_size: Bytes to sample for estimation (default 1MB)
Returns:
CompressionEstimate with predicted compression
"""
file_path = Path(file_path)
file_size = file_path.stat().st_size
# Check file extension
ext = file_path.suffix.lower()
delta_extensions = {
".zip",
".tar",
".gz",
".tar.gz",
".tgz",
".bz2",
".tar.bz2",
".xz",
".tar.xz",
".7z",
".rar",
".dmg",
".iso",
".pkg",
".deb",
".rpm",
".apk",
".jar",
".war",
".ear",
}
# Already compressed formats that won't benefit from delta
incompressible = {".jpg", ".jpeg", ".png", ".mp4", ".mp3", ".avi", ".mov"}
if ext in incompressible:
return CompressionEstimate(
original_size=file_size,
estimated_compressed_size=file_size,
estimated_ratio=0.0,
confidence=0.95,
should_use_delta=False,
)
if ext not in delta_extensions:
# Unknown type, conservative estimate
return CompressionEstimate(
original_size=file_size,
estimated_compressed_size=file_size,
estimated_ratio=0.0,
confidence=0.5,
should_use_delta=file_size > 1024 * 1024, # Only for files > 1MB
)
# Look for similar files in the target location
similar_files = find_similar_files(client, bucket, prefix, file_path.name)
if similar_files:
# If we have similar files, estimate high compression
estimated_ratio = 0.99 # 99% compression typical for similar versions
confidence = 0.9
recommended_ref = similar_files[0]["Key"] if similar_files else None
else:
# First file of its type
estimated_ratio = 0.0
confidence = 0.7
recommended_ref = None
estimated_size = int(file_size * (1 - estimated_ratio))
return CompressionEstimate(
original_size=file_size,
estimated_compressed_size=estimated_size,
estimated_ratio=estimated_ratio,
confidence=confidence,
recommended_reference=recommended_ref,
should_use_delta=True,
)
def find_similar_files(
client: Any, # DeltaGliderClient
bucket: str,
prefix: str,
filename: str,
limit: int = 5,
) -> list[dict[str, Any]]:
"""Find similar files that could serve as references.
Args:
client: DeltaGliderClient instance
bucket: S3 bucket
prefix: Prefix to search in
filename: Filename to match against
limit: Maximum number of results
Returns:
List of similar files with scores
"""
# List objects in the prefix (no metadata needed for similarity check)
response = client.list_objects(
Bucket=bucket,
Prefix=prefix,
MaxKeys=1000,
FetchMetadata=False, # Don't need metadata for similarity
)
similar: list[dict[str, Any]] = []
base_name = Path(filename).stem
ext = Path(filename).suffix
for obj in response["Contents"]:
obj_key = obj["Key"]
obj_base = Path(obj_key).stem
obj_ext = Path(obj_key).suffix
# Skip delta files and references
if obj_key.endswith(".delta") or obj_key.endswith("reference.bin"):
continue
score = 0.0
# Extension match
if ext == obj_ext:
score += 0.5
# Base name similarity
if base_name in obj_base or obj_base in base_name:
score += 0.3
# Version pattern match
if re.search(r"v?\d+[\.\d]*", base_name) and re.search(r"v?\d+[\.\d]*", obj_base):
score += 0.2
if score > 0.5:
similar.append(
{
"Key": obj_key,
"Size": obj["Size"],
"Similarity": score,
"LastModified": obj["LastModified"],
}
)
# Sort by similarity
similar.sort(key=lambda x: x["Similarity"], reverse=True) # type: ignore
return similar[:limit]

View File

@@ -0,0 +1,152 @@
"""Type-safe response builders using TypedDicts for internal type safety.
This module provides builder functions that construct boto3-compatible responses
with full compile-time type validation using TypedDicts. At runtime, TypedDicts
are plain dicts, so there's no conversion overhead.
Benefits:
- Field name typos caught by mypy (e.g., "HTTPStatusCode""HttpStatusCode")
- Wrong types caught by mypy (e.g., string instead of int)
- Missing required fields caught by mypy
- Extra unknown fields caught by mypy
"""
from typing import Any
from .types import (
CommonPrefix,
DeleteObjectResponse,
GetObjectResponse,
ListObjectsV2Response,
PutObjectResponse,
ResponseMetadata,
S3Object,
)
def build_response_metadata(status_code: int = 200) -> ResponseMetadata:
"""Build ResponseMetadata with full type safety via TypedDict.
TypedDict is a dict at runtime - no conversion needed!
mypy validates all fields match ResponseMetadata TypedDict.
Uses our types.py TypedDict which has proper NotRequired fields.
"""
# Build as TypedDict - mypy validates field names and types!
metadata: ResponseMetadata = {
"HTTPStatusCode": status_code,
# All other fields are NotRequired - can be omitted!
}
return metadata # Returns dict at runtime, ResponseMetadata type at compile-time
def build_put_response(
etag: str,
*,
version_id: str | None = None,
deltaglider_info: dict[str, Any] | None = None,
) -> PutObjectResponse:
"""Build PutObjectResponse with full type safety via TypedDict.
Uses our types.py TypedDict which has proper NotRequired fields.
mypy validates all field names, types, and structure.
"""
# Build as TypedDict - mypy catches typos and type errors!
response: PutObjectResponse = {
"ETag": etag,
"ResponseMetadata": build_response_metadata(),
}
if version_id:
response["VersionId"] = version_id
# DeltaGlider extension - add as Any field
if deltaglider_info:
response["DeltaGliderInfo"] = deltaglider_info # type: ignore[typeddict-item]
return response # Returns dict at runtime, PutObjectResponse type at compile-time
def build_get_response(
body: Any,
content_length: int,
etag: str,
metadata: dict[str, Any],
) -> GetObjectResponse:
"""Build GetObjectResponse with full type safety via TypedDict.
Uses our types.py TypedDict which has proper NotRequired fields.
mypy validates all field names, types, and structure.
"""
# Build as TypedDict - mypy catches typos and type errors!
response: GetObjectResponse = {
"Body": body,
"ContentLength": content_length,
"ETag": etag,
"Metadata": metadata,
"ResponseMetadata": build_response_metadata(),
}
return response # Returns dict at runtime, GetObjectResponse type at compile-time
def build_list_objects_response(
bucket: str,
prefix: str,
delimiter: str,
max_keys: int,
contents: list[S3Object],
common_prefixes: list[CommonPrefix] | None,
is_truncated: bool,
next_continuation_token: str | None,
continuation_token: str | None,
) -> ListObjectsV2Response:
"""Build ListObjectsV2Response with full type safety via TypedDict.
Uses our types.py TypedDict which has proper NotRequired fields.
mypy validates all field names, types, and structure.
"""
# Build as TypedDict - mypy catches typos and type errors!
response: ListObjectsV2Response = {
"IsTruncated": is_truncated,
"Contents": contents,
"Name": bucket,
"Prefix": prefix,
"Delimiter": delimiter,
"MaxKeys": max_keys,
"KeyCount": len(contents),
"ResponseMetadata": build_response_metadata(),
}
# Add optional fields
if common_prefixes:
response["CommonPrefixes"] = common_prefixes
if next_continuation_token:
response["NextContinuationToken"] = next_continuation_token
if continuation_token:
response["ContinuationToken"] = continuation_token
return response # Returns dict at runtime, ListObjectsV2Response type at compile-time
def build_delete_response(
delete_marker: bool = False,
status_code: int = 204,
deltaglider_info: dict[str, Any] | None = None,
) -> DeleteObjectResponse:
"""Build DeleteObjectResponse with full type safety via TypedDict.
Uses our types.py TypedDict which has proper NotRequired fields.
mypy validates all field names, types, and structure.
"""
# Build as TypedDict - mypy catches typos and type errors!
response: DeleteObjectResponse = {
"DeleteMarker": delete_marker,
"ResponseMetadata": build_response_metadata(status_code),
}
# DeltaGlider extension
if deltaglider_info:
response["DeltaGliderInfo"] = deltaglider_info # type: ignore[typeddict-item]
return response # Returns dict at runtime, DeleteObjectResponse type at compile-time

355
src/deltaglider/types.py Normal file
View File

@@ -0,0 +1,355 @@
"""Type definitions for boto3-compatible responses.
These TypedDict definitions provide type hints for DeltaGlider's boto3-compatible
responses. All methods return plain `dict[str, Any]` at runtime for maximum
flexibility and boto3 compatibility.
## Basic Usage (Recommended)
Use DeltaGlider with simple dict access - no type imports needed:
```python
from deltaglider import create_client
client = create_client()
# Returns plain dict - 100% boto3 compatible
response = client.put_object(Bucket='my-bucket', Key='file.zip', Body=data)
print(response['ETag'])
# List objects with dict access
listing = client.list_objects(Bucket='my-bucket')
for obj in listing['Contents']:
print(f"{obj['Key']}: {obj['Size']} bytes")
```
## Optional Type Hints
For IDE autocomplete and type checking, you can use our convenience TypedDicts:
```python
from deltaglider import create_client
from deltaglider.types import PutObjectResponse, ListObjectsV2Response
client = create_client()
response: PutObjectResponse = client.put_object(...) # IDE autocomplete
listing: ListObjectsV2Response = client.list_objects(...)
```
## Advanced: boto3-stubs Integration
For strictest type checking (requires boto3-stubs installation):
```bash
pip install boto3-stubs[s3]
```
```python
from mypy_boto3_s3.type_defs import PutObjectOutputTypeDef
response: PutObjectOutputTypeDef = client.put_object(...)
```
**Note**: boto3-stubs TypedDefs are very strict and require ALL optional fields.
DeltaGlider returns partial dicts for better boto3 compatibility, so boto3-stubs
types may show false positive errors. Use `dict[str, Any]` or our TypedDicts instead.
## Design Philosophy
DeltaGlider returns `dict[str, Any]` from all boto3-compatible methods because:
1. **Flexibility**: boto3 responses vary by service and operation
2. **Compatibility**: Exact match with boto3 runtime behavior
3. **Simplicity**: No complex type dependencies for users
4. **Optional Typing**: Users choose their preferred level of type safety
"""
from datetime import datetime
from typing import Any, Literal, NotRequired, TypedDict
# ============================================================================
# S3 Object Types
# ============================================================================
class S3Object(TypedDict):
"""An S3 object returned in list operations.
Compatible with boto3's S3.Client.list_objects_v2() response Contents.
"""
Key: str
Size: int
LastModified: datetime
ETag: NotRequired[str]
StorageClass: NotRequired[str]
Owner: NotRequired[dict[str, str]]
Metadata: NotRequired[dict[str, str]]
class CommonPrefix(TypedDict):
"""A common prefix (directory) in S3 listing.
Compatible with boto3's S3.Client.list_objects_v2() response CommonPrefixes.
"""
Prefix: str
# ============================================================================
# Response Metadata (used in all responses)
# ============================================================================
class ResponseMetadata(TypedDict):
"""Metadata about the API response.
Compatible with all boto3 responses.
"""
RequestId: NotRequired[str]
HostId: NotRequired[str]
HTTPStatusCode: int
HTTPHeaders: NotRequired[dict[str, str]]
RetryAttempts: NotRequired[int]
# ============================================================================
# List Operations Response Types
# ============================================================================
class ListObjectsV2Response(TypedDict):
"""Response from list_objects_v2 operation.
100% compatible with boto3's S3.Client.list_objects_v2() response.
Example:
```python
client = create_client()
response: ListObjectsV2Response = client.list_objects(
Bucket='my-bucket',
Prefix='path/',
Delimiter='/'
)
for obj in response['Contents']:
print(f"{obj['Key']}: {obj['Size']} bytes")
for prefix in response.get('CommonPrefixes', []):
print(f"Directory: {prefix['Prefix']}")
```
"""
Contents: list[S3Object]
Name: NotRequired[str] # Bucket name
Prefix: NotRequired[str]
Delimiter: NotRequired[str]
MaxKeys: NotRequired[int]
CommonPrefixes: NotRequired[list[CommonPrefix]]
EncodingType: NotRequired[str]
KeyCount: NotRequired[int]
ContinuationToken: NotRequired[str]
NextContinuationToken: NotRequired[str]
StartAfter: NotRequired[str]
IsTruncated: NotRequired[bool]
ResponseMetadata: NotRequired[ResponseMetadata]
# ============================================================================
# Put/Get/Delete Response Types
# ============================================================================
class PutObjectResponse(TypedDict):
"""Response from put_object operation.
Compatible with boto3's S3.Client.put_object() response.
"""
ETag: str
VersionId: NotRequired[str]
ServerSideEncryption: NotRequired[str]
ResponseMetadata: NotRequired[ResponseMetadata]
class GetObjectResponse(TypedDict):
"""Response from get_object operation.
Compatible with boto3's S3.Client.get_object() response.
"""
Body: Any # StreamingBody in boto3, bytes in DeltaGlider
ContentLength: int
ContentType: NotRequired[str]
ETag: NotRequired[str]
LastModified: NotRequired[datetime]
Metadata: NotRequired[dict[str, str]]
VersionId: NotRequired[str]
StorageClass: NotRequired[str]
ResponseMetadata: NotRequired[ResponseMetadata]
class DeleteObjectResponse(TypedDict):
"""Response from delete_object operation.
Compatible with boto3's S3.Client.delete_object() response.
"""
DeleteMarker: NotRequired[bool]
VersionId: NotRequired[str]
ResponseMetadata: NotRequired[ResponseMetadata]
class DeletedObject(TypedDict):
"""A successfully deleted object.
Compatible with boto3's S3.Client.delete_objects() response Deleted.
"""
Key: str
VersionId: NotRequired[str]
DeleteMarker: NotRequired[bool]
DeleteMarkerVersionId: NotRequired[str]
class DeleteError(TypedDict):
"""An error that occurred during deletion.
Compatible with boto3's S3.Client.delete_objects() response Errors.
"""
Key: str
Code: str
Message: str
VersionId: NotRequired[str]
class DeleteObjectsResponse(TypedDict):
"""Response from delete_objects operation.
Compatible with boto3's S3.Client.delete_objects() response.
"""
Deleted: NotRequired[list[DeletedObject]]
Errors: NotRequired[list[DeleteError]]
ResponseMetadata: NotRequired[ResponseMetadata]
# ============================================================================
# Head Object Response
# ============================================================================
class HeadObjectResponse(TypedDict):
"""Response from head_object operation.
Compatible with boto3's S3.Client.head_object() response.
"""
ContentLength: int
ContentType: NotRequired[str]
ETag: NotRequired[str]
LastModified: NotRequired[datetime]
Metadata: NotRequired[dict[str, str]]
VersionId: NotRequired[str]
StorageClass: NotRequired[str]
ResponseMetadata: NotRequired[ResponseMetadata]
# ============================================================================
# Bucket Operations
# ============================================================================
class Bucket(TypedDict):
"""An S3 bucket.
Compatible with boto3's S3.Client.list_buckets() response Buckets.
"""
Name: str
CreationDate: datetime
class ListBucketsResponse(TypedDict):
"""Response from list_buckets operation.
Compatible with boto3's S3.Client.list_buckets() response.
"""
Buckets: list[Bucket]
Owner: NotRequired[dict[str, str]]
ResponseMetadata: NotRequired[ResponseMetadata]
class CreateBucketResponse(TypedDict):
"""Response from create_bucket operation.
Compatible with boto3's S3.Client.create_bucket() response.
"""
Location: NotRequired[str]
ResponseMetadata: NotRequired[ResponseMetadata]
# ============================================================================
# Multipart Upload Types
# ============================================================================
class CompletedPart(TypedDict):
"""A completed part in a multipart upload."""
PartNumber: int
ETag: str
class CompleteMultipartUploadResponse(TypedDict):
"""Response from complete_multipart_upload operation."""
Location: NotRequired[str]
Bucket: NotRequired[str]
Key: NotRequired[str]
ETag: NotRequired[str]
VersionId: NotRequired[str]
ResponseMetadata: NotRequired[ResponseMetadata]
# ============================================================================
# Copy Operations
# ============================================================================
class CopyObjectResponse(TypedDict):
"""Response from copy_object operation.
Compatible with boto3's S3.Client.copy_object() response.
"""
CopyObjectResult: NotRequired[dict[str, Any]]
ETag: NotRequired[str]
LastModified: NotRequired[datetime]
VersionId: NotRequired[str]
ResponseMetadata: NotRequired[ResponseMetadata]
# ============================================================================
# Type Aliases for Convenience
# ============================================================================
# Common parameter types
BucketName = str
ObjectKey = str
Prefix = str
Delimiter = str
# Storage class options
StorageClass = Literal[
"STANDARD",
"REDUCED_REDUNDANCY",
"STANDARD_IA",
"ONEZONE_IA",
"INTELLIGENT_TIERING",
"GLACIER",
"DEEP_ARCHIVE",
"GLACIER_IR",
]

View File

@@ -10,7 +10,6 @@ from deltaglider import create_client
from deltaglider.client import (
BucketStats,
CompressionEstimate,
ListObjectsResponse,
ObjectInfo,
)
@@ -279,27 +278,35 @@ class TestBoto3Compatibility:
assert response["ContentLength"] == len(content)
def test_list_objects(self, client):
"""Test list_objects with various options."""
"""Test list_objects with various options (boto3-compatible dict response)."""
# List all objects (default: FetchMetadata=False)
response = client.list_objects(Bucket="test-bucket")
assert isinstance(response, ListObjectsResponse)
assert response.key_count > 0
assert len(response.contents) > 0
# Response is now a boto3-compatible dict (not ListObjectsResponse)
assert isinstance(response, dict)
assert response["KeyCount"] > 0
assert len(response["Contents"]) > 0
# Verify S3Object structure
for obj in response["Contents"]:
assert "Key" in obj
assert "Size" in obj
assert "LastModified" in obj
assert "Metadata" in obj # DeltaGlider metadata
# Test with FetchMetadata=True (should only affect delta files)
response_with_metadata = client.list_objects(Bucket="test-bucket", FetchMetadata=True)
assert isinstance(response_with_metadata, ListObjectsResponse)
assert response_with_metadata.key_count > 0
assert isinstance(response_with_metadata, dict)
assert response_with_metadata["KeyCount"] > 0
def test_list_objects_with_delimiter(self, client):
"""Test list_objects with delimiter for folder simulation."""
"""Test list_objects with delimiter for folder simulation (boto3-compatible dict response)."""
response = client.list_objects(Bucket="test-bucket", Prefix="", Delimiter="/")
# Should have common prefixes for folders
assert len(response.common_prefixes) > 0
assert {"Prefix": "folder1/"} in response.common_prefixes
assert {"Prefix": "folder2/"} in response.common_prefixes
assert len(response.get("CommonPrefixes", [])) > 0
assert {"Prefix": "folder1/"} in response["CommonPrefixes"]
assert {"Prefix": "folder2/"} in response["CommonPrefixes"]
def test_delete_object(self, client):
"""Test delete_object."""

View File

@@ -53,8 +53,11 @@ class TestSDKFiltering:
client = DeltaGliderClient(service)
response = client.list_objects(Bucket="test-bucket", Prefix="releases/")
# Response is now a boto3-compatible dict
contents = response["Contents"]
# Verify .delta suffix is stripped
keys = [obj.key for obj in response.contents]
keys = [obj["Key"] for obj in contents]
assert "releases/app-v1.zip" in keys
assert "releases/app-v2.zip" in keys
assert "releases/README.md" in keys
@@ -63,8 +66,10 @@ class TestSDKFiltering:
for key in keys:
assert not key.endswith(".delta"), f"Found .delta suffix in: {key}"
# Verify is_delta flag is set correctly
delta_objects = [obj for obj in response.contents if obj.is_delta]
# Verify is_delta flag is set correctly in Metadata
delta_objects = [
obj for obj in contents if obj.get("Metadata", {}).get("deltaglider-is-delta") == "true"
]
assert len(delta_objects) == 2
def test_list_objects_filters_reference_bin(self):
@@ -106,15 +111,18 @@ class TestSDKFiltering:
client = DeltaGliderClient(service)
response = client.list_objects(Bucket="test-bucket", Prefix="releases/")
# Response is now a boto3-compatible dict
contents = response["Contents"]
# Verify NO reference.bin files in output
keys = [obj.key for obj in response.contents]
keys = [obj["Key"] for obj in contents]
for key in keys:
assert not key.endswith("reference.bin"), f"Found reference.bin in: {key}"
# Should only have the app.zip (with .delta stripped)
assert len(response.contents) == 1
assert response.contents[0].key == "releases/app.zip"
assert response.contents[0].is_delta is True
assert len(contents) == 1
assert contents[0]["Key"] == "releases/app.zip"
assert contents[0].get("Metadata", {}).get("deltaglider-is-delta") == "true"
def test_list_objects_combined_filtering(self):
"""Test filtering of both .delta and reference.bin together."""
@@ -170,12 +178,15 @@ class TestSDKFiltering:
client = DeltaGliderClient(service)
response = client.list_objects(Bucket="test-bucket", Prefix="data/")
# Response is now a boto3-compatible dict
contents = response["Contents"]
# Should filter out 2 reference.bin files
# Should strip .delta from 3 files
# Should keep 1 regular file as-is
assert len(response.contents) == 4 # 3 deltas + 1 regular file
assert len(contents) == 4 # 3 deltas + 1 regular file
keys = [obj.key for obj in response.contents]
keys = [obj["Key"] for obj in contents]
expected_keys = ["data/file1.zip", "data/file2.zip", "data/file3.txt", "data/sub/app.jar"]
assert sorted(keys) == sorted(expected_keys)