5 Commits

Author SHA1 Message Date
Simone Scarduzio
db0662c175 fix: Update mypy type ignore comment for compatibility
- Change type: ignore[return-value] to type: ignore[no-any-return]
- Ensures mypy type checking passes in CI/CD pipeline

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-06 09:40:12 +02:00
Simone Scarduzio
2efa760785 feat: Add AWS credential parameters to create_client()
- Add aws_access_key_id, aws_secret_access_key, aws_session_token, and region_name parameters
- Pass credentials through to S3StorageAdapter and boto3.client()
- Enables multi-tenant scenarios with different AWS accounts
- Maintains backward compatibility (uses boto3 default credential chain when omitted)
- Add comprehensive tests for credential handling
- Add examples/credentials_example.py with usage examples

Fixes credential conflicts when multiple SDK instances need different credentials.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-06 09:07:40 +02:00
Simone Scarduzio
74207f4ee4 clearer readme 2025-10-03 23:28:35 +02:00
Simone Scarduzio
4668b10c3f fix tests 2025-10-03 21:49:13 +02:00
Simone Scarduzio
8cea5a3527 fix test 2025-10-03 21:41:26 +02:00
6 changed files with 422 additions and 223 deletions

341
README.md
View File

@@ -12,11 +12,11 @@
**Store 4TB of similar files in 5GB. No, that's not a typo.**
DeltaGlider is a drop-in S3 replacement that achieves 99.9% compression for versioned artifacts, backups, and release archives through intelligent binary delta compression.
DeltaGlider is a drop-in S3 replacement that may achieve 99.9% size reduction for versioned compressed artifacts, backups, and release archives through intelligent binary delta compression (via xdelta3).
## The Problem We Solved
You're storing hundreds of versions of your releases. Each 100MB build differs by <1% from the previous version. You're paying to store 100GB of what's essentially 100MB of unique data.
You're storing hundreds of versions of your software releases. Each 100MB build differs by <1% from the previous version. You're paying to store 100GB of what's essentially 100MB of unique data.
Sound familiar?
@@ -28,7 +28,45 @@ From our [ReadOnlyREST case study](docs/case-study-readonlyrest.md):
- **Compression**: 99.9% (not a typo)
- **Integration time**: 5 minutes
## How It Works
## Quick Start
The quickest way to start is using the GUI
* https://github.com/sscarduzio/dg_commander/
### CLI Installation
```bash
# Via pip (Python 3.11+)
pip install deltaglider
# Via uv (faster)
uv pip install deltaglider
# Via Docker
docker run -v ~/.aws:/root/.aws deltaglider/deltaglider --help
```
### Basic Usage
```bash
# Upload a file (automatic delta compression)
deltaglider cp my-app-v1.0.0.zip s3://releases/
# Download a file (automatic delta reconstruction)
deltaglider cp s3://releases/my-app-v1.0.0.zip ./downloaded.zip
# List objects
deltaglider ls s3://releases/
# Sync directories
deltaglider sync ./dist/ s3://releases/v1.0.0/
```
**That's it!** DeltaGlider automatically detects similar files and applies 99%+ compression. For more commands and options, see [CLI Reference](#cli-reference).
## Core Concepts
### How It Works
```
Traditional S3:
@@ -42,24 +80,32 @@ With DeltaGlider:
v1.0.2.zip (100MB) → S3: 97KB delta (100.3MB total)
```
## Quick Start
DeltaGlider stores the first file as a reference and subsequent similar files as tiny deltas (differences). When you download, it reconstructs the original file perfectly using the reference + delta.
### Installation
### Intelligent File Type Detection
```bash
# Via pip (Python 3.11+)
pip install deltaglider
DeltaGlider automatically detects file types and applies the optimal strategy:
# Via uv (faster)
uv pip install deltaglider
| File Type | Strategy | Typical Compression | Why It Works |
|-----------|----------|---------------------|--------------|
| `.zip`, `.tar`, `.gz` | Binary delta | 99%+ for similar versions | Archive structure remains consistent between versions |
| `.dmg`, `.deb`, `.rpm` | Binary delta | 95%+ for similar versions | Package formats with predictable structure |
| `.jar`, `.war`, `.ear` | Binary delta | 90%+ for similar builds | Java archives with mostly unchanged classes |
| `.exe`, `.dll`, `.so` | Direct upload | 0% (no delta benefit) | Compiled code changes unpredictably |
| `.txt`, `.json`, `.xml` | Direct upload | 0% (use gzip instead) | Text files benefit more from standard compression |
| `.sha1`, `.sha512`, `.md5` | Direct upload | 0% (already minimal) | Hash files are unique by design |
# Via Docker
docker run -v ~/.aws:/root/.aws deltaglider/deltaglider --help
```
### Key Features
### AWS S3 Compatible Commands
- **AWS CLI Replacement**: Same commands as `aws s3` with automatic compression
- **boto3-Compatible SDK**: Works with existing boto3 code with minimal changes
- **Zero Configuration**: No databases, no manifest files, no complex setup
- **Data Integrity**: SHA256 verification on every operation
- **S3 Compatible**: Works with AWS S3, MinIO, Cloudflare R2, and any S3-compatible storage
DeltaGlider is a **drop-in replacement** for AWS S3 CLI with automatic delta compression:
## CLI Reference
### All Commands
```bash
# Copy files to/from S3 (automatic delta compression for archives)
@@ -91,84 +137,35 @@ deltaglider sync --exclude "*.log" ./src/ s3://backup/ # Exclude patterns
deltaglider cp file.zip s3://bucket/ --endpoint-url http://localhost:9000
```
## Why xdelta3 Excels at Archive Compression
Traditional diff algorithms (like `diff` or `git diff`) work line-by-line on text files. Binary diff tools like `bsdiff` or `courgette` are optimized for executables. But **xdelta3** is uniquely suited for compressed archives because:
1. **Block-level matching**: xdelta3 uses a rolling hash algorithm to find matching byte sequences at any offset, not just line boundaries. This is crucial for archives where small file changes can shift all subsequent byte positions.
2. **Large window support**: xdelta3 can use reference windows up to 2GB, allowing it to find matches even when content has moved significantly within the archive. Other delta algorithms typically use much smaller windows (64KB-1MB).
3. **Compression-aware**: When you update one file in a ZIP/TAR archive, the archive format itself remains largely identical - same compression dictionary, same structure. xdelta3 preserves these similarities while other algorithms might miss them.
4. **Format agnostic**: Unlike specialized tools (e.g., `courgette` for Chrome updates), xdelta3 works on raw bytes without understanding the file format, making it perfect for any archive type.
### Real-World Example
When you rebuild a JAR file with one class changed:
- **Text diff**: 100% different (it's binary data!)
- **bsdiff**: ~30-40% of original size (optimized for executables, not archives)
- **xdelta3**: ~0.1-1% of original size (finds the unchanged parts regardless of position)
This is why DeltaGlider achieves 99%+ compression on versioned archives - xdelta3 can identify that 99% of the archive structure and content remains identical between versions.
## Intelligent File Type Detection
DeltaGlider automatically detects file types and applies the optimal strategy:
| File Type | Strategy | Typical Compression | Why It Works |
|-----------|----------|-------------------|--------------|
| `.zip`, `.tar`, `.gz` | Binary delta | 99%+ for similar versions | Archive structure remains consistent between versions |
| `.dmg`, `.deb`, `.rpm` | Binary delta | 95%+ for similar versions | Package formats with predictable structure |
| `.jar`, `.war`, `.ear` | Binary delta | 90%+ for similar builds | Java archives with mostly unchanged classes |
| `.exe`, `.dll`, `.so` | Direct upload | 0% (no delta benefit) | Compiled code changes unpredictably |
| `.txt`, `.json`, `.xml` | Direct upload | 0% (use gzip instead) | Text files benefit more from standard compression |
| `.sha1`, `.sha512`, `.md5` | Direct upload | 0% (already minimal) | Hash files are unique by design |
## Performance Benchmarks
Testing with real software releases:
```python
# 513 Elasticsearch plugin releases (82.5MB each)
Original size: 42.3 GB
DeltaGlider size: 115 MB
Compression: 99.7%
Upload speed: 3-4 files/second
Download speed: <100ms reconstruction
```
## Integration Examples
### Drop-in AWS CLI Replacement
### Command Flags
```bash
# Before (aws-cli)
aws s3 cp release-v2.0.0.zip s3://releases/
aws s3 cp --recursive ./build/ s3://releases/v2.0.0/
aws s3 ls s3://releases/
aws s3 rm s3://releases/old-version.zip
# All standard AWS flags work
deltaglider cp file.zip s3://bucket/ \
--endpoint-url http://localhost:9000 \
--profile production \
--region us-west-2
# After (deltaglider) - Same commands, 99% less storage!
deltaglider cp release-v2.0.0.zip s3://releases/
deltaglider cp -r ./build/ s3://releases/v2.0.0/
deltaglider ls s3://releases/
deltaglider rm s3://releases/old-version.zip
# DeltaGlider-specific flags
deltaglider cp file.zip s3://bucket/ \
--no-delta # Disable compression for specific files
--max-ratio 0.8 # Only use delta if compression > 20%
```
### CI/CD Pipeline (GitHub Actions)
### CI/CD Integration
#### GitHub Actions
```yaml
- name: Upload Release with 99% compression
run: |
pip install deltaglider
# Use AWS S3 compatible syntax
deltaglider cp dist/*.zip s3://releases/${{ github.ref_name }}/
# Or use recursive for entire directories
# Or recursive for entire directories
deltaglider cp -r dist/ s3://releases/${{ github.ref_name }}/
```
### Backup Script
#### Daily Backup Script
```bash
#!/bin/bash
@@ -177,18 +174,15 @@ tar -czf backup-$(date +%Y%m%d).tar.gz /data
deltaglider cp backup-*.tar.gz s3://backups/
# Only changes are stored, not full backup
# List backups with human-readable sizes
deltaglider ls -h s3://backups/
# Clean up old backups
deltaglider rm -r s3://backups/2023/
```
### Python SDK - boto3-Compatible API
## Python SDK
**[📚 Full SDK Documentation](docs/sdk/README.md)** | **[API Reference](docs/sdk/api.md)** | **[Examples](docs/sdk/examples.md)** | **[boto3 Compatibility Guide](BOTO3_COMPATIBILITY.md)**
#### Quick Start - boto3 Compatible API (Recommended)
### boto3-Compatible API (Recommended)
DeltaGlider provides a **boto3-compatible API** for core S3 operations (21 methods covering 80% of use cases):
@@ -211,8 +205,7 @@ response = client.get_object(Bucket='releases', Key='v2.0.0/my-app.zip')
with open('downloaded.zip', 'wb') as f:
f.write(response['Body'].read())
# Smart list_objects with optimized performance (NEW!)
# Fast listing (default) - no metadata fetching, ~50ms for 1000 objects
# Smart list_objects with optimized performance
response = client.list_objects(Bucket='releases', Prefix='v2.0.0/')
# Paginated listing for large buckets
@@ -224,22 +217,14 @@ while response.is_truncated:
ContinuationToken=response.next_continuation_token
)
# Get bucket statistics with smart defaults
stats = client.get_bucket_stats('releases') # Quick stats (50ms)
stats = client.get_bucket_stats('releases', detailed_stats=True) # With compression metrics
# Delete and inspect objects
client.delete_object(Bucket='releases', Key='old-version.zip')
client.head_object(Bucket='releases', Key='v2.0.0/my-app.zip')
# Bucket management - no boto3 needed!
client.create_bucket(Bucket='my-new-bucket')
client.list_buckets()
client.delete_bucket(Bucket='my-new-bucket')
```
#### Bucket Management (NEW!)
### Bucket Management
**No boto3 required!** DeltaGlider now provides complete bucket management:
**No boto3 required!** DeltaGlider provides complete bucket management:
```python
from deltaglider import create_client
@@ -264,15 +249,9 @@ for bucket in response['Buckets']:
client.delete_bucket(Bucket='my-old-bucket')
```
**Benefits:**
- ✅ No need to import boto3 separately for bucket operations
- ✅ Consistent API with DeltaGlider object operations
- ✅ Works with AWS S3, MinIO, and S3-compatible storage
- ✅ Idempotent operations (safe to retry)
See [examples/bucket_management.py](examples/bucket_management.py) for complete example.
#### Simple API (Alternative)
### Simple API (Alternative)
For simpler use cases, DeltaGlider also provides a streamlined API:
@@ -290,15 +269,16 @@ print(f"Saved {summary.savings_percent:.0f}% storage space")
client.download("s3://releases/v2.0.0/my-app-v2.0.0.zip", "local-app.zip")
```
#### Real-World Example: Software Release Storage with boto3 API
### Real-World Examples
#### Software Release Storage
```python
from deltaglider import create_client
# Works exactly like boto3, but with 99% compression!
client = create_client()
# Upload multiple versions using boto3-compatible API
# Upload multiple versions
versions = ["v1.0.0", "v1.0.1", "v1.0.2", "v1.1.0"]
for version in versions:
with open(f"dist/my-app-{version}.zip", 'rb') as f:
@@ -323,27 +303,19 @@ for version in versions:
# v1.0.1: Stored as 0.2MB delta (saved 99.8%)
# v1.0.2: Stored as 0.3MB delta (saved 99.7%)
# v1.1.0: Stored as 5.2MB delta (saved 94.8%)
# Download using standard boto3 API
response = client.get_object(Bucket='releases', Key='v1.1.0/my-app-v1.1.0.zip')
with open('my-app-latest.zip', 'wb') as f:
f.write(response['Body'].read())
```
#### Advanced Example: Automated Backup with boto3 API
#### Automated Database Backup
```python
from datetime import datetime
from deltaglider import create_client
# Works with any S3-compatible storage
client = create_client(endpoint_url="http://minio.internal:9000")
def backup_database():
"""Daily database backup with automatic deduplication using boto3 API."""
"""Daily database backup with automatic deduplication."""
date = datetime.now().strftime("%Y%m%d")
# Create database dump
dump_file = f"backup-{date}.sql.gz"
# Upload using boto3-compatible API
@@ -356,63 +328,80 @@ def backup_database():
Metadata={'date': date, 'source': 'production'}
)
# Check compression effectiveness (DeltaGlider extension)
# Check compression effectiveness
if 'DeltaGliderInfo' in response:
info = response['DeltaGliderInfo']
if info['DeltaRatio'] > 0.1: # If delta is >10% of original
if info['DeltaRatio'] > 0.1:
print(f"Warning: Low compression ({info['SavingsPercent']:.0f}%), "
"database might have significant changes")
print(f"Backup stored: {info['StoredSizeMB']:.1f}MB "
f"(compressed from {info['OriginalSizeMB']:.1f}MB)")
# List recent backups using boto3 API
response = client.list_objects(
Bucket='backups',
Prefix='postgres/',
MaxKeys=30
)
# Clean up old backups
for obj in response.get('Contents', []):
# Parse date from key
obj_date = obj['Key'].split('/')[1]
if days_old(obj_date) > 30:
client.delete_object(Bucket='backups', Key=obj['Key'])
# Run backup
backup_database()
```
For more examples and detailed API documentation, see the [SDK Documentation](docs/sdk/README.md).
## Migration from AWS CLI
## Performance & Benchmarks
Migrating from `aws s3` to `deltaglider` is as simple as changing the command name:
### Real-World Results
| AWS CLI | DeltaGlider | Compression Benefit |
|---------|------------|-------------------|
| `aws s3 cp file.zip s3://bucket/` | `deltaglider cp file.zip s3://bucket/` | ✅ 99% for similar files |
| `aws s3 cp -r dir/ s3://bucket/` | `deltaglider cp -r dir/ s3://bucket/` | ✅ 99% for archives |
| `aws s3 ls s3://bucket/` | `deltaglider ls s3://bucket/` | - |
| `aws s3 rm s3://bucket/file` | `deltaglider rm s3://bucket/file` | - |
| `aws s3 sync dir/ s3://bucket/` | `deltaglider sync dir/ s3://bucket/` | ✅ 99% incremental |
Testing with 513 Elasticsearch plugin releases (82.5MB each):
### Compatibility Flags
```bash
# All standard AWS flags work
deltaglider cp file.zip s3://bucket/ \
--endpoint-url http://localhost:9000 \
--profile production \
--region us-west-2
# DeltaGlider-specific flags
deltaglider cp file.zip s3://bucket/ \
--no-delta # Disable compression for specific files
--max-ratio 0.8 # Only use delta if compression > 20%
```
Original size: 42.3 GB
DeltaGlider size: 115 MB
Compression: 99.7%
Upload speed: 3-4 files/second
Download speed: <100ms reconstruction
```
## Architecture
### The Math
For `N` versions of a `S` MB file with `D%` difference between versions:
**Traditional S3**: `N × S` MB
**DeltaGlider**: `S + (N-1) × S × D%` MB
Example: 100 versions of 100MB files with 1% difference:
- **Traditional**: 10,000 MB
- **DeltaGlider**: 199 MB
- **Savings**: 98%
### Comparison
| Solution | Compression | Speed | Integration | Cost |
|----------|------------|-------|-------------|------|
| **DeltaGlider** | 99%+ | Fast | Drop-in | Open source |
| S3 Versioning | 0% | Native | Built-in | $$ per version |
| Deduplication | 30-50% | Slow | Complex | Enterprise $$$ |
| Git LFS | Good | Slow | Git-only | $ per GB |
| Restic/Borg | 80-90% | Medium | Backup-only | Open source |
## Architecture & Technical Deep Dive
### Why xdelta3 Excels at Archive Compression
Traditional diff algorithms (like `diff` or `git diff`) work line-by-line on text files. Binary diff tools like `bsdiff` or `courgette` are optimized for executables. But **xdelta3** is uniquely suited for compressed archives because:
1. **Block-level matching**: xdelta3 uses a rolling hash algorithm to find matching byte sequences at any offset, not just line boundaries. This is crucial for archives where small file changes can shift all subsequent byte positions.
2. **Large window support**: xdelta3 can use reference windows up to 2GB, allowing it to find matches even when content has moved significantly within the archive. Other delta algorithms typically use much smaller windows (64KB-1MB).
3. **Compression-aware**: When you update one file in a ZIP/TAR archive, the archive format itself remains largely identical - same compression dictionary, same structure. xdelta3 preserves these similarities while other algorithms might miss them.
4. **Format agnostic**: Unlike specialized tools (e.g., `courgette` for Chrome updates), xdelta3 works on raw bytes without understanding the file format, making it perfect for any archive type.
#### Real-World Example
When you rebuild a JAR file with one class changed:
- **Text diff**: 100% different (it's binary data!)
- **bsdiff**: ~30-40% of original size (optimized for executables, not archives)
- **xdelta3**: ~0.1-1% of original size (finds the unchanged parts regardless of position)
This is why DeltaGlider achieves 99%+ compression on versioned archives - xdelta3 can identify that 99% of the archive structure and content remains identical between versions.
### System Architecture
DeltaGlider uses a clean hexagonal architecture:
@@ -435,7 +424,7 @@ DeltaGlider uses a clean hexagonal architecture:
- **Local caching**: Fast repeated operations
- **Zero dependencies**: No database, no manifest files
## When to Use DeltaGlider
### When to Use DeltaGlider
**Perfect for:**
- Software releases and versioned artifacts
@@ -446,20 +435,22 @@ DeltaGlider uses a clean hexagonal architecture:
- Any versioned binary data
**Not ideal for:**
- Already compressed unique files
- Streaming media files
- Already compressed **unique** files
- Streaming or multimedia files
- Frequently changing unstructured data
- Files smaller than 1MB
## Comparison
## Migration from AWS CLI
| Solution | Compression | Speed | Integration | Cost |
|----------|------------|-------|-------------|------|
| **DeltaGlider** | 99%+ | Fast | Drop-in | Open source |
| S3 Versioning | 0% | Native | Built-in | $$ per version |
| Deduplication | 30-50% | Slow | Complex | Enterprise $$$ |
| Git LFS | Good | Slow | Git-only | $ per GB |
| Restic/Borg | 80-90% | Medium | Backup-only | Open source |
Migrating from `aws s3` to `deltaglider` is as simple as changing the command name:
| AWS CLI | DeltaGlider | Compression Benefit |
|---------|------------|---------------------|
| `aws s3 cp file.zip s3://bucket/` | `deltaglider cp file.zip s3://bucket/` | ✅ 99% for similar files |
| `aws s3 cp -r dir/ s3://bucket/` | `deltaglider cp -r dir/ s3://bucket/` | ✅ 99% for archives |
| `aws s3 ls s3://bucket/` | `deltaglider ls s3://bucket/` | - |
| `aws s3 rm s3://bucket/file` | `deltaglider rm s3://bucket/file` | - |
| `aws s3 sync dir/ s3://bucket/` | `deltaglider sync dir/ s3://bucket/` | ✅ 99% incremental |
## Production Ready
@@ -506,18 +497,6 @@ A: Zero. Files without similarity are uploaded directly.
**Q: Is this compatible with S3 encryption?**
A: Yes, DeltaGlider respects all S3 settings including SSE, KMS, and bucket policies.
## The Math
For `N` versions of a `S` MB file with `D%` difference between versions:
**Traditional S3**: `N × S` MB
**DeltaGlider**: `S + (N-1) × S × D%` MB
Example: 100 versions of 100MB files with 1% difference:
- **Traditional**: 10,000 MB
- **DeltaGlider**: 199 MB
- **Savings**: 98%
## Contributing
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
@@ -554,4 +533,4 @@ deltaglider analyze s3://your-bucket/
# Output: "Potential savings: 95.2% (4.8TB → 237GB)"
```
Built with ❤️ by engineers who were tired of paying to store the same bytes over and over.
Built with ❤️ by engineers who were tired of paying to store the same bytes over and over.

View File

@@ -0,0 +1,101 @@
"""Example: Using explicit AWS credentials with DeltaGlider.
This example demonstrates how to pass AWS credentials directly to
DeltaGlider's create_client() function, which is useful when:
1. You need to use different credentials than your environment default
2. You're working with temporary credentials (session tokens)
3. You want to avoid relying on environment variables
4. You're implementing multi-tenant systems with different AWS accounts
"""
from deltaglider import create_client
def example_basic_credentials():
"""Use basic AWS credentials (access key + secret key)."""
client = create_client(
aws_access_key_id="AKIAIOSFODNN7EXAMPLE",
aws_secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
region_name="us-west-2",
)
# Now use the client normally
# client.put_object(Bucket="my-bucket", Key="file.zip", Body=b"data")
print("✓ Created client with explicit credentials")
def example_temporary_credentials():
"""Use temporary AWS credentials (with session token)."""
client = create_client(
aws_access_key_id="ASIAIOSFODNN7EXAMPLE",
aws_secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
aws_session_token="FwoGZXIvYXdzEBEaDH...", # From STS
region_name="us-east-1",
)
print("✓ Created client with temporary credentials")
def example_environment_credentials():
"""Use default credential chain (environment variables, IAM role, etc.)."""
# When credentials are omitted, DeltaGlider uses boto3's default credential chain:
# 1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
# 2. AWS credentials file (~/.aws/credentials)
# 3. IAM role (for EC2 instances)
client = create_client()
print("✓ Created client with default credential chain")
def example_minio_credentials():
"""Use credentials for MinIO or other S3-compatible services."""
client = create_client(
endpoint_url="http://localhost:9000",
aws_access_key_id="minioadmin",
aws_secret_access_key="minioadmin",
)
print("✓ Created client for MinIO with custom credentials")
def example_multi_tenant():
"""Example: Different credentials for different tenants."""
# Tenant A uses one AWS account
tenant_a_client = create_client(
aws_access_key_id="TENANT_A_KEY",
aws_secret_access_key="TENANT_A_SECRET",
region_name="us-west-2",
)
# Tenant B uses a different AWS account
tenant_b_client = create_client(
aws_access_key_id="TENANT_B_KEY",
aws_secret_access_key="TENANT_B_SECRET",
region_name="eu-west-1",
)
print("✓ Created separate clients for multi-tenant scenario")
if __name__ == "__main__":
print("DeltaGlider Credentials Examples\n" + "=" * 40)
print("\n1. Basic credentials:")
example_basic_credentials()
print("\n2. Temporary credentials:")
example_temporary_credentials()
print("\n3. Environment credentials:")
example_environment_credentials()
print("\n4. MinIO credentials:")
example_minio_credentials()
print("\n5. Multi-tenant scenario:")
example_multi_tenant()
print("\n" + "=" * 40)
print("All examples completed successfully!")

View File

@@ -21,13 +21,31 @@ class S3StorageAdapter(StoragePort):
self,
client: Optional["S3Client"] = None,
endpoint_url: str | None = None,
boto3_kwargs: dict[str, Any] | None = None,
):
"""Initialize with S3 client."""
"""Initialize with S3 client.
Args:
client: Pre-configured S3 client (if None, one will be created)
endpoint_url: S3 endpoint URL override (for MinIO, LocalStack, etc.)
boto3_kwargs: Additional kwargs to pass to boto3.client() including:
- aws_access_key_id: AWS access key
- aws_secret_access_key: AWS secret key
- aws_session_token: AWS session token (for temporary credentials)
- region_name: AWS region name
"""
if client is None:
self.client = boto3.client(
"s3",
endpoint_url=endpoint_url or os.environ.get("AWS_ENDPOINT_URL"),
)
# Build boto3 client parameters
client_params: dict[str, Any] = {
"service_name": "s3",
"endpoint_url": endpoint_url or os.environ.get("AWS_ENDPOINT_URL"),
}
# Merge in any additional boto3 kwargs (credentials, region, etc.)
if boto3_kwargs:
client_params.update(boto3_kwargs)
self.client = boto3.client(**client_params)
else:
self.client = client
@@ -145,7 +163,7 @@ class S3StorageAdapter(StoragePort):
try:
response = self.client.get_object(Bucket=bucket, Key=object_key)
return response["Body"] # type: ignore[return-value]
return response["Body"] # type: ignore[no-any-return]
except ClientError as e:
if e.response["Error"]["Code"] == "NoSuchKey":
raise FileNotFoundError(f"Object not found: {key}") from e

View File

@@ -1396,6 +1396,10 @@ def create_client(
endpoint_url: str | None = None,
log_level: str = "INFO",
cache_dir: str = "/tmp/.deltaglider/cache",
aws_access_key_id: str | None = None,
aws_secret_access_key: str | None = None,
aws_session_token: str | None = None,
region_name: str | None = None,
**kwargs: Any,
) -> DeltaGliderClient:
"""Create a DeltaGlider client with boto3-compatible APIs.
@@ -1411,18 +1415,28 @@ def create_client(
endpoint_url: Optional S3 endpoint URL (for MinIO, R2, etc.)
log_level: Logging level
cache_dir: Directory for reference cache
aws_access_key_id: AWS access key ID (None to use environment/IAM)
aws_secret_access_key: AWS secret access key (None to use environment/IAM)
aws_session_token: AWS session token for temporary credentials (None if not using)
region_name: AWS region name (None for default)
**kwargs: Additional arguments
Returns:
DeltaGliderClient instance
Examples:
>>> # Boto3-compatible usage
>>> # Boto3-compatible usage with default credentials
>>> client = create_client()
>>> client.put_object(Bucket='my-bucket', Key='file.zip', Body=b'data')
>>> response = client.get_object(Bucket='my-bucket', Key='file.zip')
>>> data = response['Body'].read()
>>> # With explicit credentials
>>> client = create_client(
... aws_access_key_id='AKIAIOSFODNN7EXAMPLE',
... aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
... )
>>> # Batch operations
>>> results = client.upload_batch(['v1.zip', 'v2.zip'], 's3://bucket/releases/')
@@ -1441,9 +1455,20 @@ def create_client(
XdeltaAdapter,
)
# Build boto3 client kwargs
boto3_kwargs = {}
if aws_access_key_id is not None:
boto3_kwargs["aws_access_key_id"] = aws_access_key_id
if aws_secret_access_key is not None:
boto3_kwargs["aws_secret_access_key"] = aws_secret_access_key
if aws_session_token is not None:
boto3_kwargs["aws_session_token"] = aws_session_token
if region_name is not None:
boto3_kwargs["region_name"] = region_name
# Create adapters
hasher = Sha256Adapter()
storage = S3StorageAdapter(endpoint_url=endpoint_url)
storage = S3StorageAdapter(endpoint_url=endpoint_url, boto3_kwargs=boto3_kwargs)
diff = XdeltaAdapter()
cache = FsCacheAdapter(Path(cache_dir), hasher)
clock = UtcClockAdapter()

View File

@@ -15,10 +15,19 @@ from deltaglider.app.cli.main import cli
def extract_json_from_cli_output(output: str) -> dict:
"""Extract JSON from CLI output that may contain log messages."""
lines = output.split("\n")
json_start = next(i for i, line in enumerate(lines) if line.strip().startswith("{"))
json_end = next(i for i in range(json_start, len(lines)) if lines[i].strip() == "}") + 1
json_text = "\n".join(lines[json_start:json_end])
return json.loads(json_text)
for i, line in enumerate(lines):
if line.strip().startswith("{"):
json_start = i
json_end = (
next(
(j for j in range(json_start, len(lines)) if lines[j].strip() == "}"),
len(lines) - 1,
)
+ 1
)
json_text = "\n".join(lines[json_start:json_end])
return json.loads(json_text)
raise ValueError("No JSON found in CLI output")
@pytest.mark.e2e
@@ -74,23 +83,25 @@ class TestLocalStackE2E:
# Upload first file (becomes reference)
result = runner.invoke(cli, ["cp", str(file1), f"s3://{test_bucket}/plugins/"])
assert result.exit_code == 0
output1 = extract_json_from_cli_output(result.output)
assert output1["operation"] == "create_reference"
assert output1["key"] == "plugins/reference.bin"
assert "reference" in result.output.lower() or "upload:" in result.output
# Verify reference was created
objects = s3_client.list_objects_v2(Bucket=test_bucket, Prefix="plugins/")
# Verify reference was created (deltaspace is root, files are at root level)
objects = s3_client.list_objects_v2(Bucket=test_bucket)
assert "Contents" in objects
keys = [obj["Key"] for obj in objects["Contents"]]
assert "plugins/reference.bin" in keys
assert "plugins/plugin-v1.0.0.zip.delta" in keys
# Files are stored at root level: reference.bin and plugin-v1.0.0.zip.delta
assert "reference.bin" in keys
assert "plugin-v1.0.0.zip.delta" in keys
# Upload second file (creates delta)
result = runner.invoke(cli, ["cp", str(file2), f"s3://{test_bucket}/plugins/"])
assert result.exit_code == 0
output2 = extract_json_from_cli_output(result.output)
assert output2["operation"] == "create_delta"
assert output2["key"] == "plugins/plugin-v1.0.1.zip.delta"
assert "delta_ratio" in output2
assert "upload:" in result.output
# Verify delta was created
objects = s3_client.list_objects_v2(Bucket=test_bucket)
keys = [obj["Key"] for obj in objects["Contents"]]
assert "plugin-v1.0.1.zip.delta" in keys
# Download and verify second file
output_file = tmpdir / "downloaded.zip"
@@ -98,7 +109,7 @@ class TestLocalStackE2E:
cli,
[
"cp",
f"s3://{test_bucket}/plugins/plugin-v1.0.1.zip.delta",
f"s3://{test_bucket}/plugin-v1.0.1.zip.delta",
str(output_file),
],
)
@@ -108,41 +119,42 @@ class TestLocalStackE2E:
# Verify integrity
result = runner.invoke(
cli,
["verify", f"s3://{test_bucket}/plugins/plugin-v1.0.1.zip.delta"],
["verify", f"s3://{test_bucket}/plugin-v1.0.1.zip.delta"],
)
assert result.exit_code == 0
verify_output = extract_json_from_cli_output(result.output)
assert verify_output["valid"] is True
def test_multiple_deltaspaces(self, test_bucket, s3_client):
"""Test multiple deltaspace directories with separate references."""
"""Test shared deltaspace with multiple files."""
runner = CliRunner()
with tempfile.TemporaryDirectory() as tmpdir:
tmpdir = Path(tmpdir)
# Create test files for different deltaspaces
# Create test files for the same deltaspace
file_a1 = tmpdir / "app-a-v1.zip"
file_a1.write_text("Application A version 1")
file_b1 = tmpdir / "app-b-v1.zip"
file_b1.write_text("Application B version 1")
# Upload to different deltaspaces
# Upload to same deltaspace (apps/) with different target paths
result = runner.invoke(cli, ["cp", str(file_a1), f"s3://{test_bucket}/apps/app-a/"])
assert result.exit_code == 0
result = runner.invoke(cli, ["cp", str(file_b1), f"s3://{test_bucket}/apps/app-b/"])
assert result.exit_code == 0
# Verify each deltaspace has its own reference
objects_a = s3_client.list_objects_v2(Bucket=test_bucket, Prefix="apps/app-a/")
keys_a = [obj["Key"] for obj in objects_a["Contents"]]
assert "apps/app-a/reference.bin" in keys_a
objects_b = s3_client.list_objects_v2(Bucket=test_bucket, Prefix="apps/app-b/")
keys_b = [obj["Key"] for obj in objects_b["Contents"]]
assert "apps/app-b/reference.bin" in keys_b
# Verify deltaspace has reference (both files share apps/ deltaspace)
objects = s3_client.list_objects_v2(Bucket=test_bucket, Prefix="apps/")
assert "Contents" in objects
keys = [obj["Key"] for obj in objects["Contents"]]
# Should have: apps/reference.bin, apps/app-a-v1.zip.delta, apps/app-b-v1.zip.delta
# Both files share the same deltaspace (apps/) so only one reference
assert "apps/reference.bin" in keys
assert "apps/app-a-v1.zip.delta" in keys
assert "apps/app-b-v1.zip.delta" in keys
def test_large_delta_warning(self, test_bucket, s3_client):
"""Test delta compression with different content."""
@@ -174,9 +186,11 @@ class TestLocalStackE2E:
], # Very low threshold
)
assert result.exit_code == 0
# Even with completely different content, xdelta3 is efficient
output = extract_json_from_cli_output(result.output)
assert output["operation"] == "create_delta"
# Delta ratio should be small even for different files (xdelta3 is very efficient)
assert "delta_ratio" in output
assert output["delta_ratio"] > 0.01 # Should exceed the very low threshold we set
# Should still upload successfully even though delta exceeds threshold
assert "upload:" in result.output
# Verify delta was created
objects = s3_client.list_objects_v2(Bucket=test_bucket)
assert "Contents" in objects
keys = [obj["Key"] for obj in objects["Contents"]]
assert "file2.zip.delta" in keys

View File

@@ -146,6 +146,68 @@ def client(tmp_path):
return client
class TestCredentialHandling:
"""Test AWS credential passing."""
def test_create_client_with_explicit_credentials(self, tmp_path):
"""Test that credentials can be passed directly to create_client."""
# This test verifies the API accepts credentials, not that they work
# (we'd need a real S3 or LocalStack for that)
client = create_client(
aws_access_key_id="AKIAIOSFODNN7EXAMPLE",
aws_secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
region_name="us-west-2",
cache_dir=str(tmp_path / "cache"),
)
# Verify the client was created
assert client is not None
assert client.service is not None
# Verify credentials were passed to the storage adapter's boto3 client
# The storage adapter should have a client with these credentials
storage = client.service.storage
assert hasattr(storage, "client")
# Check that the boto3 client was configured with our credentials
# Note: boto3 doesn't expose credentials directly, but we can verify
# the client was created (if credentials were invalid, this would fail)
assert storage.client is not None
def test_create_client_with_session_token(self, tmp_path):
"""Test passing temporary credentials with session token."""
client = create_client(
aws_access_key_id="ASIAIOSFODNN7EXAMPLE",
aws_secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
aws_session_token="FwoGZXIvYXdzEBEaDH...",
cache_dir=str(tmp_path / "cache"),
)
assert client is not None
assert client.service.storage.client is not None
def test_create_client_without_credentials_uses_environment(self, tmp_path):
"""Test that omitting credentials falls back to environment/IAM."""
# This should use boto3's default credential chain
client = create_client(cache_dir=str(tmp_path / "cache"))
assert client is not None
assert client.service.storage.client is not None
def test_create_client_with_endpoint_and_credentials(self, tmp_path):
"""Test passing both endpoint URL and credentials."""
client = create_client(
endpoint_url="http://localhost:9000",
aws_access_key_id="minioadmin",
aws_secret_access_key="minioadmin",
cache_dir=str(tmp_path / "cache"),
)
assert client is not None
# Endpoint should be available
assert client.endpoint_url == "http://localhost:9000"
class TestBoto3Compatibility:
"""Test boto3-compatible methods."""