mirror of
https://github.com/beshu-tech/deltaglider.git
synced 2026-03-29 05:11:52 +02:00
Add simplified SDK client API and comprehensive documentation
- Create DeltaGliderClient with user-friendly interface - Add create_client() factory function with sensible defaults - Implement UploadSummary dataclass with helpful properties - Expose simplified API through main package - Add comprehensive SDK documentation under docs/sdk/: - Getting started guide with installation and examples - Complete API reference documentation - Real-world usage examples for 8 common scenarios - Architecture deep dive explaining how DeltaGlider works - Automatic documentation generation scripts - Update CONTRIBUTING.md with SDK documentation guidelines - All tests pass and code quality checks succeed 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
583
docs/sdk/api.md
Normal file
583
docs/sdk/api.md
Normal file
@@ -0,0 +1,583 @@
|
||||
# DeltaGlider API Reference
|
||||
|
||||
Complete API documentation for the DeltaGlider Python SDK.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Client Creation](#client-creation)
|
||||
- [DeltaGliderClient](#deltaglidererclient)
|
||||
- [UploadSummary](#uploadsummary)
|
||||
- [DeltaService](#deltaservice)
|
||||
- [Models](#models)
|
||||
- [Exceptions](#exceptions)
|
||||
|
||||
## Client Creation
|
||||
|
||||
### `create_client`
|
||||
|
||||
Factory function to create a configured DeltaGlider client with sensible defaults.
|
||||
|
||||
```python
|
||||
def create_client(
|
||||
endpoint_url: Optional[str] = None,
|
||||
log_level: str = "INFO",
|
||||
cache_dir: str = "/tmp/.deltaglider/cache",
|
||||
**kwargs
|
||||
) -> DeltaGliderClient
|
||||
```
|
||||
|
||||
#### Parameters
|
||||
|
||||
- **endpoint_url** (`Optional[str]`): S3 endpoint URL for MinIO, R2, or other S3-compatible storage. If None, uses AWS S3.
|
||||
- **log_level** (`str`): Logging verbosity level. Options: "DEBUG", "INFO", "WARNING", "ERROR". Default: "INFO".
|
||||
- **cache_dir** (`str`): Directory for local reference cache. Default: "/tmp/.deltaglider/cache".
|
||||
- **kwargs**: Additional arguments passed to `DeltaService`:
|
||||
- **tool_version** (`str`): Version string for metadata. Default: "deltaglider/0.1.0"
|
||||
- **max_ratio** (`float`): Maximum acceptable delta/file ratio. Default: 0.5
|
||||
|
||||
#### Returns
|
||||
|
||||
`DeltaGliderClient`: Configured client instance ready for use.
|
||||
|
||||
#### Examples
|
||||
|
||||
```python
|
||||
# Default AWS S3 configuration
|
||||
client = create_client()
|
||||
|
||||
# Custom endpoint for MinIO
|
||||
client = create_client(endpoint_url="http://localhost:9000")
|
||||
|
||||
# Debug mode with custom cache
|
||||
client = create_client(
|
||||
log_level="DEBUG",
|
||||
cache_dir="/var/cache/deltaglider"
|
||||
)
|
||||
|
||||
# Custom delta ratio threshold
|
||||
client = create_client(max_ratio=0.3) # Only use delta if <30% of original
|
||||
```
|
||||
|
||||
## DeltaGliderClient
|
||||
|
||||
Main client class for interacting with DeltaGlider.
|
||||
|
||||
### Constructor
|
||||
|
||||
```python
|
||||
class DeltaGliderClient:
|
||||
def __init__(
|
||||
self,
|
||||
service: DeltaService,
|
||||
endpoint_url: Optional[str] = None
|
||||
)
|
||||
```
|
||||
|
||||
**Note**: Use `create_client()` instead of instantiating directly.
|
||||
|
||||
### Methods
|
||||
|
||||
#### `upload`
|
||||
|
||||
Upload a file to S3 with automatic delta compression.
|
||||
|
||||
```python
|
||||
def upload(
|
||||
self,
|
||||
file_path: str | Path,
|
||||
s3_url: str,
|
||||
tags: Optional[Dict[str, str]] = None,
|
||||
max_ratio: float = 0.5
|
||||
) -> UploadSummary
|
||||
```
|
||||
|
||||
##### Parameters
|
||||
|
||||
- **file_path** (`str | Path`): Local file path to upload.
|
||||
- **s3_url** (`str`): S3 destination URL in format `s3://bucket/prefix/`.
|
||||
- **tags** (`Optional[Dict[str, str]]`): S3 object tags to attach. (Future feature)
|
||||
- **max_ratio** (`float`): Maximum acceptable delta/file size ratio. Default: 0.5.
|
||||
|
||||
##### Returns
|
||||
|
||||
`UploadSummary`: Object containing upload statistics and compression details.
|
||||
|
||||
##### Raises
|
||||
|
||||
- `FileNotFoundError`: If local file doesn't exist.
|
||||
- `ValueError`: If S3 URL is invalid.
|
||||
- `PermissionError`: If S3 access is denied.
|
||||
|
||||
##### Examples
|
||||
|
||||
```python
|
||||
# Simple upload
|
||||
summary = client.upload("app.zip", "s3://releases/v1.0.0/")
|
||||
|
||||
# With custom compression threshold
|
||||
summary = client.upload(
|
||||
"large-file.tar.gz",
|
||||
"s3://backups/",
|
||||
max_ratio=0.3 # Only use delta if compression > 70%
|
||||
)
|
||||
|
||||
# Check results
|
||||
if summary.is_delta:
|
||||
print(f"Stored as delta: {summary.stored_size_mb:.1f} MB")
|
||||
else:
|
||||
print(f"Stored as full file: {summary.original_size_mb:.1f} MB")
|
||||
```
|
||||
|
||||
#### `download`
|
||||
|
||||
Download and reconstruct a file from S3.
|
||||
|
||||
```python
|
||||
def download(
|
||||
self,
|
||||
s3_url: str,
|
||||
output_path: str | Path
|
||||
) -> None
|
||||
```
|
||||
|
||||
##### Parameters
|
||||
|
||||
- **s3_url** (`str`): S3 source URL in format `s3://bucket/key`.
|
||||
- **output_path** (`str | Path`): Local destination path.
|
||||
|
||||
##### Returns
|
||||
|
||||
None. File is written to `output_path`.
|
||||
|
||||
##### Raises
|
||||
|
||||
- `ValueError`: If S3 URL is invalid or missing key.
|
||||
- `FileNotFoundError`: If S3 object doesn't exist.
|
||||
- `PermissionError`: If local path is not writable or S3 access denied.
|
||||
|
||||
##### Examples
|
||||
|
||||
```python
|
||||
# Download a file
|
||||
client.download("s3://releases/v1.0.0/app.zip", "downloaded.zip")
|
||||
|
||||
# Auto-detects .delta suffix if needed
|
||||
client.download("s3://releases/v1.0.0/app.zip", "app.zip")
|
||||
# Will try app.zip first, then app.zip.delta if not found
|
||||
|
||||
# Download to specific directory
|
||||
from pathlib import Path
|
||||
output = Path("/tmp/downloads/app.zip")
|
||||
output.parent.mkdir(parents=True, exist_ok=True)
|
||||
client.download("s3://releases/v1.0.0/app.zip", output)
|
||||
```
|
||||
|
||||
#### `verify`
|
||||
|
||||
Verify the integrity of a stored file using SHA256 checksums.
|
||||
|
||||
```python
|
||||
def verify(
|
||||
self,
|
||||
s3_url: str
|
||||
) -> bool
|
||||
```
|
||||
|
||||
##### Parameters
|
||||
|
||||
- **s3_url** (`str`): S3 URL of the file to verify.
|
||||
|
||||
##### Returns
|
||||
|
||||
`bool`: True if verification passed, False if corrupted.
|
||||
|
||||
##### Raises
|
||||
|
||||
- `ValueError`: If S3 URL is invalid.
|
||||
- `FileNotFoundError`: If S3 object doesn't exist.
|
||||
|
||||
##### Examples
|
||||
|
||||
```python
|
||||
# Verify file integrity
|
||||
is_valid = client.verify("s3://releases/v1.0.0/app.zip")
|
||||
|
||||
if is_valid:
|
||||
print("✓ File integrity verified")
|
||||
else:
|
||||
print("✗ File is corrupted!")
|
||||
# Re-upload or investigate
|
||||
```
|
||||
|
||||
#### `lifecycle_policy`
|
||||
|
||||
Set lifecycle policy for S3 prefix (placeholder for future implementation).
|
||||
|
||||
```python
|
||||
def lifecycle_policy(
|
||||
self,
|
||||
s3_prefix: str,
|
||||
days_before_archive: int = 30,
|
||||
days_before_delete: int = 90
|
||||
) -> None
|
||||
```
|
||||
|
||||
**Note**: This method is a placeholder for future S3 lifecycle policy management.
|
||||
|
||||
## UploadSummary
|
||||
|
||||
Data class containing upload operation results.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class UploadSummary:
|
||||
operation: str # Operation type: "PUT" or "PUT_DELTA"
|
||||
bucket: str # S3 bucket name
|
||||
key: str # S3 object key
|
||||
original_size: int # Original file size in bytes
|
||||
stored_size: int # Actual stored size in bytes
|
||||
is_delta: bool # Whether delta compression was used
|
||||
delta_ratio: float = 0.0 # Ratio of delta size to original
|
||||
```
|
||||
|
||||
### Properties
|
||||
|
||||
#### `original_size_mb`
|
||||
|
||||
Original file size in megabytes.
|
||||
|
||||
```python
|
||||
@property
|
||||
def original_size_mb(self) -> float
|
||||
```
|
||||
|
||||
#### `stored_size_mb`
|
||||
|
||||
Stored size in megabytes (after compression if applicable).
|
||||
|
||||
```python
|
||||
@property
|
||||
def stored_size_mb(self) -> float
|
||||
```
|
||||
|
||||
#### `savings_percent`
|
||||
|
||||
Percentage saved through compression.
|
||||
|
||||
```python
|
||||
@property
|
||||
def savings_percent(self) -> float
|
||||
```
|
||||
|
||||
### Example Usage
|
||||
|
||||
```python
|
||||
summary = client.upload("app.zip", "s3://releases/")
|
||||
|
||||
print(f"Operation: {summary.operation}")
|
||||
print(f"Location: s3://{summary.bucket}/{summary.key}")
|
||||
print(f"Original: {summary.original_size_mb:.1f} MB")
|
||||
print(f"Stored: {summary.stored_size_mb:.1f} MB")
|
||||
print(f"Saved: {summary.savings_percent:.0f}%")
|
||||
print(f"Delta used: {summary.is_delta}")
|
||||
|
||||
if summary.is_delta:
|
||||
print(f"Delta ratio: {summary.delta_ratio:.2%}")
|
||||
```
|
||||
|
||||
## DeltaService
|
||||
|
||||
Core service class handling delta compression logic.
|
||||
|
||||
```python
|
||||
class DeltaService:
|
||||
def __init__(
|
||||
self,
|
||||
storage: StoragePort,
|
||||
diff: DiffPort,
|
||||
hasher: HashPort,
|
||||
cache: CachePort,
|
||||
clock: ClockPort,
|
||||
logger: LoggerPort,
|
||||
metrics: MetricsPort,
|
||||
tool_version: str = "deltaglider/0.1.0",
|
||||
max_ratio: float = 0.5
|
||||
)
|
||||
```
|
||||
|
||||
### Methods
|
||||
|
||||
#### `put`
|
||||
|
||||
Upload a file with automatic delta compression.
|
||||
|
||||
```python
|
||||
def put(
|
||||
self,
|
||||
file: Path,
|
||||
delta_space: DeltaSpace,
|
||||
max_ratio: Optional[float] = None
|
||||
) -> PutSummary
|
||||
```
|
||||
|
||||
#### `get`
|
||||
|
||||
Download and reconstruct a file.
|
||||
|
||||
```python
|
||||
def get(
|
||||
self,
|
||||
object_key: ObjectKey,
|
||||
output_path: Path
|
||||
) -> GetSummary
|
||||
```
|
||||
|
||||
#### `verify`
|
||||
|
||||
Verify file integrity.
|
||||
|
||||
```python
|
||||
def verify(
|
||||
self,
|
||||
object_key: ObjectKey
|
||||
) -> VerifyResult
|
||||
```
|
||||
|
||||
## Models
|
||||
|
||||
### DeltaSpace
|
||||
|
||||
Represents a compression space in S3.
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class DeltaSpace:
|
||||
bucket: str # S3 bucket name
|
||||
prefix: str # S3 prefix for related files
|
||||
```
|
||||
|
||||
### ObjectKey
|
||||
|
||||
Represents an S3 object location.
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class ObjectKey:
|
||||
bucket: str # S3 bucket name
|
||||
key: str # S3 object key
|
||||
```
|
||||
|
||||
### PutSummary
|
||||
|
||||
Detailed upload operation results.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class PutSummary:
|
||||
operation: str # "PUT" or "PUT_DELTA"
|
||||
bucket: str # S3 bucket
|
||||
key: str # S3 key
|
||||
file_size: int # Original file size
|
||||
file_hash: str # SHA256 of original file
|
||||
delta_size: Optional[int] # Size of delta (if used)
|
||||
delta_hash: Optional[str] # SHA256 of delta
|
||||
delta_ratio: Optional[float] # Delta/original ratio
|
||||
reference_hash: Optional[str] # Reference file hash
|
||||
```
|
||||
|
||||
### GetSummary
|
||||
|
||||
Download operation results.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class GetSummary:
|
||||
operation: str # "GET" or "GET_DELTA"
|
||||
bucket: str # S3 bucket
|
||||
key: str # S3 key
|
||||
size: int # Downloaded size
|
||||
hash: str # SHA256 hash
|
||||
reconstructed: bool # Whether reconstruction was needed
|
||||
```
|
||||
|
||||
### VerifyResult
|
||||
|
||||
Verification operation results.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class VerifyResult:
|
||||
valid: bool # Verification result
|
||||
operation: str # "VERIFY" or "VERIFY_DELTA"
|
||||
expected_hash: str # Expected SHA256
|
||||
actual_hash: Optional[str] # Actual SHA256 (if computed)
|
||||
details: Optional[str] # Error details if invalid
|
||||
```
|
||||
|
||||
## Exceptions
|
||||
|
||||
DeltaGlider uses standard Python exceptions with descriptive messages:
|
||||
|
||||
### Common Exceptions
|
||||
|
||||
- **FileNotFoundError**: Local file or S3 object not found
|
||||
- **PermissionError**: Access denied (S3 or local filesystem)
|
||||
- **ValueError**: Invalid parameters (malformed URLs, invalid ratios)
|
||||
- **IOError**: I/O operations failed
|
||||
- **RuntimeError**: xdelta3 binary not found or failed
|
||||
|
||||
### Exception Handling Example
|
||||
|
||||
```python
|
||||
from deltaglider import create_client
|
||||
|
||||
client = create_client()
|
||||
|
||||
try:
|
||||
summary = client.upload("app.zip", "s3://bucket/path/")
|
||||
|
||||
except FileNotFoundError as e:
|
||||
print(f"File not found: {e}")
|
||||
|
||||
except PermissionError as e:
|
||||
print(f"Permission denied: {e}")
|
||||
print("Check AWS credentials and S3 bucket permissions")
|
||||
|
||||
except ValueError as e:
|
||||
print(f"Invalid parameters: {e}")
|
||||
|
||||
except RuntimeError as e:
|
||||
print(f"System error: {e}")
|
||||
print("Ensure xdelta3 is installed: apt-get install xdelta3")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Unexpected error: {e}")
|
||||
# Log for investigation
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
DeltaGlider respects these environment variables:
|
||||
|
||||
### AWS Configuration
|
||||
|
||||
- **AWS_ACCESS_KEY_ID**: AWS access key
|
||||
- **AWS_SECRET_ACCESS_KEY**: AWS secret key
|
||||
- **AWS_DEFAULT_REGION**: AWS region (default: us-east-1)
|
||||
- **AWS_ENDPOINT_URL**: Custom S3 endpoint (for MinIO/R2)
|
||||
- **AWS_PROFILE**: AWS profile to use
|
||||
|
||||
### DeltaGlider Configuration
|
||||
|
||||
- **DG_LOG_LEVEL**: Logging level (DEBUG, INFO, WARNING, ERROR)
|
||||
- **DG_CACHE_DIR**: Local cache directory
|
||||
- **DG_MAX_RATIO**: Default maximum delta ratio
|
||||
|
||||
### Example
|
||||
|
||||
```bash
|
||||
# Configure for MinIO
|
||||
export AWS_ENDPOINT_URL=http://localhost:9000
|
||||
export AWS_ACCESS_KEY_ID=minioadmin
|
||||
export AWS_SECRET_ACCESS_KEY=minioadmin
|
||||
|
||||
# Configure DeltaGlider
|
||||
export DG_LOG_LEVEL=DEBUG
|
||||
export DG_CACHE_DIR=/var/cache/deltaglider
|
||||
export DG_MAX_RATIO=0.3
|
||||
|
||||
# Now use normally
|
||||
python my_script.py
|
||||
```
|
||||
|
||||
## Thread Safety
|
||||
|
||||
DeltaGlider clients are thread-safe for read operations but should not be shared across threads for write operations. For multi-threaded applications:
|
||||
|
||||
```python
|
||||
import threading
|
||||
from deltaglider import create_client
|
||||
|
||||
# Create separate client per thread
|
||||
def worker(file_path, s3_url):
|
||||
client = create_client() # Each thread gets its own client
|
||||
summary = client.upload(file_path, s3_url)
|
||||
print(f"Thread {threading.current_thread().name}: {summary.savings_percent:.0f}%")
|
||||
|
||||
# Create threads
|
||||
threads = []
|
||||
for i, (file, url) in enumerate(files_to_upload):
|
||||
t = threading.Thread(target=worker, args=(file, url), name=f"Worker-{i}")
|
||||
threads.append(t)
|
||||
t.start()
|
||||
|
||||
# Wait for completion
|
||||
for t in threads:
|
||||
t.join()
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Upload Performance
|
||||
|
||||
- **First file**: No compression overhead (becomes reference)
|
||||
- **Similar files**: 3-4 files/second with compression
|
||||
- **Network bound**: Limited by S3 upload speed
|
||||
- **CPU bound**: xdelta3 compression for large files
|
||||
|
||||
### Download Performance
|
||||
|
||||
- **Direct files**: Limited by S3 download speed
|
||||
- **Delta files**: <100ms reconstruction overhead
|
||||
- **Cache hits**: Near-instant for cached references
|
||||
|
||||
### Optimization Tips
|
||||
|
||||
1. **Group related files**: Upload similar files to same prefix
|
||||
2. **Batch operations**: Use concurrent uploads for independent files
|
||||
3. **Cache management**: Don't clear cache during operations
|
||||
4. **Compression threshold**: Tune `max_ratio` for your use case
|
||||
5. **Network optimization**: Use S3 Transfer Acceleration if available
|
||||
|
||||
## Logging
|
||||
|
||||
DeltaGlider uses Python's standard logging framework:
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
# Configure logging before creating client
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG,
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
handlers=[
|
||||
logging.FileHandler('deltaglider.log'),
|
||||
logging.StreamHandler()
|
||||
]
|
||||
)
|
||||
|
||||
# Create client (will use configured logging)
|
||||
client = create_client(log_level="DEBUG")
|
||||
```
|
||||
|
||||
### Log Levels
|
||||
|
||||
- **DEBUG**: Detailed operations, xdelta3 commands
|
||||
- **INFO**: Normal operations, compression statistics
|
||||
- **WARNING**: Non-critical issues, fallbacks
|
||||
- **ERROR**: Operation failures, exceptions
|
||||
|
||||
## Version Compatibility
|
||||
|
||||
- **Python**: 3.11 or higher required
|
||||
- **boto3**: 1.35.0 or higher
|
||||
- **xdelta3**: System binary required
|
||||
- **S3 API**: Compatible with S3 API v4
|
||||
|
||||
## Support
|
||||
|
||||
- **GitHub Issues**: [github.com/beshu-tech/deltaglider/issues](https://github.com/beshu-tech/deltaglider/issues)
|
||||
- **Documentation**: [github.com/beshu-tech/deltaglider](https://github.com/beshu-tech/deltaglider)
|
||||
- **PyPI Package**: [pypi.org/project/deltaglider](https://pypi.org/project/deltaglider)
|
||||
Reference in New Issue
Block a user