Add simplified SDK client API and comprehensive documentation

- Create DeltaGliderClient with user-friendly interface
- Add create_client() factory function with sensible defaults
- Implement UploadSummary dataclass with helpful properties
- Expose simplified API through main package
- Add comprehensive SDK documentation under docs/sdk/:
  - Getting started guide with installation and examples
  - Complete API reference documentation
  - Real-world usage examples for 8 common scenarios
  - Architecture deep dive explaining how DeltaGlider works
  - Automatic documentation generation scripts
- Update CONTRIBUTING.md with SDK documentation guidelines
- All tests pass and code quality checks succeed

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Simone Scarduzio
2025-09-23 13:44:38 +02:00
parent f08960b6c5
commit edcbd2c7d0
10 changed files with 3136 additions and 1 deletions

583
docs/sdk/api.md Normal file
View File

@@ -0,0 +1,583 @@
# DeltaGlider API Reference
Complete API documentation for the DeltaGlider Python SDK.
## Table of Contents
- [Client Creation](#client-creation)
- [DeltaGliderClient](#deltaglidererclient)
- [UploadSummary](#uploadsummary)
- [DeltaService](#deltaservice)
- [Models](#models)
- [Exceptions](#exceptions)
## Client Creation
### `create_client`
Factory function to create a configured DeltaGlider client with sensible defaults.
```python
def create_client(
endpoint_url: Optional[str] = None,
log_level: str = "INFO",
cache_dir: str = "/tmp/.deltaglider/cache",
**kwargs
) -> DeltaGliderClient
```
#### Parameters
- **endpoint_url** (`Optional[str]`): S3 endpoint URL for MinIO, R2, or other S3-compatible storage. If None, uses AWS S3.
- **log_level** (`str`): Logging verbosity level. Options: "DEBUG", "INFO", "WARNING", "ERROR". Default: "INFO".
- **cache_dir** (`str`): Directory for local reference cache. Default: "/tmp/.deltaglider/cache".
- **kwargs**: Additional arguments passed to `DeltaService`:
- **tool_version** (`str`): Version string for metadata. Default: "deltaglider/0.1.0"
- **max_ratio** (`float`): Maximum acceptable delta/file ratio. Default: 0.5
#### Returns
`DeltaGliderClient`: Configured client instance ready for use.
#### Examples
```python
# Default AWS S3 configuration
client = create_client()
# Custom endpoint for MinIO
client = create_client(endpoint_url="http://localhost:9000")
# Debug mode with custom cache
client = create_client(
log_level="DEBUG",
cache_dir="/var/cache/deltaglider"
)
# Custom delta ratio threshold
client = create_client(max_ratio=0.3) # Only use delta if <30% of original
```
## DeltaGliderClient
Main client class for interacting with DeltaGlider.
### Constructor
```python
class DeltaGliderClient:
def __init__(
self,
service: DeltaService,
endpoint_url: Optional[str] = None
)
```
**Note**: Use `create_client()` instead of instantiating directly.
### Methods
#### `upload`
Upload a file to S3 with automatic delta compression.
```python
def upload(
self,
file_path: str | Path,
s3_url: str,
tags: Optional[Dict[str, str]] = None,
max_ratio: float = 0.5
) -> UploadSummary
```
##### Parameters
- **file_path** (`str | Path`): Local file path to upload.
- **s3_url** (`str`): S3 destination URL in format `s3://bucket/prefix/`.
- **tags** (`Optional[Dict[str, str]]`): S3 object tags to attach. (Future feature)
- **max_ratio** (`float`): Maximum acceptable delta/file size ratio. Default: 0.5.
##### Returns
`UploadSummary`: Object containing upload statistics and compression details.
##### Raises
- `FileNotFoundError`: If local file doesn't exist.
- `ValueError`: If S3 URL is invalid.
- `PermissionError`: If S3 access is denied.
##### Examples
```python
# Simple upload
summary = client.upload("app.zip", "s3://releases/v1.0.0/")
# With custom compression threshold
summary = client.upload(
"large-file.tar.gz",
"s3://backups/",
max_ratio=0.3 # Only use delta if compression > 70%
)
# Check results
if summary.is_delta:
print(f"Stored as delta: {summary.stored_size_mb:.1f} MB")
else:
print(f"Stored as full file: {summary.original_size_mb:.1f} MB")
```
#### `download`
Download and reconstruct a file from S3.
```python
def download(
self,
s3_url: str,
output_path: str | Path
) -> None
```
##### Parameters
- **s3_url** (`str`): S3 source URL in format `s3://bucket/key`.
- **output_path** (`str | Path`): Local destination path.
##### Returns
None. File is written to `output_path`.
##### Raises
- `ValueError`: If S3 URL is invalid or missing key.
- `FileNotFoundError`: If S3 object doesn't exist.
- `PermissionError`: If local path is not writable or S3 access denied.
##### Examples
```python
# Download a file
client.download("s3://releases/v1.0.0/app.zip", "downloaded.zip")
# Auto-detects .delta suffix if needed
client.download("s3://releases/v1.0.0/app.zip", "app.zip")
# Will try app.zip first, then app.zip.delta if not found
# Download to specific directory
from pathlib import Path
output = Path("/tmp/downloads/app.zip")
output.parent.mkdir(parents=True, exist_ok=True)
client.download("s3://releases/v1.0.0/app.zip", output)
```
#### `verify`
Verify the integrity of a stored file using SHA256 checksums.
```python
def verify(
self,
s3_url: str
) -> bool
```
##### Parameters
- **s3_url** (`str`): S3 URL of the file to verify.
##### Returns
`bool`: True if verification passed, False if corrupted.
##### Raises
- `ValueError`: If S3 URL is invalid.
- `FileNotFoundError`: If S3 object doesn't exist.
##### Examples
```python
# Verify file integrity
is_valid = client.verify("s3://releases/v1.0.0/app.zip")
if is_valid:
print("✓ File integrity verified")
else:
print("✗ File is corrupted!")
# Re-upload or investigate
```
#### `lifecycle_policy`
Set lifecycle policy for S3 prefix (placeholder for future implementation).
```python
def lifecycle_policy(
self,
s3_prefix: str,
days_before_archive: int = 30,
days_before_delete: int = 90
) -> None
```
**Note**: This method is a placeholder for future S3 lifecycle policy management.
## UploadSummary
Data class containing upload operation results.
```python
@dataclass
class UploadSummary:
operation: str # Operation type: "PUT" or "PUT_DELTA"
bucket: str # S3 bucket name
key: str # S3 object key
original_size: int # Original file size in bytes
stored_size: int # Actual stored size in bytes
is_delta: bool # Whether delta compression was used
delta_ratio: float = 0.0 # Ratio of delta size to original
```
### Properties
#### `original_size_mb`
Original file size in megabytes.
```python
@property
def original_size_mb(self) -> float
```
#### `stored_size_mb`
Stored size in megabytes (after compression if applicable).
```python
@property
def stored_size_mb(self) -> float
```
#### `savings_percent`
Percentage saved through compression.
```python
@property
def savings_percent(self) -> float
```
### Example Usage
```python
summary = client.upload("app.zip", "s3://releases/")
print(f"Operation: {summary.operation}")
print(f"Location: s3://{summary.bucket}/{summary.key}")
print(f"Original: {summary.original_size_mb:.1f} MB")
print(f"Stored: {summary.stored_size_mb:.1f} MB")
print(f"Saved: {summary.savings_percent:.0f}%")
print(f"Delta used: {summary.is_delta}")
if summary.is_delta:
print(f"Delta ratio: {summary.delta_ratio:.2%}")
```
## DeltaService
Core service class handling delta compression logic.
```python
class DeltaService:
def __init__(
self,
storage: StoragePort,
diff: DiffPort,
hasher: HashPort,
cache: CachePort,
clock: ClockPort,
logger: LoggerPort,
metrics: MetricsPort,
tool_version: str = "deltaglider/0.1.0",
max_ratio: float = 0.5
)
```
### Methods
#### `put`
Upload a file with automatic delta compression.
```python
def put(
self,
file: Path,
delta_space: DeltaSpace,
max_ratio: Optional[float] = None
) -> PutSummary
```
#### `get`
Download and reconstruct a file.
```python
def get(
self,
object_key: ObjectKey,
output_path: Path
) -> GetSummary
```
#### `verify`
Verify file integrity.
```python
def verify(
self,
object_key: ObjectKey
) -> VerifyResult
```
## Models
### DeltaSpace
Represents a compression space in S3.
```python
@dataclass(frozen=True)
class DeltaSpace:
bucket: str # S3 bucket name
prefix: str # S3 prefix for related files
```
### ObjectKey
Represents an S3 object location.
```python
@dataclass(frozen=True)
class ObjectKey:
bucket: str # S3 bucket name
key: str # S3 object key
```
### PutSummary
Detailed upload operation results.
```python
@dataclass
class PutSummary:
operation: str # "PUT" or "PUT_DELTA"
bucket: str # S3 bucket
key: str # S3 key
file_size: int # Original file size
file_hash: str # SHA256 of original file
delta_size: Optional[int] # Size of delta (if used)
delta_hash: Optional[str] # SHA256 of delta
delta_ratio: Optional[float] # Delta/original ratio
reference_hash: Optional[str] # Reference file hash
```
### GetSummary
Download operation results.
```python
@dataclass
class GetSummary:
operation: str # "GET" or "GET_DELTA"
bucket: str # S3 bucket
key: str # S3 key
size: int # Downloaded size
hash: str # SHA256 hash
reconstructed: bool # Whether reconstruction was needed
```
### VerifyResult
Verification operation results.
```python
@dataclass
class VerifyResult:
valid: bool # Verification result
operation: str # "VERIFY" or "VERIFY_DELTA"
expected_hash: str # Expected SHA256
actual_hash: Optional[str] # Actual SHA256 (if computed)
details: Optional[str] # Error details if invalid
```
## Exceptions
DeltaGlider uses standard Python exceptions with descriptive messages:
### Common Exceptions
- **FileNotFoundError**: Local file or S3 object not found
- **PermissionError**: Access denied (S3 or local filesystem)
- **ValueError**: Invalid parameters (malformed URLs, invalid ratios)
- **IOError**: I/O operations failed
- **RuntimeError**: xdelta3 binary not found or failed
### Exception Handling Example
```python
from deltaglider import create_client
client = create_client()
try:
summary = client.upload("app.zip", "s3://bucket/path/")
except FileNotFoundError as e:
print(f"File not found: {e}")
except PermissionError as e:
print(f"Permission denied: {e}")
print("Check AWS credentials and S3 bucket permissions")
except ValueError as e:
print(f"Invalid parameters: {e}")
except RuntimeError as e:
print(f"System error: {e}")
print("Ensure xdelta3 is installed: apt-get install xdelta3")
except Exception as e:
print(f"Unexpected error: {e}")
# Log for investigation
import traceback
traceback.print_exc()
```
## Environment Variables
DeltaGlider respects these environment variables:
### AWS Configuration
- **AWS_ACCESS_KEY_ID**: AWS access key
- **AWS_SECRET_ACCESS_KEY**: AWS secret key
- **AWS_DEFAULT_REGION**: AWS region (default: us-east-1)
- **AWS_ENDPOINT_URL**: Custom S3 endpoint (for MinIO/R2)
- **AWS_PROFILE**: AWS profile to use
### DeltaGlider Configuration
- **DG_LOG_LEVEL**: Logging level (DEBUG, INFO, WARNING, ERROR)
- **DG_CACHE_DIR**: Local cache directory
- **DG_MAX_RATIO**: Default maximum delta ratio
### Example
```bash
# Configure for MinIO
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
# Configure DeltaGlider
export DG_LOG_LEVEL=DEBUG
export DG_CACHE_DIR=/var/cache/deltaglider
export DG_MAX_RATIO=0.3
# Now use normally
python my_script.py
```
## Thread Safety
DeltaGlider clients are thread-safe for read operations but should not be shared across threads for write operations. For multi-threaded applications:
```python
import threading
from deltaglider import create_client
# Create separate client per thread
def worker(file_path, s3_url):
client = create_client() # Each thread gets its own client
summary = client.upload(file_path, s3_url)
print(f"Thread {threading.current_thread().name}: {summary.savings_percent:.0f}%")
# Create threads
threads = []
for i, (file, url) in enumerate(files_to_upload):
t = threading.Thread(target=worker, args=(file, url), name=f"Worker-{i}")
threads.append(t)
t.start()
# Wait for completion
for t in threads:
t.join()
```
## Performance Considerations
### Upload Performance
- **First file**: No compression overhead (becomes reference)
- **Similar files**: 3-4 files/second with compression
- **Network bound**: Limited by S3 upload speed
- **CPU bound**: xdelta3 compression for large files
### Download Performance
- **Direct files**: Limited by S3 download speed
- **Delta files**: <100ms reconstruction overhead
- **Cache hits**: Near-instant for cached references
### Optimization Tips
1. **Group related files**: Upload similar files to same prefix
2. **Batch operations**: Use concurrent uploads for independent files
3. **Cache management**: Don't clear cache during operations
4. **Compression threshold**: Tune `max_ratio` for your use case
5. **Network optimization**: Use S3 Transfer Acceleration if available
## Logging
DeltaGlider uses Python's standard logging framework:
```python
import logging
# Configure logging before creating client
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('deltaglider.log'),
logging.StreamHandler()
]
)
# Create client (will use configured logging)
client = create_client(log_level="DEBUG")
```
### Log Levels
- **DEBUG**: Detailed operations, xdelta3 commands
- **INFO**: Normal operations, compression statistics
- **WARNING**: Non-critical issues, fallbacks
- **ERROR**: Operation failures, exceptions
## Version Compatibility
- **Python**: 3.11 or higher required
- **boto3**: 1.35.0 or higher
- **xdelta3**: System binary required
- **S3 API**: Compatible with S3 API v4
## Support
- **GitHub Issues**: [github.com/beshu-tech/deltaglider/issues](https://github.com/beshu-tech/deltaglider/issues)
- **Documentation**: [github.com/beshu-tech/deltaglider](https://github.com/beshu-tech/deltaglider)
- **PyPI Package**: [pypi.org/project/deltaglider](https://pypi.org/project/deltaglider)