Files
deltaglider-beshu-tech/docs/sdk/api.md
Simone Scarduzio edcbd2c7d0 Add simplified SDK client API and comprehensive documentation
- Create DeltaGliderClient with user-friendly interface
- Add create_client() factory function with sensible defaults
- Implement UploadSummary dataclass with helpful properties
- Expose simplified API through main package
- Add comprehensive SDK documentation under docs/sdk/:
  - Getting started guide with installation and examples
  - Complete API reference documentation
  - Real-world usage examples for 8 common scenarios
  - Architecture deep dive explaining how DeltaGlider works
  - Automatic documentation generation scripts
- Update CONTRIBUTING.md with SDK documentation guidelines
- All tests pass and code quality checks succeed

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-23 13:44:38 +02:00

14 KiB

DeltaGlider API Reference

Complete API documentation for the DeltaGlider Python SDK.

Table of Contents

Client Creation

create_client

Factory function to create a configured DeltaGlider client with sensible defaults.

def create_client(
    endpoint_url: Optional[str] = None,
    log_level: str = "INFO",
    cache_dir: str = "/tmp/.deltaglider/cache",
    **kwargs
) -> DeltaGliderClient

Parameters

  • endpoint_url (Optional[str]): S3 endpoint URL for MinIO, R2, or other S3-compatible storage. If None, uses AWS S3.
  • log_level (str): Logging verbosity level. Options: "DEBUG", "INFO", "WARNING", "ERROR". Default: "INFO".
  • cache_dir (str): Directory for local reference cache. Default: "/tmp/.deltaglider/cache".
  • kwargs: Additional arguments passed to DeltaService:
    • tool_version (str): Version string for metadata. Default: "deltaglider/0.1.0"
    • max_ratio (float): Maximum acceptable delta/file ratio. Default: 0.5

Returns

DeltaGliderClient: Configured client instance ready for use.

Examples

# Default AWS S3 configuration
client = create_client()

# Custom endpoint for MinIO
client = create_client(endpoint_url="http://localhost:9000")

# Debug mode with custom cache
client = create_client(
    log_level="DEBUG",
    cache_dir="/var/cache/deltaglider"
)

# Custom delta ratio threshold
client = create_client(max_ratio=0.3)  # Only use delta if <30% of original

DeltaGliderClient

Main client class for interacting with DeltaGlider.

Constructor

class DeltaGliderClient:
    def __init__(
        self,
        service: DeltaService,
        endpoint_url: Optional[str] = None
    )

Note: Use create_client() instead of instantiating directly.

Methods

upload

Upload a file to S3 with automatic delta compression.

def upload(
    self,
    file_path: str | Path,
    s3_url: str,
    tags: Optional[Dict[str, str]] = None,
    max_ratio: float = 0.5
) -> UploadSummary
Parameters
  • file_path (str | Path): Local file path to upload.
  • s3_url (str): S3 destination URL in format s3://bucket/prefix/.
  • tags (Optional[Dict[str, str]]): S3 object tags to attach. (Future feature)
  • max_ratio (float): Maximum acceptable delta/file size ratio. Default: 0.5.
Returns

UploadSummary: Object containing upload statistics and compression details.

Raises
  • FileNotFoundError: If local file doesn't exist.
  • ValueError: If S3 URL is invalid.
  • PermissionError: If S3 access is denied.
Examples
# Simple upload
summary = client.upload("app.zip", "s3://releases/v1.0.0/")

# With custom compression threshold
summary = client.upload(
    "large-file.tar.gz",
    "s3://backups/",
    max_ratio=0.3  # Only use delta if compression > 70%
)

# Check results
if summary.is_delta:
    print(f"Stored as delta: {summary.stored_size_mb:.1f} MB")
else:
    print(f"Stored as full file: {summary.original_size_mb:.1f} MB")

download

Download and reconstruct a file from S3.

def download(
    self,
    s3_url: str,
    output_path: str | Path
) -> None
Parameters
  • s3_url (str): S3 source URL in format s3://bucket/key.
  • output_path (str | Path): Local destination path.
Returns

None. File is written to output_path.

Raises
  • ValueError: If S3 URL is invalid or missing key.
  • FileNotFoundError: If S3 object doesn't exist.
  • PermissionError: If local path is not writable or S3 access denied.
Examples
# Download a file
client.download("s3://releases/v1.0.0/app.zip", "downloaded.zip")

# Auto-detects .delta suffix if needed
client.download("s3://releases/v1.0.0/app.zip", "app.zip")
# Will try app.zip first, then app.zip.delta if not found

# Download to specific directory
from pathlib import Path
output = Path("/tmp/downloads/app.zip")
output.parent.mkdir(parents=True, exist_ok=True)
client.download("s3://releases/v1.0.0/app.zip", output)

verify

Verify the integrity of a stored file using SHA256 checksums.

def verify(
    self,
    s3_url: str
) -> bool
Parameters
  • s3_url (str): S3 URL of the file to verify.
Returns

bool: True if verification passed, False if corrupted.

Raises
  • ValueError: If S3 URL is invalid.
  • FileNotFoundError: If S3 object doesn't exist.
Examples
# Verify file integrity
is_valid = client.verify("s3://releases/v1.0.0/app.zip")

if is_valid:
    print("✓ File integrity verified")
else:
    print("✗ File is corrupted!")
    # Re-upload or investigate

lifecycle_policy

Set lifecycle policy for S3 prefix (placeholder for future implementation).

def lifecycle_policy(
    self,
    s3_prefix: str,
    days_before_archive: int = 30,
    days_before_delete: int = 90
) -> None

Note: This method is a placeholder for future S3 lifecycle policy management.

UploadSummary

Data class containing upload operation results.

@dataclass
class UploadSummary:
    operation: str           # Operation type: "PUT" or "PUT_DELTA"
    bucket: str              # S3 bucket name
    key: str                 # S3 object key
    original_size: int       # Original file size in bytes
    stored_size: int         # Actual stored size in bytes
    is_delta: bool           # Whether delta compression was used
    delta_ratio: float = 0.0 # Ratio of delta size to original

Properties

original_size_mb

Original file size in megabytes.

@property
def original_size_mb(self) -> float

stored_size_mb

Stored size in megabytes (after compression if applicable).

@property
def stored_size_mb(self) -> float

savings_percent

Percentage saved through compression.

@property
def savings_percent(self) -> float

Example Usage

summary = client.upload("app.zip", "s3://releases/")

print(f"Operation: {summary.operation}")
print(f"Location: s3://{summary.bucket}/{summary.key}")
print(f"Original: {summary.original_size_mb:.1f} MB")
print(f"Stored: {summary.stored_size_mb:.1f} MB")
print(f"Saved: {summary.savings_percent:.0f}%")
print(f"Delta used: {summary.is_delta}")

if summary.is_delta:
    print(f"Delta ratio: {summary.delta_ratio:.2%}")

DeltaService

Core service class handling delta compression logic.

class DeltaService:
    def __init__(
        self,
        storage: StoragePort,
        diff: DiffPort,
        hasher: HashPort,
        cache: CachePort,
        clock: ClockPort,
        logger: LoggerPort,
        metrics: MetricsPort,
        tool_version: str = "deltaglider/0.1.0",
        max_ratio: float = 0.5
    )

Methods

put

Upload a file with automatic delta compression.

def put(
    self,
    file: Path,
    delta_space: DeltaSpace,
    max_ratio: Optional[float] = None
) -> PutSummary

get

Download and reconstruct a file.

def get(
    self,
    object_key: ObjectKey,
    output_path: Path
) -> GetSummary

verify

Verify file integrity.

def verify(
    self,
    object_key: ObjectKey
) -> VerifyResult

Models

DeltaSpace

Represents a compression space in S3.

@dataclass(frozen=True)
class DeltaSpace:
    bucket: str  # S3 bucket name
    prefix: str  # S3 prefix for related files

ObjectKey

Represents an S3 object location.

@dataclass(frozen=True)
class ObjectKey:
    bucket: str  # S3 bucket name
    key: str     # S3 object key

PutSummary

Detailed upload operation results.

@dataclass
class PutSummary:
    operation: str              # "PUT" or "PUT_DELTA"
    bucket: str                 # S3 bucket
    key: str                    # S3 key
    file_size: int              # Original file size
    file_hash: str              # SHA256 of original file
    delta_size: Optional[int]   # Size of delta (if used)
    delta_hash: Optional[str]   # SHA256 of delta
    delta_ratio: Optional[float] # Delta/original ratio
    reference_hash: Optional[str] # Reference file hash

GetSummary

Download operation results.

@dataclass
class GetSummary:
    operation: str    # "GET" or "GET_DELTA"
    bucket: str       # S3 bucket
    key: str          # S3 key
    size: int         # Downloaded size
    hash: str         # SHA256 hash
    reconstructed: bool # Whether reconstruction was needed

VerifyResult

Verification operation results.

@dataclass
class VerifyResult:
    valid: bool           # Verification result
    operation: str        # "VERIFY" or "VERIFY_DELTA"
    expected_hash: str    # Expected SHA256
    actual_hash: Optional[str] # Actual SHA256 (if computed)
    details: Optional[str] # Error details if invalid

Exceptions

DeltaGlider uses standard Python exceptions with descriptive messages:

Common Exceptions

  • FileNotFoundError: Local file or S3 object not found
  • PermissionError: Access denied (S3 or local filesystem)
  • ValueError: Invalid parameters (malformed URLs, invalid ratios)
  • IOError: I/O operations failed
  • RuntimeError: xdelta3 binary not found or failed

Exception Handling Example

from deltaglider import create_client

client = create_client()

try:
    summary = client.upload("app.zip", "s3://bucket/path/")

except FileNotFoundError as e:
    print(f"File not found: {e}")

except PermissionError as e:
    print(f"Permission denied: {e}")
    print("Check AWS credentials and S3 bucket permissions")

except ValueError as e:
    print(f"Invalid parameters: {e}")

except RuntimeError as e:
    print(f"System error: {e}")
    print("Ensure xdelta3 is installed: apt-get install xdelta3")

except Exception as e:
    print(f"Unexpected error: {e}")
    # Log for investigation
    import traceback
    traceback.print_exc()

Environment Variables

DeltaGlider respects these environment variables:

AWS Configuration

  • AWS_ACCESS_KEY_ID: AWS access key
  • AWS_SECRET_ACCESS_KEY: AWS secret key
  • AWS_DEFAULT_REGION: AWS region (default: us-east-1)
  • AWS_ENDPOINT_URL: Custom S3 endpoint (for MinIO/R2)
  • AWS_PROFILE: AWS profile to use

DeltaGlider Configuration

  • DG_LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR)
  • DG_CACHE_DIR: Local cache directory
  • DG_MAX_RATIO: Default maximum delta ratio

Example

# Configure for MinIO
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin

# Configure DeltaGlider
export DG_LOG_LEVEL=DEBUG
export DG_CACHE_DIR=/var/cache/deltaglider
export DG_MAX_RATIO=0.3

# Now use normally
python my_script.py

Thread Safety

DeltaGlider clients are thread-safe for read operations but should not be shared across threads for write operations. For multi-threaded applications:

import threading
from deltaglider import create_client

# Create separate client per thread
def worker(file_path, s3_url):
    client = create_client()  # Each thread gets its own client
    summary = client.upload(file_path, s3_url)
    print(f"Thread {threading.current_thread().name}: {summary.savings_percent:.0f}%")

# Create threads
threads = []
for i, (file, url) in enumerate(files_to_upload):
    t = threading.Thread(target=worker, args=(file, url), name=f"Worker-{i}")
    threads.append(t)
    t.start()

# Wait for completion
for t in threads:
    t.join()

Performance Considerations

Upload Performance

  • First file: No compression overhead (becomes reference)
  • Similar files: 3-4 files/second with compression
  • Network bound: Limited by S3 upload speed
  • CPU bound: xdelta3 compression for large files

Download Performance

  • Direct files: Limited by S3 download speed
  • Delta files: <100ms reconstruction overhead
  • Cache hits: Near-instant for cached references

Optimization Tips

  1. Group related files: Upload similar files to same prefix
  2. Batch operations: Use concurrent uploads for independent files
  3. Cache management: Don't clear cache during operations
  4. Compression threshold: Tune max_ratio for your use case
  5. Network optimization: Use S3 Transfer Acceleration if available

Logging

DeltaGlider uses Python's standard logging framework:

import logging

# Configure logging before creating client
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('deltaglider.log'),
        logging.StreamHandler()
    ]
)

# Create client (will use configured logging)
client = create_client(log_level="DEBUG")

Log Levels

  • DEBUG: Detailed operations, xdelta3 commands
  • INFO: Normal operations, compression statistics
  • WARNING: Non-critical issues, fallbacks
  • ERROR: Operation failures, exceptions

Version Compatibility

  • Python: 3.11 or higher required
  • boto3: 1.35.0 or higher
  • xdelta3: System binary required
  • S3 API: Compatible with S3 API v4

Support