starred/deltaglider-beshu-tech

Fork 0

mirror of https://github.com/beshu-tech/deltaglider.git synced 2026-01-11 14:40:26 +01:00

Files

Simone Scarduzio 284f030fae updates to docs

2025-11-11 17:05:50 +01:00

23 KiB

Raw Permalink Blame History

DeltaGlider API Reference

Complete API documentation for the DeltaGlider Python SDK.

Client Creation
DeltaGliderClient
UploadSummary
DeltaService
Models
Exceptions

Client Creation

`create_client`

Factory function to create a configured DeltaGlider client with sensible defaults.

def create_client(
    endpoint_url: Optional[str] = None,
    log_level: str = "INFO",
    **kwargs
) -> DeltaGliderClient

Parameters

endpoint_url (Optional[str]): S3 endpoint URL for MinIO, R2, or other S3-compatible storage. If None, uses AWS S3.
log_level (str): Logging verbosity level. Options: "DEBUG", "INFO", "WARNING", "ERROR". Default: "INFO".
kwargs: Additional arguments passed to DeltaService:
- tool_version (str): Version string for metadata. Default: "deltaglider/0.1.0"
- max_ratio (float): Maximum acceptable delta/file ratio. Default: 0.5

Security Note: DeltaGlider automatically uses ephemeral, process-isolated cache (/tmp/deltaglider-*) that is cleaned up on exit. No configuration needed.

Returns

DeltaGliderClient: Configured client instance ready for use.

Examples

# Default AWS S3 configuration
client = create_client()

# Custom endpoint for MinIO
client = create_client(endpoint_url="http://localhost:9000")

# Debug mode
client = create_client(log_level="DEBUG")

# Custom delta ratio threshold
client = create_client(max_ratio=0.3)  # Only use delta if <30% of original

DeltaGliderClient

Main client class for interacting with DeltaGlider.

Constructor

class DeltaGliderClient:
    def __init__(
        self,
        service: DeltaService,
        endpoint_url: Optional[str] = None
    )

Note: Use create_client() instead of instantiating directly.

boto3-Compatible Methods (Recommended)

These methods provide compatibility with boto3's core S3 client operations. DeltaGlider implements 21 essential S3 methods covering ~80% of common use cases. See BOTO3_COMPATIBILITY.md for complete coverage details.

`list_objects`

List objects in a bucket with smart performance optimizations.

def list_objects(
    self,
    Bucket: str,
    Prefix: str = "",
    Delimiter: str = "",
    MaxKeys: int = 1000,
    ContinuationToken: Optional[str] = None,
    StartAfter: Optional[str] = None,
    FetchMetadata: bool = False,
    **kwargs
) -> dict[str, Any]

Parameters

Bucket (str): S3 bucket name.
Prefix (str): Filter results to keys beginning with prefix.
Delimiter (str): Delimiter for grouping keys (e.g., '/' for folders).
MaxKeys (int): Maximum number of keys to return (for pagination). Default: 1000.
ContinuationToken (Optional[str]): Token from previous response for pagination.
StartAfter (Optional[str]): Start listing after this key (alternative pagination).
FetchMetadata (bool): If True, fetch compression metadata for delta files only. Default: False.
- IMPORTANT: Non-delta files NEVER trigger metadata fetching (no performance impact).
- With FetchMetadata=False: ~50ms for 1000 objects (1 API call)
- With FetchMetadata=True: ~2-3s for 1000 objects (1 + N delta files API calls)

Performance Optimization

The method intelligently optimizes performance by:

Never fetching metadata for non-delta files (they don't need it)
Only fetching metadata for delta files when explicitly requested
Supporting efficient pagination for large buckets

Returns

boto3-compatible dict with:

Contents (list[dict]): List of S3Object dicts with Key, Size, LastModified, Metadata
CommonPrefixes (list[dict]): Optional list of common prefixes (folders)
IsTruncated (bool): Whether more results are available
NextContinuationToken (str): Token for next page
KeyCount (int): Number of keys returned

Examples

# Fast listing for UI display (no metadata fetching)
response = client.list_objects(Bucket='releases')
for obj in response['Contents']:
    print(f"{obj['Key']}: {obj['Size']} bytes")

# Paginated listing for large buckets
response = client.list_objects(Bucket='releases', MaxKeys=100)
while response.get('IsTruncated'):
    for obj in response['Contents']:
        print(obj['Key'])
    response = client.list_objects(
        Bucket='releases',
        MaxKeys=100,
        ContinuationToken=response.get('NextContinuationToken')
    )

# Get detailed compression stats (slower, only for analytics)
response = client.list_objects(
    Bucket='releases',
    FetchMetadata=True  # Only fetches for delta files
)
for obj in response['Contents']:
    metadata = obj.get('Metadata', {})
    if metadata.get('deltaglider-is-delta') == 'true':
        compression = metadata.get('deltaglider-compression-ratio', 'unknown')
        print(f"{obj['Key']}: {compression} compression")

`get_bucket_stats`

Get statistics for a bucket with optional detailed compression metrics. Results are cached inside the bucket for performance.

def get_bucket_stats(
    self,
    bucket: str,
    mode: Literal["quick", "sampled", "detailed"] = "quick",
    use_cache: bool = True,
    refresh_cache: bool = False,
) -> BucketStats

Parameters

bucket (str): S3 bucket name.
mode (Literal[...]): Accuracy/cost trade-off:
- "quick" (default): LIST-only scan; compression ratios for deltas are estimated.
- "sampled": HEAD one delta per deltaspace and reuse the ratio.
- "detailed": HEAD every delta object; slowest but exact.
use_cache (bool): If True, read/write .deltaglider/stats_{mode}.json in the bucket for reuse.
refresh_cache (bool): Force recomputation even if a cache file is valid.

Caching Behavior

Stats are cached per mode directly inside the bucket at .deltaglider/stats_{mode}.json.
Every call validates cache freshness via a quick LIST (object count + compressed size).
refresh_cache=True skips cache validation and recomputes immediately.
use_cache=False bypasses both reading and writing cache artifacts.

Returns

BucketStats: Dataclass containing:

bucket (str): Bucket name
object_count (int): Total number of objects
total_size (int): Original size in bytes (before compression)
compressed_size (int): Actual stored size in bytes
space_saved (int): Bytes saved through compression
average_compression_ratio (float): Average compression ratio (0.0-1.0)
delta_objects (int): Number of delta-compressed objects
direct_objects (int): Number of directly stored objects

Examples

# Quick stats (fast LIST-only)
stats = client.get_bucket_stats('releases')
print(f"Objects: {stats.object_count}, Size: {stats.total_size}")

# Sampled/detailed modes for analytics
sampled = client.get_bucket_stats('releases', mode='sampled')
detailed = client.get_bucket_stats('releases', mode='detailed')
print(f"Compression ratio: {detailed.average_compression_ratio:.1%}")

# Force refresh if an external tool modified the bucket
fresh = client.get_bucket_stats('releases', mode='quick', refresh_cache=True)

# Skip cache entirely when running ad-hoc diagnostics
uncached = client.get_bucket_stats('releases', use_cache=False)

`put_object`

Upload an object to S3 with automatic delta compression (boto3-compatible).

def put_object(
    self,
    Bucket: str,
    Key: str,
    Body: bytes | str | Path | None = None,
    Metadata: Optional[Dict[str, str]] = None,
    ContentType: Optional[str] = None,
    **kwargs
) -> Dict[str, Any]

Parameters

Bucket (str): S3 bucket name.
Key (str): Object key (path in bucket).
Body (bytes | str | Path): Object data.
Metadata (Optional[Dict[str, str]]): Custom metadata.
ContentType (Optional[str]): MIME type (for compatibility).

Returns

Dict with ETag and DeltaGlider compression info.

`get_object`

Download an object from S3 with automatic delta reconstruction (boto3-compatible).

def get_object(
    self,
    Bucket: str,
    Key: str,
    **kwargs
) -> Dict[str, Any]

Returns

Dict with Body stream and metadata (identical to boto3).

`create_bucket`

Create an S3 bucket (boto3-compatible).

def create_bucket(
    self,
    Bucket: str,
    CreateBucketConfiguration: Optional[Dict[str, str]] = None,
    **kwargs
) -> Dict[str, Any]

Parameters

Bucket (str): Name of the bucket to create.
CreateBucketConfiguration (Optional[Dict[str, str]]): Bucket configuration with optional LocationConstraint.

Returns

Dict with Location of created bucket.

Notes

Idempotent: Creating an existing bucket returns success
Use for basic bucket creation without advanced S3 features

Examples

# Create bucket in default region
client.create_bucket(Bucket='my-releases')

# Create bucket in specific region
client.create_bucket(
    Bucket='my-backups',
    CreateBucketConfiguration={'LocationConstraint': 'eu-west-1'}
)

`delete_bucket`

Delete an S3 bucket (boto3-compatible).

def delete_bucket(
    self,
    Bucket: str,
    **kwargs
) -> Dict[str, Any]

Parameters

Bucket (str): Name of the bucket to delete.

Returns

Dict confirming deletion.

Notes

Idempotent: Deleting a non-existent bucket returns success
Bucket must be empty before deletion

Examples

# Delete empty bucket
client.delete_bucket(Bucket='old-releases')

`list_buckets`

List all S3 buckets (boto3-compatible).

def list_buckets(
    self,
    **kwargs
) -> Dict[str, Any]

Returns

Dict with the same structure boto3 returns (Buckets, Owner, ResponseMetadata). DeltaGlider does not inject additional metadata; use get_bucket_stats() for compression data.

Examples

response = client.list_buckets()
for bucket in response['Buckets']:
    print(f"{bucket['Name']} - Created: {bucket['CreationDate']}")

# Combine with get_bucket_stats for deeper insights
stats = client.get_bucket_stats('releases', mode='detailed')
print(f"releases -> {stats.object_count} objects, {stats.space_saved/(1024**3):.2f} GB saved")

Simple API Methods

`upload`

Upload a file to S3 with automatic delta compression.

def upload(
    self,
    file_path: str | Path,
    s3_url: str,
    tags: Optional[Dict[str, str]] = None,
    max_ratio: float = 0.5
) -> UploadSummary

Parameters

file_path (str | Path): Local file path to upload.
s3_url (str): S3 destination URL in format s3://bucket/prefix/.
tags (Optional[Dict[str, str]]): S3 object tags to attach. (Future feature)
max_ratio (float): Maximum acceptable delta/file size ratio. Default: 0.5.

Returns

UploadSummary: Object containing upload statistics and compression details.

Raises

FileNotFoundError: If local file doesn't exist.
ValueError: If S3 URL is invalid.
PermissionError: If S3 access is denied.

Examples

# Simple upload
summary = client.upload("app.zip", "s3://releases/v1.0.0/")

# With custom compression threshold
summary = client.upload(
    "large-file.tar.gz",
    "s3://backups/",
    max_ratio=0.3  # Only use delta if compression > 70%
)

# Check results
if summary.is_delta:
    print(f"Stored as delta: {summary.stored_size_mb:.1f} MB")
else:
    print(f"Stored as full file: {summary.original_size_mb:.1f} MB")

`download`

Download and reconstruct a file from S3.

def download(
    self,
    s3_url: str,
    output_path: str | Path
) -> None

Parameters

s3_url (str): S3 source URL in format s3://bucket/key.
output_path (str | Path): Local destination path.

Returns

None. File is written to output_path.

Raises

ValueError: If S3 URL is invalid or missing key.
FileNotFoundError: If S3 object doesn't exist.
PermissionError: If local path is not writable or S3 access denied.

Examples

# Download a file
client.download("s3://releases/v1.0.0/app.zip", "downloaded.zip")

# Auto-detects .delta suffix if needed
client.download("s3://releases/v1.0.0/app.zip", "app.zip")
# Will try app.zip first, then app.zip.delta if not found

# Download to specific directory
from pathlib import Path
output = Path("/tmp/downloads/app.zip")
output.parent.mkdir(parents=True, exist_ok=True)
client.download("s3://releases/v1.0.0/app.zip", output)

`verify`

Verify the integrity of a stored file using SHA256 checksums.

def verify(
    self,
    s3_url: str
) -> bool

Parameters

s3_url (str): S3 URL of the file to verify.

Returns

bool: True if verification passed, False if corrupted.

Raises

ValueError: If S3 URL is invalid.
FileNotFoundError: If S3 object doesn't exist.

Examples

# Verify file integrity
is_valid = client.verify("s3://releases/v1.0.0/app.zip")

if is_valid:
    print("✓ File integrity verified")
else:
    print("✗ File is corrupted!")
    # Re-upload or investigate

Cache Management Methods

`clear_cache`

Clear all locally cached reference files.

def clear_cache(self) -> None

Description

Removes all cached reference files from the local filesystem. Useful for:

Freeing disk space in long-running applications
Ensuring the next upload/download fetches fresh references from S3
Resetting cache after configuration or credential changes
Testing and development workflows

Cache Scope

Reference Cache: Binary reference files stored in /tmp/deltaglider-*/
- Encrypted at rest with ephemeral keys
- Content-addressed storage (SHA256-based filenames)
- Automatically cleaned up on process exit
Statistics Cache: Stored inside the bucket as .deltaglider/stats_{mode}.json.
- clear_cache() does not remove these S3 objects; use refresh_cache=True or delete the objects manually if needed.

Examples

# Long-running application
client = create_client()

# Work with files
for i in range(1000):
    client.upload(f"file_{i}.zip", "s3://bucket/")

    # Periodic cache cleanup to prevent disk buildup
    if i % 100 == 0:
        client.clear_cache()

# Force fresh statistics after external changes (skip cache instead of clearing)
stats_before = client.get_bucket_stats('releases')
stats_after = client.get_bucket_stats('releases', refresh_cache=True)

# Development workflow
client.clear_cache()  # Start with clean state

UploadSummary

Data class containing upload operation results.

@dataclass
class UploadSummary:
    operation: str           # Operation type: "PUT" or "PUT_DELTA"
    bucket: str              # S3 bucket name
    key: str                 # S3 object key
    original_size: int       # Original file size in bytes
    stored_size: int         # Actual stored size in bytes
    is_delta: bool           # Whether delta compression was used
    delta_ratio: float = 0.0 # Ratio of delta size to original

Properties

`original_size_mb`

Original file size in megabytes.

@property
def original_size_mb(self) -> float

`stored_size_mb`

Stored size in megabytes (after compression if applicable).

@property
def stored_size_mb(self) -> float

`savings_percent`

Percentage saved through compression.

@property
def savings_percent(self) -> float

Example Usage

summary = client.upload("app.zip", "s3://releases/")

print(f"Operation: {summary.operation}")
print(f"Location: s3://{summary.bucket}/{summary.key}")
print(f"Original: {summary.original_size_mb:.1f} MB")
print(f"Stored: {summary.stored_size_mb:.1f} MB")
print(f"Saved: {summary.savings_percent:.0f}%")
print(f"Delta used: {summary.is_delta}")

if summary.is_delta:
    print(f"Delta ratio: {summary.delta_ratio:.2%}")

DeltaService

Core service class handling delta compression logic.

class DeltaService:
    def __init__(
        self,
        storage: StoragePort,
        diff: DiffPort,
        hasher: HashPort,
        cache: CachePort,
        clock: ClockPort,
        logger: LoggerPort,
        metrics: MetricsPort,
        tool_version: str = "deltaglider/0.1.0",
        max_ratio: float = 0.5
    )

Methods

`put`

Upload a file with automatic delta compression.

def put(
    self,
    file: Path,
    delta_space: DeltaSpace,
    max_ratio: Optional[float] = None
) -> PutSummary

`get`

Download and reconstruct a file.

def get(
    self,
    object_key: ObjectKey,
    output_path: Path
) -> GetSummary

`verify`

Verify file integrity.

def verify(
    self,
    object_key: ObjectKey
) -> VerifyResult

Models

DeltaSpace

Represents a compression space in S3.

@dataclass(frozen=True)
class DeltaSpace:
    bucket: str  # S3 bucket name
    prefix: str  # S3 prefix for related files

ObjectKey

Represents an S3 object location.

@dataclass(frozen=True)
class ObjectKey:
    bucket: str  # S3 bucket name
    key: str     # S3 object key

PutSummary

Detailed upload operation results.

@dataclass
class PutSummary:
    operation: str              # "PUT" or "PUT_DELTA"
    bucket: str                 # S3 bucket
    key: str                    # S3 key
    file_size: int              # Original file size
    file_hash: str              # SHA256 of original file
    delta_size: Optional[int]   # Size of delta (if used)
    delta_hash: Optional[str]   # SHA256 of delta
    delta_ratio: Optional[float] # Delta/original ratio
    reference_hash: Optional[str] # Reference file hash

GetSummary

Download operation results.

@dataclass
class GetSummary:
    operation: str    # "GET" or "GET_DELTA"
    bucket: str       # S3 bucket
    key: str          # S3 key
    size: int         # Downloaded size
    hash: str         # SHA256 hash
    reconstructed: bool # Whether reconstruction was needed

VerifyResult

Verification operation results.

@dataclass
class VerifyResult:
    valid: bool           # Verification result
    operation: str        # "VERIFY" or "VERIFY_DELTA"
    expected_hash: str    # Expected SHA256
    actual_hash: Optional[str] # Actual SHA256 (if computed)
    details: Optional[str] # Error details if invalid

Exceptions

DeltaGlider uses standard Python exceptions with descriptive messages:

Common Exceptions

FileNotFoundError: Local file or S3 object not found
PermissionError: Access denied (S3 or local filesystem)
ValueError: Invalid parameters (malformed URLs, invalid ratios)
IOError: I/O operations failed
RuntimeError: xdelta3 binary not found or failed

Exception Handling Example

from deltaglider import create_client

client = create_client()

try:
    summary = client.upload("app.zip", "s3://bucket/path/")

except FileNotFoundError as e:
    print(f"File not found: {e}")

except PermissionError as e:
    print(f"Permission denied: {e}")
    print("Check AWS credentials and S3 bucket permissions")

except ValueError as e:
    print(f"Invalid parameters: {e}")

except RuntimeError as e:
    print(f"System error: {e}")
    print("Ensure xdelta3 is installed: apt-get install xdelta3")

except Exception as e:
    print(f"Unexpected error: {e}")
    # Log for investigation
    import traceback
    traceback.print_exc()

Environment Variables

DeltaGlider respects these environment variables:

AWS Configuration

AWS_ACCESS_KEY_ID: AWS access key
AWS_SECRET_ACCESS_KEY: AWS secret key
AWS_DEFAULT_REGION: AWS region (default: us-east-1)
AWS_ENDPOINT_URL: Custom S3 endpoint (for MinIO/R2)
AWS_PROFILE: AWS profile to use

DeltaGlider Configuration

DG_LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR)
DG_MAX_RATIO: Default maximum delta ratio

Note: Cache is automatically managed (ephemeral, process-isolated) and requires no configuration.

Example

# Configure for MinIO
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin

# Configure DeltaGlider
export DG_LOG_LEVEL=DEBUG
export DG_MAX_RATIO=0.3

# Now use normally (cache managed automatically)
python my_script.py

Thread Safety

DeltaGlider clients are thread-safe for read operations but should not be shared across threads for write operations. For multi-threaded applications:

import threading
from deltaglider import create_client

# Create separate client per thread
def worker(file_path, s3_url):
    client = create_client()  # Each thread gets its own client
    summary = client.upload(file_path, s3_url)
    print(f"Thread {threading.current_thread().name}: {summary.savings_percent:.0f}%")

# Create threads
threads = []
for i, (file, url) in enumerate(files_to_upload):
    t = threading.Thread(target=worker, args=(file, url), name=f"Worker-{i}")
    threads.append(t)
    t.start()

# Wait for completion
for t in threads:
    t.join()

Performance Considerations

Upload Performance

First file: No compression overhead (becomes reference)
Similar files: 3-4 files/second with compression
Network bound: Limited by S3 upload speed
CPU bound: xdelta3 compression for large files

Download Performance

Direct files: Limited by S3 download speed
Delta files: <100ms reconstruction overhead
Cache hits: Near-instant for cached references

Optimization Tips

Group related files: Upload similar files to same prefix
Batch operations: Use concurrent uploads for independent files
Cache management: Don't clear cache during operations
Compression threshold: Tune max_ratio for your use case
Network optimization: Use S3 Transfer Acceleration if available

Logging

DeltaGlider uses Python's standard logging framework:

import logging

# Configure logging before creating client
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('deltaglider.log'),
        logging.StreamHandler()
    ]
)

# Create client (will use configured logging)
client = create_client(log_level="DEBUG")

Log Levels

DEBUG: Detailed operations, xdelta3 commands
INFO: Normal operations, compression statistics
WARNING: Non-critical issues, fallbacks
ERROR: Operation failures, exceptions

Version Compatibility

Python: 3.11 or higher required
boto3: 1.35.0 or higher
xdelta3: System binary required
S3 API: Compatible with S3 API v4

Support

GitHub Issues: github.com/beshu-tech/deltaglider/issues
Documentation: github.com/beshu-tech/deltaglider
PyPI Package: pypi.org/project/deltaglider

23 KiB Raw Permalink Blame History

DeltaGlider API Reference

Table of Contents

Client Creation

create_client

Parameters

Returns

Examples

DeltaGliderClient

Constructor

boto3-Compatible Methods (Recommended)

list_objects

Parameters

Performance Optimization

Returns

Examples

get_bucket_stats

Parameters

Caching Behavior

Returns

Examples

put_object

Parameters

Returns

get_object

Returns

create_bucket

Parameters

Returns

Notes

Examples

delete_bucket

Parameters

Returns

Notes

Examples

list_buckets

Returns

Examples

Simple API Methods

upload

Parameters

Returns

Raises

Examples

download

Parameters

Returns

Raises

Examples

verify

Parameters

Returns

Raises

Examples

Cache Management Methods

clear_cache

Description

Cache Scope

Examples

UploadSummary

Properties

original_size_mb

stored_size_mb

savings_percent

Example Usage

DeltaService

Methods

put

get

verify

Models

DeltaSpace

ObjectKey

PutSummary

GetSummary

VerifyResult

Exceptions

Common Exceptions

Exception Handling Example

23 KiB

Raw Permalink Blame History

`create_client`

`list_objects`

`get_bucket_stats`

`put_object`

`get_object`

`create_bucket`

`delete_bucket`

`list_buckets`

`upload`

`download`

`verify`

`clear_cache`

`original_size_mb`

`stored_size_mb`

`savings_percent`

`put`

`get`

`verify`