# DeltaGlider API Reference

Complete API documentation for the DeltaGlider Python SDK.

## Table of Contents

- [Client Creation](#client-creation)
- [DeltaGliderClient](#deltaglidererclient)
- [UploadSummary](#uploadsummary)
- [DeltaService](#deltaservice)
- [Models](#models)
- [Exceptions](#exceptions)

## Client Creation

### `create_client`

Factory function to create a configured DeltaGlider client with sensible defaults.

```python
def create_client(
    endpoint_url: Optional[str] = None,
    log_level: str = "INFO",
    **kwargs
) -> DeltaGliderClient
```

#### Parameters

- **endpoint_url** (`Optional[str]`): S3 endpoint URL for MinIO, R2, or other S3-compatible storage. If None, uses AWS S3.
- **log_level** (`str`): Logging verbosity level. Options: "DEBUG", "INFO", "WARNING", "ERROR". Default: "INFO".
- **kwargs**: Additional arguments passed to `DeltaService`:
  - **tool_version** (`str`): Version string for metadata. Default: "deltaglider/0.1.0"
  - **max_ratio** (`float`): Maximum acceptable delta/file ratio. Default: 0.5

**Security Note**: DeltaGlider automatically uses ephemeral, process-isolated cache (`/tmp/deltaglider-*`) that is cleaned up on exit. No configuration needed.

#### Returns

`DeltaGliderClient`: Configured client instance ready for use.

#### Examples

```python
# Default AWS S3 configuration
client = create_client()

# Custom endpoint for MinIO
client = create_client(endpoint_url="http://localhost:9000")

# Debug mode
client = create_client(log_level="DEBUG")

# Custom delta ratio threshold
client = create_client(max_ratio=0.3)  # Only use delta if <30% of original
```

## DeltaGliderClient

Main client class for interacting with DeltaGlider.

### Constructor

```python
class DeltaGliderClient:
    def __init__(
        self,
        service: DeltaService,
        endpoint_url: Optional[str] = None
    )
```

**Note**: Use `create_client()` instead of instantiating directly.

### boto3-Compatible Methods (Recommended)

These methods provide compatibility with boto3's core S3 client operations. DeltaGlider implements 21 essential S3 methods covering ~80% of common use cases. See [BOTO3_COMPATIBILITY.md](../../BOTO3_COMPATIBILITY.md) for complete coverage details.

#### `list_objects`

List objects in a bucket with smart performance optimizations.

```python
def list_objects(
    self,
    Bucket: str,
    Prefix: str = "",
    Delimiter: str = "",
    MaxKeys: int = 1000,
    ContinuationToken: Optional[str] = None,
    StartAfter: Optional[str] = None,
    FetchMetadata: bool = False,
    **kwargs
) -> dict[str, Any]
```

##### Parameters

- **Bucket** (`str`): S3 bucket name.
- **Prefix** (`str`): Filter results to keys beginning with prefix.
- **Delimiter** (`str`): Delimiter for grouping keys (e.g., '/' for folders).
- **MaxKeys** (`int`): Maximum number of keys to return (for pagination). Default: 1000.
- **ContinuationToken** (`Optional[str]`): Token from previous response for pagination.
- **StartAfter** (`Optional[str]`): Start listing after this key (alternative pagination).
- **FetchMetadata** (`bool`): If True, fetch compression metadata for delta files only. Default: False.
  - **IMPORTANT**: Non-delta files NEVER trigger metadata fetching (no performance impact).
  - With `FetchMetadata=False`: ~50ms for 1000 objects (1 API call)
  - With `FetchMetadata=True`: ~2-3s for 1000 objects (1 + N delta files API calls)

##### Performance Optimization

The method intelligently optimizes performance by:
1. **Never** fetching metadata for non-delta files (they don't need it)
2. Only fetching metadata for delta files when explicitly requested
3. Supporting efficient pagination for large buckets

##### Returns

boto3-compatible dict with:
- **Contents** (`list[dict]`): List of S3Object dicts with Key, Size, LastModified, Metadata
- **CommonPrefixes** (`list[dict]`): Optional list of common prefixes (folders)
- **IsTruncated** (`bool`): Whether more results are available
- **NextContinuationToken** (`str`): Token for next page
- **KeyCount** (`int`): Number of keys returned

##### Examples

```python
# Fast listing for UI display (no metadata fetching)
response = client.list_objects(Bucket='releases')
for obj in response['Contents']:
    print(f"{obj['Key']}: {obj['Size']} bytes")

# Paginated listing for large buckets
response = client.list_objects(Bucket='releases', MaxKeys=100)
while response.get('IsTruncated'):
    for obj in response['Contents']:
        print(obj['Key'])
    response = client.list_objects(
        Bucket='releases',
        MaxKeys=100,
        ContinuationToken=response.get('NextContinuationToken')
    )

# Get detailed compression stats (slower, only for analytics)
response = client.list_objects(
    Bucket='releases',
    FetchMetadata=True  # Only fetches for delta files
)
for obj in response['Contents']:
    metadata = obj.get('Metadata', {})
    if metadata.get('deltaglider-is-delta') == 'true':
        compression = metadata.get('deltaglider-compression-ratio', 'unknown')
        print(f"{obj['Key']}: {compression} compression")
```

#### `get_bucket_stats`

Get statistics for a bucket with optional detailed compression metrics. Results are cached inside the bucket for performance.

```python
def get_bucket_stats(
    self,
    bucket: str,
    mode: Literal["quick", "sampled", "detailed"] = "quick",
    use_cache: bool = True,
    refresh_cache: bool = False,
) -> BucketStats
```

##### Parameters

- **bucket** (`str`): S3 bucket name.
- **mode** (`Literal[...]`): Accuracy/cost trade-off:
  - `"quick"` (default): LIST-only scan; compression ratios for deltas are estimated.
  - `"sampled"`: HEAD one delta per deltaspace and reuse the ratio.
  - `"detailed"`: HEAD every delta object; slowest but exact.
- **use_cache** (`bool`): If True, read/write `.deltaglider/stats_{mode}.json` in the bucket for reuse.
- **refresh_cache** (`bool`): Force recomputation even if a cache file is valid.

##### Caching Behavior

- Stats are cached per mode directly inside the bucket at `.deltaglider/stats_{mode}.json`.
- Every call validates cache freshness via a quick LIST (object count + compressed size).
- `refresh_cache=True` skips cache validation and recomputes immediately.
- `use_cache=False` bypasses both reading and writing cache artifacts.

##### Returns

`BucketStats`: Dataclass containing:
- **bucket** (`str`): Bucket name
- **object_count** (`int`): Total number of objects
- **total_size** (`int`): Original size in bytes (before compression)
- **compressed_size** (`int`): Actual stored size in bytes
- **space_saved** (`int`): Bytes saved through compression
- **average_compression_ratio** (`float`): Average compression ratio (0.0-1.0)
- **delta_objects** (`int`): Number of delta-compressed objects
- **direct_objects** (`int`): Number of directly stored objects

##### Examples

```python
# Quick stats (fast LIST-only)
stats = client.get_bucket_stats('releases')
print(f"Objects: {stats.object_count}, Size: {stats.total_size}")

# Sampled/detailed modes for analytics
sampled = client.get_bucket_stats('releases', mode='sampled')
detailed = client.get_bucket_stats('releases', mode='detailed')
print(f"Compression ratio: {detailed.average_compression_ratio:.1%}")

# Force refresh if an external tool modified the bucket
fresh = client.get_bucket_stats('releases', mode='quick', refresh_cache=True)

# Skip cache entirely when running ad-hoc diagnostics
uncached = client.get_bucket_stats('releases', use_cache=False)
```

#### `put_object`

Upload an object to S3 with automatic delta compression (boto3-compatible).

```python
def put_object(
    self,
    Bucket: str,
    Key: str,
    Body: bytes | str | Path | None = None,
    Metadata: Optional[Dict[str, str]] = None,
    ContentType: Optional[str] = None,
    **kwargs
) -> Dict[str, Any]
```

##### Parameters

- **Bucket** (`str`): S3 bucket name.
- **Key** (`str`): Object key (path in bucket).
- **Body** (`bytes | str | Path`): Object data.
- **Metadata** (`Optional[Dict[str, str]]`): Custom metadata.
- **ContentType** (`Optional[str]`): MIME type (for compatibility).

##### Returns

Dict with ETag and DeltaGlider compression info.

#### `get_object`

Download an object from S3 with automatic delta reconstruction (boto3-compatible).

```python
def get_object(
    self,
    Bucket: str,
    Key: str,
    **kwargs
) -> Dict[str, Any]
```

##### Returns

Dict with Body stream and metadata (identical to boto3).

#### `create_bucket`

Create an S3 bucket (boto3-compatible).

```python
def create_bucket(
    self,
    Bucket: str,
    CreateBucketConfiguration: Optional[Dict[str, str]] = None,
    **kwargs
) -> Dict[str, Any]
```

##### Parameters

- **Bucket** (`str`): Name of the bucket to create.
- **CreateBucketConfiguration** (`Optional[Dict[str, str]]`): Bucket configuration with optional LocationConstraint.

##### Returns

Dict with Location of created bucket.

##### Notes

- Idempotent: Creating an existing bucket returns success
- Use for basic bucket creation without advanced S3 features

##### Examples

```python
# Create bucket in default region
client.create_bucket(Bucket='my-releases')

# Create bucket in specific region
client.create_bucket(
    Bucket='my-backups',
    CreateBucketConfiguration={'LocationConstraint': 'eu-west-1'}
)
```

#### `delete_bucket`

Delete an S3 bucket (boto3-compatible).

```python
def delete_bucket(
    self,
    Bucket: str,
    **kwargs
) -> Dict[str, Any]
```

##### Parameters

- **Bucket** (`str`): Name of the bucket to delete.

##### Returns

Dict confirming deletion.

##### Notes

- Idempotent: Deleting a non-existent bucket returns success
- Bucket must be empty before deletion

##### Examples

```python
# Delete empty bucket
client.delete_bucket(Bucket='old-releases')
```

#### `list_buckets`

List all S3 buckets (boto3-compatible).

```python
def list_buckets(
    self,
    **kwargs
) -> Dict[str, Any]
```

##### Returns

Dict with the same structure boto3 returns (`Buckets`, `Owner`, `ResponseMetadata`). DeltaGlider does not inject additional metadata; use `get_bucket_stats()` for compression data.

##### Examples

```python
response = client.list_buckets()
for bucket in response['Buckets']:
    print(f"{bucket['Name']} - Created: {bucket['CreationDate']}")

# Combine with get_bucket_stats for deeper insights
stats = client.get_bucket_stats('releases', mode='detailed')
print(f"releases -> {stats.object_count} objects, {stats.space_saved/(1024**3):.2f} GB saved")
```

### Simple API Methods

#### `upload`

Upload a file to S3 with automatic delta compression.

```python
def upload(
    self,
    file_path: str | Path,
    s3_url: str,
    tags: Optional[Dict[str, str]] = None,
    max_ratio: float = 0.5
) -> UploadSummary
```

##### Parameters

- **file_path** (`str | Path`): Local file path to upload.
- **s3_url** (`str`): S3 destination URL in format `s3://bucket/prefix/`.
- **tags** (`Optional[Dict[str, str]]`): S3 object tags to attach. (Future feature)
- **max_ratio** (`float`): Maximum acceptable delta/file size ratio. Default: 0.5.

##### Returns

`UploadSummary`: Object containing upload statistics and compression details.

##### Raises

- `FileNotFoundError`: If local file doesn't exist.
- `ValueError`: If S3 URL is invalid.
- `PermissionError`: If S3 access is denied.

##### Examples

```python
# Simple upload
summary = client.upload("app.zip", "s3://releases/v1.0.0/")

# With custom compression threshold
summary = client.upload(
    "large-file.tar.gz",
    "s3://backups/",
    max_ratio=0.3  # Only use delta if compression > 70%
)

# Check results
if summary.is_delta:
    print(f"Stored as delta: {summary.stored_size_mb:.1f} MB")
else:
    print(f"Stored as full file: {summary.original_size_mb:.1f} MB")
```

#### `download`

Download and reconstruct a file from S3.

```python
def download(
    self,
    s3_url: str,
    output_path: str | Path
) -> None
```

##### Parameters

- **s3_url** (`str`): S3 source URL in format `s3://bucket/key`.
- **output_path** (`str | Path`): Local destination path.

##### Returns

None. File is written to `output_path`.

##### Raises

- `ValueError`: If S3 URL is invalid or missing key.
- `FileNotFoundError`: If S3 object doesn't exist.
- `PermissionError`: If local path is not writable or S3 access denied.

##### Examples

```python
# Download a file
client.download("s3://releases/v1.0.0/app.zip", "downloaded.zip")

# Auto-detects .delta suffix if needed
client.download("s3://releases/v1.0.0/app.zip", "app.zip")
# Will try app.zip first, then app.zip.delta if not found

# Download to specific directory
from pathlib import Path
output = Path("/tmp/downloads/app.zip")
output.parent.mkdir(parents=True, exist_ok=True)
client.download("s3://releases/v1.0.0/app.zip", output)
```

#### `verify`

Verify the integrity of a stored file using SHA256 checksums.

```python
def verify(
    self,
    s3_url: str
) -> bool
```

##### Parameters

- **s3_url** (`str`): S3 URL of the file to verify.

##### Returns

`bool`: True if verification passed, False if corrupted.

##### Raises

- `ValueError`: If S3 URL is invalid.
- `FileNotFoundError`: If S3 object doesn't exist.

##### Examples

```python
# Verify file integrity
is_valid = client.verify("s3://releases/v1.0.0/app.zip")

if is_valid:
    print("✓ File integrity verified")
else:
    print("✗ File is corrupted!")
    # Re-upload or investigate
```

### Cache Management Methods

#### `clear_cache`

Clear all locally cached reference files.

```python
def clear_cache(self) -> None
```

##### Description

Removes all cached reference files from the local filesystem. Useful for:
- Freeing disk space in long-running applications
- Ensuring the next upload/download fetches fresh references from S3
- Resetting cache after configuration or credential changes
- Testing and development workflows

##### Cache Scope

- **Reference Cache**: Binary reference files stored in `/tmp/deltaglider-*/`
  - Encrypted at rest with ephemeral keys
  - Content-addressed storage (SHA256-based filenames)
  - Automatically cleaned up on process exit
- **Statistics Cache**: Stored inside the bucket as `.deltaglider/stats_{mode}.json`.
  - `clear_cache()` does *not* remove these S3 objects; use `refresh_cache=True` or delete the objects manually if needed.

##### Examples

```python
# Long-running application
client = create_client()

# Work with files
for i in range(1000):
    client.upload(f"file_{i}.zip", "s3://bucket/")

    # Periodic cache cleanup to prevent disk buildup
    if i % 100 == 0:
        client.clear_cache()

# Force fresh statistics after external changes (skip cache instead of clearing)
stats_before = client.get_bucket_stats('releases')
stats_after = client.get_bucket_stats('releases', refresh_cache=True)

# Development workflow
client.clear_cache()  # Start with clean state
```

## UploadSummary

Data class containing upload operation results.

```python
@dataclass
class UploadSummary:
    operation: str           # Operation type: "PUT" or "PUT_DELTA"
    bucket: str              # S3 bucket name
    key: str                 # S3 object key
    original_size: int       # Original file size in bytes
    stored_size: int         # Actual stored size in bytes
    is_delta: bool           # Whether delta compression was used
    delta_ratio: float = 0.0 # Ratio of delta size to original
```

### Properties

#### `original_size_mb`

Original file size in megabytes.

```python
@property
def original_size_mb(self) -> float
```

#### `stored_size_mb`

Stored size in megabytes (after compression if applicable).

```python
@property
def stored_size_mb(self) -> float
```

#### `savings_percent`

Percentage saved through compression.

```python
@property
def savings_percent(self) -> float
```

### Example Usage

```python
summary = client.upload("app.zip", "s3://releases/")

print(f"Operation: {summary.operation}")
print(f"Location: s3://{summary.bucket}/{summary.key}")
print(f"Original: {summary.original_size_mb:.1f} MB")
print(f"Stored: {summary.stored_size_mb:.1f} MB")
print(f"Saved: {summary.savings_percent:.0f}%")
print(f"Delta used: {summary.is_delta}")

if summary.is_delta:
    print(f"Delta ratio: {summary.delta_ratio:.2%}")
```

## DeltaService

Core service class handling delta compression logic.

```python
class DeltaService:
    def __init__(
        self,
        storage: StoragePort,
        diff: DiffPort,
        hasher: HashPort,
        cache: CachePort,
        clock: ClockPort,
        logger: LoggerPort,
        metrics: MetricsPort,
        tool_version: str = "deltaglider/0.1.0",
        max_ratio: float = 0.5
    )
```

### Methods

#### `put`

Upload a file with automatic delta compression.

```python
def put(
    self,
    file: Path,
    delta_space: DeltaSpace,
    max_ratio: Optional[float] = None
) -> PutSummary
```

#### `get`

Download and reconstruct a file.

```python
def get(
    self,
    object_key: ObjectKey,
    output_path: Path
) -> GetSummary
```

#### `verify`

Verify file integrity.

```python
def verify(
    self,
    object_key: ObjectKey
) -> VerifyResult
```

## Models

### DeltaSpace

Represents a compression space in S3.

```python
@dataclass(frozen=True)
class DeltaSpace:
    bucket: str  # S3 bucket name
    prefix: str  # S3 prefix for related files
```

### ObjectKey

Represents an S3 object location.

```python
@dataclass(frozen=True)
class ObjectKey:
    bucket: str  # S3 bucket name
    key: str     # S3 object key
```

### PutSummary

Detailed upload operation results.

```python
@dataclass
class PutSummary:
    operation: str              # "PUT" or "PUT_DELTA"
    bucket: str                 # S3 bucket
    key: str                    # S3 key
    file_size: int              # Original file size
    file_hash: str              # SHA256 of original file
    delta_size: Optional[int]   # Size of delta (if used)
    delta_hash: Optional[str]   # SHA256 of delta
    delta_ratio: Optional[float] # Delta/original ratio
    reference_hash: Optional[str] # Reference file hash
```

### GetSummary

Download operation results.

```python
@dataclass
class GetSummary:
    operation: str    # "GET" or "GET_DELTA"
    bucket: str       # S3 bucket
    key: str          # S3 key
    size: int         # Downloaded size
    hash: str         # SHA256 hash
    reconstructed: bool # Whether reconstruction was needed
```

### VerifyResult

Verification operation results.

```python
@dataclass
class VerifyResult:
    valid: bool           # Verification result
    operation: str        # "VERIFY" or "VERIFY_DELTA"
    expected_hash: str    # Expected SHA256
    actual_hash: Optional[str] # Actual SHA256 (if computed)
    details: Optional[str] # Error details if invalid
```

## Exceptions

DeltaGlider uses standard Python exceptions with descriptive messages:

### Common Exceptions

- **FileNotFoundError**: Local file or S3 object not found
- **PermissionError**: Access denied (S3 or local filesystem)
- **ValueError**: Invalid parameters (malformed URLs, invalid ratios)
- **IOError**: I/O operations failed
- **RuntimeError**: xdelta3 binary not found or failed

### Exception Handling Example

```python
from deltaglider import create_client

client = create_client()

try:
    summary = client.upload("app.zip", "s3://bucket/path/")

except FileNotFoundError as e:
    print(f"File not found: {e}")

except PermissionError as e:
    print(f"Permission denied: {e}")
    print("Check AWS credentials and S3 bucket permissions")

except ValueError as e:
    print(f"Invalid parameters: {e}")

except RuntimeError as e:
    print(f"System error: {e}")
    print("Ensure xdelta3 is installed: apt-get install xdelta3")

except Exception as e:
    print(f"Unexpected error: {e}")
    # Log for investigation
    import traceback
    traceback.print_exc()
```

## Environment Variables

DeltaGlider respects these environment variables:

### AWS Configuration

- **AWS_ACCESS_KEY_ID**: AWS access key
- **AWS_SECRET_ACCESS_KEY**: AWS secret key
- **AWS_DEFAULT_REGION**: AWS region (default: us-east-1)
- **AWS_ENDPOINT_URL**: Custom S3 endpoint (for MinIO/R2)
- **AWS_PROFILE**: AWS profile to use

### DeltaGlider Configuration

- **DG_LOG_LEVEL**: Logging level (DEBUG, INFO, WARNING, ERROR)
- **DG_MAX_RATIO**: Default maximum delta ratio

**Note**: Cache is automatically managed (ephemeral, process-isolated) and requires no configuration.

### Example

```bash
# Configure for MinIO
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin

# Configure DeltaGlider
export DG_LOG_LEVEL=DEBUG
export DG_MAX_RATIO=0.3

# Now use normally (cache managed automatically)
python my_script.py
```

## Thread Safety

DeltaGlider clients are thread-safe for read operations but should not be shared across threads for write operations. For multi-threaded applications:

```python
import threading
from deltaglider import create_client

# Create separate client per thread
def worker(file_path, s3_url):
    client = create_client()  # Each thread gets its own client
    summary = client.upload(file_path, s3_url)
    print(f"Thread {threading.current_thread().name}: {summary.savings_percent:.0f}%")

# Create threads
threads = []
for i, (file, url) in enumerate(files_to_upload):
    t = threading.Thread(target=worker, args=(file, url), name=f"Worker-{i}")
    threads.append(t)
    t.start()

# Wait for completion
for t in threads:
    t.join()
```

## Performance Considerations

### Upload Performance

- **First file**: No compression overhead (becomes reference)
- **Similar files**: 3-4 files/second with compression
- **Network bound**: Limited by S3 upload speed
- **CPU bound**: xdelta3 compression for large files

### Download Performance

- **Direct files**: Limited by S3 download speed
- **Delta files**: <100ms reconstruction overhead
- **Cache hits**: Near-instant for cached references

### Optimization Tips

1. **Group related files**: Upload similar files to same prefix
2. **Batch operations**: Use concurrent uploads for independent files
3. **Cache management**: Don't clear cache during operations
4. **Compression threshold**: Tune `max_ratio` for your use case
5. **Network optimization**: Use S3 Transfer Acceleration if available

## Logging

DeltaGlider uses Python's standard logging framework:

```python
import logging

# Configure logging before creating client
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('deltaglider.log'),
        logging.StreamHandler()
    ]
)

# Create client (will use configured logging)
client = create_client(log_level="DEBUG")
```

### Log Levels

- **DEBUG**: Detailed operations, xdelta3 commands
- **INFO**: Normal operations, compression statistics
- **WARNING**: Non-critical issues, fallbacks
- **ERROR**: Operation failures, exceptions

## Version Compatibility

- **Python**: 3.11 or higher required
- **boto3**: 1.35.0 or higher
- **xdelta3**: System binary required
- **S3 API**: Compatible with S3 API v4

## Support

- **GitHub Issues**: [github.com/beshu-tech/deltaglider/issues](https://github.com/beshu-tech/deltaglider/issues)
- **Documentation**: [github.com/beshu-tech/deltaglider](https://github.com/beshu-tech/deltaglider)
- **PyPI Package**: [pypi.org/project/deltaglider](https://pypi.org/project/deltaglider)