docs: Update SDK documentation for v5.1.0 features

- Add session-level caching documentation to API reference - Document clear_cache() and evict_cache() methods - Add comprehensive bucket statistics examples - Update list_buckets() with DeltaGliderStats metadata - Add cache management patterns and best practices - Update CHANGELOG comparison links 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-11 22:30:48 +01:00 · 2025-10-10 18:34:44 +02:00
parent 3d04a407c0
commit dbd2632cae
3 changed files with 498 additions and 14 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -177,6 +177,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Delta compression for versioned artifacts
 - 99%+ compression for similar files

+[5.1.0]: https://github.com/beshu-tech/deltaglider/compare/v5.0.3...v5.1.0
+[5.0.3]: https://github.com/beshu-tech/deltaglider/compare/v5.0.1...v5.0.3
 [5.0.1]: https://github.com/beshu-tech/deltaglider/compare/v5.0.0...v5.0.1
 [5.0.0]: https://github.com/beshu-tech/deltaglider/compare/v4.2.4...v5.0.0
 [4.2.4]: https://github.com/beshu-tech/deltaglider/compare/v4.2.3...v4.2.4
--- a/docs/sdk/api.md
+++ b/docs/sdk/api.md
@@ -156,7 +156,7 @@ for obj in response['Contents']:

 #### `get_bucket_stats`

-Get statistics for a bucket with optional detailed compression metrics.
+Get statistics for a bucket with optional detailed compression metrics. Results are cached per client session for performance.

 ```python
 def get_bucket_stats(
@@ -173,16 +173,46 @@ def get_bucket_stats(
  - With `detailed_stats=False`: ~50ms for any bucket size (LIST calls only)
  - With `detailed_stats=True`: ~2-3s per 1000 objects (adds HEAD calls for delta files)

+##### Caching Behavior
+
+- **Session-scoped cache**: Results cached within client instance lifetime
+- **Automatic invalidation**: Cache cleared on bucket mutations (put, delete, bucket operations)
+- **Intelligent reuse**: Detailed stats can serve quick stat requests
+- **Manual cache control**: Use `clear_cache()` to invalidate all cached stats
+
+##### Returns
+
+`BucketStats`: Dataclass containing:
+- **bucket** (`str`): Bucket name
+- **object_count** (`int`): Total number of objects
+- **total_size** (`int`): Original size in bytes (before compression)
+- **compressed_size** (`int`): Actual stored size in bytes
+- **space_saved** (`int`): Bytes saved through compression
+- **average_compression_ratio** (`float`): Average compression ratio (0.0-1.0)
+- **delta_objects** (`int`): Number of delta-compressed objects
+- **direct_objects** (`int`): Number of directly stored objects
+
 ##### Examples

 ```python
-# Quick stats for dashboard display
+# Quick stats for dashboard display (cached after first call)
 stats = client.get_bucket_stats('releases')
 print(f"Objects: {stats.object_count}, Size: {stats.total_size}")

-# Detailed stats for analytics (slower but accurate)
+# Second call hits cache (instant response)
+stats = client.get_bucket_stats('releases')
+print(f"Space saved: {stats.space_saved} bytes")
+
+# Detailed stats for analytics (slower but accurate, also cached)
 stats = client.get_bucket_stats('releases', detailed_stats=True)
 print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
+
+# Quick call after detailed call reuses detailed cache (more accurate)
+quick_stats = client.get_bucket_stats('releases')  # Uses detailed cache
+
+# Clear cache to force refresh
+client.clear_cache()
+stats = client.get_bucket_stats('releases')  # Fresh computation
 ```

 #### `put_object`
@@ -304,7 +334,7 @@ client.delete_bucket(Bucket='old-releases')

 #### `list_buckets`

-List all S3 buckets (boto3-compatible).
+List all S3 buckets (boto3-compatible). Includes cached statistics when available.

 ```python
 def list_buckets(
@@ -315,7 +345,32 @@ def list_buckets(

 ##### Returns

-Dict with list of buckets and owner information (identical to boto3).
+Dict with list of buckets and owner information (identical to boto3). Each bucket may include optional `DeltaGliderStats` metadata if statistics have been previously cached.
+
+##### Response Structure
+
+```python
+{
+    'Buckets': [
+        {
+            'Name': 'bucket-name',
+            'CreationDate': datetime(2025, 1, 1),
+            'DeltaGliderStats': {  # Optional, only if cached
+                'Cached': True,
+                'Detailed': bool,  # Whether detailed stats were fetched
+                'ObjectCount': int,
+                'TotalSize': int,
+                'CompressedSize': int,
+                'SpaceSaved': int,
+                'AverageCompressionRatio': float,
+                'DeltaObjects': int,
+                'DirectObjects': int
+            }
+        }
+    ],
+    'Owner': {...}
+}
+```

 ##### Examples

@@ -324,6 +379,17 @@ Dict with list of buckets and owner information (identical to boto3).
 response = client.list_buckets()
 for bucket in response['Buckets']:
    print(f"{bucket['Name']} - Created: {bucket['CreationDate']}")
+
+    # Check if stats are cached
+    if 'DeltaGliderStats' in bucket:
+        stats = bucket['DeltaGliderStats']
+        print(f"  Cached stats: {stats['ObjectCount']} objects, "
+              f"{stats['AverageCompressionRatio']:.1%} compression")
+
+# Fetch stats first, then list buckets to see cached data
+client.get_bucket_stats('my-bucket', detailed_stats=True)
+response = client.list_buckets()
+# Now 'my-bucket' will include DeltaGliderStats in response
 ```

 ### Simple API Methods
@@ -460,6 +526,104 @@ else:
    # Re-upload or investigate
 ```

+### Cache Management Methods
+
+DeltaGlider maintains two types of caches for performance optimization:
+1. **Reference cache**: Binary reference files used for delta reconstruction
+2. **Statistics cache**: Bucket statistics (session-scoped)
+
+#### `clear_cache`
+
+Clear all cached data including reference files and bucket statistics.
+
+```python
+def clear_cache(self) -> None
+```
+
+##### Description
+
+Removes all cached reference files from the local filesystem and invalidates all bucket statistics. Useful for:
+- Forcing fresh statistics computation
+- Freeing disk space in long-running applications
+- Ensuring latest data after external bucket modifications
+- Testing and development workflows
+
+##### Cache Types Cleared
+
+1. **Reference Cache**: Binary reference files stored in `/tmp/deltaglider-*/`
+   - Encrypted at rest with ephemeral keys
+   - Content-addressed storage (SHA256-based filenames)
+   - Automatically cleaned up on process exit
+
+2. **Statistics Cache**: Bucket statistics cached per client session
+   - Metadata about compression ratios and object counts
+   - Session-scoped (not persisted to disk)
+   - Automatically invalidated on bucket mutations
+
+##### Examples
+
+```python
+# Long-running application
+client = create_client()
+
+# Work with files
+for i in range(1000):
+    client.upload(f"file_{i}.zip", "s3://bucket/")
+
+    # Periodic cache cleanup to prevent disk buildup
+    if i % 100 == 0:
+        client.clear_cache()
+
+# Force fresh statistics after external changes
+stats_before = client.get_bucket_stats('releases')  # Cached
+# ... external tool modifies bucket ...
+client.clear_cache()
+stats_after = client.get_bucket_stats('releases')  # Fresh data
+
+# Development workflow
+client.clear_cache()  # Start with clean state
+```
+
+#### `evict_cache`
+
+Remove a specific cached reference file from the local cache.
+
+```python
+def evict_cache(self, s3_url: str) -> None
+```
+
+##### Parameters
+
+- **s3_url** (`str`): S3 URL of the reference file to evict (e.g., `s3://bucket/prefix/reference.bin`)
+
+##### Description
+
+Removes a specific reference file from the cache without affecting other cached files or statistics. Useful for:
+- Selective cache invalidation when specific references are updated
+- Memory management in applications with many delta spaces
+- Testing specific delta compression scenarios
+
+##### Examples
+
+```python
+# Evict specific reference after update
+client.upload("new-reference.zip", "s3://releases/v2.0.0/")
+client.evict_cache("s3://releases/v2.0.0/reference.bin")
+
+# Next upload will fetch fresh reference
+client.upload("similar-file.zip", "s3://releases/v2.0.0/")
+
+# Selective eviction for specific delta spaces
+delta_spaces = ["v1.0.0", "v1.1.0", "v1.2.0"]
+for space in delta_spaces:
+    client.evict_cache(f"s3://releases/{space}/reference.bin")
+```
+
+##### See Also
+
+- [docs/CACHE_MANAGEMENT.md](../../CACHE_MANAGEMENT.md): Complete cache management guide
+- `clear_cache()`: Clear all caches
+
 #### `lifecycle_policy`

 Set lifecycle policy for S3 prefix (placeholder for future implementation).
--- a/docs/sdk/examples.md
+++ b/docs/sdk/examples.md
@@ -5,15 +5,17 @@ Real-world examples and patterns for using DeltaGlider in production application
 ## Table of Contents

 1. [Performance-Optimized Bucket Listing](#performance-optimized-bucket-listing)
-2. [Bucket Management](#bucket-management)
-3. [Software Release Management](#software-release-management)
-4. [Database Backup System](#database-backup-system)
-5. [CI/CD Pipeline Integration](#cicd-pipeline-integration)
-6. [Container Registry Storage](#container-registry-storage)
-7. [Machine Learning Model Versioning](#machine-learning-model-versioning)
-8. [Game Asset Distribution](#game-asset-distribution)
-9. [Log Archive Management](#log-archive-management)
-10. [Multi-Region Replication](#multi-region-replication)
+2. [Bucket Statistics and Monitoring](#bucket-statistics-and-monitoring)
+3. [Session-Level Cache Management](#session-level-cache-management)
+4. [Bucket Management](#bucket-management)
+5. [Software Release Management](#software-release-management)
+6. [Database Backup System](#database-backup-system)
+7. [CI/CD Pipeline Integration](#cicd-pipeline-integration)
+8. [Container Registry Storage](#container-registry-storage)
+9. [Machine Learning Model Versioning](#machine-learning-model-versioning)
+10. [Game Asset Distribution](#game-asset-distribution)
+11. [Log Archive Management](#log-archive-management)
+12. [Multi-Region Replication](#multi-region-replication)

 ## Performance-Optimized Bucket Listing

@@ -199,6 +201,322 @@ performance_comparison('releases')

 2. **Never Fetch for Non-Deltas**: The SDK automatically skips metadata fetching for non-delta files even when `FetchMetadata=True`.

+## Bucket Statistics and Monitoring
+
+DeltaGlider provides powerful bucket statistics with session-level caching for performance.
+
+### Quick Dashboard Stats (Cached)
+
+```python
+from deltaglider import create_client
+
+client = create_client()
+
+def show_bucket_dashboard(bucket: str):
+    """Display real-time bucket statistics with caching."""
+
+    # First call: computes stats (~50ms)
+    stats = client.get_bucket_stats(bucket)
+
+    # Second call: instant (cached)
+    stats = client.get_bucket_stats(bucket)
+
+    print(f"Dashboard for {stats.bucket}")
+    print(f"=" * 60)
+    print(f"Total Objects:        {stats.object_count:,}")
+    print(f"  Delta Objects:      {stats.delta_objects:,}")
+    print(f"  Direct Objects:     {stats.direct_objects:,}")
+    print()
+    print(f"Original Size:        {stats.total_size / (1024**3):.2f} GB")
+    print(f"Stored Size:          {stats.compressed_size / (1024**3):.2f} GB")
+    print(f"Space Saved:          {stats.space_saved / (1024**3):.2f} GB")
+    print(f"Compression Ratio:    {stats.average_compression_ratio:.1%}")
+
+# Example: Show stats for multiple buckets (each cached separately)
+for bucket_name in ['releases', 'backups', 'archives']:
+    show_bucket_dashboard(bucket_name)
+```
+
+### Detailed Compression Analysis
+
+```python
+def detailed_compression_report(bucket: str):
+    """Generate detailed compression report with accurate ratios."""
+
+    # Detailed stats fetch metadata for delta files (slower, accurate)
+    stats = client.get_bucket_stats(bucket, detailed_stats=True)
+
+    efficiency = (stats.space_saved / stats.total_size * 100) if stats.total_size > 0 else 0
+
+    print(f"Detailed Compression Report: {stats.bucket}")
+    print(f"=" * 60)
+    print(f"Object Distribution:")
+    print(f"  Total:              {stats.object_count:,}")
+    print(f"  Delta-Compressed:   {stats.delta_objects:,} ({stats.delta_objects/stats.object_count*100:.1f}%)")
+    print(f"  Direct Storage:     {stats.direct_objects:,} ({stats.direct_objects/stats.object_count*100:.1f}%)")
+    print()
+    print(f"Storage Efficiency:")
+    print(f"  Original Data:      {stats.total_size / (1024**3):.2f} GB")
+    print(f"  Actual Storage:     {stats.compressed_size / (1024**3):.2f} GB")
+    print(f"  Space Saved:        {stats.space_saved / (1024**3):.2f} GB")
+    print(f"  Efficiency:         {efficiency:.1f}%")
+    print(f"  Avg Compression:    {stats.average_compression_ratio:.2%}")
+
+    # Calculate estimated monthly costs (example: $0.023/GB S3 Standard)
+    cost_without = stats.total_size / (1024**3) * 0.023
+    cost_with = stats.compressed_size / (1024**3) * 0.023
+    monthly_savings = cost_without - cost_with
+
+    print()
+    print(f"Estimated Monthly S3 Costs ($0.023/GB):")
+    print(f"  Without DeltaGlider: ${cost_without:.2f}")
+    print(f"  With DeltaGlider:    ${cost_with:.2f}")
+    print(f"  Monthly Savings:     ${monthly_savings:.2f}")
+
+# Example: Detailed report
+detailed_compression_report('releases')
+```
+
+### List Buckets with Cached Stats
+
+```python
+def list_buckets_with_stats():
+    """List all buckets and show cached statistics if available."""
+
+    # Pre-fetch stats for important buckets
+    important_buckets = ['releases', 'backups']
+    for bucket_name in important_buckets:
+        client.get_bucket_stats(bucket_name, detailed_stats=True)
+
+    # List all buckets (includes cached stats automatically)
+    response = client.list_buckets()
+
+    print("All Buckets:")
+    print(f"{'Name':<30} {'Objects':<10} {'Compression':<15} {'Cached'}")
+    print("=" * 70)
+
+    for bucket in response['Buckets']:
+        name = bucket['Name']
+
+        # Check if stats are cached
+        if 'DeltaGliderStats' in bucket:
+            stats = bucket['DeltaGliderStats']
+            obj_count = f"{stats['ObjectCount']:,}"
+            compression = f"{stats['AverageCompressionRatio']:.1%}"
+            cached = "✓ (detailed)" if stats['Detailed'] else "✓ (quick)"
+        else:
+            obj_count = "N/A"
+            compression = "N/A"
+            cached = "✗"
+
+        print(f"{name:<30} {obj_count:<10} {compression:<15} {cached}")
+
+# Example: List with stats
+list_buckets_with_stats()
+```
+
+### Monitoring Dashboard (Real-Time)
+
+```python
+import time
+
+def monitoring_dashboard(buckets: list[str], refresh_seconds: int = 60):
+    """Real-time monitoring dashboard with periodic refresh."""
+
+    while True:
+        print("\033[2J\033[H")  # Clear screen
+        print(f"DeltaGlider Monitoring Dashboard - {time.strftime('%Y-%m-%d %H:%M:%S')}")
+        print("=" * 80)
+
+        for bucket_name in buckets:
+            # Get cached stats (instant) or compute fresh
+            stats = client.get_bucket_stats(bucket_name)
+
+            print(f"\n{bucket_name}:")
+            print(f"  Objects: {stats.object_count:,} | "
+                  f"Delta: {stats.delta_objects:,} | "
+                  f"Direct: {stats.direct_objects:,}")
+            print(f"  Size: {stats.compressed_size/(1024**3):.2f} GB | "
+                  f"Saved: {stats.space_saved/(1024**3):.2f} GB | "
+                  f"Compression: {stats.average_compression_ratio:.1%}")
+
+        print(f"\n{'=' * 80}")
+        print(f"Refreshing in {refresh_seconds} seconds... (Ctrl+C to exit)")
+
+        time.sleep(refresh_seconds)
+
+        # Clear cache for fresh data on next iteration
+        client.clear_cache()
+
+# Example: Monitor key buckets
+try:
+    monitoring_dashboard(['releases', 'backups', 'archives'], refresh_seconds=30)
+except KeyboardInterrupt:
+    print("\nMonitoring stopped.")
+```
+
+## Session-Level Cache Management
+
+DeltaGlider maintains session-level caches for optimal performance in long-running applications.
+
+### Long-Running Application Pattern
+
+```python
+from deltaglider import create_client
+import time
+
+def long_running_upload_service():
+    """Upload service with periodic cache cleanup."""
+
+    client = create_client()
+    processed_count = 0
+
+    while True:
+        # Simulate file processing
+        files_to_upload = get_pending_files()  # Your file queue
+
+        for file_path in files_to_upload:
+            try:
+                summary = client.upload(file_path, "s3://releases/")
+                processed_count += 1
+
+                print(f"Uploaded {file_path}: {summary.savings_percent:.0f}% saved")
+
+                # Periodic cache cleanup (every 100 files)
+                if processed_count % 100 == 0:
+                    client.clear_cache()
+                    print(f"Cache cleared after {processed_count} files")
+
+            except Exception as e:
+                print(f"Error uploading {file_path}: {e}")
+
+        time.sleep(60)  # Check for new files every minute
+
+# Example: Run upload service
+# long_running_upload_service()
+```
+
+### Cache Invalidation After External Changes
+
+```python
+def handle_external_bucket_changes(bucket: str):
+    """Refresh statistics after external tools modify bucket."""
+
+    # Get initial stats (cached)
+    stats_before = client.get_bucket_stats(bucket)
+    print(f"Before: {stats_before.object_count} objects")
+
+    # External process modifies bucket
+    print("External backup tool running...")
+    run_external_backup_tool(bucket)  # Your external tool
+
+    # Clear cache to get fresh data
+    client.clear_cache()
+
+    # Get updated stats
+    stats_after = client.get_bucket_stats(bucket)
+    print(f"After: {stats_after.object_count} objects")
+    print(f"Added: {stats_after.object_count - stats_before.object_count} objects")
+
+# Example usage
+handle_external_bucket_changes('backups')
+```
+
+### Selective Cache Eviction
+
+```python
+def selective_cache_management():
+    """Manage cache for specific delta spaces."""
+
+    client = create_client()
+
+    # Upload to multiple delta spaces
+    versions = ['v1.0.0', 'v1.1.0', 'v1.2.0']
+
+    for version in versions:
+        client.upload(f"app-{version}.zip", f"s3://releases/{version}/")
+
+    # Update reference for specific version
+    print("Updating v1.1.0 reference...")
+    client.upload("new-reference.zip", "s3://releases/v1.1.0/")
+
+    # Evict only v1.1.0 cache (others remain cached)
+    client.evict_cache("s3://releases/v1.1.0/reference.bin")
+
+    # Next upload to v1.1.0 fetches fresh reference
+    # v1.0.0 and v1.2.0 still use cached references
+    client.upload("similar-file.zip", "s3://releases/v1.1.0/")
+
+# Example: Selective eviction
+selective_cache_management()
+```
+
+### Testing with Clean Cache
+
+```python
+import pytest
+from deltaglider import create_client
+
+def test_upload_workflow():
+    """Test with clean cache state."""
+
+    client = create_client()
+    client.clear_cache()  # Start with clean state
+
+    # Test first upload (no reference exists)
+    summary1 = client.upload("file1.zip", "s3://test-bucket/prefix/")
+    assert not summary1.is_delta  # First file is reference
+
+    # Test subsequent upload (uses cached reference)
+    summary2 = client.upload("file2.zip", "s3://test-bucket/prefix/")
+    assert summary2.is_delta  # Should use delta
+
+    # Clear and test again
+    client.clear_cache()
+    summary3 = client.upload("file3.zip", "s3://test-bucket/prefix/")
+    assert summary3.is_delta  # Still delta (reference in S3)
+
+# Run test
+# test_upload_workflow()
+```
+
+### Cache Performance Monitoring
+
+```python
+import time
+
+def measure_cache_performance(bucket: str):
+    """Measure performance impact of caching."""
+
+    client = create_client()
+
+    # Test 1: Cold cache
+    client.clear_cache()
+    start = time.time()
+    stats1 = client.get_bucket_stats(bucket, detailed_stats=True)
+    cold_time = (time.time() - start) * 1000
+
+    # Test 2: Warm cache
+    start = time.time()
+    stats2 = client.get_bucket_stats(bucket, detailed_stats=True)
+    warm_time = (time.time() - start) * 1000
+
+    # Test 3: Quick stats from detailed cache
+    start = time.time()
+    stats3 = client.get_bucket_stats(bucket, detailed_stats=False)
+    reuse_time = (time.time() - start) * 1000
+
+    print(f"Cache Performance for {bucket}:")
+    print(f"  Cold Cache (detailed):     {cold_time:.0f}ms")
+    print(f"  Warm Cache (detailed):     {warm_time:.0f}ms")
+    print(f"  Cache Reuse (quick):       {reuse_time:.0f}ms")
+    print(f"  Speedup (detailed):        {cold_time/warm_time:.1f}x")
+    print(f"  Speedup (reuse):           {cold_time/reuse_time:.1f}x")
+
+# Example: Measure cache performance
+measure_cache_performance('releases')
+```
+
 3. **Use Pagination**: For large buckets, use `MaxKeys` and `ContinuationToken` to paginate results.

 4. **Cache Results**: If you need metadata frequently, consider caching the results to avoid repeated HEAD requests.