docs: Update SDK documentation for v5.1.0 features

- Add session-level caching documentation to API reference - Document clear_cache() and evict_cache() methods - Add comprehensive bucket statistics examples - Update list_buckets() with DeltaGliderStats metadata - Add cache management patterns and best practices - Update CHANGELOG comparison links 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 03:01:43 +02:00 · 2025-10-10 18:34:44 +02:00
parent 3d04a407c0
commit dbd2632cae
3 changed files with 498 additions and 14 deletions
@@ -5,15 +5,17 @@ Real-world examples and patterns for using DeltaGlider in production application
 ## Table of Contents

 1. [Performance-Optimized Bucket Listing](#performance-optimized-bucket-listing)
-2. [Bucket Management](#bucket-management)
-3. [Software Release Management](#software-release-management)
-4. [Database Backup System](#database-backup-system)
-5. [CI/CD Pipeline Integration](#cicd-pipeline-integration)
-6. [Container Registry Storage](#container-registry-storage)
-7. [Machine Learning Model Versioning](#machine-learning-model-versioning)
-8. [Game Asset Distribution](#game-asset-distribution)
-9. [Log Archive Management](#log-archive-management)
-10. [Multi-Region Replication](#multi-region-replication)
+2. [Bucket Statistics and Monitoring](#bucket-statistics-and-monitoring)
+3. [Session-Level Cache Management](#session-level-cache-management)
+4. [Bucket Management](#bucket-management)
+5. [Software Release Management](#software-release-management)
+6. [Database Backup System](#database-backup-system)
+7. [CI/CD Pipeline Integration](#cicd-pipeline-integration)
+8. [Container Registry Storage](#container-registry-storage)
+9. [Machine Learning Model Versioning](#machine-learning-model-versioning)
+10. [Game Asset Distribution](#game-asset-distribution)
+11. [Log Archive Management](#log-archive-management)
+12. [Multi-Region Replication](#multi-region-replication)

 ## Performance-Optimized Bucket Listing

@@ -199,6 +201,322 @@ performance_comparison('releases')

 2. **Never Fetch for Non-Deltas**: The SDK automatically skips metadata fetching for non-delta files even when `FetchMetadata=True`.

+## Bucket Statistics and Monitoring
+
+DeltaGlider provides powerful bucket statistics with session-level caching for performance.
+
+### Quick Dashboard Stats (Cached)
+
+```python
+from deltaglider import create_client
+
+client = create_client()
+
+def show_bucket_dashboard(bucket: str):
+    """Display real-time bucket statistics with caching."""
+
+    # First call: computes stats (~50ms)
+    stats = client.get_bucket_stats(bucket)
+
+    # Second call: instant (cached)
+    stats = client.get_bucket_stats(bucket)
+
+    print(f"Dashboard for {stats.bucket}")
+    print(f"=" * 60)
+    print(f"Total Objects:        {stats.object_count:,}")
+    print(f"  Delta Objects:      {stats.delta_objects:,}")
+    print(f"  Direct Objects:     {stats.direct_objects:,}")
+    print()
+    print(f"Original Size:        {stats.total_size / (1024**3):.2f} GB")
+    print(f"Stored Size:          {stats.compressed_size / (1024**3):.2f} GB")
+    print(f"Space Saved:          {stats.space_saved / (1024**3):.2f} GB")
+    print(f"Compression Ratio:    {stats.average_compression_ratio:.1%}")
+
+# Example: Show stats for multiple buckets (each cached separately)
+for bucket_name in ['releases', 'backups', 'archives']:
+    show_bucket_dashboard(bucket_name)
+```
+
+### Detailed Compression Analysis
+
+```python
+def detailed_compression_report(bucket: str):
+    """Generate detailed compression report with accurate ratios."""
+
+    # Detailed stats fetch metadata for delta files (slower, accurate)
+    stats = client.get_bucket_stats(bucket, detailed_stats=True)
+
+    efficiency = (stats.space_saved / stats.total_size * 100) if stats.total_size > 0 else 0
+
+    print(f"Detailed Compression Report: {stats.bucket}")
+    print(f"=" * 60)
+    print(f"Object Distribution:")
+    print(f"  Total:              {stats.object_count:,}")
+    print(f"  Delta-Compressed:   {stats.delta_objects:,} ({stats.delta_objects/stats.object_count*100:.1f}%)")
+    print(f"  Direct Storage:     {stats.direct_objects:,} ({stats.direct_objects/stats.object_count*100:.1f}%)")
+    print()
+    print(f"Storage Efficiency:")
+    print(f"  Original Data:      {stats.total_size / (1024**3):.2f} GB")
+    print(f"  Actual Storage:     {stats.compressed_size / (1024**3):.2f} GB")
+    print(f"  Space Saved:        {stats.space_saved / (1024**3):.2f} GB")
+    print(f"  Efficiency:         {efficiency:.1f}%")
+    print(f"  Avg Compression:    {stats.average_compression_ratio:.2%}")
+
+    # Calculate estimated monthly costs (example: $0.023/GB S3 Standard)
+    cost_without = stats.total_size / (1024**3) * 0.023
+    cost_with = stats.compressed_size / (1024**3) * 0.023
+    monthly_savings = cost_without - cost_with
+
+    print()
+    print(f"Estimated Monthly S3 Costs ($0.023/GB):")
+    print(f"  Without DeltaGlider: ${cost_without:.2f}")
+    print(f"  With DeltaGlider:    ${cost_with:.2f}")
+    print(f"  Monthly Savings:     ${monthly_savings:.2f}")
+
+# Example: Detailed report
+detailed_compression_report('releases')
+```
+
+### List Buckets with Cached Stats
+
+```python
+def list_buckets_with_stats():
+    """List all buckets and show cached statistics if available."""
+
+    # Pre-fetch stats for important buckets
+    important_buckets = ['releases', 'backups']
+    for bucket_name in important_buckets:
+        client.get_bucket_stats(bucket_name, detailed_stats=True)
+
+    # List all buckets (includes cached stats automatically)
+    response = client.list_buckets()
+
+    print("All Buckets:")
+    print(f"{'Name':<30} {'Objects':<10} {'Compression':<15} {'Cached'}")
+    print("=" * 70)
+
+    for bucket in response['Buckets']:
+        name = bucket['Name']
+
+        # Check if stats are cached
+        if 'DeltaGliderStats' in bucket:
+            stats = bucket['DeltaGliderStats']
+            obj_count = f"{stats['ObjectCount']:,}"
+            compression = f"{stats['AverageCompressionRatio']:.1%}"
+            cached = "✓ (detailed)" if stats['Detailed'] else "✓ (quick)"
+        else:
+            obj_count = "N/A"
+            compression = "N/A"
+            cached = "✗"
+
+        print(f"{name:<30} {obj_count:<10} {compression:<15} {cached}")
+
+# Example: List with stats
+list_buckets_with_stats()
+```
+
+### Monitoring Dashboard (Real-Time)
+
+```python
+import time
+
+def monitoring_dashboard(buckets: list[str], refresh_seconds: int = 60):
+    """Real-time monitoring dashboard with periodic refresh."""
+
+    while True:
+        print("\033[2J\033[H")  # Clear screen
+        print(f"DeltaGlider Monitoring Dashboard - {time.strftime('%Y-%m-%d %H:%M:%S')}")
+        print("=" * 80)
+
+        for bucket_name in buckets:
+            # Get cached stats (instant) or compute fresh
+            stats = client.get_bucket_stats(bucket_name)
+
+            print(f"\n{bucket_name}:")
+            print(f"  Objects: {stats.object_count:,} | "
+                  f"Delta: {stats.delta_objects:,} | "
+                  f"Direct: {stats.direct_objects:,}")
+            print(f"  Size: {stats.compressed_size/(1024**3):.2f} GB | "
+                  f"Saved: {stats.space_saved/(1024**3):.2f} GB | "
+                  f"Compression: {stats.average_compression_ratio:.1%}")
+
+        print(f"\n{'=' * 80}")
+        print(f"Refreshing in {refresh_seconds} seconds... (Ctrl+C to exit)")
+
+        time.sleep(refresh_seconds)
+
+        # Clear cache for fresh data on next iteration
+        client.clear_cache()
+
+# Example: Monitor key buckets
+try:
+    monitoring_dashboard(['releases', 'backups', 'archives'], refresh_seconds=30)
+except KeyboardInterrupt:
+    print("\nMonitoring stopped.")
+```
+
+## Session-Level Cache Management
+
+DeltaGlider maintains session-level caches for optimal performance in long-running applications.
+
+### Long-Running Application Pattern
+
+```python
+from deltaglider import create_client
+import time
+
+def long_running_upload_service():
+    """Upload service with periodic cache cleanup."""
+
+    client = create_client()
+    processed_count = 0
+
+    while True:
+        # Simulate file processing
+        files_to_upload = get_pending_files()  # Your file queue
+
+        for file_path in files_to_upload:
+            try:
+                summary = client.upload(file_path, "s3://releases/")
+                processed_count += 1
+
+                print(f"Uploaded {file_path}: {summary.savings_percent:.0f}% saved")
+
+                # Periodic cache cleanup (every 100 files)
+                if processed_count % 100 == 0:
+                    client.clear_cache()
+                    print(f"Cache cleared after {processed_count} files")
+
+            except Exception as e:
+                print(f"Error uploading {file_path}: {e}")
+
+        time.sleep(60)  # Check for new files every minute
+
+# Example: Run upload service
+# long_running_upload_service()
+```
+
+### Cache Invalidation After External Changes
+
+```python
+def handle_external_bucket_changes(bucket: str):
+    """Refresh statistics after external tools modify bucket."""
+
+    # Get initial stats (cached)
+    stats_before = client.get_bucket_stats(bucket)
+    print(f"Before: {stats_before.object_count} objects")
+
+    # External process modifies bucket
+    print("External backup tool running...")
+    run_external_backup_tool(bucket)  # Your external tool
+
+    # Clear cache to get fresh data
+    client.clear_cache()
+
+    # Get updated stats
+    stats_after = client.get_bucket_stats(bucket)
+    print(f"After: {stats_after.object_count} objects")
+    print(f"Added: {stats_after.object_count - stats_before.object_count} objects")
+
+# Example usage
+handle_external_bucket_changes('backups')
+```
+
+### Selective Cache Eviction
+
+```python
+def selective_cache_management():
+    """Manage cache for specific delta spaces."""
+
+    client = create_client()
+
+    # Upload to multiple delta spaces
+    versions = ['v1.0.0', 'v1.1.0', 'v1.2.0']
+
+    for version in versions:
+        client.upload(f"app-{version}.zip", f"s3://releases/{version}/")
+
+    # Update reference for specific version
+    print("Updating v1.1.0 reference...")
+    client.upload("new-reference.zip", "s3://releases/v1.1.0/")
+
+    # Evict only v1.1.0 cache (others remain cached)
+    client.evict_cache("s3://releases/v1.1.0/reference.bin")
+
+    # Next upload to v1.1.0 fetches fresh reference
+    # v1.0.0 and v1.2.0 still use cached references
+    client.upload("similar-file.zip", "s3://releases/v1.1.0/")
+
+# Example: Selective eviction
+selective_cache_management()
+```
+
+### Testing with Clean Cache
+
+```python
+import pytest
+from deltaglider import create_client
+
+def test_upload_workflow():
+    """Test with clean cache state."""
+
+    client = create_client()
+    client.clear_cache()  # Start with clean state
+
+    # Test first upload (no reference exists)
+    summary1 = client.upload("file1.zip", "s3://test-bucket/prefix/")
+    assert not summary1.is_delta  # First file is reference
+
+    # Test subsequent upload (uses cached reference)
+    summary2 = client.upload("file2.zip", "s3://test-bucket/prefix/")
+    assert summary2.is_delta  # Should use delta
+
+    # Clear and test again
+    client.clear_cache()
+    summary3 = client.upload("file3.zip", "s3://test-bucket/prefix/")
+    assert summary3.is_delta  # Still delta (reference in S3)
+
+# Run test
+# test_upload_workflow()
+```
+
+### Cache Performance Monitoring
+
+```python
+import time
+
+def measure_cache_performance(bucket: str):
+    """Measure performance impact of caching."""
+
+    client = create_client()
+
+    # Test 1: Cold cache
+    client.clear_cache()
+    start = time.time()
+    stats1 = client.get_bucket_stats(bucket, detailed_stats=True)
+    cold_time = (time.time() - start) * 1000
+
+    # Test 2: Warm cache
+    start = time.time()
+    stats2 = client.get_bucket_stats(bucket, detailed_stats=True)
+    warm_time = (time.time() - start) * 1000
+
+    # Test 3: Quick stats from detailed cache
+    start = time.time()
+    stats3 = client.get_bucket_stats(bucket, detailed_stats=False)
+    reuse_time = (time.time() - start) * 1000
+
+    print(f"Cache Performance for {bucket}:")
+    print(f"  Cold Cache (detailed):     {cold_time:.0f}ms")
+    print(f"  Warm Cache (detailed):     {warm_time:.0f}ms")
+    print(f"  Cache Reuse (quick):       {reuse_time:.0f}ms")
+    print(f"  Speedup (detailed):        {cold_time/warm_time:.1f}x")
+    print(f"  Speedup (reuse):           {cold_time/reuse_time:.1f}x")
+
+# Example: Measure cache performance
+measure_cache_performance('releases')
+```
+
 3. **Use Pagination**: For large buckets, use `MaxKeys` and `ContinuationToken` to paginate results.

 4. **Cache Results**: If you need metadata frequently, consider caching the results to avoid repeated HEAD requests.