fix: remove hardcoded /tmp dir in create_client()

Fixes #5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
style: format files for ruff format --check compliance
2026-04-30 12:14:32 +02:00 · 2026-03-16 10:27:29 +07:00 · 2026-02-07 16:02:40 +01:00 · 2026-02-07 15:57:37 +01:00 · 2026-02-07 15:55:50 +01:00 · 2026-02-07 15:53:33 +01:00
51 changed files with 6716 additions and 1782 deletions
--- a/.github/workflows/docker-publish.yml
+++ b/.github/workflows/docker-publish.yml
@@ -0,0 +1,92 @@
+name: Build and Publish Docker Images
+
+on:
+  push:
+    branches:
+      - main
+      - develop
+    tags:
+      - 'v*'
+  pull_request:
+    branches:
+      - main
+  workflow_dispatch:
+
+env:
+  REGISTRY: docker.io
+  IMAGE_NAME: beshultd/deltaglider
+
+jobs:
+  build-and-push:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0  # Full history for proper git describe
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Log in to Docker Hub
+        if: github.event_name != 'pull_request'
+        uses: docker/login-action@v3
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Extract version from git
+        id: version
+        run: |
+          # Get version from git tags
+          VERSION=$(git describe --tags --always --abbrev=0 2>/dev/null || echo "dev")
+          # Remove 'v' prefix if present
+          VERSION=${VERSION#v}
+          echo "version=${VERSION}" >> $GITHUB_OUTPUT
+          echo "Version: ${VERSION}"
+
+      - name: Extract metadata (tags, labels) for Docker
+        id: meta
+        uses: docker/metadata-action@v5
+        with:
+          images: ${{ env.IMAGE_NAME }}
+          tags: |
+            # For main branch: tag as 'latest'
+            type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}
+            # For develop branch: tag as 'develop'
+            type=raw,value=develop,enable=${{ github.ref == 'refs/heads/develop' }}
+            # For version tags: use semver patterns
+            type=semver,pattern={{version}}
+            type=semver,pattern={{major}}.{{minor}}
+            type=semver,pattern={{major}}
+            # For PRs: tag as pr-<number>
+            type=ref,event=pr
+            # Include git sha for traceability (only on branch pushes, not tags)
+            type=sha,prefix={{branch}}-,enable=${{ startsWith(github.ref, 'refs/heads/') }}
+
+      - name: Build and push Docker image
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          platforms: linux/amd64,linux/arm64
+          push: ${{ github.event_name != 'pull_request' }}
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+          build-args: |
+            VERSION=${{ steps.version.outputs.version }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+
+      - name: Docker Hub Description
+        if: github.event_name != 'pull_request' && github.ref == 'refs/heads/main'
+        uses: peter-evans/dockerhub-description@v4
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+          repository: ${{ env.IMAGE_NAME }}
+          short-description: "Store 4TB in 5GB: S3-compatible storage with 99.9% compression"
+          readme-filepath: ./README.md
--- a/BOTO3_COMPATIBILITY.md
+++ b/BOTO3_COMPATIBILITY.md
@@ -2,7 +2,7 @@

 DeltaGlider implements a **subset** of boto3's S3 client API, focusing on the most commonly used operations. This is **not** a 100% drop-in replacement, but covers the core functionality needed for most use cases.

-## ✅ Implemented Methods (21 core methods)
+## ✅ Implemented Methods (23 core methods)

 ### Object Operations
 - ✅ `put_object()` - Upload objects (with automatic delta compression)
@@ -17,6 +17,8 @@ DeltaGlider implements a **subset** of boto3's S3 client API, focusing on the mo
 - ✅ `create_bucket()` - Create buckets
 - ✅ `delete_bucket()` - Delete empty buckets
 - ✅ `list_buckets()` - List all buckets
+- ✅ `put_bucket_acl()` - Set bucket ACL (passthrough to S3)
+- ✅ `get_bucket_acl()` - Get bucket ACL (passthrough to S3)

 ### Presigned URLs
 - ✅ `generate_presigned_url()` - Generate presigned URLs
@@ -46,8 +48,6 @@ DeltaGlider implements a **subset** of boto3's S3 client API, focusing on the mo
 - ❌ `list_parts()`

 ### Access Control (ACL)
- ❌ `get_bucket_acl()`
- ❌ `put_bucket_acl()`
 - ❌ `get_object_acl()`
 - ❌ `put_object_acl()`
 - ❌ `get_public_access_block()`
@@ -135,9 +135,9 @@ DeltaGlider implements a **subset** of boto3's S3 client API, focusing on the mo

 ## Coverage Analysis

-**Implemented:** ~21 methods
+**Implemented:** ~23 methods
 **Total boto3 S3 methods:** ~100+ methods
-**Coverage:** ~20%
+**Coverage:** ~23%

 ## What's Covered

--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,7 +5,105 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

-## [Unreleased]
+## [6.1.0] - 2025-02-07
+
+### Added
+- **Bucket ACL Management**: New `put_bucket_acl()` and `get_bucket_acl()` methods
+  - boto3-compatible passthrough to native S3 ACL operations
+  - Supports canned ACLs (`private`, `public-read`, `public-read-write`, `authenticated-read`)
+  - Supports grant-based ACLs (`GrantRead`, `GrantWrite`, `GrantFullControl`, etc.)
+  - Supports full `AccessControlPolicy` dict for fine-grained control
+  - SDK method count increased from 21 to 23
+- **New CLI Commands**: `deltaglider put-bucket-acl` and `deltaglider get-bucket-acl`
+  - Mirrors `aws s3api put-bucket-acl` / `get-bucket-acl` syntax
+  - Accepts bucket name or `s3://bucket` URL format
+  - JSON output for `get-bucket-acl` (compatible with AWS CLI)
+  - Supports `--endpoint-url`, `--region`, `--profile` flags
+- **Docker Publishing**: Added GitHub Actions workflow for multi-arch Docker image builds (amd64/arm64)
+
+### Changed
+- **Refactor**: Extracted `DeltaGliderConfig` dataclass for centralized configuration management
+- **Refactor**: Introduced typed `DeleteResult` and `RecursiveDeleteResult` dataclasses replacing raw dicts
+- **Refactor**: Centralized S3 metadata key aliases into `core/models.py` constants
+- **Refactor**: Extracted helper methods in `DeltaService` for improved readability
+
+### Fixed
+- Removed unused imports flagged by ruff in test files
+
+### Documentation
+- Updated BOTO3_COMPATIBILITY.md (coverage 20% → 23%)
+- Updated AWS S3 CLI compatibility docs with ACL command examples
+- Refreshed README with dark mode logo and streamlined content
+- Cleaned up SDK documentation and examples
+
+## [6.0.0] - 2025-10-17
+
+### Added
+- **EC2 Region Detection & Cost Optimization**
+  - Automatic detection of EC2 instance region using IMDSv2
+  - Warns when EC2 region ≠ S3 client region (potential cross-region charges)
+  - Different warnings for auto-detected vs. explicit `--region` flag mismatches
+  - Green checkmark when regions are aligned (optimal configuration)
+  - Can be disabled with `DG_DISABLE_EC2_DETECTION=true` environment variable
+  - Helps users optimize for cost and performance before migration starts
+- **New CLI Command**: `deltaglider migrate` for S3-to-S3 bucket migration with compression
+  - Supports resume capability (skips already migrated files)
+  - Real-time progress tracking with file count and statistics
+  - Interactive confirmation prompt (use `--yes` to skip)
+  - Prefix preservation by default (use `--no-preserve-prefix` to disable)
+  - Dry run mode with `--dry-run` flag
+  - Include/exclude pattern filtering
+  - Shows compression statistics after migration
+  - **EC2-aware region logging**: Detects EC2 instance and warns about cross-region charges
+  - **FIXED**: Now correctly preserves original filenames during migration
+- **S3-to-S3 Recursive Copy**: `deltaglider cp -r s3://source/ s3://dest/` now supported
+  - Automatically uses migration functionality with prefix preservation
+  - Applies delta compression during transfer
+  - Preserves original filenames correctly
+- **Version Command**: Added `--version` flag to show deltaglider version
+  - Usage: `deltaglider --version`
+- **DeltaService API Enhancement**: Added `override_name` parameter to `put()` method
+  - Allows specifying destination filename independently of source filesystem path
+  - Enables proper S3-to-S3 transfers without filesystem renaming tricks
+- **Rehydration & Purge**: Automatic rehydration of delta-compressed files for presigned URL access
+  - New `deltaglider purge` CLI command to clean expired temporary files
+- **Metadata Namespace**: Centralized `dg-` prefixed metadata keys for all DeltaGlider metadata
+- **S3-Based Stats Caching**: Bucket statistics cached in S3 with automatic invalidation
+
+### Fixed
+- **Critical**: S3-to-S3 migration now preserves original filenames
+  - Previously created files with temp names like `tmp1b9cpdsn.zip`
+  - Now correctly uses original filenames from source S3 keys
+  - Fixed by adding `override_name` parameter to `DeltaService.put()`
+- **CLI Region Support**: `--region` flag now properly passes region to boto3 client
+  - Previously only set environment variable, relied on boto3 auto-detection
+  - Now explicitly passes `region_name` to `boto3.client()` via `boto3_kwargs`
+  - Ensures consistent behavior with `DeltaGliderClient` SDK
+
+### Changed
+- Recursive S3-to-S3 copy operations now preserve source prefix structure by default
+- Migration operations show formatted output with source and destination paths
+
+### Documentation
+- Added comprehensive migration guide in README.md
+- Updated CLI reference with migrate command examples
+- Added prefix preservation behavior documentation
+
+## [5.1.1] - 2025-01-10
+
+### Fixed
+- **Stats Command**: Fixed incorrect compression ratio calculations
+  - Now correctly counts ALL files including reference.bin in compressed size
+  - Fixed handling of orphaned reference.bin files (reference files with no delta files)
+  - Added prominent warnings for orphaned reference files with cleanup commands
+  - Fixed stats for buckets with no compression (now shows 0% instead of negative)
+  - SHA1 checksum files are now properly included in calculations
+
+### Improved
+- **Stats Performance**: Optimized metadata fetching with parallel requests
+  - 5-10x faster for buckets with many delta files
+  - Uses ThreadPoolExecutor for concurrent HEAD requests
+  - Single-pass calculation algorithm for better efficiency

 ## [5.1.0] - 2025-10-10

@@ -177,6 +275,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Delta compression for versioned artifacts
 - 99%+ compression for similar files

+[6.1.0]: https://github.com/beshu-tech/deltaglider/compare/v6.0.2...v6.1.0
+[6.0.0]: https://github.com/beshu-tech/deltaglider/compare/v5.1.1...v6.0.0
 [5.1.0]: https://github.com/beshu-tech/deltaglider/compare/v5.0.3...v5.1.0
 [5.0.3]: https://github.com/beshu-tech/deltaglider/compare/v5.0.1...v5.0.3
 [5.0.1]: https://github.com/beshu-tech/deltaglider/compare/v5.0.0...v5.0.1
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -79,12 +79,14 @@ deltaglider stats test-bucket                  # Get bucket statistics

 ### Available CLI Commands
 ```bash
-cp      # Copy files to/from S3 (AWS S3 compatible)
-ls      # List S3 buckets or objects (AWS S3 compatible)
-rm      # Remove S3 objects (AWS S3 compatible)
-sync    # Synchronize directories with S3 (AWS S3 compatible)
-stats   # Get bucket statistics and compression metrics
-verify  # Verify integrity of delta file
+cp               # Copy files to/from S3 (AWS S3 compatible)
+ls               # List S3 buckets or objects (AWS S3 compatible)
+rm               # Remove S3 objects (AWS S3 compatible)
+sync             # Synchronize directories with S3 (AWS S3 compatible)
+stats            # Get bucket statistics and compression metrics
+verify           # Verify integrity of delta file
+put-bucket-acl   # Set bucket ACL (s3api compatible passthrough)
+get-bucket-acl   # Get bucket ACL (s3api compatible passthrough)
 ```

 ## Architecture
--- a/17
+++ b/17
@@ -1,6 +1,7 @@
 # Multi-stage build for deltaglider
 ARG PYTHON_VERSION=3.12-slim
 ARG UV_VERSION=0.5.13
+ARG VERSION=6.0.2

 # Builder stage - install UV and dependencies
 FROM ghcr.io/astral-sh/uv:$UV_VERSION AS uv
@@ -16,16 +17,15 @@ WORKDIR /build
 COPY pyproject.toml ./
 COPY README.md ./

-# Install dependencies with UV caching
-RUN --mount=type=cache,target=/root/.cache/uv \
-    uv pip install --compile-bytecode .
-
-# Copy source code
+# Copy source code - needed for setuptools-scm to write version file
 COPY src ./src

-# Install the package (force reinstall to ensure it's properly installed)
+# Install dependencies and package with UV caching
+# Set SETUPTOOLS_SCM_PRETEND_VERSION to avoid needing .git directory
+ARG VERSION
+ENV SETUPTOOLS_SCM_PRETEND_VERSION_FOR_DELTAGLIDER=${VERSION}
 RUN --mount=type=cache,target=/root/.cache/uv \
-    uv pip install --compile-bytecode --no-deps --force-reinstall .
+    uv pip install --compile-bytecode .

 # Runtime stage - minimal image
 FROM python:${PYTHON_VERSION}
@@ -90,9 +90,10 @@ ENV DG_CACHE_MEMORY_SIZE_MB=100
 # ENV AWS_DEFAULT_REGION=us-east-1

 # Labels
+ARG VERSION
 LABEL org.opencontainers.image.title="DeltaGlider" \
      org.opencontainers.image.description="Delta-aware S3 file storage wrapper with encryption" \
-      org.opencontainers.image.version="5.0.3" \
+      org.opencontainers.image.version="${VERSION}" \
      org.opencontainers.image.authors="Beshu Limited" \
      org.opencontainers.image.source="https://github.com/beshu-tech/deltaglider"

--- a/README.md
+++ b/README.md
@@ -6,14 +6,13 @@
 [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
 [![xdelta3](https://img.shields.io/badge/powered%20by-xdelta3-green.svg)](https://github.com/jmacd/xdelta)

-<div align="center">
-  <img src="https://github.com/beshu-tech/deltaglider/raw/main/docs/deltaglider.png" alt="DeltaGlider Logo" width="500"/>
-</div>

 **Store 4TB of similar files in 5GB. No, that's not a typo.**

 DeltaGlider is a drop-in S3 replacement that may achieve 99.9% size reduction for versioned compressed artifacts, backups, and release archives through intelligent binary delta compression (via xdelta3).

+> 🌟 Star if you like this! Or Leave a message in [Issues](https://github.com/beshu-tech/deltaglider/issues) - we are listening!
+
 ## The Problem We Solved

 You're storing hundreds of versions of your software releases. Each 100MB build differs by <1% from the previous version. You're paying to store 100GB of what's essentially 100MB of unique data.
@@ -26,12 +25,20 @@ From our [ReadOnlyREST case study](docs/case-study-readonlyrest.md):
 - **Before**: 201,840 files, 3.96TB storage, $1,120/year
 - **After**: Same files, 4.9GB storage, $1.32/year
 - **Compression**: 99.9% (not a typo)
- **Integration time**: 5 minutes
+- **Integration time**: 5 minutes 
+- **Data migration** `deltaglider migrate s3://origin-bucket s3://dest-bucket`
+
+Deltaglider is great for compressed archives of similar content. Like multiple releases of the same software, DB backups, etc.
+We don't expect significant benefit for multimedia content like videos, but we never tried.

 ## Quick Start

-The quickest way to start is using the GUI
-* https://github.com/sscarduzio/dg_commander/
+Deltaglider comes as SDK, CLI, but we also have a GUI:
+* https://github.com/beshu-tech/deltaglider_commander/
+
+<div align="center">
+  <img src="https://github.com/beshu-tech/deltaglider/raw/main/docs/deltaglider.png" alt="DeltaGlider Logo"/>
+</div>

 ### CLI Installation

@@ -89,6 +96,7 @@ docker run -v /shared-cache:/tmp/.deltaglider \
 - `DG_CACHE_BACKEND`: Cache backend (default: `filesystem`, options: `filesystem`, `memory`)
 - `DG_CACHE_MEMORY_SIZE_MB`: Memory cache size in MB (default: `100`)
 - `DG_CACHE_ENCRYPTION_KEY`: Optional base64-encoded encryption key for cross-process cache sharing
+- `DG_DISABLE_EC2_DETECTION`: Disable EC2 instance detection (default: `false`, set to `true` to disable)
 - `AWS_ENDPOINT_URL`: S3 endpoint URL (default: AWS S3)
 - `AWS_ACCESS_KEY_ID`: AWS access key
 - `AWS_SECRET_ACCESS_KEY`: AWS secret key
@@ -116,6 +124,9 @@ deltaglider ls s3://releases/

 # Sync directories
 deltaglider sync ./dist/ s3://releases/v1.0.0/
+
+# Migrate existing S3 bucket to DeltaGlider-compressed storage
+deltaglider migrate s3://old-bucket/ s3://new-bucket/
 ```

 **That's it!** DeltaGlider automatically detects similar files and applies 99%+ compression. For more commands and options, see [CLI Reference](#cli-reference).
@@ -132,11 +143,11 @@ Traditional S3:

 With DeltaGlider:
  v1.0.0.zip (100MB) → S3: 100MB reference + 0KB delta
-  v1.0.1.zip (100MB) → S3: 98KB delta (100.1MB total)
-  v1.0.2.zip (100MB) → S3: 97KB delta (100.3MB total)
+  v1.0.1.zip (100MB) → S3: 98KB delta (from 100.1MB total)
+  v1.0.2.zip (100MB) → S3: 97KB delta (from 100.3MB total)
 ```

-DeltaGlider stores the first file as a reference and subsequent similar files as tiny deltas (differences). When you download, it reconstructs the original file perfectly using the reference + delta.
+DeltaGlider stores the first file in a directory (deltaspace) as a reference and subsequent similar files as tiny deltas (differences). When you download, it reconstructs the original file perfectly using the reference + delta.

 ### Intelligent File Type Detection

@@ -156,7 +167,7 @@ DeltaGlider automatically detects file types and applies the optimal strategy:
 - **AWS CLI Replacement**: Same commands as `aws s3` with automatic compression
 - **boto3-Compatible SDK**: Works with existing boto3 code with minimal changes
 - **Zero Configuration**: No databases, no manifest files, no complex setup
- **Data Integrity**: SHA256 verification on every operation
+- **Data Integrity**: original file's SHA256 checksum saved within S3 metadata, verification on every reconstruction
 - **S3 Compatible**: Works with AWS S3, MinIO, Cloudflare R2, and any S3-compatible storage

 ## CLI Reference
@@ -189,13 +200,28 @@ deltaglider sync s3://releases/ ./local-backup/   # Sync from S3
 deltaglider sync --delete ./src/ s3://backup/     # Mirror exactly
 deltaglider sync --exclude "*.log" ./src/ s3://backup/  # Exclude patterns

-# Get bucket statistics (compression metrics)
-deltaglider stats my-bucket                       # Quick stats overview
+# Get bucket statistics with intelligent S3-based caching
+deltaglider stats my-bucket                       # Quick stats (~100ms with cache)
 deltaglider stats s3://my-bucket                  # Also accepts s3:// format
 deltaglider stats s3://my-bucket/                 # With or without trailing slash
-deltaglider stats my-bucket --detailed            # Detailed compression metrics (slower)
+deltaglider stats my-bucket --sampled             # Balanced (one sample per deltaspace)
+deltaglider stats my-bucket --detailed            # Most accurate (slower, all metadata)
+deltaglider stats my-bucket --refresh             # Force cache refresh
+deltaglider stats my-bucket --no-cache            # Skip caching entirely
 deltaglider stats my-bucket --json                # JSON output for automation

+# Integrity verification & maintenance
+deltaglider verify s3://releases/file.zip         # Validate stored SHA256
+deltaglider purge my-bucket                       # Clean expired .deltaglider/tmp files
+deltaglider purge my-bucket --dry-run             # Preview purge results
+deltaglider purge my-bucket --json                # Machine-readable purge stats
+
+# Migrate existing S3 buckets to DeltaGlider compression
+deltaglider migrate s3://old-bucket/ s3://new-bucket/         # Interactive migration
+deltaglider migrate s3://old-bucket/ s3://new-bucket/ --yes   # Skip confirmation
+deltaglider migrate --dry-run s3://old-bucket/ s3://new/      # Preview migration
+deltaglider migrate s3://bucket/v1/ s3://bucket/v2/           # Migrate prefixes
+
 # Works with MinIO, R2, and S3-compatible storage
 deltaglider cp file.zip s3://bucket/ --endpoint-url http://localhost:9000
 ```
@@ -470,18 +496,18 @@ This is why DeltaGlider achieves 99%+ compression on versioned archives - xdelta

 ### System Architecture

-DeltaGlider uses a clean hexagonal architecture:
+DeltaGlider intelligently stores files within **DeltaSpaces** - S3 prefixes where related files share a common reference file for delta compression:

 ```
-┌─────────────┐     ┌──────────────┐     ┌─────────────┐
-│   Your App  │────▶│ DeltaGlider  │────▶│  S3/MinIO   │
-│   (CLI/SDK) │     │    Core      │     │   Storage   │
-└─────────────┘     └──────────────┘     └─────────────┘
-                           │
-                    ┌──────▼───────┐
-                    │ Local Cache  │
-                    │ (References) │
-                    └──────────────┘
+┌─────────────┐     ┌──────────────┐     ┌─────────────────┐
+│   Your App  │────▶│ DeltaGlider  │────▶│   DeltaSpace    │
+│   (CLI/SDK) │     │    Core      │     │  (S3 prefix)    │
+└─────────────┘     └──────────────┘     ├─────────────────┤
+                           │              │ reference.bin   │
+                    ┌──────▼───────┐      │ file1.delta    │
+                    │ Local Cache  │      │ file2.delta    │
+                    │ (References) │      │ file3.delta    │
+                    └──────────────┘      └─────────────────┘
 ```

 **Key Components:**
@@ -490,6 +516,9 @@ DeltaGlider uses a clean hexagonal architecture:
 - **Integrity verification**: SHA256 on every operation
 - **Local caching**: Fast repeated operations
 - **Zero dependencies**: No database, no manifest files
+- **Modular storage**: The storage layer is pluggable - you could easily replace S3 with a filesystem driver (using extended attributes for metadata) or any other backend
+
+The codebase follows a ports-and-adapters pattern where core business logic is decoupled from infrastructure, with storage operations abstracted through well-defined interfaces in the `ports/` directory and concrete implementations in `adapters/`.

 ### When to Use DeltaGlider

@@ -519,10 +548,57 @@ Migrating from `aws s3` to `deltaglider` is as simple as changing the command na
 | `aws s3 rm s3://bucket/file` | `deltaglider rm s3://bucket/file` | - |
 | `aws s3 sync dir/ s3://bucket/` | `deltaglider sync dir/ s3://bucket/` | ✅ 99% incremental |

+### Migrating Existing S3 Buckets
+
+DeltaGlider provides a dedicated `migrate` command to compress your existing S3 data:
+
+```bash
+# Migrate an entire bucket
+deltaglider migrate s3://old-bucket/ s3://compressed-bucket/
+
+# Migrate a prefix (preserves prefix structure by default)
+deltaglider migrate s3://bucket/releases/ s3://bucket/archive/
+# Result: s3://bucket/archive/releases/ contains the files
+
+# Migrate without preserving source prefix
+deltaglider migrate --no-preserve-prefix s3://bucket/v1/ s3://bucket/archive/
+# Result: Files go directly into s3://bucket/archive/
+
+# Preview migration (dry run)
+deltaglider migrate --dry-run s3://old/ s3://new/
+
+# Skip confirmation prompt
+deltaglider migrate --yes s3://old/ s3://new/
+
+# Exclude certain file patterns
+deltaglider migrate --exclude "*.log" s3://old/ s3://new/
+```
+
+**Key Features:**
+- **Resume Support**: Migration automatically skips files that already exist in the destination
+- **Progress Tracking**: Shows real-time migration progress and statistics
+- **Safety First**: Interactive confirmation shows file count before starting
+- **EC2 Cost Optimization**: Automatically detects EC2 instance region and warns about cross-region charges
+  - ✅ Green checkmark when regions align (no extra charges)
+  - ℹ️ INFO when auto-detected mismatch (suggests optimal region)
+  - ⚠️ WARNING when user explicitly set wrong `--region` (expect data transfer costs)
+  - Disable with `DG_DISABLE_EC2_DETECTION=true` if needed
+- **AWS Region Transparency**: Displays the actual AWS region being used
+- **Prefix Preservation**: By default, source prefix is preserved in destination (use `--no-preserve-prefix` to disable)
+- **S3-to-S3 Transfer**: Both regular S3 and DeltaGlider buckets supported
+
+**Prefix Preservation Examples:**
+- `s3://src/data/` → `s3://dest/` creates `s3://dest/data/`
+- `s3://src/a/b/c/` → `s3://dest/x/` creates `s3://dest/x/c/`
+- Use `--no-preserve-prefix` to place files directly in destination without the source prefix
+
+The migration preserves all file names and structure while applying DeltaGlider's compression transparently.
+
 ## Production Ready

 - ✅ **Battle tested**: 200K+ files in production
 - ✅ **Data integrity**: SHA256 verification on every operation
+- ✅ **Cost optimization**: Automatic EC2 region detection warns about cross-region charges - [📖 EC2 Detection Guide](docs/EC2_REGION_DETECTION.md)
 - ✅ **S3 compatible**: Works with AWS, MinIO, Cloudflare R2, etc.
 - ✅ **Atomic operations**: No partial states
 - ✅ **Concurrent safe**: Multiple clients supported
@@ -587,14 +663,8 @@ MIT - Use it freely in your projects.

 ## Success Stories

-> "We reduced our artifact storage from 4TB to 5GB. This isn't hyperbole—it's math."
-> — [ReadOnlyREST Case Study](docs/case-study-readonlyrest.md)
-
-> "Our CI/CD pipeline now uploads 100x faster. Deploys that took minutes now take seconds."
-> — Platform Engineer at [redacted]
-
-> "We were about to buy expensive deduplication storage. DeltaGlider saved us $50K/year."
-> — CTO at [stealth startup]
+> "We reduced our artifact storage from 4TB to 5GB. CI is also much faster, due to smaller uploads."
+> — [ReadonlyREST Case Study](docs/case-study-readonlyrest.md)

 ---

@@ -606,4 +676,10 @@ deltaglider analyze s3://your-bucket/
 # Output: "Potential savings: 95.2% (4.8TB → 237GB)"
 ```

-Built with ❤️ by engineers who were tired of paying to store the same bytes over and over.
+## Who built this? 
+
+Built with ❤️ by [ReadonlyREST](https://readonlyrest.com) engineers who were tired of paying to store the same bytes over and over.
+
+We also built [Anaphora](https://anaphora.it) for aggregated reports and alerting
+
+And [Deltaglider Commander](https://github.com/beshu-tech/deltaglider_commander)
--- a/docs/BOTO3_COMPATIBILITY_VISION.md
+++ b/docs/BOTO3_COMPATIBILITY_VISION.md
@@ -1,28 +1,18 @@
 # boto3 Compatibility Vision

-## Current State (v4.2.3)
+DeltaGlider is a drop-in replacement for boto3's S3 client. This document spells out what “drop-in”
+means in practice so new projects can adopt the SDK with confidence.

-DeltaGlider currently uses custom dataclasses for responses:
+## Current State (v5.x and newer)

-```python
-from deltaglider import create_client, ListObjectsResponse, ObjectInfo
-
-client = create_client()
-response: ListObjectsResponse = client.list_objects(Bucket='my-bucket')
-
-for obj in response.contents:  # Custom field name
-    print(f"{obj.key}: {obj.size}")  # Custom ObjectInfo dataclass
-```
-
-**Problems:**
- ❌ Not a true drop-in replacement for boto3
- ❌ Users need to learn DeltaGlider-specific types
- ❌ Can't use with tools expecting boto3 responses
- ❌ Different API surface (`.contents` vs `['Contents']`)
-
-## Target State (v5.0.0)
-
-DeltaGlider should return native boto3-compatible dicts with TypedDict type hints:
+- `DeltaGliderClient` methods such as `list_objects`, `put_object`, `get_object`, `delete_object`,
+  `delete_objects`, `head_object`, etc. return **boto3-compatible dicts**.
+- TypedDict aliases in `deltaglider.types` (e.g. `ListObjectsV2Response`, `PutObjectResponse`) give
+  IDE/type-checking support without importing boto3.
+- DeltaGlider-specific metadata lives inside standard boto3 fields (typically `Metadata`), so tools
+  that ignore those keys see the exact same structures as they would from boto3.
+- Tests and documentation exercise and describe the boto3-style responses (`response['Contents']`
+  instead of `response.contents`).

 ```python
 from deltaglider import create_client, ListObjectsV2Response
@@ -30,239 +20,35 @@ from deltaglider import create_client, ListObjectsV2Response
 client = create_client()
 response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')

-for obj in response['Contents']:  # boto3-compatible!
-    print(f"{obj['Key']}: {obj['Size']}")  # Works exactly like boto3
-```
-
-**Benefits:**
- ✅ **True drop-in replacement** - swap `boto3.client('s3')` with `create_client()`
- ✅ **No learning curve** - if you know boto3, you know DeltaGlider
- ✅ **Tool compatibility** - works with any library expecting boto3 types
- ✅ **Type safety** - TypedDict provides IDE autocomplete without boto3 import
- ✅ **Zero runtime overhead** - TypedDict compiles to plain dict
-
-## Implementation Plan
-
-### Phase 1: Type Definitions ✅ (DONE)
-
-Created `deltaglider/types.py` with comprehensive TypedDict definitions:
-
-```python
-from typing import TypedDict, NotRequired
-from datetime import datetime
-
-class S3Object(TypedDict):
-    Key: str
-    Size: int
-    LastModified: datetime
-    ETag: NotRequired[str]
-    StorageClass: NotRequired[str]
-
-class ListObjectsV2Response(TypedDict):
-    Contents: list[S3Object]
-    CommonPrefixes: NotRequired[list[dict[str, str]]]
-    IsTruncated: NotRequired[bool]
-    NextContinuationToken: NotRequired[str]
-```
-
-**Key insight:** TypedDict provides type safety at development time but compiles to plain `dict` at runtime!
-
-### Phase 2: Refactor Client Methods (TODO)
-
-Update all client methods to return boto3-compatible dicts:
-
-#### `list_objects()`
-
-**Before:**
-```python
-def list_objects(...) -> ListObjectsResponse:  # Custom dataclass
-    return ListObjectsResponse(
-        name=bucket,
-        contents=[ObjectInfo(...), ...]  # Custom dataclass
-    )
-```
-
-**After:**
-```python
-def list_objects(...) -> ListObjectsV2Response:  # TypedDict
-    return {
-        'Contents': [
-            {
-                'Key': 'file.zip',  # .delta suffix already stripped
-                'Size': 1024,
-                'LastModified': datetime(...),
-                'ETag': '"abc123"',
-            }
-        ],
-        'CommonPrefixes': [{'Prefix': 'dir/'}],
-        'IsTruncated': False,
-    }
-```
-
-**Key changes:**
-1. Return plain dict instead of custom dataclass
-2. Use boto3 field names: `Contents` not `contents`, `Key` not `key`
-3. Strip `.delta` suffix transparently (already done)
-4. Hide `reference.bin` files (already done)
-
-#### `put_object()`
-
-**Before:**
-```python
-def put_object(...) -> dict[str, Any]:
-    return {
-        "ETag": etag,
-        "VersionId": None,
-        "DeltaGliderInfo": {...}  # Custom field
-    }
-```
-
-**After:**
-```python
-def put_object(...) -> PutObjectResponse:  # TypedDict
-    return {
-        'ETag': etag,
-        'ResponseMetadata': {'HTTPStatusCode': 200},
-        # DeltaGlider metadata goes in Metadata field
-        'Metadata': {
-            'deltaglider-is-delta': 'true',
-            'deltaglider-compression-ratio': '0.99'
-        }
-    }
-```
-
-#### `get_object()`
-
-**Before:**
-```python
-def get_object(...) -> dict[str, Any]:
-    return {
-        "Body": data,
-        "ContentLength": len(data),
-        "DeltaGliderInfo": {...}  # Custom field
-    }
-```
-
-**After:**
-```python
-def get_object(...) -> GetObjectResponse:  # TypedDict
-    return {
-        'Body': data,  # bytes, not StreamingBody (simpler!)
-        'ContentLength': len(data),
-        'LastModified': datetime(...),
-        'ETag': '"abc123"',
-        'Metadata': {  # DeltaGlider metadata here
-            'deltaglider-is-delta': 'true'
-        }
-    }
-```
-
-#### `delete_object()`, `delete_objects()`, `head_object()`, etc.
-
-All follow the same pattern: return boto3-compatible dicts with TypedDict hints.
-
-### Phase 3: Backward Compatibility (TODO)
-
-Keep old dataclasses for 1-2 versions with deprecation warnings:
-
-```python
-class ListObjectsResponse:
-    """DEPRECATED: Use dict responses with ListObjectsV2Response type hint.
-
-    This will be removed in v6.0.0. Update your code:
-
-    Before:
-        response.contents[0].key
-
-    After:
-        response['Contents'][0]['Key']
-    """
-    def __init__(self, data: dict):
-        warnings.warn(
-            "ListObjectsResponse dataclass is deprecated. "
-            "Use dict responses with ListObjectsV2Response type hint.",
-            DeprecationWarning,
-            stacklevel=2
-        )
-        self._data = data
-
-    @property
-    def contents(self):
-        return [ObjectInfo(obj) for obj in self._data.get('Contents', [])]
-```
-
-### Phase 4: Update Documentation (TODO)
-
-1. Update all examples to use dict responses
-2. Add migration guide from v4.x to v5.0
-3. Update BOTO3_COMPATIBILITY.md
-4. Add "Drop-in Replacement" marketing language
-
-### Phase 5: Update Tests (TODO)
-
-Convert all tests from:
-```python
-assert response.contents[0].key == "file.zip"
-```
-
-To:
-```python
-assert response['Contents'][0]['Key'] == "file.zip"
-```
-
-## Migration Guide (for users)
-
-### v4.x → v5.0
-
-**Old code (v4.x):**
-```python
-from deltaglider import create_client
-
-client = create_client()
-response = client.list_objects(Bucket='my-bucket')
-
-for obj in response.contents:  # Dataclass attribute
-    print(f"{obj.key}: {obj.size}")  # Dataclass attributes
-```
-
-**New code (v5.0):**
-```python
-from deltaglider import create_client, ListObjectsV2Response
-
-client = create_client()
-response: ListObjectsV2Response = client.list_objects(Bucket='my-bucket')
-
-for obj in response['Contents']:  # Dict key (boto3-compatible)
-    print(f"{obj['Key']}: {obj['Size']}")  # Dict keys (boto3-compatible)
-```
-
-**Or even simpler - no type hint needed:**
-```python
-client = create_client()
-response = client.list_objects(Bucket='my-bucket')
-
 for obj in response['Contents']:
-    print(f"{obj['Key']}: {obj['Size']}")
+    print(f"{obj['Key']}: {obj['Size']} bytes")
 ```

+## Key Design Points
+
+- **TypedDict everywhere** – `put_object`, `get_object`, `list_objects`, `delete_object`, etc.
+  return the same shapes boto3 does. Use the provided aliases (`ListObjectsV2Response`,
+  `PutObjectResponse`, …) for IDE/completion help.
+- **Metadata namespace** – DeltaGlider-specific flags such as `deltaglider-is-delta` live under the
+  regular `Metadata` key so every response remains valid boto3 output.
+- **No shims required** – responses are plain dicts. If you already know boto3, you already know how
+  to consume DeltaGlider outputs.
+
 ## Benefits Summary

 ### For Users
- **Zero learning curve** - if you know boto3, you're done
- **Drop-in replacement** - literally change one line (client creation)
- **Type safety** - TypedDict provides autocomplete without boto3 dependency
- **Tool compatibility** - works with all boto3-compatible libraries
+- **Zero learning curve** – identical data structures to boto3.
+- **Tooling compatibility** – works with any boto3-aware tool or library.
+- **Type safety** – TypedDicts provide IDE autocomplete even without boto3 installed.

 ### For DeltaGlider
- **Simpler codebase** - no custom dataclasses to maintain
- **Better marketing** - true "drop-in replacement" claim
- **Easier testing** - test against boto3 behavior directly
- **Future-proof** - if boto3 adds fields, users can access them immediately
+- **Cleaner internals** – no custom dataclasses to maintain.
+- **Simpler docs/tests** – examples mirror boto3 verbatim.
+- **Marketing accuracy** – "drop-in replacement" is now literal.

 ## Technical Details

-### How TypedDict Works
-
+### TypedDict refresher
 ```python
 from typing import TypedDict

@@ -270,47 +56,29 @@ class MyResponse(TypedDict):
    Key: str
    Size: int

-# At runtime, this is just a dict!
-response: MyResponse = {'Key': 'file.zip', 'Size': 1024}
-print(type(response))  # <class 'dict'>
-
-# But mypy and IDEs understand the structure
-response['Key']  # ✅ Autocomplete works!
-response['Nonexistent']  # ❌ Mypy error: Key 'Nonexistent' not found
+resp: MyResponse = {'Key': 'file.zip', 'Size': 1024}
+print(type(resp))  # <class 'dict'>
 ```
+At runtime the structure is still a plain `dict`, but static type-checkers understand the shape.

-### DeltaGlider-Specific Metadata
-
-Store in standard boto3 `Metadata` field:
+### DeltaGlider Metadata

+Delta-specific fields live inside the standard `Metadata` map. Example list_objects entry:
 ```python
 {
    'Key': 'file.zip',
    'Size': 1024,
    'Metadata': {
-        # DeltaGlider-specific fields (prefixed for safety)
        'deltaglider-is-delta': 'true',
        'deltaglider-compression-ratio': '0.99',
-        'deltaglider-original-size': '100000',
-        'deltaglider-reference-key': 'releases/v1.0.0/reference.bin',
+        'deltaglider-original-size': '50000000',
    }
 }
 ```
+These keys are namespaced (`deltaglider-...`) so they are safe to ignore if not needed.

-This is:
- ✅ boto3-compatible (Metadata is a standard field)
- ✅ Namespaced (deltaglider- prefix prevents conflicts)
- ✅ Optional (tools can ignore it)
- ✅ Type-safe (Metadata: NotRequired[dict[str, str]])
+## Status Snapshot

-## Status
-
- ✅ **Phase 1:** TypedDict definitions created
- ✅ **Phase 2:** `list_objects()` refactored to return boto3-compatible dict
- ⏳ **Phase 3:** Refactor remaining methods (`put_object`, `get_object`, etc.) (TODO)
- ⏳ **Phase 4:** Backward compatibility with deprecation warnings (TODO)
- ⏳ **Phase 5:** Documentation updates (TODO)
- ⏳ **Phase 6:** Full test coverage updates (PARTIAL - list_objects tests done)
-
-**Current:** v4.2.3+ (Phase 2 complete - `list_objects()` boto3-compatible)
-**Target:** v5.0.0 release (all phases complete)
+- ✅ TypedDict builders are used everywhere (`build_list_objects_response`, etc.).
+- ✅ Tests assert boto3-style dict access (`response['Contents']`).
+- ✅ Documentation (README, SDK docs, examples) shows the boto3 syntax.
--- a/docs/DOCKER.md
+++ b/docs/DOCKER.md
@@ -0,0 +1,364 @@
+# Docker Support for DeltaGlider
+
+This document describes how to build, run, and publish Docker images for DeltaGlider.
+
+## Quick Start
+
+### Pull and run the latest image
+
+```bash
+docker pull beshultd/deltaglider:latest
+docker run --rm beshultd/deltaglider:latest --help
+```
+
+### Run with AWS credentials
+
+```bash
+docker run --rm \
+  -e AWS_ACCESS_KEY_ID=your_key \
+  -e AWS_SECRET_ACCESS_KEY=your_secret \
+  -e AWS_DEFAULT_REGION=us-east-1 \
+  beshultd/deltaglider:latest ls s3://your-bucket/
+```
+
+### Run with MinIO (local S3 alternative)
+
+```bash
+# Start MinIO
+docker run -d \
+  -p 9000:9000 -p 9001:9001 \
+  -e MINIO_ROOT_USER=minioadmin \
+  -e MINIO_ROOT_PASSWORD=minioadmin \
+  --name minio \
+  minio/minio server /data --console-address ":9001"
+
+# Use DeltaGlider with MinIO
+docker run --rm \
+  -e AWS_ENDPOINT_URL=http://host.docker.internal:9000 \
+  -e AWS_ACCESS_KEY_ID=minioadmin \
+  -e AWS_SECRET_ACCESS_KEY=minioadmin \
+  -e AWS_DEFAULT_REGION=us-east-1 \
+  beshultd/deltaglider:latest ls
+```
+
+## Building Locally
+
+### Build with current git version
+
+```bash
+VERSION=$(git describe --tags --always --abbrev=0 | sed 's/^v//')
+docker build --build-arg VERSION=${VERSION} -t beshultd/deltaglider:${VERSION} .
+```
+
+### Build with custom version
+
+```bash
+docker build --build-arg VERSION=6.0.2 -t beshultd/deltaglider:6.0.2 .
+```
+
+### Multi-platform build
+
+```bash
+# Create a buildx builder (one-time setup)
+docker buildx create --name deltaglider-builder --use
+
+# Build for multiple platforms
+docker buildx build \
+  --platform linux/amd64,linux/arm64 \
+  --build-arg VERSION=6.0.2 \
+  -t beshultd/deltaglider:6.0.2 \
+  --push \
+  .
+```
+
+## Testing the Image
+
+### Basic functionality test
+
+```bash
+# Check version
+docker run --rm beshultd/deltaglider:test --version
+
+# Check help
+docker run --rm beshultd/deltaglider:test --help
+
+# List available commands
+docker run --rm beshultd/deltaglider:test
+```
+
+### Integration test with MinIO
+
+```bash
+# 1. Start MinIO
+docker run -d \
+  -p 9000:9000 -p 9001:9001 \
+  -e MINIO_ROOT_USER=minioadmin \
+  -e MINIO_ROOT_PASSWORD=minioadmin \
+  --name minio \
+  minio/minio server /data --console-address ":9001"
+
+# 2. Create a test file
+echo "Hello DeltaGlider" > test.txt
+
+# 3. Upload to S3/MinIO
+docker run --rm \
+  -v $(pwd):/data \
+  -w /data \
+  -e AWS_ENDPOINT_URL=http://host.docker.internal:9000 \
+  -e AWS_ACCESS_KEY_ID=minioadmin \
+  -e AWS_SECRET_ACCESS_KEY=minioadmin \
+  -e AWS_DEFAULT_REGION=us-east-1 \
+  beshultd/deltaglider:test cp test.txt s3://test-bucket/
+
+# 4. List bucket contents
+docker run --rm \
+  -e AWS_ENDPOINT_URL=http://host.docker.internal:9000 \
+  -e AWS_ACCESS_KEY_ID=minioadmin \
+  -e AWS_SECRET_ACCESS_KEY=minioadmin \
+  -e AWS_DEFAULT_REGION=us-east-1 \
+  beshultd/deltaglider:test ls s3://test-bucket/
+
+# 5. Get statistics
+docker run --rm \
+  -e AWS_ENDPOINT_URL=http://host.docker.internal:9000 \
+  -e AWS_ACCESS_KEY_ID=minioadmin \
+  -e AWS_SECRET_ACCESS_KEY=minioadmin \
+  -e AWS_DEFAULT_REGION=us-east-1 \
+  beshultd/deltaglider:test stats test-bucket
+
+# 6. Cleanup
+docker stop minio && docker rm minio
+rm test.txt
+```
+
+## Publishing to Docker Hub
+
+### Manual Publishing
+
+```bash
+# 1. Log in to Docker Hub
+docker login
+
+# 2. Build the image
+VERSION=$(git describe --tags --always --abbrev=0 | sed 's/^v//')
+docker build --build-arg VERSION=${VERSION} \
+  -t beshultd/deltaglider:${VERSION} \
+  -t beshultd/deltaglider:latest \
+  .
+
+# 3. Push to Docker Hub
+docker push beshultd/deltaglider:${VERSION}
+docker push beshultd/deltaglider:latest
+```
+
+### Multi-platform Publishing
+
+```bash
+# Create builder (one-time setup)
+docker buildx create --name deltaglider-builder --use
+
+# Build and push for multiple platforms
+VERSION=$(git describe --tags --always --abbrev=0 | sed 's/^v//')
+docker buildx build \
+  --platform linux/amd64,linux/arm64 \
+  --build-arg VERSION=${VERSION} \
+  -t beshultd/deltaglider:${VERSION} \
+  -t beshultd/deltaglider:latest \
+  --push \
+  .
+```
+
+## GitHub Actions Automation
+
+The repository includes a GitHub Action workflow (`.github/workflows/docker-publish.yml`) that automatically builds and publishes Docker images.
+
+### Automatic Publishing Triggers
+
+- **On main branch push**: Tags as `latest`
+- **On develop branch push**: Tags as `develop`
+- **On version tag push** (e.g., `v6.0.2`): Tags with semver patterns:
+  - `6.0.2` (full version)
+  - `6.0` (major.minor)
+  - `6` (major)
+- **On pull request**: Builds but doesn't push (testing only)
+
+### Required GitHub Secrets
+
+Set these secrets in your GitHub repository settings (`Settings > Secrets and variables > Actions`):
+
+1. **DOCKERHUB_USERNAME**: Your Docker Hub username (e.g., `beshultd`)
+2. **DOCKERHUB_TOKEN**: Docker Hub access token (create at https://hub.docker.com/settings/security)
+
+### Manual Workflow Trigger
+
+You can manually trigger the Docker build workflow from the GitHub Actions tab:
+
+1. Go to **Actions** tab
+2. Select **Build and Publish Docker Images**
+3. Click **Run workflow**
+4. Select branch and click **Run workflow**
+
+## Docker Image Details
+
+### Image Layers
+
+The Dockerfile uses a multi-stage build:
+
+1. **Builder stage**: Installs UV and Python dependencies
+2. **Runtime stage**: Minimal Python 3.12-slim with only runtime dependencies
+
+### Image Features
+
+- **Size**: ~150MB (compressed)
+- **Platforms**: linux/amd64, linux/arm64
+- **User**: Runs as non-root user `deltaglider` (UID 1000)
+- **Base**: Python 3.12-slim (Debian)
+- **Dependencies**:
+  - Python 3.12
+  - xdelta3 (binary diff tool)
+  - All Python dependencies from `pyproject.toml`
+
+### Environment Variables
+
+The image supports the following environment variables:
+
+```bash
+# Logging
+DG_LOG_LEVEL=INFO              # DEBUG, INFO, WARNING, ERROR
+
+# Performance & Compression
+DG_MAX_RATIO=0.5               # Max delta/file ratio (0.0-1.0)
+
+# Cache Configuration
+DG_CACHE_BACKEND=filesystem    # filesystem or memory
+DG_CACHE_MEMORY_SIZE_MB=100    # Memory cache size
+DG_CACHE_ENCRYPTION_KEY=       # Optional encryption key
+
+# AWS Configuration
+AWS_ENDPOINT_URL=              # S3 endpoint (for MinIO/LocalStack)
+AWS_ACCESS_KEY_ID=             # AWS access key
+AWS_SECRET_ACCESS_KEY=         # AWS secret key
+AWS_DEFAULT_REGION=us-east-1   # AWS region
+```
+
+### Health Check
+
+The image includes a health check that runs every 30 seconds:
+
+```bash
+docker inspect --format='{{.State.Health.Status}}' <container-id>
+```
+
+## Troubleshooting
+
+### Build Issues
+
+#### "setuptools-scm was unable to detect version"
+
+**Cause**: Git metadata not available during build.
+
+**Solution**: Always use the `VERSION` build arg:
+
+```bash
+docker build --build-arg VERSION=6.0.2 -t beshultd/deltaglider:6.0.2 .
+```
+
+#### Cache issues
+
+**Cause**: Docker build cache causing stale builds.
+
+**Solution**: Use `--no-cache` flag:
+
+```bash
+docker build --no-cache --build-arg VERSION=6.0.2 -t beshultd/deltaglider:6.0.2 .
+```
+
+### Runtime Issues
+
+#### "unauthorized: access token has insufficient scopes"
+
+**Cause**: Not logged in to Docker Hub or invalid credentials.
+
+**Solution**:
+
+```bash
+docker login
+# Enter your Docker Hub credentials
+```
+
+#### "Cannot connect to MinIO/LocalStack"
+
+**Cause**: Using `localhost` instead of `host.docker.internal` from inside container.
+
+**Solution**: Use `host.docker.internal` for Mac/Windows or `172.17.0.1` for Linux:
+
+```bash
+# Mac/Windows
+-e AWS_ENDPOINT_URL=http://host.docker.internal:9000
+
+# Linux
+-e AWS_ENDPOINT_URL=http://172.17.0.1:9000
+```
+
+## Docker Compose
+
+For local development with MinIO:
+
+```yaml
+version: '3.8'
+
+services:
+  minio:
+    image: minio/minio:latest
+    ports:
+      - "9000:9000"
+      - "9001:9001"
+    environment:
+      MINIO_ROOT_USER: minioadmin
+      MINIO_ROOT_PASSWORD: minioadmin
+    command: server /data --console-address ":9001"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  deltaglider:
+    image: beshultd/deltaglider:latest
+    environment:
+      AWS_ENDPOINT_URL: http://minio:9000
+      AWS_ACCESS_KEY_ID: minioadmin
+      AWS_SECRET_ACCESS_KEY: minioadmin
+      AWS_DEFAULT_REGION: us-east-1
+      DG_LOG_LEVEL: DEBUG
+    depends_on:
+      - minio
+    volumes:
+      - ./data:/data
+    working_dir: /data
+    command: ["--help"]
+```
+
+Run with:
+
+```bash
+docker-compose up -d
+docker-compose run --rm deltaglider ls
+```
+
+## Best Practices
+
+1. **Always specify version**: Use `--build-arg VERSION=x.y.z` when building
+2. **Use multi-stage builds**: Keeps final image small
+3. **Tag with semantic versions**: Follow semver (major.minor.patch)
+4. **Test before pushing**: Run integration tests locally
+5. **Use secrets**: Never hardcode credentials in images
+6. **Multi-platform builds**: Support both amd64 and arm64
+7. **Update README**: Keep Docker Hub description in sync with README.md
+
+## Additional Resources
+
+- [Docker Hub Repository](https://hub.docker.com/r/beshultd/deltaglider)
+- [GitHub Repository](https://github.com/beshu-tech/deltaglider)
+- [MinIO Documentation](https://min.io/docs/minio/container/index.html)
+- [Docker Buildx Documentation](https://docs.docker.com/buildx/working-with-buildx/)
--- a/docs/EC2_REGION_DETECTION.md
+++ b/docs/EC2_REGION_DETECTION.md
@@ -0,0 +1,242 @@
+# EC2 Region Detection & Cost Optimization
+
+DeltaGlider automatically detects when you're running on an EC2 instance and warns you about potential cross-region data transfer charges.
+
+## Overview
+
+When running `deltaglider migrate` on an EC2 instance, DeltaGlider:
+
+1. **Detects EC2 Environment**: Uses IMDSv2 (Instance Metadata Service v2) to determine if running on EC2
+2. **Retrieves Instance Region**: Gets the actual AWS region where your EC2 instance is running
+3. **Compares Regions**: Checks if your EC2 region matches the S3 client region
+4. **Warns About Costs**: Displays clear warnings when regions don't match
+
+## Why This Matters
+
+**AWS Cross-Region Data Transfer Costs**:
+- **Same region**: No additional charges for data transfer
+- **Cross-region**: $0.02 per GB transferred (can add up quickly for large migrations)
+- **NAT Gateway**: Additional charges if going through NAT
+
+**Example Cost Impact**:
+- Migrating 1TB from `us-east-1` EC2 → `us-west-2` S3 = ~$20 in data transfer charges
+- Same migration within same region = $0 in data transfer charges
+
+## Output Examples
+
+### Scenario 1: Regions Aligned (Optimal) ✅
+
+```bash
+$ deltaglider migrate s3://old-bucket/ s3://new-bucket/
+EC2 Instance: us-east-1a
+S3 Client Region: us-east-1
+✓ Regions aligned - no cross-region charges
+Migrating from s3://old-bucket/
+           to s3://new-bucket/
+...
+```
+
+**Result**: No warnings, optimal configuration, no extra charges.
+
+---
+
+### Scenario 2: Auto-Detected Mismatch (INFO) ℹ️
+
+```bash
+$ deltaglider migrate s3://old-bucket/ s3://new-bucket/
+EC2 Instance: us-west-2a
+S3 Client Region: us-east-1
+
+ℹ️  INFO: EC2 region (us-west-2) differs from configured S3 region (us-east-1)
+    Consider using --region us-west-2 to avoid cross-region charges.
+
+Migrating from s3://old-bucket/
+           to s3://new-bucket/
+...
+```
+
+**Result**: Informational warning, suggests optimal region. User didn't explicitly set wrong region, so it's likely from their AWS config.
+
+---
+
+### Scenario 3: Explicit Region Override Mismatch (WARNING) ⚠️
+
+```bash
+$ deltaglider migrate --region us-east-1 s3://old-bucket/ s3://new-bucket/
+EC2 Instance: us-west-2a
+S3 Client Region: us-east-1
+
+⚠️  WARNING: EC2 region=us-west-2 != S3 client region=us-east-1
+    Expect cross-region/NAT data charges. Align regions (set client region=us-west-2)
+    before proceeding. Or drop --region for automatic region resolution.
+
+Migrating from s3://old-bucket/
+           to s3://new-bucket/
+...
+```
+
+**Result**: Strong warning because user explicitly set the wrong region with `--region` flag. They might not realize the cost implications.
+
+---
+
+### Scenario 4: Not on EC2
+
+```bash
+$ deltaglider migrate s3://old-bucket/ s3://new-bucket/
+S3 Client Region: us-east-1
+Migrating from s3://old-bucket/
+           to s3://new-bucket/
+...
+```
+
+**Result**: Simple region display, no EC2 warnings (not applicable).
+
+## Configuration
+
+### Disable EC2 Detection
+
+If you want to disable EC2 detection (e.g., for testing or if it causes issues):
+
+```bash
+export DG_DISABLE_EC2_DETECTION=true
+deltaglider migrate s3://old/ s3://new/
+```
+
+Or in your script:
+
+```python
+import os
+os.environ["DG_DISABLE_EC2_DETECTION"] = "true"
+```
+
+### How It Works
+
+DeltaGlider uses **IMDSv2** (Instance Metadata Service v2) for security:
+
+1. **Token Request** (PUT with TTL):
+   ```
+   PUT http://169.254.169.254/latest/api/token
+   X-aws-ec2-metadata-token-ttl-seconds: 21600
+   ```
+
+2. **Metadata Request** (GET with token):
+   ```
+   GET http://169.254.169.254/latest/meta-data/placement/region
+   X-aws-ec2-metadata-token: <token>
+   ```
+
+3. **Fast Timeout**: 1 second timeout for non-EC2 environments (no delay if not on EC2)
+
+### Security Notes
+
+- **IMDSv2 Only**: DeltaGlider uses the more secure IMDSv2, not the legacy IMDSv1
+- **No Credentials**: Only reads metadata, never accesses credentials
+- **Graceful Fallback**: Silently skips detection if IMDS unavailable
+- **No Network Impact**: Uses local-only IP (169.254.169.254), never leaves the instance
+
+## Best Practices
+
+### For Cost Optimization
+
+1. **Same Region**: Always try to keep EC2 instance and S3 bucket in the same region
+2. **Check First**: Run with `--dry-run` to verify the setup before actual migration
+3. **Use Auto-Detection**: Don't specify `--region` unless you have a specific reason
+4. **Monitor Costs**: Use AWS Cost Explorer to track cross-region data transfer
+
+### For Terraform/IaC
+
+```hcl
+# Good: EC2 and S3 in same region
+resource "aws_instance" "app" {
+  region = "us-west-2"
+}
+
+resource "aws_s3_bucket" "data" {
+  region = "us-west-2"  # Same region
+}
+```
+
+### For Multi-Region Setups
+
+If you MUST do cross-region transfers:
+
+1. **Use VPC Endpoints**: Reduce NAT Gateway costs
+2. **Schedule Off-Peak**: AWS charges less during off-peak hours in some regions
+3. **Consider S3 Transfer Acceleration**: May be cheaper for very large transfers
+4. **Batch Operations**: Minimize number of API calls
+
+## Technical Details
+
+### EC2MetadataAdapter
+
+Location: `src/deltaglider/adapters/ec2_metadata.py`
+
+Key methods:
+- `is_running_on_ec2()`: Detects EC2 environment
+- `get_region()`: Returns AWS region code (e.g., "us-east-1")
+- `get_availability_zone()`: Returns AZ (e.g., "us-east-1a")
+
+### Region Logging
+
+Location: `src/deltaglider/app/cli/aws_compat.py`
+
+Function: `log_aws_region(service, region_override=False)`
+
+Logic:
+- If not EC2: Show S3 region only
+- If EC2 + regions match: Green checkmark ✅
+- If EC2 + auto-detected mismatch: Blue INFO ℹ️
+- If EC2 + `--region` mismatch: Yellow WARNING ⚠️
+
+## Troubleshooting
+
+### "Cannot connect to IMDS"
+
+**Cause**: Network policy blocks access to 169.254.169.254
+
+**Solution**:
+```bash
+# Test IMDS connectivity
+TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
+  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
+curl -H "X-aws-ec2-metadata-token: $TOKEN" \
+  http://169.254.169.254/latest/meta-data/placement/region
+
+# If it fails, disable detection
+export DG_DISABLE_EC2_DETECTION=true
+```
+
+### "Wrong region detected"
+
+**Cause**: Cached metadata or race condition
+
+**Solution**: DeltaGlider caches metadata for performance. Restart the process to refresh.
+
+### "Warning appears but I want cross-region"
+
+**Cause**: You intentionally need cross-region transfer
+
+**Solution**: This is just a warning, not an error. The migration will proceed. The warning helps you confirm you understand the cost implications.
+
+## FAQ
+
+**Q: Does this slow down my migrations?**
+A: No. EC2 detection happens once before migration starts (< 100ms). It doesn't affect migration performance.
+
+**Q: What if I'm not on EC2 but the detection is slow?**
+A: The timeout is 1 second. If IMDS is unreachable, it fails fast. Disable with `DG_DISABLE_EC2_DETECTION=true`.
+
+**Q: Does this work on Fargate/ECS/Lambda?**
+A: Yes! All AWS compute services support IMDSv2. The detection works the same way.
+
+**Q: Can I use this with LocalStack/MinIO?**
+A: Yes. When using `--endpoint-url`, DeltaGlider skips EC2 detection (not applicable for non-AWS S3).
+
+**Q: Will this detect VPC endpoints?**
+A: No. VPC endpoints don't change the "region" from an EC2 perspective. The warning still applies if regions don't match.
+
+## Related Documentation
+
+- [AWS Data Transfer Pricing](https://aws.amazon.com/ec2/pricing/on-demand/#Data_Transfer)
+- [AWS IMDSv2 Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html)
+- [S3 Transfer Costs](https://aws.amazon.com/s3/pricing/)
--- a/docs/STATS_CACHING.md
+++ b/docs/STATS_CACHING.md
@@ -0,0 +1,342 @@
+# Bucket Statistics Caching
+
+**TL;DR**: Bucket stats are now cached in S3 with automatic validation. What took 20 minutes now takes ~100ms when the bucket hasn't changed.
+
+## Overview
+
+DeltaGlider's `get_bucket_stats()` operation now includes intelligent S3-based caching that dramatically improves performance for read-heavy workloads while maintaining accuracy through automatic validation.
+
+## The Problem
+
+Computing bucket statistics requires:
+1. **LIST operation**: Get all objects (~50-100ms per 1000 objects)
+2. **HEAD operations**: Fetch metadata for delta files (expensive!)
+   - For a bucket with 10,000 delta files: 10,000 HEAD calls
+   - Even with 10 parallel workers: ~1,000 sequential batches
+   - At ~100ms per batch: **100+ seconds minimum**
+   - With network issues or throttling: **20+ minutes** 😱
+
+This made monitoring dashboards and repeated stats checks impractical.
+
+## The Solution
+
+### S3-Based Cache with Automatic Validation
+
+Statistics are cached in S3 at `.deltaglider/stats_{mode}.json` (one per mode). On every call:
+
+1. **Quick LIST operation** (~50-100ms) - always performed for validation
+2. **Compare** current object_count + compressed_size with cache
+3. **If unchanged** → Return cached stats instantly ✅ (**~100ms total**)
+4. **If changed** → Recompute and update cache automatically
+
+### Three Stats Modes
+
+```bash
+# Quick mode (default): Fast listing-only, approximate compression metrics
+deltaglider stats my-bucket
+
+# Sampled mode: One HEAD per deltaspace, balanced accuracy/speed
+deltaglider stats my-bucket --sampled
+
+# Detailed mode: All HEAD calls, most accurate (slowest)
+deltaglider stats my-bucket --detailed
+```
+
+Each mode has its own independent cache file.
+
+## Performance
+
+| Scenario | Before | After | Speedup |
+|----------|--------|-------|---------|
+| **First run** (cold cache) | 20 min | 20 min | 1x (must compute) |
+| **Bucket unchanged** (warm cache) | 20 min | **100ms** | **200x** ✨ |
+| **Bucket changed** (stale cache) | 20 min | 20 min | 1x (auto-recompute) |
+| **Dashboard monitoring** | 20 min/check | **100ms/check** | **200x** ✨ |
+
+## CLI Usage
+
+### Basic Usage
+
+```bash
+# Use cache (default behavior)
+deltaglider stats my-bucket
+
+# Force recomputation even if cache valid
+deltaglider stats my-bucket --refresh
+
+# Skip cache entirely (both read and write)
+deltaglider stats my-bucket --no-cache
+
+# Different modes with caching
+deltaglider stats my-bucket --sampled
+deltaglider stats my-bucket --detailed
+```
+
+### Cache Control Flags
+
+| Flag | Description | Use Case |
+|------|-------------|----------|
+| *(none)* | Use cache if valid | **Default** - Fast monitoring |
+| `--refresh` | Force recomputation | Updated data needed now |
+| `--no-cache` | Skip caching entirely | Testing, one-off analysis |
+| `--sampled` | Balanced mode | Good accuracy, faster than detailed |
+| `--detailed` | Most accurate mode | Analytics, reports |
+
+## Python SDK Usage
+
+```python
+from deltaglider import create_client
+
+client = create_client()
+
+# Use cache (fast, ~100ms with cache hit)
+stats = client.get_bucket_stats('releases')
+
+# Force refresh (slow, recomputes everything)
+stats = client.get_bucket_stats('releases', refresh_cache=True)
+
+# Skip cache entirely
+stats = client.get_bucket_stats('releases', use_cache=False)
+
+# Different modes with caching
+stats = client.get_bucket_stats('releases', mode='quick')     # Fast
+stats = client.get_bucket_stats('releases', mode='sampled')   # Balanced
+stats = client.get_bucket_stats('releases', mode='detailed')  # Accurate
+```
+
+## Cache Structure
+
+Cache files are stored at `.deltaglider/stats_{mode}.json` in your bucket:
+
+```json
+{
+  "version": "1.0",
+  "mode": "quick",
+  "computed_at": "2025-10-14T10:30:00Z",
+  "validation": {
+    "object_count": 1523,
+    "compressed_size": 1234567890
+  },
+  "stats": {
+    "bucket": "releases",
+    "object_count": 1523,
+    "total_size": 50000000000,
+    "compressed_size": 1234567890,
+    "space_saved": 48765432110,
+    "average_compression_ratio": 0.9753,
+    "delta_objects": 1500,
+    "direct_objects": 23
+  }
+}
+```
+
+## How Validation Works
+
+**Smart Staleness Detection**:
+1. Always perform quick LIST operation (required anyway, ~50-100ms)
+2. Calculate current `object_count` and `compressed_size` from LIST
+3. Compare with cached values
+4. If **both match** → Cache valid, return instantly
+5. If **either differs** → Bucket changed, recompute automatically
+
+This catches:
+- ✅ Objects added (count increases)
+- ✅ Objects removed (count decreases)
+- ✅ Objects replaced (size changes)
+- ✅ Content modified (size changes)
+
+**Edge Case**: If only metadata changes (tags, headers) but not content/count/size, cache remains valid. This is acceptable since metadata changes are rare and don't affect core statistics.
+
+## Use Cases
+
+### ✅ Perfect For
+
+1. **Monitoring Dashboards**
+   - Check stats every minute
+   - Bucket rarely changes
+   - **20 min → 100ms per check** ✨
+
+2. **CI/CD Status Checks**
+   - Verify upload success
+   - Check compression effectiveness
+   - Near-instant feedback
+
+3. **Repeated Analysis**
+   - Multiple stats queries during investigation
+   - Cache persists across sessions
+   - Huge time savings
+
+### ⚠️ Less Beneficial For
+
+1. **Write-Heavy Buckets**
+   - Bucket changes on every check
+   - Cache always stale
+   - **No benefit, but no harm either** (graceful degradation)
+
+2. **One-Off Queries**
+   - Single stats check
+   - Cache doesn't help (cold cache)
+   - Still works normally
+
+## Cache Management
+
+### Automatic Management
+
+- **Creation**: Automatic on first `get_bucket_stats()` call
+- **Validation**: Automatic on every call (always current)
+- **Updates**: Automatic when bucket changes
+- **Cleanup**: Not needed (cache files are tiny ~1-10KB)
+
+### Manual Management
+
+```bash
+# View cache files
+deltaglider ls s3://my-bucket/.deltaglider/
+
+# Delete cache manually (will be recreated automatically)
+deltaglider rm s3://my-bucket/.deltaglider/stats_quick.json
+deltaglider rm s3://my-bucket/.deltaglider/stats_sampled.json
+deltaglider rm s3://my-bucket/.deltaglider/stats_detailed.json
+
+# Or delete entire .deltaglider prefix
+deltaglider rm -r s3://my-bucket/.deltaglider/
+```
+
+## Technical Details
+
+### Cache Files
+
+- **Location**: `.deltaglider/` prefix in each bucket
+- **Naming**: `stats_{mode}.json` (quick, sampled, detailed)
+- **Size**: ~1-10KB per file
+- **Format**: JSON with version, mode, validation data, and stats
+
+### Validation Logic
+
+```python
+def is_cache_valid(cached, current):
+    """Cache is valid if object count and size unchanged."""
+    return (
+        cached['object_count'] == current['object_count'] and
+        cached['compressed_size'] == current['compressed_size']
+    )
+```
+
+### Error Handling
+
+Cache operations are **non-fatal**:
+- ✅ Cache read fails → Compute normally, log warning
+- ✅ Cache write fails → Return computed stats, log warning
+- ✅ Corrupted cache → Ignore, recompute, overwrite
+- ✅ Version mismatch → Ignore, recompute with new version
+- ✅ Permission denied → Log warning, continue without caching
+
+**The stats operation never fails due to cache issues.**
+
+## Future Enhancements
+
+Potential improvements for the future:
+
+1. **TTL-Based Expiration**: Auto-refresh after N hours even if unchanged
+2. **Cache Cleanup Command**: `deltaglider cache clear` for manual invalidation
+3. **Cache Statistics**: Show hit/miss rates, staleness info
+4. **Async Cache Updates**: Background refresh for very large buckets
+5. **Cross-Bucket Cache**: Share reference data across related buckets
+
+## Comparison with Old Implementation
+
+| Aspect | Old (In-Memory) | New (S3-Based) |
+|--------|----------------|----------------|
+| **Storage** | Process memory | S3 bucket |
+| **Persistence** | Lost on restart | Survives restarts |
+| **Sharing** | Per-process | Shared across all clients |
+| **Validation** | None | Automatic on every call |
+| **Staleness** | Always fresh | Automatically detected |
+| **Use Case** | Single session | Monitoring, dashboards |
+
+## Examples
+
+### Example 1: Monitoring Dashboard
+
+```python
+from deltaglider import create_client
+import time
+
+client = create_client()
+
+while True:
+    # Fast stats check (~100ms with cache)
+    stats = client.get_bucket_stats('releases')
+    print(f"Objects: {stats.object_count}, "
+          f"Compression: {stats.average_compression_ratio:.1%}")
+
+    time.sleep(60)  # Check every minute
+
+# First run: 20 min (computes and caches)
+# All subsequent runs: ~100ms (cache hit)
+```
+
+### Example 2: CI/CD Pipeline
+
+```python
+from deltaglider import create_client
+
+client = create_client()
+
+# Upload new release
+client.upload("v2.0.0.zip", "s3://releases/v2.0.0/")
+
+# Quick verification (fast with cache)
+stats = client.get_bucket_stats('releases')
+if stats.average_compression_ratio < 0.90:
+    print("Warning: Lower than expected compression")
+```
+
+### Example 3: Force Fresh Stats
+
+```python
+from deltaglider import create_client
+
+client = create_client()
+
+# Force recomputation for accurate report
+stats = client.get_bucket_stats(
+    'releases',
+    mode='detailed',
+    refresh_cache=True
+)
+
+print(f"Accurate compression report:")
+print(f"  Original: {stats.total_size / 1e9:.1f} GB")
+print(f"  Stored: {stats.compressed_size / 1e9:.1f} GB")
+print(f"  Saved: {stats.space_saved / 1e9:.1f} GB ({stats.average_compression_ratio:.1%})")
+```
+
+## FAQ
+
+**Q: Does caching affect accuracy?**
+A: No! Cache is automatically validated on every call. If the bucket changed, stats are recomputed automatically.
+
+**Q: What if I need fresh stats immediately?**
+A: Use `--refresh` flag (CLI) or `refresh_cache=True` (SDK) to force recomputation.
+
+**Q: Can I disable caching?**
+A: Yes, use `--no-cache` flag (CLI) or `use_cache=False` (SDK).
+
+**Q: How much space do cache files use?**
+A: ~1-10KB per mode, negligible for any bucket.
+
+**Q: What happens if cache write fails?**
+A: The operation continues normally - computed stats are returned and a warning is logged. Caching is optional and non-fatal.
+
+**Q: Do I need to clean up cache files?**
+A: No, they're tiny and automatically managed. But you can delete `.deltaglider/` prefix if desired.
+
+**Q: Does cache work across different modes?**
+A: Each mode (quick, sampled, detailed) has its own independent cache file.
+
+---
+
+**Implementation**: See [PR #XX] for complete implementation details and test coverage.
+
+**Related**: [SDK Documentation](sdk/README.md) | [CLI Reference](../README.md#cli-reference) | [Architecture](sdk/architecture.md)
--- a/docs/aws-s3-cli-compatibility.md
+++ b/docs/aws-s3-cli-compatibility.md
@@ -9,7 +9,11 @@ DeltaGlider provides AWS S3 CLI compatible commands with automatic delta compres
 - `deltaglider ls [s3_url]` - List buckets and objects
 - `deltaglider rm <s3_url>` - Remove objects
 - `deltaglider sync <source> <destination>` - Synchronize directories
+- `deltaglider migrate <source> <destination>` - Migrate S3 buckets with compression and EC2 cost warnings
+- `deltaglider stats <bucket>` - Get bucket statistics and compression metrics
 - `deltaglider verify <s3_url>` - Verify file integrity
+- `deltaglider put-bucket-acl <bucket>` - Set bucket ACL (s3api compatible)
+- `deltaglider get-bucket-acl <bucket>` - Get bucket ACL (s3api compatible)

 ### Current Usage Examples
 ```bash
@@ -21,6 +25,14 @@ deltaglider cp s3://bucket/path/to/file.zip .

 # Verify integrity
 deltaglider verify s3://bucket/path/to/file.zip.delta
+
+# Set bucket ACL
+deltaglider put-bucket-acl my-bucket --acl public-read
+deltaglider put-bucket-acl my-bucket --acl private
+deltaglider put-bucket-acl my-bucket --grant-read id=12345
+
+# Get bucket ACL
+deltaglider get-bucket-acl my-bucket
 ```

 ## Target State: AWS S3 CLI Compatibility
--- a/docs/case-study-readonlyrest.md
+++ b/docs/case-study-readonlyrest.md
@@ -1,347 +1,76 @@
-# Case Study: How ReadOnlyREST Reduced Storage Costs by 99.9% with DeltaGlider
+## How ReadonlyREST Cut 4TB of S3 Storage Down to 5GB (and Saved 99.9%)

-## Executive Summary
+### TL;DR

-**The Challenge**: ReadOnlyREST, a security plugin for Elasticsearch, was facing exponential storage costs managing 145 release versions across multiple product lines, consuming nearly 4TB of S3 storage.
+We were paying to store 4TB of mostly identical plugin builds.
+DeltaGlider deduplicated everything down to 4.9GB — 99.9% smaller, $1.1k/year cheaper, and no workflow changes.

-**The Solution**: DeltaGlider, an intelligent delta compression system that reduced storage from 4,060GB to just 4.9GB.
+#### The Problem

-**The Impact**:
- 💰 **$1,119 annual savings** on storage costs
- 📉 **99.9% reduction** in storage usage
- ⚡ **Zero changes** to existing workflows
- ✅ **Full data integrity** maintained
+ReadonlyREST supports ~150 Elasticsearch/Kibana versions × multiple product lines × all our own releases.
+After years of publishing builds, our S3 archive hit `4TB` (201,840 files, $93/month).
+Glacier helped, but restoring files took 48 hours — useless for CI/CD.

---
+Every plugin ZIP was ~82MB, but `99.7% identical` to the next one. We were paying to store duplicates.

-## The Storage Crisis
+#### The Fix: DeltaGlider

-### The Numbers That Kept Us Up at Night
+DeltaGlider stores binary deltas between similar files instead of full copies.

-ReadOnlyREST maintains a comprehensive release archive:
- **145 version folders** (v1.50.0 through v1.66.1)
- **201,840 total files** to manage
- **3.96 TB** of S3 storage consumed
- **$1,120/year** in storage costs alone
-
-Each version folder contained:
- 513 plugin ZIP files (one for each Elasticsearch version)
- 879 checksum files (SHA1 and SHA512)
- 3 product lines (Enterprise, Pro, Free)
-
-### The Hidden Problem
-
-What made this particularly painful wasn't just the size—it was the **redundancy**. Each 82.5MB plugin ZIP was 99.7% identical to others in the same version, differing only in minor Elasticsearch compatibility adjustments. We were essentially storing the same data hundreds of times.
-
-> "We were paying to store 4TB of data that was fundamentally just variations of the same ~250MB of unique content. It felt like photocopying War and Peace 500 times because each copy had a different page number."
->
-> — *DevOps Lead*
-
---
-
-## Enter DeltaGlider
-
-### The Lightbulb Moment
-
-The breakthrough came when we realized we didn't need to store complete files—just the *differences* between them. DeltaGlider applies this principle automatically:
-
-1. **First file becomes the reference** (stored in full)
-2. **Similar files store only deltas** (typically 0.3% of original size)
-3. **Different files uploaded directly** (no delta overhead)
-
-### Implementation: Surprisingly Simple
-
-```bash
-# Before DeltaGlider (standard S3 upload)
-aws s3 cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/
-# Size on S3: 82.5MB
-
-# With DeltaGlider
-deltaglider cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/
-# Size on S3: 65KB (99.92% smaller!)
+# Before
+```
+aws s3 cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/  # 82MB
 ```

-The beauty? **Zero changes to our build pipeline**. DeltaGlider works as a drop-in replacement for S3 uploads.
-
---
-
-## The Results: Beyond Our Expectations
-
-### Storage Transformation
-
+# After
 ```
-BEFORE DELTAGLIDER          AFTER DELTAGLIDER
-━━━━━━━━━━━━━━━━━          ━━━━━━━━━━━━━━━━
-4,060 GB (3.96 TB)    →    4.9 GB
-$93.38/month          →    $0.11/month
-201,840 files         →    201,840 files (same!)
+deltaglider cp readonlyrest-1.66.1_es8.0.0.zip s3://releases/  # 65KB
 ```

-### Real Performance Metrics
+Drop-in replacement for `aws s3 cp`. No pipeline changes.
+Data integrity checked with SHA256, stored as metadata in S3.

-From our actual production deployment:

-| Metric | Value | Impact |
-|--------|-------|--------|
-| **Compression Ratio** | 99.9% | Near-perfect deduplication |
-| **Delta Size** | ~65KB per 82.5MB file | 1/1,269th of original |
-| **Upload Speed** | 3-4 files/second | Faster than raw S3 uploads |
-| **Download Speed** | Transparent reconstruction | No user impact |
-| **Storage Savings** | 4,055 GB | Enough for 850,000 more files |
+### The Result

-### Version-to-Version Comparison
+| Metric        | Before   | After    | Δ            |
+|-------------- |----------|----------|--------------|
+| Storage       | 4.06TB   | 4.9GB    | -99.9%       |
+| Cost          | $93/mo   | $0.11/mo | -$1,119/yr   |
+| Files         | 201,840  | 201,840  | identical    |
+| Upload speed  | 1x       | 3–4x     | faster       |

-Testing between similar versions showed incredible efficiency:
+Each “different” ZIP? Just a 65KB delta.
+Reconstruction time: <100ms.
+Zero user impact.
+
+
+## Under the Hood
+
+Uses xdelta3 diffs.
+	•	Keeps one reference per group
+	•	Stores deltas for near-identical files
+	•	Skips small or text-based ones (.sha, .json, etc.)
+
+It’s smart enough to decide what’s worth diffing automatically.
+
+
+## Payoff
+	•	4TB → 5GB overnight
+	•	Uploads 1,200× faster
+	•	CI bandwidth cut 99%
+	•	100% checksum verified integrity
+	•	Zero vendor lock-in (open source)
+
+## Takeaways
+
+If You Ship Versioned Artifacts
+
+This will probably save you four figures and hours of upload time per year.

 ```
-readonlyrest-1.66.1_es7.17.0.zip  (82.5MB) → reference.bin (82.5MB)
-readonlyrest-1.66.1_es7.17.1.zip  (82.5MB) → 64KB delta (0.08% size)
-readonlyrest-1.66.1_es7.17.2.zip  (82.5MB) → 65KB delta (0.08% size)
-...
-readonlyrest-1.66.1_es8.15.0.zip  (82.5MB) → 71KB delta (0.09% size)
-```
-
---
-
-## Technical Deep Dive
-
-### How DeltaGlider Achieves 99.9% Compression
-
-DeltaGlider uses binary diff algorithms (xdelta3) to identify and store only the bytes that change between files:
-
-```python
-# Simplified concept
-reference = "readonlyrest-1.66.1_es7.17.0.zip"  # 82.5MB
-new_file  = "readonlyrest-1.66.1_es7.17.1.zip"  # 82.5MB
-
-delta = binary_diff(reference, new_file)         # 65KB
-# Delta contains only:
-# - Elasticsearch version string changes
-# - Compatibility metadata updates
-# - Build timestamp differences
-```
-
-### Intelligent File Type Detection
-
-Not every file benefits from delta compression. DeltaGlider automatically:
-
- **Applies delta compression to**: `.zip`, `.tar`, `.gz`, `.dmg`, `.jar`, `.war`
- **Uploads directly**: `.txt`, `.sha1`, `.sha512`, `.json`, `.md`
-
-This intelligence meant our 127,455 checksum files were uploaded directly, avoiding unnecessary processing overhead.
-
-### Architecture That Scales
-
-```
-┌─────────────┐     ┌──────────────┐     ┌─────────────┐
-│   Client    │────▶│ DeltaGlider  │────▶│  S3/MinIO   │
-│ (CI/CD)     │     │              │     │             │
-└─────────────┘     └──────────────┘     └─────────────┘
-                           │
-                    ┌──────▼───────┐
-                    │ Local Cache  │
-                    │ (References) │
-                    └──────────────┘
-```
-
---
-
-## Business Impact
-
-### Immediate ROI
-
- **Day 1**: 99.9% storage reduction
- **Month 1**: $93 saved
- **Year 1**: $1,119 saved
- **5 Years**: $5,595 saved (not counting growth)
-
-### Hidden Benefits We Didn't Expect
-
-1. **Faster Deployments**: Uploading 65KB deltas is 1,200x faster than 82.5MB files
-2. **Reduced Bandwidth**: CI/CD pipeline bandwidth usage dropped 99%
-3. **Improved Reliability**: Fewer timeout errors on large file uploads
-4. **Better Compliance**: Automatic SHA256 integrity verification on every operation
-
-### Environmental Impact
-
-> "Reducing storage by 4TB means fewer drives spinning in data centers. It's a small contribution to our sustainability goals, but every bit counts."
->
-> — *CTO*
-
---
-
-## Implementation Journey
-
-### Week 1: Proof of Concept
- Tested with 10 files
- Achieved 99.6% compression
- Decision to proceed
-
-### Week 2: Production Rollout
- Uploaded all 201,840 files
- Zero errors or failures
- Immediate cost reduction
-
-### Week 3: Integration
-```bash
-# Simple integration into our CI/CD
- aws s3 cp $FILE s3://releases/
-+ deltaglider cp $FILE s3://releases/
-```
-
-### Week 4: Full Migration
- All build pipelines updated
- Developer documentation completed
- Monitoring dashboards configured
-
---
-
-## Lessons Learned
-
-### What Worked Well
-
-1. **Drop-in replacement**: No architectural changes needed
-2. **Automatic intelligence**: File type detection "just worked"
-3. **Preservation of structure**: Directory hierarchy maintained perfectly
-
-### Challenges Overcome
-
-1. **Initial skepticism**: "99.9% compression sounds too good to be true"
-   - *Solution*: Live demonstration with real data
-
-2. **Download concerns**: "Will it be slow to reconstruct files?"
-   - *Solution*: Benchmarking showed <100ms reconstruction time
-
-3. **Reliability questions**: "What if the reference file is corrupted?"
-   - *Solution*: SHA256 verification on every operation
-
---
-
-## For Decision Makers
-
-### Why This Matters
-
-Storage costs scale linearly with data growth. Without DeltaGlider:
- Next 145 versions: Additional $1,120/year
- 5-year projection: $11,200 in storage alone
- Opportunity cost: Resources that could fund innovation
-
-### Risk Assessment
-
-| Risk | Mitigation | Status |
-|------|------------|--------|
-| Vendor lock-in | Open-source, standards-based | ✅ Mitigated |
-| Data corruption | SHA256 verification built-in | ✅ Mitigated |
-| Performance impact | Faster than original | ✅ No risk |
-| Complexity | Drop-in replacement | ✅ No risk |
-
-### Strategic Advantages
-
-1. **Cost Predictability**: Storage costs become negligible
-2. **Scalability**: Can handle 100x more versions in same space
-3. **Competitive Edge**: More resources for product development
-4. **Green IT**: Reduced carbon footprint from storage
-
---
-
-## For Engineers
-
-### Getting Started
-
-```bash
-# Install DeltaGlider
 pip install deltaglider
-
-# Upload a file (automatic compression)
-deltaglider cp my-release-v1.0.0.zip s3://releases/
-
-# Download (automatic reconstruction)
-deltaglider cp s3://releases/my-release-v1.0.0.zip .
-
-# It's that simple.
+deltaglider cp my-release.zip s3://releases/
 ```

-### Performance Characteristics
-
-```python
-# Compression ratios by similarity
-identical_files:        99.9%  # Same file, different name
-minor_changes:          99.7%  # Version bumps, timestamps
-moderate_changes:       95.0%  # Feature additions
-major_changes:          70.0%  # Significant refactoring
-completely_different:   0%     # No compression (uploaded as-is)
-```
-
-### Integration Examples
-
-**GitHub Actions**:
-```yaml
- name: Upload Release
-  run: deltaglider cp dist/*.zip s3://releases/${{ github.ref_name }}/
-```
-
-**Jenkins Pipeline**:
-```groovy
-sh "deltaglider cp ${WORKSPACE}/target/*.jar s3://artifacts/"
-```
-
-**Python Script**:
-```python
-from deltaglider import DeltaService
-service = DeltaService(bucket="releases")
-service.put("my-app-v2.0.0.zip", "v2.0.0/")
-```
-
---
-
-## The Bottom Line
-
-DeltaGlider transformed our storage crisis into a solved problem:
-
- ✅ **4TB → 5GB** storage reduction
- ✅ **$1,119/year** saved
- ✅ **Zero** workflow disruption
- ✅ **100%** data integrity maintained
-
-For ReadOnlyREST, DeltaGlider wasn't just a cost-saving tool—it was a glimpse into the future of intelligent storage. When 99.9% of your data is redundant, why pay to store it 500 times?
-
---
-
-## Next Steps
-
-### For Your Organization
-
-1. **Identify similar use cases**: Version releases, backups, build artifacts
-2. **Run the calculator**: `[Your files] × [Versions] × [Similarity] = Savings`
-3. **Start small**: Test with one project's releases
-4. **Scale confidently**: Deploy across all similar data
-
-### Get Started Today
-
-```bash
-# See your potential savings
-git clone https://github.com/beshu-tech/deltaglider
-cd deltaglider
-python calculate_savings.py --path /your/releases
-
-# Try it yourself
-docker run -p 9000:9000 minio/minio  # Local S3
-pip install deltaglider
-deltaglider cp your-file.zip s3://test/
-```
-
---
-
-## About ReadOnlyREST
-
-ReadOnlyREST is the enterprise security plugin for Elasticsearch and OpenSearch, protecting clusters in production since 2015. Learn more at [readonlyrest.com](https://readonlyrest.com)
-
-## About DeltaGlider
-
-DeltaGlider is an open-source delta compression system for S3-compatible storage, turning redundant data into remarkable savings. Built with modern Python, containerized for portability, and designed for scale.
-
---
-
-*"In a world where storage is cheap but not free, and data grows exponentially but changes incrementally, DeltaGlider represents a fundamental shift in how we think about storing versioned artifacts."*
-
-**— ReadOnlyREST Engineering Team**
+That’s it.
--- a/docs/deltaglider.png
+++ b/docs/deltaglider.png
--- a/docs/sdk/README.md
+++ b/docs/sdk/README.md
@@ -1,6 +1,6 @@
 # DeltaGlider Python SDK Documentation

-The DeltaGlider Python SDK provides a **boto3-compatible API for core S3 operations** (~20% of methods covering 80% of use cases), while achieving 99%+ compression for versioned artifacts through intelligent binary delta compression.
+The DeltaGlider Python SDK provides a **boto3-compatible API for core S3 operations** (~20% of methods covering 80% of use cases), while achieving 99%+ compression for very similar versioned artifacts through intelligent binary delta compression.

 ## 🎯 Key Highlights

@@ -57,9 +57,10 @@ while response.get('IsTruncated'):
 # Get detailed compression stats only when needed
 response = client.list_objects(Bucket='releases', FetchMetadata=True)  # Slower but detailed

-# Quick bucket statistics
-stats = client.get_bucket_stats('releases')  # Fast overview
-stats = client.get_bucket_stats('releases', detailed_stats=True)  # With compression metrics
+# Bucket statistics with intelligent S3-based caching (NEW!)
+stats = client.get_bucket_stats('releases')  # Fast (~100ms with cache)
+stats = client.get_bucket_stats('releases', mode='detailed')  # Accurate compression metrics
+stats = client.get_bucket_stats('releases', refresh_cache=True)  # Force fresh computation

 client.delete_object(Bucket='releases', Key='old-version.zip')
 ```
@@ -205,10 +206,17 @@ from deltaglider import create_client
 client = create_client(
    endpoint_url="http://minio.internal:9000",  # Custom S3 endpoint
    log_level="DEBUG",                           # Detailed logging
-    cache_dir="/var/cache/deltaglider",         # Custom cache location
+    aws_access_key_id="minio",
+    aws_secret_access_key="minio",
+    region_name="eu-west-1",
+    max_ratio=0.3,                                # Stricter delta acceptance
 )
 ```

+> ℹ️  The SDK now manages an encrypted, process-isolated cache automatically in `/tmp/deltaglider-*`.
+> Tune cache behavior via environment variables such as `DG_CACHE_BACKEND`,
+> `DG_CACHE_MEMORY_SIZE_MB`, and `DG_CACHE_ENCRYPTION_KEY` instead of passing a `cache_dir` argument.
+
 ## Real-World Example

 ```python
@@ -298,4 +306,4 @@ url = client.generate_presigned_url(

 ## License

-MIT License - See [LICENSE](https://github.com/beshu-tech/deltaglider/blob/main/LICENSE) for details.
+MIT License - See [LICENSE](https://github.com/beshu-tech/deltaglider/blob/main/LICENSE) for details.
--- a/docs/sdk/api.md
+++ b/docs/sdk/api.md
@@ -156,29 +156,34 @@ for obj in response['Contents']:

 #### `get_bucket_stats`

-Get statistics for a bucket with optional detailed compression metrics. Results are cached per client session for performance.
+Get statistics for a bucket with optional detailed compression metrics. Results are cached inside the bucket for performance.

 ```python
 def get_bucket_stats(
    self,
    bucket: str,
-    detailed_stats: bool = False
+    mode: Literal["quick", "sampled", "detailed"] = "quick",
+    use_cache: bool = True,
+    refresh_cache: bool = False,
 ) -> BucketStats
 ```

 ##### Parameters

 - **bucket** (`str`): S3 bucket name.
- **detailed_stats** (`bool`): If True, fetch accurate compression ratios for delta files. Default: False.
-  - With `detailed_stats=False`: ~50ms for any bucket size (LIST calls only)
-  - With `detailed_stats=True`: ~2-3s per 1000 objects (adds HEAD calls for delta files)
+- **mode** (`Literal[...]`): Accuracy/cost trade-off:
+  - `"quick"` (default): LIST-only scan; compression ratios for deltas are estimated.
+  - `"sampled"`: HEAD one delta per deltaspace and reuse the ratio.
+  - `"detailed"`: HEAD every delta object; slowest but exact.
+- **use_cache** (`bool`): If True, read/write `.deltaglider/stats_{mode}.json` in the bucket for reuse.
+- **refresh_cache** (`bool`): Force recomputation even if a cache file is valid.

 ##### Caching Behavior

- **Session-scoped cache**: Results cached within client instance lifetime
- **Automatic invalidation**: Cache cleared on bucket mutations (put, delete, bucket operations)
- **Intelligent reuse**: Detailed stats can serve quick stat requests
- **Manual cache control**: Use `clear_cache()` to invalidate all cached stats
+- Stats are cached per mode directly inside the bucket at `.deltaglider/stats_{mode}.json`.
+- Every call validates cache freshness via a quick LIST (object count + compressed size).
+- `refresh_cache=True` skips cache validation and recomputes immediately.
+- `use_cache=False` bypasses both reading and writing cache artifacts.

 ##### Returns

@@ -195,24 +200,20 @@ def get_bucket_stats(
 ##### Examples

 ```python
-# Quick stats for dashboard display (cached after first call)
+# Quick stats (fast LIST-only)
 stats = client.get_bucket_stats('releases')
 print(f"Objects: {stats.object_count}, Size: {stats.total_size}")

-# Second call hits cache (instant response)
-stats = client.get_bucket_stats('releases')
-print(f"Space saved: {stats.space_saved} bytes")
+# Sampled/detailed modes for analytics
+sampled = client.get_bucket_stats('releases', mode='sampled')
+detailed = client.get_bucket_stats('releases', mode='detailed')
+print(f"Compression ratio: {detailed.average_compression_ratio:.1%}")

-# Detailed stats for analytics (slower but accurate, also cached)
-stats = client.get_bucket_stats('releases', detailed_stats=True)
-print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
+# Force refresh if an external tool modified the bucket
+fresh = client.get_bucket_stats('releases', mode='quick', refresh_cache=True)

-# Quick call after detailed call reuses detailed cache (more accurate)
-quick_stats = client.get_bucket_stats('releases')  # Uses detailed cache
-
-# Clear cache to force refresh
-client.clear_cache()
-stats = client.get_bucket_stats('releases')  # Fresh computation
+# Skip cache entirely when running ad-hoc diagnostics
+uncached = client.get_bucket_stats('releases', use_cache=False)
 ```

 #### `put_object`
@@ -334,7 +335,7 @@ client.delete_bucket(Bucket='old-releases')

 #### `list_buckets`

-List all S3 buckets (boto3-compatible). Includes cached statistics when available.
+List all S3 buckets (boto3-compatible).

 ```python
 def list_buckets(
@@ -345,51 +346,18 @@ def list_buckets(

 ##### Returns

-Dict with list of buckets and owner information (identical to boto3). Each bucket may include optional `DeltaGliderStats` metadata if statistics have been previously cached.
-
-##### Response Structure
-
-```python
-{
-    'Buckets': [
-        {
-            'Name': 'bucket-name',
-            'CreationDate': datetime(2025, 1, 1),
-            'DeltaGliderStats': {  # Optional, only if cached
-                'Cached': True,
-                'Detailed': bool,  # Whether detailed stats were fetched
-                'ObjectCount': int,
-                'TotalSize': int,
-                'CompressedSize': int,
-                'SpaceSaved': int,
-                'AverageCompressionRatio': float,
-                'DeltaObjects': int,
-                'DirectObjects': int
-            }
-        }
-    ],
-    'Owner': {...}
-}
-```
+Dict with the same structure boto3 returns (`Buckets`, `Owner`, `ResponseMetadata`). DeltaGlider does not inject additional metadata; use `get_bucket_stats()` for compression data.

 ##### Examples

 ```python
-# List all buckets
 response = client.list_buckets()
 for bucket in response['Buckets']:
    print(f"{bucket['Name']} - Created: {bucket['CreationDate']}")

-    # Check if stats are cached
-    if 'DeltaGliderStats' in bucket:
-        stats = bucket['DeltaGliderStats']
-        print(f"  Cached stats: {stats['ObjectCount']} objects, "
-              f"{stats['AverageCompressionRatio']:.1%} compression")
-
-# Fetch stats first, then list buckets to see cached data
-client.get_bucket_stats('my-bucket', detailed_stats=True)
-response = client.list_buckets()
-# Now 'my-bucket' will include DeltaGliderStats in response
+# Combine with get_bucket_stats for deeper insights
+stats = client.get_bucket_stats('releases', mode='detailed')
+print(f"releases -> {stats.object_count} objects, {stats.space_saved/(1024**3):.2f} GB saved")
 ```

 ### Simple API Methods
@@ -528,13 +496,9 @@ else:

 ### Cache Management Methods

-DeltaGlider maintains two types of caches for performance optimization:
-1. **Reference cache**: Binary reference files used for delta reconstruction
-2. **Statistics cache**: Bucket statistics (session-scoped)
-
 #### `clear_cache`

-Clear all cached data including reference files and bucket statistics.
+Clear all locally cached reference files.

 ```python
 def clear_cache(self) -> None
@@ -542,23 +506,20 @@ def clear_cache(self) -> None

 ##### Description

-Removes all cached reference files from the local filesystem and invalidates all bucket statistics. Useful for:
- Forcing fresh statistics computation
+Removes all cached reference files from the local filesystem. Useful for:
 - Freeing disk space in long-running applications
- Ensuring latest data after external bucket modifications
+- Ensuring the next upload/download fetches fresh references from S3
+- Resetting cache after configuration or credential changes
 - Testing and development workflows

-##### Cache Types Cleared
+##### Cache Scope

-1. **Reference Cache**: Binary reference files stored in `/tmp/deltaglider-*/`
-   - Encrypted at rest with ephemeral keys
-   - Content-addressed storage (SHA256-based filenames)
-   - Automatically cleaned up on process exit
-
-2. **Statistics Cache**: Bucket statistics cached per client session
-   - Metadata about compression ratios and object counts
-   - Session-scoped (not persisted to disk)
-   - Automatically invalidated on bucket mutations
+- **Reference Cache**: Binary reference files stored in `/tmp/deltaglider-*/`
+  - Encrypted at rest with ephemeral keys
+  - Content-addressed storage (SHA256-based filenames)
+  - Automatically cleaned up on process exit
+- **Statistics Cache**: Stored inside the bucket as `.deltaglider/stats_{mode}.json`.
+  - `clear_cache()` does *not* remove these S3 objects; use `refresh_cache=True` or delete the objects manually if needed.

 ##### Examples

@@ -574,71 +535,14 @@ for i in range(1000):
    if i % 100 == 0:
        client.clear_cache()

-# Force fresh statistics after external changes
-stats_before = client.get_bucket_stats('releases')  # Cached
-# ... external tool modifies bucket ...
-client.clear_cache()
-stats_after = client.get_bucket_stats('releases')  # Fresh data
+# Force fresh statistics after external changes (skip cache instead of clearing)
+stats_before = client.get_bucket_stats('releases')
+stats_after = client.get_bucket_stats('releases', refresh_cache=True)

 # Development workflow
 client.clear_cache()  # Start with clean state
 ```

-#### `evict_cache`
-
-Remove a specific cached reference file from the local cache.
-
-```python
-def evict_cache(self, s3_url: str) -> None
-```
-
-##### Parameters
-
- **s3_url** (`str`): S3 URL of the reference file to evict (e.g., `s3://bucket/prefix/reference.bin`)
-
-##### Description
-
-Removes a specific reference file from the cache without affecting other cached files or statistics. Useful for:
- Selective cache invalidation when specific references are updated
- Memory management in applications with many delta spaces
- Testing specific delta compression scenarios
-
-##### Examples
-
-```python
-# Evict specific reference after update
-client.upload("new-reference.zip", "s3://releases/v2.0.0/")
-client.evict_cache("s3://releases/v2.0.0/reference.bin")
-
-# Next upload will fetch fresh reference
-client.upload("similar-file.zip", "s3://releases/v2.0.0/")
-
-# Selective eviction for specific delta spaces
-delta_spaces = ["v1.0.0", "v1.1.0", "v1.2.0"]
-for space in delta_spaces:
-    client.evict_cache(f"s3://releases/{space}/reference.bin")
-```
-
-##### See Also
-
- [docs/CACHE_MANAGEMENT.md](../../CACHE_MANAGEMENT.md): Complete cache management guide
- `clear_cache()`: Clear all caches
-
-#### `lifecycle_policy`
-
-Set lifecycle policy for S3 prefix (placeholder for future implementation).
-
-```python
-def lifecycle_policy(
-    self,
-    s3_prefix: str,
-    days_before_archive: int = 30,
-    days_before_delete: int = 90
-) -> None
-```
-
-**Note**: This method is a placeholder for future S3 lifecycle policy management.
-
 ## UploadSummary

 Data class containing upload operation results.
@@ -995,4 +899,4 @@ client = create_client(log_level="DEBUG")

 - **GitHub Issues**: [github.com/beshu-tech/deltaglider/issues](https://github.com/beshu-tech/deltaglider/issues)
 - **Documentation**: [github.com/beshu-tech/deltaglider](https://github.com/beshu-tech/deltaglider)
- **PyPI Package**: [pypi.org/project/deltaglider](https://pypi.org/project/deltaglider)
+- **PyPI Package**: [pypi.org/project/deltaglider](https://pypi.org/project/deltaglider)
--- a/docs/sdk/examples.md
+++ b/docs/sdk/examples.md
@@ -25,6 +25,7 @@ DeltaGlider's smart `list_objects` method eliminates the N+1 query problem by in

 ```python
 from deltaglider import create_client
+from deltaglider.client_models import BucketStats
 import time

 client = create_client()
@@ -41,19 +42,19 @@ def fast_bucket_listing(bucket: str):

    # Process objects for display
    items = []
-    for obj in response.contents:
+    for obj in response['Contents']:
+        metadata = obj.get("Metadata", {})
        items.append({
-            "key": obj.key,
-            "size": obj.size,
-            "last_modified": obj.last_modified,
-            "is_delta": obj.is_delta,  # Determined from filename
-            # No compression_ratio - would require HEAD request
+            "key": obj["Key"],
+            "size": obj["Size"],
+            "last_modified": obj["LastModified"],
+            "is_delta": metadata.get("deltaglider-is-delta") == "true",
        })

    elapsed = time.time() - start
    print(f"Listed {len(items)} objects in {elapsed*1000:.0f}ms")

-    return items, response.next_continuation_token
+    return items, response.get("NextContinuationToken")

 # Example: List first page
 items, next_token = fast_bucket_listing('releases')
@@ -75,12 +76,12 @@ def paginated_listing(bucket: str, page_size: int = 50):
            FetchMetadata=False  # Keep it fast
        )

-        all_objects.extend(response.contents)
+        all_objects.extend(response["Contents"])

-        if not response.is_truncated:
+        if not response.get("IsTruncated"):
            break

-        continuation_token = response.next_continuation_token
+        continuation_token = response.get("NextContinuationToken")
        print(f"Fetched {len(all_objects)} objects so far...")

    return all_objects
@@ -96,8 +97,8 @@ print(f"Total objects: {len(all_objects)}")
 def dashboard_with_stats(bucket: str):
    """Dashboard view with optional detailed stats."""

-    # Quick overview (fast - no metadata)
-    stats = client.get_bucket_stats(bucket, detailed_stats=False)
+    # Quick overview (fast LIST-only)
+    stats = client.get_bucket_stats(bucket)

    print(f"Quick Stats for {bucket}:")
    print(f"  Total Objects: {stats.object_count}")
@@ -108,7 +109,7 @@ def dashboard_with_stats(bucket: str):

    # Detailed compression analysis (slower - fetches metadata for deltas only)
    if stats.delta_objects > 0:
-        detailed_stats = client.get_bucket_stats(bucket, detailed_stats=True)
+        detailed_stats = client.get_bucket_stats(bucket, mode='detailed')
        print(f"\nDetailed Compression Stats:")
        print(f"  Average Compression: {detailed_stats.average_compression_ratio:.1%}")
        print(f"  Space Saved: {detailed_stats.space_saved / (1024**3):.2f} GB")
@@ -131,11 +132,25 @@ def compression_analysis(bucket: str, prefix: str = ""):
    )

    # Analyze compression effectiveness
-    delta_files = [obj for obj in response.contents if obj.is_delta]
+    delta_files: list[dict[str, float | int | str]] = []
+    for obj in response["Contents"]:
+        metadata = obj.get("Metadata", {})
+        if metadata.get("deltaglider-is-delta") != "true":
+            continue
+        original_size = int(metadata.get("deltaglider-original-size", obj["Size"]))
+        compression_ratio = float(metadata.get("deltaglider-compression-ratio", 0.0))
+        delta_files.append(
+            {
+                "key": obj["Key"],
+                "original": original_size,
+                "compressed": obj["Size"],
+                "ratio": compression_ratio,
+            }
+        )

    if delta_files:
-        total_original = sum(obj.original_size for obj in delta_files)
-        total_compressed = sum(obj.compressed_size for obj in delta_files)
+        total_original = sum(obj["original"] for obj in delta_files)
+        total_compressed = sum(obj["compressed"] for obj in delta_files)
        avg_ratio = (total_original - total_compressed) / total_original

        print(f"Compression Analysis for {prefix or 'all files'}:")
@@ -145,11 +160,11 @@ def compression_analysis(bucket: str, prefix: str = ""):
        print(f"  Average Compression: {avg_ratio:.1%}")

        # Find best and worst compression
-        best = max(delta_files, key=lambda x: x.compression_ratio or 0)
-        worst = min(delta_files, key=lambda x: x.compression_ratio or 1)
+        best = max(delta_files, key=lambda x: x["ratio"])
+        worst = min(delta_files, key=lambda x: x["ratio"])

-        print(f"  Best Compression: {best.key} ({best.compression_ratio:.1%})")
-        print(f"  Worst Compression: {worst.key} ({worst.compression_ratio:.1%})")
+        print(f"  Best Compression: {best['key']} ({best['ratio']:.1%})")
+        print(f"  Worst Compression: {worst['key']} ({worst['ratio']:.1%})")

 # Example: Analyze v2.0 releases
 compression_analysis('releases', 'v2.0/')
@@ -180,7 +195,11 @@ def performance_comparison(bucket: str):
    )
    time_detailed = (time.time() - start) * 1000

-    delta_count = sum(1 for obj in response_fast.contents if obj.is_delta)
+    delta_count = sum(
+        1
+        for obj in response_fast["Contents"]
+        if obj.get("Metadata", {}).get("deltaglider-is-delta") == "true"
+    )

    print(f"Performance Comparison for {bucket}:")
    print(f"  Fast Listing: {time_fast:.0f}ms (1 API call)")
@@ -203,7 +222,7 @@ performance_comparison('releases')

 ## Bucket Statistics and Monitoring

-DeltaGlider provides powerful bucket statistics with session-level caching for performance.
+DeltaGlider provides powerful bucket statistics with S3-backed caching for performance.

 ### Quick Dashboard Stats (Cached)

@@ -244,7 +263,7 @@ def detailed_compression_report(bucket: str):
    """Generate detailed compression report with accurate ratios."""

    # Detailed stats fetch metadata for delta files (slower, accurate)
-    stats = client.get_bucket_stats(bucket, detailed_stats=True)
+    stats = client.get_bucket_stats(bucket, mode='detailed')

    efficiency = (stats.space_saved / stats.total_size * 100) if stats.total_size > 0 else 0

@@ -281,15 +300,18 @@ detailed_compression_report('releases')

 ```python
 def list_buckets_with_stats():
-    """List all buckets and show cached statistics if available."""
+    """List buckets and augment with cached stats fetched on demand."""

-    # Pre-fetch stats for important buckets
-    important_buckets = ['releases', 'backups']
-    for bucket_name in important_buckets:
-        client.get_bucket_stats(bucket_name, detailed_stats=True)
-
-    # List all buckets (includes cached stats automatically)
    response = client.list_buckets()
+    stats_cache: dict[str, BucketStats | None] = {}
+
+    def ensure_stats(bucket_name: str) -> BucketStats | None:
+        if bucket_name not in stats_cache:
+            try:
+                stats_cache[bucket_name] = client.get_bucket_stats(bucket_name)
+            except Exception:
+                stats_cache[bucket_name] = None
+        return stats_cache[bucket_name]

    print("All Buckets:")
    print(f"{'Name':<30} {'Objects':<10} {'Compression':<15} {'Cached'}")
@@ -297,13 +319,12 @@ def list_buckets_with_stats():

    for bucket in response['Buckets']:
        name = bucket['Name']
+        stats = ensure_stats(name)

-        # Check if stats are cached
-        if 'DeltaGliderStats' in bucket:
-            stats = bucket['DeltaGliderStats']
-            obj_count = f"{stats['ObjectCount']:,}"
-            compression = f"{stats['AverageCompressionRatio']:.1%}"
-            cached = "✓ (detailed)" if stats['Detailed'] else "✓ (quick)"
+        if stats:
+            obj_count = f"{stats.object_count:,}"
+            compression = f"{stats.average_compression_ratio:.1%}"
+            cached = "✓ (S3 cache)"
        else:
            obj_count = "N/A"
            compression = "N/A"
@@ -357,7 +378,7 @@ except KeyboardInterrupt:

 ## Session-Level Cache Management

-DeltaGlider maintains session-level caches for optimal performance in long-running applications.
+DeltaGlider maintains an encrypted reference cache for optimal performance in long-running applications.

 ### Long-Running Application Pattern

@@ -410,11 +431,8 @@ def handle_external_bucket_changes(bucket: str):
    print("External backup tool running...")
    run_external_backup_tool(bucket)  # Your external tool

-    # Clear cache to get fresh data
-    client.clear_cache()
-
-    # Get updated stats
-    stats_after = client.get_bucket_stats(bucket)
+    # Force a recompute of the cached stats
+    stats_after = client.get_bucket_stats(bucket, refresh_cache=True)
    print(f"After: {stats_after.object_count} objects")
    print(f"Added: {stats_after.object_count - stats_before.object_count} objects")

@@ -422,35 +440,6 @@ def handle_external_bucket_changes(bucket: str):
 handle_external_bucket_changes('backups')
 ```

-### Selective Cache Eviction
-
-```python
-def selective_cache_management():
-    """Manage cache for specific delta spaces."""
-
-    client = create_client()
-
-    # Upload to multiple delta spaces
-    versions = ['v1.0.0', 'v1.1.0', 'v1.2.0']
-
-    for version in versions:
-        client.upload(f"app-{version}.zip", f"s3://releases/{version}/")
-
-    # Update reference for specific version
-    print("Updating v1.1.0 reference...")
-    client.upload("new-reference.zip", "s3://releases/v1.1.0/")
-
-    # Evict only v1.1.0 cache (others remain cached)
-    client.evict_cache("s3://releases/v1.1.0/reference.bin")
-
-    # Next upload to v1.1.0 fetches fresh reference
-    # v1.0.0 and v1.2.0 still use cached references
-    client.upload("similar-file.zip", "s3://releases/v1.1.0/")
-
-# Example: Selective eviction
-selective_cache_management()
-```
-
 ### Testing with Clean Cache

 ```python
@@ -491,19 +480,18 @@ def measure_cache_performance(bucket: str):
    client = create_client()

    # Test 1: Cold cache
-    client.clear_cache()
    start = time.time()
-    stats1 = client.get_bucket_stats(bucket, detailed_stats=True)
+    stats1 = client.get_bucket_stats(bucket, mode='detailed', refresh_cache=True)
    cold_time = (time.time() - start) * 1000

    # Test 2: Warm cache
    start = time.time()
-    stats2 = client.get_bucket_stats(bucket, detailed_stats=True)
+    stats2 = client.get_bucket_stats(bucket, mode='detailed')
    warm_time = (time.time() - start) * 1000

    # Test 3: Quick stats from detailed cache
    start = time.time()
-    stats3 = client.get_bucket_stats(bucket, detailed_stats=False)
+    stats3 = client.get_bucket_stats(bucket, mode='quick')
    reuse_time = (time.time() - start) * 1000

    print(f"Cache Performance for {bucket}:")
@@ -1707,4 +1695,4 @@ files_to_upload = [
 results = uploader.upload_batch(files_to_upload)
 ```

-These examples demonstrate real-world usage patterns for DeltaGlider across various domains. Each example includes error handling, monitoring, and best practices for production deployments.
+These examples demonstrate real-world usage patterns for DeltaGlider across various domains. Each example includes error handling, monitoring, and best practices for production deployments.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -53,6 +53,7 @@ dependencies = [
    "click>=8.1.0",
    "cryptography>=42.0.0",
    "python-dateutil>=2.9.0",
+    "requests>=2.32.0",
 ]

 [project.urls]
@@ -109,6 +110,7 @@ dev-dependencies = [
    "mypy>=1.13.0",
    "boto3-stubs[s3]>=1.35.0",
    "types-python-dateutil>=2.9.0",
+    "types-requests>=2.32.0",
    "setuptools-scm>=8.0.0",
 ]

--- a/scripts/check_metadata.py
+++ b/scripts/check_metadata.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+"""Check which delta files are missing metadata."""
+
+import sys
+from pathlib import Path
+
+# Add src to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+from deltaglider import create_client
+
+
+def check_bucket_metadata(bucket: str) -> None:
+    """Check all delta files in a bucket for missing metadata.
+
+    Args:
+        bucket: S3 bucket name
+    """
+    client = create_client()
+
+    print(f"Checking delta files in bucket: {bucket}\n")
+    print("=" * 80)
+
+    # List all objects
+    response = client.service.storage.list_objects(bucket=bucket, max_keys=10000)
+
+    missing_metadata = []
+    has_metadata = []
+    total_delta_files = 0
+
+    for obj in response["objects"]:
+        key = obj["key"]
+
+        # Only check .delta files
+        if not key.endswith(".delta"):
+            continue
+
+        total_delta_files += 1
+
+        # Get metadata
+        obj_head = client.service.storage.head(f"{bucket}/{key}")
+
+        if not obj_head:
+            print(f"❌ {key}: Object not found")
+            continue
+
+        metadata = obj_head.metadata
+
+        # Check for required metadata fields
+        required_fields = ["file_size", "file_sha256", "ref_key", "ref_sha256", "delta_size"]
+        missing_fields = [f for f in required_fields if f not in metadata]
+
+        if missing_fields:
+            missing_metadata.append({
+                "key": key,
+                "missing_fields": missing_fields,
+                "has_metadata": bool(metadata),
+                "available_keys": list(metadata.keys()) if metadata else [],
+            })
+            status = "⚠️  MISSING"
+            detail = f"missing: {', '.join(missing_fields)}"
+        else:
+            has_metadata.append(key)
+            status = "✅ OK"
+            detail = f"file_size={metadata.get('file_size')}"
+
+        print(f"{status} {key}")
+        print(f"    {detail}")
+        if metadata:
+            print(f"    Available keys: {', '.join(metadata.keys())}")
+        print()
+
+    # Summary
+    print("=" * 80)
+    print(f"\nSummary:")
+    print(f"  Total delta files: {total_delta_files}")
+    print(f"  With complete metadata: {len(has_metadata)} ({len(has_metadata)/total_delta_files*100:.1f}%)")
+    print(f"  Missing metadata: {len(missing_metadata)} ({len(missing_metadata)/total_delta_files*100:.1f}%)")
+
+    if missing_metadata:
+        print(f"\n❌ Files with missing metadata:")
+        for item in missing_metadata:
+            print(f"  - {item['key']}")
+            print(f"    Missing: {', '.join(item['missing_fields'])}")
+            if item['available_keys']:
+                print(f"    Has: {', '.join(item['available_keys'])}")
+
+        print(f"\n💡 Recommendation:")
+        print(f"  These files should be re-uploaded to get proper metadata and accurate stats.")
+        print(f"  You can re-upload with: deltaglider cp <local-file> s3://{bucket}/<path>")
+    else:
+        print(f"\n✅ All delta files have complete metadata!")
+
+
+if __name__ == "__main__":
+    if len(sys.argv) != 2:
+        print("Usage: python check_metadata.py <bucket-name>")
+        sys.exit(1)
+
+    bucket_name = sys.argv[1]
+    check_bucket_metadata(bucket_name)
--- a/src/deltaglider/adapters/init.py
+++ b/src/deltaglider/adapters/init.py
@@ -6,20 +6,22 @@ from .cache_fs import FsCacheAdapter
 from .cache_memory import MemoryCache
 from .clock_utc import UtcClockAdapter
 from .diff_xdelta import XdeltaAdapter
+from .ec2_metadata import EC2MetadataAdapter
 from .hash_sha import Sha256Adapter
 from .logger_std import StdLoggerAdapter
 from .metrics_noop import NoopMetricsAdapter
 from .storage_s3 import S3StorageAdapter

 __all__ = [
-    "S3StorageAdapter",
-    "XdeltaAdapter",
-    "Sha256Adapter",
-    "FsCacheAdapter",
    "ContentAddressedCache",
+    "EC2MetadataAdapter",
    "EncryptedCache",
+    "FsCacheAdapter",
    "MemoryCache",
-    "UtcClockAdapter",
-    "StdLoggerAdapter",
    "NoopMetricsAdapter",
+    "S3StorageAdapter",
+    "Sha256Adapter",
+    "StdLoggerAdapter",
+    "UtcClockAdapter",
+    "XdeltaAdapter",
 ]
--- a/src/deltaglider/adapters/ec2_metadata.py
+++ b/src/deltaglider/adapters/ec2_metadata.py
@@ -0,0 +1,126 @@
+"""EC2 Instance Metadata Service (IMDS) adapter.
+
+Provides access to EC2 instance metadata using IMDSv2 with token-based authentication.
+Falls back gracefully when not running on EC2.
+"""
+
+import os
+
+import requests
+
+
+class EC2MetadataAdapter:
+    """Adapter for EC2 Instance Metadata Service (IMDSv2)."""
+
+    IMDS_BASE_URL = "http://169.254.169.254/latest"
+    TOKEN_URL = f"{IMDS_BASE_URL}/api/token"
+    TOKEN_TTL_SECONDS = 21600  # 6 hours
+    TOKEN_HEADER = "X-aws-ec2-metadata-token"
+    TIMEOUT_SECONDS = 1  # Fast timeout for non-EC2 environments
+
+    def __init__(self) -> None:
+        """Initialize EC2 metadata adapter."""
+        self._token: str | None = None
+        self._is_ec2: bool | None = None
+        self._region: str | None = None
+
+    def is_running_on_ec2(self) -> bool:
+        """Check if running on an EC2 instance.
+
+        Returns:
+            True if running on EC2, False otherwise
+
+        Note:
+            Result is cached after first check for performance.
+        """
+        if self._is_ec2 is not None:
+            return self._is_ec2
+
+        # Skip check if explicitly disabled
+        if os.environ.get("DG_DISABLE_EC2_DETECTION", "").lower() in ("true", "1", "yes"):
+            self._is_ec2 = False
+            return False
+
+        try:
+            # Try to get IMDSv2 token
+            self._token = self._get_token()
+            self._is_ec2 = self._token is not None
+        except Exception:
+            self._is_ec2 = False
+
+        return self._is_ec2
+
+    def get_region(self) -> str | None:
+        """Get the EC2 instance's AWS region.
+
+        Returns:
+            AWS region code (e.g., "us-east-1") or None if not on EC2
+
+        Note:
+            Result is cached after first successful fetch.
+        """
+        if not self.is_running_on_ec2():
+            return None
+
+        if self._region is not None:
+            return self._region
+
+        try:
+            if self._token:
+                response = requests.get(
+                    f"{self.IMDS_BASE_URL}/meta-data/placement/region",
+                    headers={self.TOKEN_HEADER: self._token},
+                    timeout=self.TIMEOUT_SECONDS,
+                )
+                if response.status_code == 200:
+                    self._region = response.text.strip()
+                    return self._region
+        except Exception:
+            pass
+
+        return None
+
+    def get_availability_zone(self) -> str | None:
+        """Get the EC2 instance's availability zone.
+
+        Returns:
+            Availability zone (e.g., "us-east-1a") or None if not on EC2
+        """
+        if not self.is_running_on_ec2():
+            return None
+
+        try:
+            if self._token:
+                response = requests.get(
+                    f"{self.IMDS_BASE_URL}/meta-data/placement/availability-zone",
+                    headers={self.TOKEN_HEADER: self._token},
+                    timeout=self.TIMEOUT_SECONDS,
+                )
+                if response.status_code == 200:
+                    return str(response.text.strip())
+        except Exception:
+            pass
+
+        return None
+
+    def _get_token(self) -> str | None:
+        """Get IMDSv2 token for authenticated metadata requests.
+
+        Returns:
+            IMDSv2 token or None if unable to retrieve
+
+        Note:
+            Uses IMDSv2 for security. IMDSv1 is not supported.
+        """
+        try:
+            response = requests.put(
+                self.TOKEN_URL,
+                headers={"X-aws-ec2-metadata-token-ttl-seconds": str(self.TOKEN_TTL_SECONDS)},
+                timeout=self.TIMEOUT_SECONDS,
+            )
+            if response.status_code == 200:
+                return response.text.strip()
+        except Exception:
+            pass
+
+        return None
--- a/src/deltaglider/adapters/storage_s3.py
+++ b/src/deltaglider/adapters/storage_s3.py
@@ -1,5 +1,6 @@
 """S3 storage adapter."""

+import logging
 import os
 from collections.abc import Iterator
 from pathlib import Path
@@ -8,11 +9,13 @@ from typing import TYPE_CHECKING, Any, BinaryIO, Optional
 import boto3
 from botocore.exceptions import ClientError

+from ..ports.storage import ObjectHead, PutResult, StoragePort
+
+logger = logging.getLogger(__name__)
+
 if TYPE_CHECKING:
    from mypy_boto3_s3.client import S3Client

-from ..ports.storage import ObjectHead, PutResult, StoragePort
-

 class S3StorageAdapter(StoragePort):
    """S3 implementation of StoragePort."""
@@ -55,12 +58,21 @@ class S3StorageAdapter(StoragePort):

        try:
            response = self.client.head_object(Bucket=bucket, Key=object_key)
+            extracted_metadata = self._extract_metadata(response.get("Metadata", {}))
+
+            # Debug: Log metadata received (to verify it's stored correctly)
+            if logger.isEnabledFor(logging.DEBUG):
+                logger.debug(
+                    f"HEAD {object_key}: Received metadata with {len(extracted_metadata)} keys: "
+                    f"{list(extracted_metadata.keys())}"
+                )
+
            return ObjectHead(
                key=object_key,
                size=response["ContentLength"],
                etag=response["ETag"].strip('"'),
                last_modified=response["LastModified"],
-                metadata=self._extract_metadata(response.get("Metadata", {})),
+                metadata=extracted_metadata,
            )
        except ClientError as e:
            if e.response["Error"]["Code"] == "404":
@@ -97,6 +109,7 @@ class S3StorageAdapter(StoragePort):
        delimiter: str = "",
        max_keys: int = 1000,
        start_after: str | None = None,
+        continuation_token: str | None = None,
    ) -> dict[str, Any]:
        """List objects with S3-compatible response.

@@ -105,7 +118,8 @@ class S3StorageAdapter(StoragePort):
            prefix: Filter results to keys beginning with prefix
            delimiter: Delimiter for grouping keys (e.g., '/' for folders)
            max_keys: Maximum number of keys to return
-            start_after: Start listing after this key
+            start_after: Start listing after this key (for first page only)
+            continuation_token: Token from previous response for pagination

        Returns:
            Dict with objects, common_prefixes, and pagination info
@@ -119,7 +133,11 @@ class S3StorageAdapter(StoragePort):
            params["Prefix"] = prefix
        if delimiter:
            params["Delimiter"] = delimiter
-        if start_after:
+
+        # Use ContinuationToken for pagination if available, otherwise StartAfter
+        if continuation_token:
+            params["ContinuationToken"] = continuation_token
+        elif start_after:
            params["StartAfter"] = start_after

        try:
@@ -191,6 +209,22 @@ class S3StorageAdapter(StoragePort):
        # AWS requires lowercase metadata keys
        clean_metadata = {k.lower(): v for k, v in metadata.items()}

+        # Calculate total metadata size (AWS has 2KB limit)
+        total_metadata_size = sum(len(k) + len(v) for k, v in clean_metadata.items())
+
+        if logger.isEnabledFor(logging.DEBUG):
+            logger.debug(
+                f"PUT {object_key}: Sending metadata with {len(clean_metadata)} keys "
+                f"({total_metadata_size} bytes): {list(clean_metadata.keys())}"
+            )
+
+        # Warn if approaching AWS metadata size limit (2KB per key, 2KB total for user metadata)
+        if total_metadata_size > 1800:  # Warn at 1.8KB
+            logger.warning(
+                f"PUT {object_key}: Metadata size ({total_metadata_size} bytes) approaching "
+                f"AWS S3 limit (2KB). Some metadata may be lost!"
+            )
+
        try:
            response = self.client.put_object(
                Bucket=bucket,
@@ -199,6 +233,33 @@ class S3StorageAdapter(StoragePort):
                ContentType=content_type,
                Metadata=clean_metadata,
            )
+
+            # VERIFICATION: Check if metadata was actually stored (especially for delta files)
+            if object_key.endswith(".delta") and clean_metadata:
+                try:
+                    # Verify metadata was stored by doing a HEAD immediately
+                    verify_response = self.client.head_object(Bucket=bucket, Key=object_key)
+                    stored_metadata = verify_response.get("Metadata", {})
+
+                    if not stored_metadata:
+                        logger.error(
+                            f"PUT {object_key}: CRITICAL - Metadata was sent but NOT STORED! "
+                            f"Sent {len(clean_metadata)} keys, received 0 keys back."
+                        )
+                    elif len(stored_metadata) < len(clean_metadata):
+                        missing_keys = set(clean_metadata.keys()) - set(stored_metadata.keys())
+                        logger.warning(
+                            f"PUT {object_key}: Metadata partially stored. "
+                            f"Sent {len(clean_metadata)} keys, stored {len(stored_metadata)} keys. "
+                            f"Missing keys: {missing_keys}"
+                        )
+                    elif logger.isEnabledFor(logging.DEBUG):
+                        logger.debug(
+                            f"PUT {object_key}: Metadata verified - all {len(clean_metadata)} keys stored"
+                        )
+                except Exception as e:
+                    logger.warning(f"PUT {object_key}: Could not verify metadata: {e}")
+
            return PutResult(
                etag=response["ETag"].strip('"'),
                version_id=response.get("VersionId"),
--- a/src/deltaglider/app/cli/aws_compat.py
+++ b/src/deltaglider/app/cli/aws_compat.py
@@ -1,28 +1,120 @@
 """AWS S3 CLI compatible commands."""

+import shutil
 import sys
 from pathlib import Path

 import click

-from ...core import DeltaService, DeltaSpace, ObjectKey
+from ...core import (
+    DeltaService,
+    DeltaSpace,
+    ObjectKey,
+    build_s3_url,
+    is_s3_url,
+)
+from ...core import parse_s3_url as core_parse_s3_url
+from .sync import fetch_s3_object_heads
+
+__all__ = [
+    "is_s3_path",
+    "parse_s3_url",
+    "determine_operation",
+    "upload_file",
+    "download_file",
+    "copy_s3_to_s3",
+    "migrate_s3_to_s3",
+    "handle_recursive",
+    "log_aws_region",
+]
+
+
+def log_aws_region(service: DeltaService, region_override: bool = False) -> None:
+    """Log the AWS region being used and warn about cross-region charges.
+
+    This function:
+    1. Detects if running on EC2
+    2. Compares EC2 region with S3 client region
+    3. Warns about potential cross-region data transfer charges
+    4. Helps users optimize for cost and performance
+
+    Args:
+        service: DeltaService instance with storage adapter
+        region_override: True if user explicitly specified --region flag
+    """
+    try:
+        from ...adapters.ec2_metadata import EC2MetadataAdapter
+        from ...adapters.storage_s3 import S3StorageAdapter
+
+        if not isinstance(service.storage, S3StorageAdapter):
+            return  # Not using S3 storage, skip
+
+        # Get S3 client region
+        s3_region = service.storage.client.meta.region_name
+        if not s3_region:
+            s3_region = "us-east-1"  # boto3 default
+
+        # Check if running on EC2
+        ec2_metadata = EC2MetadataAdapter()
+        if ec2_metadata.is_running_on_ec2():
+            ec2_region = ec2_metadata.get_region()
+            ec2_az = ec2_metadata.get_availability_zone()
+
+            # Log EC2 context
+            click.echo(f"EC2 Instance: {ec2_az or ec2_region or 'unknown'}")
+            click.echo(f"S3 Client Region: {s3_region}")
+
+            # Check for region mismatch
+            if ec2_region and ec2_region != s3_region:
+                if region_override:
+                    # User explicitly set --region, warn about costs
+                    click.echo("")
+                    click.secho(
+                        f"⚠️  WARNING: EC2 region={ec2_region} != S3 client region={s3_region}",
+                        fg="yellow",
+                        bold=True,
+                    )
+                    click.secho(
+                        f"    Expect cross-region/NAT data charges. Align regions (set client region={ec2_region})",
+                        fg="yellow",
+                    )
+                    click.secho(
+                        "    before proceeding. Or drop --region for automatic region resolution.",
+                        fg="yellow",
+                    )
+                    click.echo("")
+                else:
+                    # Auto-detected mismatch, but user can still cancel
+                    click.echo("")
+                    click.secho(
+                        f"ℹ️  INFO: EC2 region ({ec2_region}) differs from configured S3 region ({s3_region})",
+                        fg="cyan",
+                    )
+                    click.secho(
+                        f"    Consider using --region {ec2_region} to avoid cross-region charges.",
+                        fg="cyan",
+                    )
+                    click.echo("")
+            elif ec2_region and ec2_region == s3_region:
+                # Regions match - optimal configuration
+                click.secho("✓ Regions aligned - no cross-region charges", fg="green")
+        else:
+            # Not on EC2, just show S3 region
+            click.echo(f"S3 Client Region: {s3_region}")
+
+    except Exception:
+        pass  # Silently ignore errors getting region info


 def is_s3_path(path: str) -> bool:
    """Check if path is an S3 URL."""
-    return path.startswith("s3://")
+    return is_s3_url(path)


 def parse_s3_url(url: str) -> tuple[str, str]:
    """Parse S3 URL into bucket and key."""
-    if not url.startswith("s3://"):
-        raise ValueError(f"Invalid S3 URL: {url}")
-
-    s3_path = url[5:].rstrip("/")
-    parts = s3_path.split("/", 1)
-    bucket = parts[0]
-    key = parts[1] if len(parts) > 1 else ""
-    return bucket, key
+    parsed = core_parse_s3_url(url, strip_trailing_slash=True)
+    return parsed.bucket, parsed.key


 def determine_operation(source: str, dest: str) -> str:
@@ -57,6 +149,8 @@ def upload_file(

    delta_space = DeltaSpace(bucket=bucket, prefix="/".join(key.split("/")[:-1]))

+    dest_url = build_s3_url(bucket, key)
+
    try:
        # Check if delta should be disabled
        if no_delta:
@@ -66,7 +160,7 @@ def upload_file(

            if not quiet:
                file_size = local_path.stat().st_size
-                click.echo(f"upload: '{local_path}' to 's3://{bucket}/{key}' ({file_size} bytes)")
+                click.echo(f"upload: '{local_path}' to '{dest_url}' ({file_size} bytes)")
        else:
            # Use delta compression
            summary = service.put(local_path, delta_space, max_ratio)
@@ -75,12 +169,12 @@ def upload_file(
                if summary.delta_size:
                    ratio = round((summary.delta_size / summary.file_size) * 100, 1)
                    click.echo(
-                        f"upload: '{local_path}' to 's3://{bucket}/{summary.key}' "
+                        f"upload: '{local_path}' to '{build_s3_url(bucket, summary.key)}' "
                        f"(delta: {ratio}% of original)"
                    )
                else:
                    click.echo(
-                        f"upload: '{local_path}' to 's3://{bucket}/{summary.key}' "
+                        f"upload: '{local_path}' to '{build_s3_url(bucket, summary.key)}' "
                        f"(reference: {summary.file_size} bytes)"
                    )

@@ -112,7 +206,7 @@ def download_file(
                actual_key = delta_key
                obj_key = ObjectKey(bucket=bucket, key=delta_key)
                if not quiet:
-                    click.echo(f"Auto-detected delta: s3://{bucket}/{delta_key}")
+                    click.echo(f"Auto-detected delta: {build_s3_url(bucket, delta_key)}")

        # Determine output path
        if local_path is None:
@@ -136,7 +230,7 @@ def download_file(
        if not quiet:
            file_size = local_path.stat().st_size
            click.echo(
-                f"download: 's3://{bucket}/{actual_key}' to '{local_path}' ({file_size} bytes)"
+                f"download: '{build_s3_url(bucket, actual_key)}' to '{local_path}' ({file_size} bytes)"
            )

    except Exception as e:
@@ -149,31 +243,310 @@ def copy_s3_to_s3(
    source_url: str,
    dest_url: str,
    quiet: bool = False,
+    max_ratio: float | None = None,
+    no_delta: bool = False,
 ) -> None:
-    """Copy object between S3 locations."""
-    # For now, implement as download + upload
-    # TODO: Optimize with server-side copy when possible
+    """Copy object between S3 locations with optional delta compression.

+    This performs a direct S3-to-S3 transfer using streaming to preserve
+    the original file content and apply delta compression at the destination.
+    """
    source_bucket, source_key = parse_s3_url(source_url)
    dest_bucket, dest_key = parse_s3_url(dest_url)

    if not quiet:
-        click.echo(f"copy: 's3://{source_bucket}/{source_key}' to 's3://{dest_bucket}/{dest_key}'")
+        click.echo(
+            f"copy: '{build_s3_url(source_bucket, source_key)}' "
+            f"to '{build_s3_url(dest_bucket, dest_key)}'"
+        )

-    # Use temporary file
-    import tempfile
+    try:
+        # Get the source object as a stream
+        source_stream = service.storage.get(f"{source_bucket}/{source_key}")

-    with tempfile.NamedTemporaryFile(suffix=Path(source_key).suffix) as tmp:
-        tmp_path = Path(tmp.name)
+        # Determine the destination deltaspace
+        dest_key_parts = dest_key.split("/")
+        if len(dest_key_parts) > 1:
+            dest_prefix = "/".join(dest_key_parts[:-1])
+        else:
+            dest_prefix = ""

-        # Download from source
-        download_file(service, source_url, tmp_path, quiet=True)
+        dest_deltaspace = DeltaSpace(bucket=dest_bucket, prefix=dest_prefix)

-        # Upload to destination
-        upload_file(service, tmp_path, dest_url, quiet=True)
+        # If delta is disabled or max_ratio specified, use direct put
+        if no_delta:
+            # Direct storage put without delta compression
+            service.storage.put(f"{dest_bucket}/{dest_key}", source_stream, {})
+            if not quiet:
+                click.echo("Copy completed (no delta compression)")
+        else:
+            # Write to a temporary file and use override_name to preserve original filename
+            import tempfile

+            # Extract original filename from source
+            original_filename = Path(source_key).name
+
+            with tempfile.NamedTemporaryFile(delete=False, suffix=Path(source_key).suffix) as tmp:
+                tmp_path = Path(tmp.name)
+
+                # Write stream to temp file
+                with open(tmp_path, "wb") as f:
+                    shutil.copyfileobj(source_stream, f)
+
+            try:
+                # Use DeltaService.put() with override_name to preserve original filename
+                summary = service.put(
+                    tmp_path, dest_deltaspace, max_ratio, override_name=original_filename
+                )
+
+                if not quiet:
+                    if summary.delta_size:
+                        ratio = round((summary.delta_size / summary.file_size) * 100, 1)
+                        click.echo(f"Copy completed with delta compression ({ratio}% of original)")
+                    else:
+                        click.echo("Copy completed (stored as reference)")
+            finally:
+                # Clean up temp file
+                tmp_path.unlink(missing_ok=True)
+
+    except Exception as e:
+        click.echo(f"S3-to-S3 copy failed: {e}", err=True)
+        raise
+
+
+def migrate_s3_to_s3(
+    service: DeltaService,
+    source_url: str,
+    dest_url: str,
+    exclude: str | None = None,
+    include: str | None = None,
+    quiet: bool = False,
+    no_delta: bool = False,
+    max_ratio: float | None = None,
+    dry_run: bool = False,
+    skip_confirm: bool = False,
+    preserve_prefix: bool = False,
+    region_override: bool = False,
+) -> None:
+    """Migrate objects from one S3 location to another with delta compression.
+
+    Features:
+    - Resume support: Only copies files that don't exist in destination
+    - Progress tracking: Shows migration progress
+    - Confirmation prompt: Shows file count before starting
+    - Prefix preservation: Optionally preserves source prefix structure in destination
+    - EC2 region detection: Warns about cross-region data transfer charges
+
+    Args:
+        service: DeltaService instance
+        source_url: Source S3 URL
+        dest_url: Destination S3 URL
+        exclude: Pattern to exclude files
+        include: Pattern to include files
+        quiet: Suppress output
+        no_delta: Disable delta compression
+        max_ratio: Maximum delta/file ratio
+        dry_run: Show what would be migrated without migrating
+        skip_confirm: Skip confirmation prompt
+        preserve_prefix: Preserve source prefix in destination
+        region_override: True if user explicitly specified --region flag
+    """
+    import fnmatch
+
+    source_bucket, source_prefix = parse_s3_url(source_url)
+    dest_bucket, dest_prefix = parse_s3_url(dest_url)
+
+    # Ensure prefixes end with / if they exist
+    if source_prefix and not source_prefix.endswith("/"):
+        source_prefix += "/"
+    if dest_prefix and not dest_prefix.endswith("/"):
+        dest_prefix += "/"
+
+    # Determine the effective destination prefix based on preserve_prefix setting
+    effective_dest_prefix = dest_prefix
+    if preserve_prefix and source_prefix:
+        # Extract the last component of the source prefix (e.g., "prefix1/" from "path/to/prefix1/")
+        source_prefix_name = source_prefix.rstrip("/").split("/")[-1]
+        if source_prefix_name:
+            # Append source prefix name to destination
+            effective_dest_prefix = (dest_prefix or "") + source_prefix_name + "/"
+
+    if not quiet:
+        # Log AWS region being used (helps users verify their configuration)
+        # Pass region_override to warn about cross-region charges if user explicitly set --region
+        log_aws_region(service, region_override=region_override)
+
+        source_display = build_s3_url(source_bucket, source_prefix)
+        dest_display = build_s3_url(dest_bucket, dest_prefix)
+        effective_dest_display = build_s3_url(dest_bucket, effective_dest_prefix)
+
+        if preserve_prefix and source_prefix:
+            click.echo(f"Migrating from {source_display}")
+            click.echo(f"           to {effective_dest_display}")
+        else:
+            click.echo(f"Migrating from {source_display} to {dest_display}")
+        click.echo("Scanning source and destination buckets...")
+
+    # List source objects
+    source_list_prefix = f"{source_bucket}/{source_prefix}" if source_prefix else source_bucket
+    source_objects = []
+
+    for obj in service.storage.list(source_list_prefix):
+        # Skip reference.bin files (internal delta reference)
+        if obj.key.endswith("/reference.bin"):
+            continue
+        # Skip .delta files in source (we'll handle the original files)
+        if obj.key.endswith(".delta"):
+            continue
+
+        # Apply include/exclude filters
+        rel_key = obj.key.removeprefix(source_prefix) if source_prefix else obj.key
+        if exclude and fnmatch.fnmatch(rel_key, exclude):
+            continue
+        if include and not fnmatch.fnmatch(rel_key, include):
+            continue
+
+        source_objects.append(obj)
+
+    # List destination objects to detect what needs copying
+    dest_list_prefix = (
+        f"{dest_bucket}/{effective_dest_prefix}" if effective_dest_prefix else dest_bucket
+    )
+    dest_keys = set()
+
+    for obj in service.storage.list(dest_list_prefix):
+        # Get the relative key in destination
+        rel_key = obj.key.removeprefix(effective_dest_prefix) if effective_dest_prefix else obj.key
+        # Remove .delta suffix for comparison
+        if rel_key.endswith(".delta"):
+            rel_key = rel_key[:-6]
+        # Skip reference.bin
+        if not rel_key.endswith("/reference.bin"):
+            dest_keys.add(rel_key)
+
+    # Determine files to migrate (not in destination)
+    files_to_migrate = []
+    total_size = 0
+
+    for source_obj in source_objects:
+        # Get relative path from source prefix
+        rel_key = source_obj.key.removeprefix(source_prefix) if source_prefix else source_obj.key
+
+        # Check if already exists in destination
+        if rel_key not in dest_keys:
+            files_to_migrate.append((source_obj, rel_key))
+            total_size += source_obj.size
+
+    # Show summary and ask for confirmation
+    if not files_to_migrate:
        if not quiet:
-            click.echo("Copy completed")
+            click.echo("All files are already migrated. Nothing to do.")
+        return
+
+    if not quiet:
+
+        def format_bytes(size: int) -> str:
+            size_float = float(size)
+            for unit in ["B", "KB", "MB", "GB", "TB"]:
+                if size_float < 1024.0:
+                    return f"{size_float:.2f} {unit}"
+                size_float /= 1024.0
+            return f"{size_float:.2f} PB"
+
+        click.echo("")
+        click.echo(f"Files to migrate: {len(files_to_migrate)}")
+        click.echo(f"Total size: {format_bytes(total_size)}")
+        if len(dest_keys) > 0:
+            click.echo(f"Already migrated: {len(dest_keys)} files (will be skipped)")
+
+    # Handle dry run mode early (before confirmation prompt)
+    if dry_run:
+        if not quiet:
+            click.echo("\n--- DRY RUN MODE ---")
+            for _obj, rel_key in files_to_migrate[:10]:  # Show first 10 files
+                click.echo(f"  Would migrate: {rel_key}")
+            if len(files_to_migrate) > 10:
+                click.echo(f"  ... and {len(files_to_migrate) - 10} more files")
+        return
+
+    # Ask for confirmation before proceeding with actual migration
+    if not quiet and not skip_confirm:
+        click.echo("")
+        if not click.confirm("Do you want to proceed with the migration?"):
+            click.echo("Migration cancelled.")
+            return
+
+    # Perform migration
+    if not quiet:
+        click.echo(f"\nStarting migration of {len(files_to_migrate)} files...")
+
+    successful = 0
+    failed = 0
+    failed_files = []
+
+    for i, (source_obj, rel_key) in enumerate(files_to_migrate, 1):
+        source_s3_url = build_s3_url(source_bucket, source_obj.key)
+
+        # Construct destination URL using effective prefix
+        if effective_dest_prefix:
+            dest_key = effective_dest_prefix + rel_key
+        else:
+            dest_key = rel_key
+        dest_s3_url = build_s3_url(dest_bucket, dest_key)
+
+        try:
+            if not quiet:
+                progress = f"[{i}/{len(files_to_migrate)}]"
+                click.echo(f"{progress} Migrating {rel_key}...", nl=False)
+
+            # Copy with delta compression
+            copy_s3_to_s3(
+                service,
+                source_s3_url,
+                dest_s3_url,
+                quiet=True,
+                max_ratio=max_ratio,
+                no_delta=no_delta,
+            )
+
+            successful += 1
+            if not quiet:
+                click.echo(" ✓")
+
+        except Exception as e:
+            failed += 1
+            failed_files.append((rel_key, str(e)))
+            if not quiet:
+                click.echo(f" ✗ ({e})")
+
+    # Show final summary
+    if not quiet:
+        click.echo("")
+        click.echo("Migration Summary:")
+        click.echo(f"  Successfully migrated: {successful} files")
+        if failed > 0:
+            click.echo(f"  Failed: {failed} files")
+            click.echo("\nFailed files:")
+            for file, error in failed_files[:10]:  # Show first 10 failures
+                click.echo(f"    {file}: {error}")
+            if len(failed_files) > 10:
+                click.echo(f"    ... and {len(failed_files) - 10} more failures")
+
+        # Show compression statistics from cache if available (no bucket scan)
+        if successful > 0 and not no_delta:
+            try:
+                from ...client import DeltaGliderClient
+
+                client = DeltaGliderClient(service)
+                # Use cached stats only - don't scan bucket (prevents blocking)
+                cached_stats = client._get_cached_bucket_stats(dest_bucket, "quick")
+                if cached_stats and cached_stats.delta_objects > 0:
+                    click.echo(
+                        f"\nCompression achieved: {cached_stats.average_compression_ratio:.1%}"
+                    )
+                    click.echo(f"Space saved: {format_bytes(cached_stats.space_saved)}")
+            except Exception:
+                pass  # Ignore stats errors


 def handle_recursive(
@@ -228,10 +601,7 @@ def handle_recursive(
        dest_path = Path(dest)
        dest_path.mkdir(parents=True, exist_ok=True)

-        # List all objects with prefix
-        # Note: S3StorageAdapter.list() expects "bucket/prefix" format
-        list_prefix = f"{bucket}/{prefix}" if prefix else bucket
-        objects = list(service.storage.list(list_prefix))
+        objects = fetch_s3_object_heads(service, bucket, prefix)

        if not quiet:
            click.echo(f"Downloading {len(objects)} files...")
@@ -261,9 +631,22 @@ def handle_recursive(
            local_path.parent.mkdir(parents=True, exist_ok=True)

            # Download file
-            s3_url = f"s3://{bucket}/{obj.key}"
+            s3_url = build_s3_url(bucket, obj.key)
            download_file(service, s3_url, local_path, quiet)

-    else:
-        click.echo("S3-to-S3 recursive copy not yet implemented", err=True)
-        sys.exit(1)
+    elif operation == "copy":
+        # S3-to-S3 recursive copy with migration support
+        migrate_s3_to_s3(
+            service,
+            source,
+            dest,
+            exclude=exclude,
+            include=include,
+            quiet=quiet,
+            no_delta=no_delta,
+            max_ratio=max_ratio,
+            dry_run=False,
+            skip_confirm=True,  # Don't prompt for cp command
+            preserve_prefix=True,  # Always preserve prefix for cp -r
+            region_override=False,  # cp command doesn't track region override explicitly
+        )
--- a/src/deltaglider/app/cli/main.py
+++ b/src/deltaglider/app/cli/main.py
@@ -6,10 +6,13 @@ import os
 import shutil
 import sys
 import tempfile
+from datetime import UTC
 from pathlib import Path
+from typing import Any

 import click

+from ... import __version__
 from ...adapters import (
    NoopMetricsAdapter,
    S3StorageAdapter,
@@ -19,6 +22,7 @@ from ...adapters import (
    XdeltaAdapter,
 )
 from ...core import DeltaService, ObjectKey
+from ...core.config import DeltaGliderConfig
 from ...ports import MetricsPort
 from ...ports.cache import CachePort
 from .aws_compat import (
@@ -38,69 +42,87 @@ def create_service(
    endpoint_url: str | None = None,
    region: str | None = None,
    profile: str | None = None,
+    *,
+    config: DeltaGliderConfig | None = None,
 ) -> DeltaService:
-    """Create service with wired adapters."""
-    # Get config from environment
-    max_ratio = float(os.environ.get("DG_MAX_RATIO", "0.5"))
-    metrics_type = os.environ.get("DG_METRICS", "logging")  # Options: noop, logging, cloudwatch
+    """Create service with wired adapters.
+
+    Args:
+        log_level: Logging level (overridden by config.log_level if config provided).
+        endpoint_url: S3 endpoint URL (overridden by config if provided).
+        region: AWS region (overridden by config if provided).
+        profile: AWS profile (overridden by config if provided).
+        config: Optional pre-built config. If None, built from env vars + explicit params.
+    """
+    if config is None:
+        config = DeltaGliderConfig.from_env(
+            log_level=log_level,
+            endpoint_url=endpoint_url,
+            region=region,
+            profile=profile,
+        )

    # SECURITY: Always use ephemeral process-isolated cache
    cache_dir = Path(tempfile.mkdtemp(prefix="deltaglider-", dir="/tmp"))
    # Register cleanup handler to remove cache on exit
    atexit.register(lambda: shutil.rmtree(cache_dir, ignore_errors=True))

-    # Set AWS environment variables if provided
-    if endpoint_url:
-        os.environ["AWS_ENDPOINT_URL"] = endpoint_url
-    if region:
-        os.environ["AWS_DEFAULT_REGION"] = region
-    if profile:
-        os.environ["AWS_PROFILE"] = profile
+    # Set AWS environment variables if provided (for compatibility with other AWS tools)
+    if config.endpoint_url:
+        os.environ["AWS_ENDPOINT_URL"] = config.endpoint_url
+    if config.region:
+        os.environ["AWS_DEFAULT_REGION"] = config.region
+    if config.profile:
+        os.environ["AWS_PROFILE"] = config.profile
+
+    # Build boto3_kwargs for explicit parameter passing (preferred over env vars)
+    boto3_kwargs: dict[str, Any] = {}
+    if config.region:
+        boto3_kwargs["region_name"] = config.region

    # Create adapters
    hasher = Sha256Adapter()
-    storage = S3StorageAdapter(endpoint_url=endpoint_url)
+    storage = S3StorageAdapter(endpoint_url=config.endpoint_url, boto3_kwargs=boto3_kwargs)
    diff = XdeltaAdapter()

    # SECURITY: Configurable cache with encryption and backend selection
    from deltaglider.adapters import ContentAddressedCache, EncryptedCache, MemoryCache

-    # Select backend: memory or filesystem
-    cache_backend = os.environ.get("DG_CACHE_BACKEND", "filesystem")  # Options: filesystem, memory
    base_cache: CachePort
-    if cache_backend == "memory":
-        max_size_mb = int(os.environ.get("DG_CACHE_MEMORY_SIZE_MB", "100"))
-        base_cache = MemoryCache(hasher, max_size_mb=max_size_mb, temp_dir=cache_dir)
+    if config.cache_backend == "memory":
+        base_cache = MemoryCache(
+            hasher, max_size_mb=config.cache_memory_size_mb, temp_dir=cache_dir
+        )
    else:
-        # Filesystem-backed with Content-Addressed Storage
        base_cache = ContentAddressedCache(cache_dir, hasher)

    # Always apply encryption with ephemeral keys (security hardening)
-    # Encryption key is optional via DG_CACHE_ENCRYPTION_KEY (ephemeral if not set)
    cache: CachePort = EncryptedCache.from_env(base_cache)

    clock = UtcClockAdapter()
-    logger = StdLoggerAdapter(level=log_level)
+    logger = StdLoggerAdapter(level=config.log_level)

    # Create metrics adapter based on configuration
    metrics: MetricsPort
-    if metrics_type == "cloudwatch":
-        # Import here to avoid dependency if not used
+    if config.metrics_type == "cloudwatch":
        from ...adapters.metrics_cloudwatch import CloudWatchMetricsAdapter

        metrics = CloudWatchMetricsAdapter(
-            namespace=os.environ.get("DG_METRICS_NAMESPACE", "DeltaGlider"),
-            region=region,
-            endpoint_url=endpoint_url if endpoint_url and "localhost" in endpoint_url else None,
+            namespace=config.metrics_namespace,
+            region=config.region,
+            endpoint_url=(
+                config.endpoint_url
+                if config.endpoint_url and "localhost" in config.endpoint_url
+                else None
+            ),
        )
-    elif metrics_type == "logging":
+    elif config.metrics_type == "logging":
        from ...adapters.metrics_cloudwatch import LoggingMetricsAdapter

-        metrics = LoggingMetricsAdapter(log_level=log_level)
+        metrics = LoggingMetricsAdapter(log_level=config.log_level)
    else:
        metrics = NoopMetricsAdapter()

-    # Create service
    return DeltaService(
        storage=storage,
        diff=diff,
@@ -109,12 +131,27 @@ def create_service(
        clock=clock,
        logger=logger,
        metrics=metrics,
-        max_ratio=max_ratio,
+        max_ratio=config.max_ratio,
    )


+def _version_callback(ctx: click.Context, param: click.Parameter, value: bool) -> None:
+    """Callback for --version option."""
+    if value:
+        click.echo(f"deltaglider {__version__}")
+        ctx.exit(0)
+
+
@click.group()
@click.option("--debug", is_flag=True, help="Enable debug logging")
+@click.option(
+    "--version",
+    is_flag=True,
+    is_eager=True,
+    expose_value=False,
+    callback=_version_callback,
+    help="Show version and exit",
+)
@click.pass_context
 def cli(ctx: click.Context, debug: bool) -> None:
    """DeltaGlider - Delta-aware S3 file storage wrapper."""
@@ -172,9 +209,6 @@ def cp(

        # Handle recursive operations for directories
        if recursive:
-            if operation == "copy":
-                click.echo("S3-to-S3 recursive copy not yet implemented", err=True)
-                sys.exit(1)
            handle_recursive(
                service, source, dest, recursive, exclude, include, quiet, no_delta, max_ratio
            )
@@ -196,7 +230,7 @@ def cp(
            download_file(service, source, local_path, quiet)

        elif operation == "copy":
-            copy_s3_to_s3(service, source, dest, quiet)
+            copy_s3_to_s3(service, source, dest, quiet, max_ratio, no_delta)

    except ValueError as e:
        click.echo(f"Error: {e}", err=True)
@@ -469,24 +503,24 @@ def rm(

                # Report the results
                if not quiet:
-                    if result["deleted_count"] == 0:
+                    if result.deleted_count == 0:
                        click.echo(f"delete: No objects found with prefix: s3://{bucket}/{prefix}")
                    else:
-                        click.echo(f"Deleted {result['deleted_count']} object(s)")
+                        click.echo(f"Deleted {result.deleted_count} object(s)")

                        # Show warnings if any references were kept
-                        for warning in result.get("warnings", []):
+                        for warning in result.warnings:
                            if "Kept reference" in warning:
                                click.echo(
                                    f"Keeping reference file (still in use): s3://{bucket}/{warning.split()[2]}"
                                )

                # Report any errors
-                if result["failed_count"] > 0:
-                    for error in result.get("errors", []):
+                if result.failed_count > 0:
+                    for error in result.errors:
                        click.echo(f"Error: {error}", err=True)

-                    if result["failed_count"] > 0:
+                    if result.failed_count > 0:
                        sys.exit(1)

    except Exception as e:
@@ -604,20 +638,14 @@ def sync(
@click.pass_obj
 def verify(service: DeltaService, s3_url: str) -> None:
    """Verify integrity of delta file."""
-    # Parse S3 URL
-    if not s3_url.startswith("s3://"):
+    try:
+        bucket, key = parse_s3_url(s3_url)
+        if not key:
+            raise ValueError("Missing key")
+    except ValueError:
        click.echo(f"Error: Invalid S3 URL: {s3_url}", err=True)
        sys.exit(1)

-    s3_path = s3_url[5:]
-    parts = s3_path.split("/", 1)
-    if len(parts) != 2:
-        click.echo(f"Error: Invalid S3 URL: {s3_url}", err=True)
-        sys.exit(1)
-
-    bucket = parts[0]
-    key = parts[1]
-
    obj_key = ObjectKey(bucket=bucket, key=key)

    try:
@@ -641,37 +669,196 @@ def verify(service: DeltaService, s3_url: str) -> None:


@cli.command()
+@click.argument("source")
+@click.argument("dest")
+@click.option("--exclude", help="Exclude files matching pattern")
+@click.option("--include", help="Include only files matching pattern")
+@click.option("--quiet", "-q", is_flag=True, help="Suppress output")
+@click.option("--no-delta", is_flag=True, help="Disable delta compression")
+@click.option("--max-ratio", type=float, help="Max delta/file ratio (default: 0.5)")
+@click.option("--dry-run", is_flag=True, help="Show what would be migrated without migrating")
+@click.option("--yes", "-y", is_flag=True, help="Skip confirmation prompt")
+@click.option(
+    "--no-preserve-prefix", is_flag=True, help="Don't preserve source prefix in destination"
+)
+@click.option("--endpoint-url", help="Override S3 endpoint URL")
+@click.option("--region", help="AWS region")
+@click.option("--profile", help="AWS profile to use")
+@click.pass_obj
+def migrate(
+    service: DeltaService,
+    source: str,
+    dest: str,
+    exclude: str | None,
+    include: str | None,
+    quiet: bool,
+    no_delta: bool,
+    max_ratio: float | None,
+    dry_run: bool,
+    yes: bool,
+    no_preserve_prefix: bool,
+    endpoint_url: str | None,
+    region: str | None,
+    profile: str | None,
+) -> None:
+    """Migrate S3 bucket/prefix to DeltaGlider-compressed storage.
+
+    This command facilitates the migration of existing S3 objects to another bucket
+    with DeltaGlider compression. It supports:
+    - Resume capability: Only copies files that don't exist in destination
+    - Progress tracking: Shows migration progress
+    - Confirmation prompt: Shows file count before starting (use --yes to skip)
+    - Prefix preservation: By default, source prefix is preserved in destination
+
+    When migrating a prefix, the source prefix name is preserved by default:
+        s3://src/prefix1/ → s3://dest/      creates s3://dest/prefix1/
+        s3://src/a/b/c/  → s3://dest/x/    creates s3://dest/x/c/
+
+    Use --no-preserve-prefix to disable this behavior:
+        s3://src/prefix1/ → s3://dest/      creates s3://dest/ (files at root)
+
+    Examples:
+        deltaglider migrate s3://old-bucket/ s3://new-bucket/
+        deltaglider migrate s3://old-bucket/data/ s3://new-bucket/
+        deltaglider migrate --no-preserve-prefix s3://src/v1/ s3://dest/
+        deltaglider migrate --dry-run s3://old-bucket/ s3://new-bucket/
+        deltaglider migrate --yes --quiet s3://old-bucket/ s3://new-bucket/
+    """
+    from .aws_compat import is_s3_path, migrate_s3_to_s3
+
+    # Recreate service with AWS parameters if provided
+    if endpoint_url or region or profile:
+        service = create_service(
+            log_level=os.environ.get("DG_LOG_LEVEL", "INFO"),
+            endpoint_url=endpoint_url,
+            region=region,
+            profile=profile,
+        )
+
+    try:
+        # Validate both paths are S3
+        if not is_s3_path(source) or not is_s3_path(dest):
+            click.echo("Error: Both source and destination must be S3 paths", err=True)
+            sys.exit(1)
+
+        # Perform migration
+        migrate_s3_to_s3(
+            service,
+            source,
+            dest,
+            exclude=exclude,
+            include=include,
+            quiet=quiet,
+            no_delta=no_delta,
+            max_ratio=max_ratio,
+            dry_run=dry_run,
+            skip_confirm=yes,
+            preserve_prefix=not no_preserve_prefix,
+            region_override=region is not None,  # True if user explicitly specified --region
+        )
+
+    except Exception as e:
+        click.echo(f"Migration failed: {e}", err=True)
+        sys.exit(1)
+
+
+@cli.command(short_help="Get bucket statistics and compression metrics")
@click.argument("bucket")
-@click.option("--detailed", is_flag=True, help="Fetch detailed compression metrics (slower)")
+@click.option("--sampled", is_flag=True, help="Balanced mode: one sample per deltaspace (~5-15s)")
+@click.option(
+    "--detailed", is_flag=True, help="Most accurate: HEAD for all deltas (slowest, ~1min+)"
+)
+@click.option("--refresh", is_flag=True, help="Force cache refresh even if valid")
+@click.option("--no-cache", is_flag=True, help="Skip caching entirely (both read and write)")
@click.option("--json", "output_json", is_flag=True, help="Output in JSON format")
@click.pass_obj
-def stats(service: DeltaService, bucket: str, detailed: bool, output_json: bool) -> None:
-    """Get bucket statistics and compression metrics.
+def stats(
+    service: DeltaService,
+    bucket: str,
+    sampled: bool,
+    detailed: bool,
+    refresh: bool,
+    no_cache: bool,
+    output_json: bool,
+) -> None:
+    """Get bucket statistics and compression metrics with intelligent S3-based caching.

    BUCKET can be specified as:
      - s3://bucket-name/
      - s3://bucket-name
      - bucket-name
+
+    Modes (mutually exclusive):
+      - quick (default): Fast listing-only stats (~0.5s), approximate compression metrics
+      - --sampled: Balanced mode - one HEAD per deltaspace (~5-15s for typical buckets)
+      - --detailed: Most accurate - HEAD for every delta file (slowest, ~1min+ for large buckets)
+
+    Caching (NEW - massive performance improvement!):
+      Stats are cached in S3 at .deltaglider/stats_{mode}.json (one per mode).
+      Cache is automatically validated on every call using object count + size.
+      If bucket changed, stats are recomputed automatically.
+
+      Performance with cache:
+        - Cache hit: ~0.1s (200x faster than recomputation!)
+        - Cache miss: Full computation time (creates cache for next time)
+        - Cache invalid: Auto-recomputes when bucket changes
+
+    Options:
+      --refresh: Force cache refresh even if valid (use when you need fresh data now)
+      --no-cache: Skip caching entirely - always recompute (useful for testing/debugging)
+      --json: Output in JSON format for automation/scripting
+
+    Examples:
+      deltaglider stats mybucket                    # Fast (~0.1s with cache, ~0.5s without)
+      deltaglider stats mybucket --sampled          # Balanced accuracy/speed (~5-15s first run)
+      deltaglider stats mybucket --detailed         # Most accurate (~1-10min first run, ~0.1s cached)
+      deltaglider stats mybucket --refresh          # Force recomputation even if cached
+      deltaglider stats mybucket --no-cache         # Always compute fresh (skip cache)
+      deltaglider stats mybucket --json             # JSON output for scripts
+      deltaglider stats s3://mybucket/              # Also accepts s3:// URLs
+
+    Timing Logs:
+      Set DG_LOG_LEVEL=INFO to see detailed phase timing with timestamps:
+        [HH:MM:SS.mmm] Phase 1: LIST completed in 0.52s - Found 1523 objects
+        [HH:MM:SS.mmm] Phase 2: Cache HIT in 0.06s - Using cached stats
+        [HH:MM:SS.mmm] COMPLETE: Total time 0.58s
+
+    See docs/STATS_CACHING.md for complete documentation.
    """
    from ...client import DeltaGliderClient
+    from ...client_operations.stats import StatsMode

    try:
        # Parse bucket from S3 URL if needed
-        if bucket.startswith("s3://"):
-            # Remove s3:// prefix and any trailing slashes
-            bucket = bucket[5:].rstrip("/")
-            # Extract just the bucket name (first path component)
-            bucket = bucket.split("/")[0] if "/" in bucket else bucket
+        if is_s3_path(bucket):
+            bucket, _prefix = parse_s3_url(bucket)

        if not bucket:
            click.echo("Error: Invalid bucket name", err=True)
            sys.exit(1)

+        if sampled and detailed:
+            click.echo("Error: --sampled and --detailed cannot be used together", err=True)
+            sys.exit(1)
+
+        if refresh and no_cache:
+            click.echo("Error: --refresh and --no-cache cannot be used together", err=True)
+            sys.exit(1)
+
+        mode: StatsMode = "quick"
+        if sampled:
+            mode = "sampled"
+        if detailed:
+            mode = "detailed"
+
        # Create client from service
        client = DeltaGliderClient(service=service)

-        # Get bucket stats
-        bucket_stats = client.get_bucket_stats(bucket, detailed_stats=detailed)
+        # Get bucket stats with caching control
+        use_cache = not no_cache
+        bucket_stats = client.get_bucket_stats(
+            bucket, mode=mode, use_cache=use_cache, refresh_cache=refresh
+        )

        if output_json:
            # JSON output
@@ -718,6 +905,301 @@ def stats(service: DeltaService, bucket: str, detailed: bool, output_json: bool)
        sys.exit(1)


+@cli.command()
+@click.argument("bucket")
+@click.option("--dry-run", is_flag=True, help="Show what would be deleted without deleting")
+@click.option("--json", "output_json", is_flag=True, help="Output in JSON format")
+@click.option("--endpoint-url", help="Override S3 endpoint URL")
+@click.option("--region", help="AWS region")
+@click.option("--profile", help="AWS profile to use")
+@click.pass_obj
+def purge(
+    service: DeltaService,
+    bucket: str,
+    dry_run: bool,
+    output_json: bool,
+    endpoint_url: str | None,
+    region: str | None,
+    profile: str | None,
+) -> None:
+    """Purge expired temporary files from .deltaglider/tmp/.
+
+    This command scans the .deltaglider/tmp/ prefix in the specified bucket
+    and deletes any files whose dg-expires-at metadata indicates they have expired.
+
+    These temporary files are created by the rehydration process when deltaglider-compressed
+    files need to be made available for direct download (e.g., via presigned URLs).
+
+    BUCKET can be specified as:
+      - s3://bucket-name/
+      - s3://bucket-name
+      - bucket-name
+
+    Examples:
+      deltaglider purge mybucket                    # Purge expired files
+      deltaglider purge mybucket --dry-run          # Preview what would be deleted
+      deltaglider purge mybucket --json             # JSON output for automation
+      deltaglider purge s3://mybucket/              # Also accepts s3:// URLs
+    """
+    # Recreate service with AWS parameters if provided
+    if endpoint_url or region or profile:
+        service = create_service(
+            log_level=os.environ.get("DG_LOG_LEVEL", "INFO"),
+            endpoint_url=endpoint_url,
+            region=region,
+            profile=profile,
+        )
+
+    try:
+        # Parse bucket from S3 URL if needed
+        if is_s3_path(bucket):
+            bucket, _prefix = parse_s3_url(bucket)
+
+        if not bucket:
+            click.echo("Error: Invalid bucket name", err=True)
+            sys.exit(1)
+
+        # Perform the purge (or dry run simulation)
+        if dry_run:
+            # For dry run, we need to simulate what would be deleted
+            prefix = ".deltaglider/tmp/"
+            expired_files = []
+            total_size = 0
+
+            # List all objects in temp directory
+            from datetime import datetime
+
+            import boto3
+
+            s3_client = boto3.client(
+                "s3",
+                endpoint_url=endpoint_url or os.environ.get("AWS_ENDPOINT_URL"),
+                region_name=region,
+            )
+
+            paginator = s3_client.get_paginator("list_objects_v2")
+            page_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)
+
+            for page in page_iterator:
+                for obj in page.get("Contents", []):
+                    # Get object metadata
+                    head_response = s3_client.head_object(Bucket=bucket, Key=obj["Key"])
+                    metadata = head_response.get("Metadata", {})
+
+                    expires_at_str = metadata.get("dg-expires-at")
+                    if expires_at_str:
+                        try:
+                            expires_at = datetime.fromisoformat(
+                                expires_at_str.replace("Z", "+00:00")
+                            )
+                            if expires_at.tzinfo is None:
+                                expires_at = expires_at.replace(tzinfo=UTC)
+
+                            if datetime.now(UTC) >= expires_at:
+                                expired_files.append(
+                                    {
+                                        "key": obj["Key"],
+                                        "size": obj["Size"],
+                                        "expires_at": expires_at_str,
+                                    }
+                                )
+                                total_size += obj["Size"]
+                        except ValueError:
+                            pass
+
+            if output_json:
+                output = {
+                    "bucket": bucket,
+                    "prefix": prefix,
+                    "dry_run": True,
+                    "would_delete_count": len(expired_files),
+                    "total_size_to_free": total_size,
+                    "expired_files": expired_files[:10],  # Show first 10
+                }
+                click.echo(json.dumps(output, indent=2))
+            else:
+                click.echo(f"Dry run: Would delete {len(expired_files)} expired file(s)")
+                click.echo(f"Total space to free: {total_size:,} bytes")
+                if expired_files:
+                    click.echo("\nFiles that would be deleted (first 10):")
+                    for file_info in expired_files[:10]:
+                        click.echo(f"  {file_info['key']} (expires: {file_info['expires_at']})")
+                    if len(expired_files) > 10:
+                        click.echo(f"  ... and {len(expired_files) - 10} more")
+        else:
+            # Perform actual purge using the service method
+            result = service.purge_temp_files(bucket)
+
+            if output_json:
+                # JSON output
+                click.echo(json.dumps(result, indent=2))
+            else:
+                # Human-readable output
+                click.echo(f"Purge Statistics for bucket: {bucket}")
+                click.echo(f"{'=' * 60}")
+                click.echo(f"Expired files found:  {result['expired_count']}")
+                click.echo(f"Files deleted:        {result['deleted_count']}")
+                click.echo(f"Errors:               {result['error_count']}")
+                click.echo(f"Space freed:          {result['total_size_freed']:,} bytes")
+                click.echo(f"Duration:             {result['duration_seconds']:.2f} seconds")
+
+                if result["errors"]:
+                    click.echo("\nErrors encountered:")
+                    for error in result["errors"][:5]:
+                        click.echo(f"  - {error}")
+                    if len(result["errors"]) > 5:
+                        click.echo(f"  ... and {len(result['errors']) - 5} more errors")
+
+    except Exception as e:
+        click.echo(f"Error: {e}", err=True)
+        sys.exit(1)
+
+
+@cli.command("put-bucket-acl")
+@click.argument("bucket")
+@click.option(
+    "--acl",
+    type=click.Choice(["private", "public-read", "public-read-write", "authenticated-read"]),
+    help="Canned ACL to apply",
+)
+@click.option("--grant-full-control", help="Grants full control (e.g., id=account-id)")
+@click.option("--grant-read", help="Allows grantee to list objects (e.g., id=account-id)")
+@click.option("--grant-read-acp", help="Allows grantee to read the bucket ACL")
+@click.option("--grant-write", help="Allows grantee to create objects in the bucket")
+@click.option("--grant-write-acp", help="Allows grantee to write the ACL for the bucket")
+@click.option("--access-control-policy", help="Full ACL policy as JSON string")
+@click.option("--endpoint-url", help="Override S3 endpoint URL")
+@click.option("--region", help="AWS region")
+@click.option("--profile", help="AWS profile to use")
+@click.pass_obj
+def put_bucket_acl(
+    service: DeltaService,
+    bucket: str,
+    acl: str | None,
+    grant_full_control: str | None,
+    grant_read: str | None,
+    grant_read_acp: str | None,
+    grant_write: str | None,
+    grant_write_acp: str | None,
+    access_control_policy: str | None,
+    endpoint_url: str | None,
+    region: str | None,
+    profile: str | None,
+) -> None:
+    """Set the access control list (ACL) for an S3 bucket.
+
+    BUCKET can be specified as:
+      - s3://bucket-name
+      - bucket-name
+
+    Examples:
+        deltaglider put-bucket-acl my-bucket --acl private
+        deltaglider put-bucket-acl my-bucket --acl public-read
+        deltaglider put-bucket-acl my-bucket --grant-read id=12345
+    """
+    from ...client import DeltaGliderClient
+
+    # Recreate service with AWS parameters if provided
+    if endpoint_url or region or profile:
+        service = create_service(
+            log_level=os.environ.get("DG_LOG_LEVEL", "INFO"),
+            endpoint_url=endpoint_url,
+            region=region,
+            profile=profile,
+        )
+
+    try:
+        # Parse bucket from S3 URL if needed
+        if is_s3_path(bucket):
+            bucket, _prefix = parse_s3_url(bucket)
+
+        if not bucket:
+            click.echo("Error: Invalid bucket name", err=True)
+            sys.exit(1)
+
+        client = DeltaGliderClient(service=service)
+
+        kwargs: dict[str, Any] = {}
+        if acl is not None:
+            kwargs["ACL"] = acl
+        if grant_full_control is not None:
+            kwargs["GrantFullControl"] = grant_full_control
+        if grant_read is not None:
+            kwargs["GrantRead"] = grant_read
+        if grant_read_acp is not None:
+            kwargs["GrantReadACP"] = grant_read_acp
+        if grant_write is not None:
+            kwargs["GrantWrite"] = grant_write
+        if grant_write_acp is not None:
+            kwargs["GrantWriteACP"] = grant_write_acp
+        if access_control_policy is not None:
+            kwargs["AccessControlPolicy"] = json.loads(access_control_policy)
+
+        client.put_bucket_acl(Bucket=bucket, **kwargs)
+        click.echo(f"ACL updated for bucket: {bucket}")
+
+    except json.JSONDecodeError as e:
+        click.echo(f"Error: Invalid JSON for --access-control-policy: {e}", err=True)
+        sys.exit(1)
+    except Exception as e:
+        click.echo(f"Error: {e}", err=True)
+        sys.exit(1)
+
+
+@cli.command("get-bucket-acl")
+@click.argument("bucket")
+@click.option("--endpoint-url", help="Override S3 endpoint URL")
+@click.option("--region", help="AWS region")
+@click.option("--profile", help="AWS profile to use")
+@click.pass_obj
+def get_bucket_acl(
+    service: DeltaService,
+    bucket: str,
+    endpoint_url: str | None,
+    region: str | None,
+    profile: str | None,
+) -> None:
+    """Get the access control list (ACL) for an S3 bucket.
+
+    BUCKET can be specified as:
+      - s3://bucket-name
+      - bucket-name
+
+    Examples:
+        deltaglider get-bucket-acl my-bucket
+        deltaglider get-bucket-acl s3://my-bucket
+    """
+    from ...client import DeltaGliderClient
+
+    # Recreate service with AWS parameters if provided
+    if endpoint_url or region or profile:
+        service = create_service(
+            log_level=os.environ.get("DG_LOG_LEVEL", "INFO"),
+            endpoint_url=endpoint_url,
+            region=region,
+            profile=profile,
+        )
+
+    try:
+        # Parse bucket from S3 URL if needed
+        if is_s3_path(bucket):
+            bucket, _prefix = parse_s3_url(bucket)
+
+        if not bucket:
+            click.echo("Error: Invalid bucket name", err=True)
+            sys.exit(1)
+
+        client = DeltaGliderClient(service=service)
+        response = client.get_bucket_acl(Bucket=bucket)
+
+        # Output as JSON like aws s3api get-bucket-acl
+        click.echo(json.dumps(response, indent=2, default=str))
+
+    except Exception as e:
+        click.echo(f"Error: {e}", err=True)
+        sys.exit(1)
+
+
 def main() -> None:
    """Main entry point."""
    cli()
--- a/src/deltaglider/app/cli/sync.py
+++ b/src/deltaglider/app/cli/sync.py
@@ -5,9 +5,27 @@ from pathlib import Path
 import click

 from ...core import DeltaService
+from ...core.object_listing import list_all_objects, object_dict_to_head
 from ...ports import ObjectHead


+def fetch_s3_object_heads(service: DeltaService, bucket: str, prefix: str) -> list[ObjectHead]:
+    """Retrieve all objects for a prefix, falling back to iterator when needed."""
+    try:
+        listing = list_all_objects(
+            service.storage,
+            bucket=bucket,
+            prefix=prefix,
+            max_keys=1000,
+            logger=getattr(service, "logger", None),
+        )
+    except (RuntimeError, NotImplementedError):
+        list_prefix = f"{bucket}/{prefix}" if prefix else bucket
+        return list(service.storage.list(list_prefix))
+
+    return [object_dict_to_head(obj) for obj in listing.objects]
+
+
 def get_local_files(
    local_dir: Path, exclude: str | None = None, include: str | None = None
 ) -> dict[str, tuple[Path, int]]:
@@ -42,8 +60,7 @@ def get_s3_files(
    import fnmatch

    files = {}
-    list_prefix = f"{bucket}/{prefix}" if prefix else bucket
-    objects = service.storage.list(list_prefix)
+    objects = fetch_s3_object_heads(service, bucket, prefix)

    for obj in objects:
        # Skip reference.bin files (internal)
--- a/src/deltaglider/client.py
+++ b/src/deltaglider/client.py
@@ -9,6 +9,7 @@ from collections.abc import Callable
 from pathlib import Path
 from typing import Any, cast

+from . import __version__
 from .adapters.storage_s3 import S3StorageAdapter
 from .client_delete_helpers import delete_with_delta_suffix
 from .client_models import (
@@ -27,16 +28,23 @@ from .client_operations import (
    find_similar_files as _find_similar_files,
    generate_presigned_post as _generate_presigned_post,
    generate_presigned_url as _generate_presigned_url,
+    get_bucket_acl as _get_bucket_acl,
    get_bucket_stats as _get_bucket_stats,
    get_object_info as _get_object_info,
    list_buckets as _list_buckets,
+    put_bucket_acl as _put_bucket_acl,
    upload_batch as _upload_batch,
    upload_chunked as _upload_chunked,
 )
+
 # fmt: on
+from .client_operations.stats import StatsMode

 from .core import DeltaService, DeltaSpace, ObjectKey
 from .core.errors import NotFoundError
+from .core.models import DeleteResult
+from .core.object_listing import ObjectListing, list_objects_page
+from .core.s3_uri import parse_s3_url
 from .response_builders import (
    build_delete_response,
    build_get_response,
@@ -62,9 +70,8 @@ class DeltaGliderClient:
        """Initialize client with service."""
        self.service = service
        self.endpoint_url = endpoint_url
-        self._multipart_uploads: dict[str, Any] = {}  # Track multipart uploads
        # Session-scoped bucket statistics cache (cleared with the client lifecycle)
-        self._bucket_stats_cache: dict[str, dict[bool, BucketStats]] = {}
+        self._bucket_stats_cache: dict[str, dict[str, BucketStats]] = {}

    # -------------------------------------------------------------------------
    # Internal helpers
@@ -80,35 +87,45 @@ class DeltaGliderClient:
    def _store_bucket_stats_cache(
        self,
        bucket: str,
-        detailed_stats: bool,
+        mode: StatsMode,
        stats: BucketStats,
    ) -> None:
        """Store bucket statistics in the session cache."""
        bucket_cache = self._bucket_stats_cache.setdefault(bucket, {})
-        bucket_cache[detailed_stats] = stats
-        # Detailed stats are a superset of quick stats; reuse them for quick calls.
-        if detailed_stats:
-            bucket_cache[False] = stats
+        bucket_cache[mode] = stats
+        if mode == "detailed":
+            bucket_cache["sampled"] = stats
+            bucket_cache["quick"] = stats
+        elif mode == "sampled":
+            bucket_cache.setdefault("quick", stats)

-    def _get_cached_bucket_stats(self, bucket: str, detailed_stats: bool) -> BucketStats | None:
-        """Retrieve cached stats for a bucket, preferring detailed metrics when available."""
+    def _get_cached_bucket_stats(self, bucket: str, mode: StatsMode) -> BucketStats | None:
+        """Retrieve cached stats for a bucket, preferring more detailed metrics when available."""
        bucket_cache = self._bucket_stats_cache.get(bucket)
        if not bucket_cache:
            return None
-        if detailed_stats:
-            return bucket_cache.get(True)
-        return bucket_cache.get(False) or bucket_cache.get(True)
+        if mode == "detailed":
+            return bucket_cache.get("detailed")
+        if mode == "sampled":
+            return bucket_cache.get("sampled") or bucket_cache.get("detailed")
+        return (
+            bucket_cache.get("quick") or bucket_cache.get("sampled") or bucket_cache.get("detailed")
+        )

-    def _get_cached_bucket_stats_for_listing(self, bucket: str) -> tuple[BucketStats | None, bool]:
+    def _get_cached_bucket_stats_for_listing(
+        self, bucket: str
+    ) -> tuple[BucketStats | None, StatsMode | None]:
        """Return best cached stats for bucket listings."""
        bucket_cache = self._bucket_stats_cache.get(bucket)
        if not bucket_cache:
-            return (None, False)
-        if True in bucket_cache:
-            return (bucket_cache[True], True)
-        if False in bucket_cache:
-            return (bucket_cache[False], False)
-        return (None, False)
+            return (None, None)
+        if "detailed" in bucket_cache:
+            return (bucket_cache["detailed"], "detailed")
+        if "sampled" in bucket_cache:
+            return (bucket_cache["sampled"], "sampled")
+        if "quick" in bucket_cache:
+            return (bucket_cache["quick"], "quick")
+        return (None, None)

    # ============================================================================
    # Boto3-compatible APIs (matches S3 client interface)
@@ -328,34 +345,32 @@ class DeltaGliderClient:
                FetchMetadata=True  # Only fetches for delta files
            )
        """
-        # Use storage adapter's list_objects method
-        if hasattr(self.service.storage, "list_objects"):
-            result = self.service.storage.list_objects(
+        start_after = StartAfter or ContinuationToken
+        try:
+            listing = list_objects_page(
+                self.service.storage,
                bucket=Bucket,
                prefix=Prefix,
                delimiter=Delimiter,
                max_keys=MaxKeys,
-                start_after=StartAfter or ContinuationToken,  # Support both pagination methods
+                start_after=start_after,
            )
-        elif isinstance(self.service.storage, S3StorageAdapter):
-            result = self.service.storage.list_objects(
-                bucket=Bucket,
-                prefix=Prefix,
-                delimiter=Delimiter,
-                max_keys=MaxKeys,
-                start_after=StartAfter or ContinuationToken,
-            )
-        else:
-            # Fallback
-            result = {
-                "objects": [],
-                "common_prefixes": [],
-                "is_truncated": False,
-            }
+        except NotImplementedError:
+            if isinstance(self.service.storage, S3StorageAdapter):
+                listing = list_objects_page(
+                    self.service.storage,
+                    bucket=Bucket,
+                    prefix=Prefix,
+                    delimiter=Delimiter,
+                    max_keys=MaxKeys,
+                    start_after=start_after,
+                )
+            else:
+                listing = ObjectListing()

        # Convert to boto3-compatible S3Object TypedDicts (type-safe!)
        contents: list[S3Object] = []
-        for obj in result.get("objects", []):
+        for obj in listing.objects:
            # Skip reference.bin files (internal files, never exposed to users)
            if obj["key"].endswith("/reference.bin") or obj["key"] == "reference.bin":
                continue
@@ -403,14 +418,14 @@ class DeltaGliderClient:
                "Key": display_key,  # Use cleaned key without .delta
                "Size": obj["size"],
                "LastModified": obj.get("last_modified", ""),
-                "ETag": obj.get("etag"),
+                "ETag": str(obj.get("etag", "")),
                "StorageClass": obj.get("storage_class", "STANDARD"),
                "Metadata": deltaglider_metadata,
            }
            contents.append(s3_obj)

        # Build type-safe boto3-compatible CommonPrefix TypedDicts
-        common_prefixes = result.get("common_prefixes", [])
+        common_prefixes = listing.common_prefixes
        common_prefix_dicts: list[CommonPrefix] | None = (
            [CommonPrefix(Prefix=p) for p in common_prefixes] if common_prefixes else None
        )
@@ -425,8 +440,8 @@ class DeltaGliderClient:
                max_keys=MaxKeys,
                contents=contents,
                common_prefixes=common_prefix_dicts,
-                is_truncated=result.get("is_truncated", False),
-                next_continuation_token=result.get("next_continuation_token"),
+                is_truncated=listing.is_truncated,
+                next_continuation_token=listing.next_continuation_token,
                continuation_token=ContinuationToken,
            ),
        )
@@ -451,19 +466,17 @@ class DeltaGliderClient:

        # Build DeltaGlider-specific info
        deltaglider_info: dict[str, Any] = {
-            "Type": delete_result.get("type"),
-            "Deleted": delete_result.get("deleted", False),
+            "Type": delete_result.type,
+            "Deleted": delete_result.deleted,
        }

        # Add warnings if any
-        warnings = delete_result.get("warnings")
-        if warnings:
-            deltaglider_info["Warnings"] = warnings
+        if delete_result.warnings:
+            deltaglider_info["Warnings"] = delete_result.warnings

        # Add dependent delta count for references
-        dependent_deltas = delete_result.get("dependent_deltas")
-        if dependent_deltas:
-            deltaglider_info["DependentDeltas"] = dependent_deltas
+        if delete_result.dependent_deltas:
+            deltaglider_info["DependentDeltas"] = delete_result.dependent_deltas

        # Return as dict[str, Any] for public API (TypedDict is a dict at runtime!)
        response = cast(
@@ -505,21 +518,21 @@ class DeltaGliderClient:
                deleted_item = {"Key": key}
                if actual_key != key:
                    deleted_item["StoredKey"] = actual_key
-                if delete_result.get("type"):
-                    deleted_item["Type"] = delete_result["type"]
-                if delete_result.get("warnings"):
-                    deleted_item["Warnings"] = delete_result["warnings"]
+                if delete_result.type:
+                    deleted_item["Type"] = delete_result.type
+                if delete_result.warnings:
+                    deleted_item["Warnings"] = delete_result.warnings

                deleted.append(deleted_item)

                # Track delta-specific info
-                if delete_result.get("type") in ["delta", "reference"]:
+                if delete_result.type in ("delta", "reference"):
                    delta_info.append(
                        {
                            "Key": key,
                            "StoredKey": actual_key,
-                            "Type": delete_result["type"],
-                            "DependentDeltas": delete_result.get("dependent_deltas", 0),
+                            "Type": delete_result.type,
+                            "DependentDeltas": delete_result.dependent_deltas,
                        }
                    )

@@ -591,22 +604,22 @@ class DeltaGliderClient:
                    continue

                try:
-                    actual_key, delete_result = delete_with_delta_suffix(
+                    actual_key, single_del = delete_with_delta_suffix(
                        self.service, Bucket, candidate
                    )
-                    if delete_result.get("deleted"):
+                    if single_del.deleted:
                        single_results.append(
                            {
                                "requested_key": candidate,
                                "actual_key": actual_key,
-                                "result": delete_result,
+                                "result": single_del,
                            }
                        )
                except Exception as e:
                    single_errors.append(f"Failed to delete {candidate}: {e}")

        # Use core service's delta-aware recursive delete for remaining objects
-        delete_result = self.service.delete_recursive(Bucket, Prefix)
+        recursive_result = self.service.delete_recursive(Bucket, Prefix)

        # Aggregate results
        single_deleted_count = len(single_results)
@@ -615,37 +628,32 @@ class DeltaGliderClient:
        single_warnings: list[str] = []

        for item in single_results:
-            result = item["result"]
+            dr: DeleteResult = item["result"]
            requested_key = item["requested_key"]
            actual_key = item["actual_key"]
-            result_type = result.get("type", "other")
-            if result_type not in single_counts:
-                result_type = "other"
+            result_type = dr.type if dr.type in single_counts else "other"
            single_counts[result_type] += 1
-            detail = {
+            detail: dict[str, Any] = {
                "Key": requested_key,
-                "Type": result.get("type"),
-                "DependentDeltas": result.get("dependent_deltas", 0),
-                "Warnings": result.get("warnings", []),
+                "Type": dr.type,
+                "DependentDeltas": dr.dependent_deltas,
+                "Warnings": dr.warnings,
            }
            if actual_key != requested_key:
                detail["StoredKey"] = actual_key
            single_details.append(detail)
-            warnings = result.get("warnings")
-            if warnings:
-                single_warnings.extend(warnings)
+            if dr.warnings:
+                single_warnings.extend(dr.warnings)

-        deleted_count = cast(int, delete_result.get("deleted_count", 0)) + single_deleted_count
-        failed_count = cast(int, delete_result.get("failed_count", 0)) + len(single_errors)
+        deleted_count = recursive_result.deleted_count + single_deleted_count
+        failed_count = recursive_result.failed_count + len(single_errors)

-        deltas_deleted = cast(int, delete_result.get("deltas_deleted", 0)) + single_counts["delta"]
-        references_deleted = (
-            cast(int, delete_result.get("references_deleted", 0)) + single_counts["reference"]
-        )
-        direct_deleted = cast(int, delete_result.get("direct_deleted", 0)) + single_counts["direct"]
-        other_deleted = cast(int, delete_result.get("other_deleted", 0)) + single_counts["other"]
+        deltas_deleted = recursive_result.deltas_deleted + single_counts["delta"]
+        references_deleted = recursive_result.references_deleted + single_counts["reference"]
+        direct_deleted = recursive_result.direct_deleted + single_counts["direct"]
+        other_deleted = recursive_result.other_deleted + single_counts["other"]

-        response = {
+        response: dict[str, Any] = {
            "ResponseMetadata": {
                "HTTPStatusCode": 200,
            },
@@ -659,13 +667,11 @@ class DeltaGliderClient:
            },
        }

-        errors = delete_result.get("errors")
-        if errors:
-            response["Errors"] = cast(list[str], errors)
+        if recursive_result.errors:
+            response["Errors"] = recursive_result.errors

-        warnings = delete_result.get("warnings")
-        if warnings:
-            response["Warnings"] = cast(list[str], warnings)
+        if recursive_result.warnings:
+            response["Warnings"] = recursive_result.warnings

        if single_errors:
            errors_list = cast(list[str], response.setdefault("Errors", []))
@@ -736,14 +742,9 @@ class DeltaGliderClient:
        """
        file_path = Path(file_path)

-        # Parse S3 URL
-        if not s3_url.startswith("s3://"):
-            raise ValueError(f"Invalid S3 URL: {s3_url}")
-
-        s3_path = s3_url[5:].rstrip("/")
-        parts = s3_path.split("/", 1)
-        bucket = parts[0]
-        prefix = parts[1] if len(parts) > 1 else ""
+        address = parse_s3_url(s3_url, strip_trailing_slash=True)
+        bucket = address.bucket
+        prefix = address.key

        # Create delta space and upload
        delta_space = DeltaSpace(bucket=bucket, prefix=prefix)
@@ -776,17 +777,9 @@ class DeltaGliderClient:
        """
        output_path = Path(output_path)

-        # Parse S3 URL
-        if not s3_url.startswith("s3://"):
-            raise ValueError(f"Invalid S3 URL: {s3_url}")
-
-        s3_path = s3_url[5:]
-        parts = s3_path.split("/", 1)
-        if len(parts) < 2:
-            raise ValueError(f"S3 URL must include key: {s3_url}")
-
-        bucket = parts[0]
-        key = parts[1]
+        address = parse_s3_url(s3_url, allow_empty_key=False)
+        bucket = address.bucket
+        key = address.key

        # Auto-append .delta if the file doesn't exist without it
        # This allows users to specify the original name and we'll find the delta
@@ -812,17 +805,9 @@ class DeltaGliderClient:
        Returns:
            True if verification passed, False otherwise
        """
-        # Parse S3 URL
-        if not s3_url.startswith("s3://"):
-            raise ValueError(f"Invalid S3 URL: {s3_url}")
-
-        s3_path = s3_url[5:]
-        parts = s3_path.split("/", 1)
-        if len(parts) < 2:
-            raise ValueError(f"S3 URL must include key: {s3_url}")
-
-        bucket = parts[0]
-        key = parts[1]
+        address = parse_s3_url(s3_url, allow_empty_key=False)
+        bucket = address.bucket
+        key = address.key

        obj_key = ObjectKey(bucket=bucket, key=key)
        result = self.service.verify(obj_key)
@@ -965,39 +950,62 @@ class DeltaGliderClient:
        result: ObjectInfo = _get_object_info(self, s3_url)
        return result

-    def get_bucket_stats(self, bucket: str, detailed_stats: bool = False) -> BucketStats:
-        """Get statistics for a bucket with optional detailed compression metrics.
+    def get_bucket_stats(
+        self,
+        bucket: str,
+        mode: StatsMode = "quick",
+        use_cache: bool = True,
+        refresh_cache: bool = False,
+    ) -> BucketStats:
+        """Get statistics for a bucket with selectable accuracy modes and S3-based caching.

-        This method provides two modes:
-        - Quick stats (default): Fast overview using LIST only (~50ms)
-        - Detailed stats: Accurate compression metrics with HEAD requests (slower)
+        Modes:
+            - ``quick``: Fast listing-only stats (delta compression approximated).
+            - ``sampled``: Fetch one delta HEAD per delta-space and reuse the ratio.
+            - ``detailed``: Fetch metadata for every delta object (slowest, most accurate).
+
+        Caching:
+            - Stats are cached in S3 at ``.deltaglider/stats_{mode}.json``
+            - Cache is automatically validated on every call (uses LIST operation)
+            - If bucket changed, cache is recomputed automatically
+            - Use ``refresh_cache=True`` to force recomputation
+            - Use ``use_cache=False`` to skip caching entirely

        Args:
            bucket: S3 bucket name
-            detailed_stats: If True, fetch accurate compression ratios for delta files (default: False)
+            mode: Stats mode ("quick", "sampled", or "detailed")
+            use_cache: If True, use S3-cached stats when available (default: True)
+            refresh_cache: If True, force cache recomputation even if valid (default: False)

        Returns:
            BucketStats with compression and space savings info

        Performance:
-            - With detailed_stats=False: ~50ms for any bucket size (1 LIST call per 1000 objects)
-            - With detailed_stats=True: ~2-3s per 1000 objects (adds HEAD calls for delta files only)
+            - With cache hit: ~50-100ms (LIST + cache read + validation)
+            - quick (no cache): ~50ms per 1000 objects (LIST only)
+            - sampled (no cache): ~60 HEAD calls per 60 delta-spaces plus LIST
+            - detailed (no cache): ~2-3s per 1000 delta objects (LIST + HEAD per delta)

        Example:
-            # Quick stats for dashboard display
+            # Quick stats with caching (fast, ~100ms)
            stats = client.get_bucket_stats('releases')
-            print(f"Objects: {stats.object_count}, Size: {stats.total_size}")

-            # Detailed stats for analytics (slower but accurate)
-            stats = client.get_bucket_stats('releases', detailed_stats=True)
-            print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
+            # Force refresh (slow, recomputes everything)
+            stats = client.get_bucket_stats('releases', refresh_cache=True)
+
+            # Skip cache entirely
+            stats = client.get_bucket_stats('releases', use_cache=False)
+
+            # Detailed stats with caching
+            stats = client.get_bucket_stats('releases', mode='detailed')
        """
-        cached = self._get_cached_bucket_stats(bucket, detailed_stats)
-        if cached:
-            return cached
+        if mode not in {"quick", "sampled", "detailed"}:
+            raise ValueError(f"Unknown stats mode: {mode}")

-        result: BucketStats = _get_bucket_stats(self, bucket, detailed_stats)
-        self._store_bucket_stats_cache(bucket, detailed_stats, result)
+        # Use S3-based caching from stats.py (replaces old in-memory cache)
+        result: BucketStats = _get_bucket_stats(
+            self, bucket, mode=mode, use_cache=use_cache, refresh_cache=refresh_cache
+        )
        return result

    def generate_presigned_url(
@@ -1114,6 +1122,63 @@ class DeltaGliderClient:
        """
        return _list_buckets(self, **kwargs)

+    def put_bucket_acl(
+        self,
+        Bucket: str,
+        ACL: str | None = None,
+        AccessControlPolicy: dict[str, Any] | None = None,
+        GrantFullControl: str | None = None,
+        GrantRead: str | None = None,
+        GrantReadACP: str | None = None,
+        GrantWrite: str | None = None,
+        GrantWriteACP: str | None = None,
+        **kwargs: Any,
+    ) -> dict[str, Any]:
+        """Set the ACL for an S3 bucket (boto3-compatible passthrough).
+
+        Args:
+            Bucket: Bucket name
+            ACL: Canned ACL (private, public-read, public-read-write, authenticated-read)
+            AccessControlPolicy: Full ACL policy dict
+            GrantFullControl: Grants full control to the grantee
+            GrantRead: Allows grantee to list objects in the bucket
+            GrantReadACP: Allows grantee to read the bucket ACL
+            GrantWrite: Allows grantee to create objects in the bucket
+            GrantWriteACP: Allows grantee to write the ACL for the bucket
+            **kwargs: Additional S3 parameters (for compatibility)
+
+        Returns:
+            Response dict with status
+        """
+        return _put_bucket_acl(
+            self,
+            Bucket,
+            ACL=ACL,
+            AccessControlPolicy=AccessControlPolicy,
+            GrantFullControl=GrantFullControl,
+            GrantRead=GrantRead,
+            GrantReadACP=GrantReadACP,
+            GrantWrite=GrantWrite,
+            GrantWriteACP=GrantWriteACP,
+            **kwargs,
+        )
+
+    def get_bucket_acl(
+        self,
+        Bucket: str,
+        **kwargs: Any,
+    ) -> dict[str, Any]:
+        """Get the ACL for an S3 bucket (boto3-compatible passthrough).
+
+        Args:
+            Bucket: Bucket name
+            **kwargs: Additional S3 parameters (for compatibility)
+
+        Returns:
+            Response dict with Owner and Grants
+        """
+        return _get_bucket_acl(self, Bucket, **kwargs)
+
    def _parse_tagging(self, tagging: str) -> dict[str, str]:
        """Parse URL-encoded tagging string to dict."""
        tags = {}
@@ -1205,6 +1270,116 @@ class DeltaGliderClient:
        self._invalidate_bucket_stats_cache()
        self.service.cache.clear()

+    def rehydrate_for_download(self, Bucket: str, Key: str, ExpiresIn: int = 3600) -> str | None:
+        """Rehydrate a deltaglider-compressed file for direct download.
+
+        If the file is deltaglider-compressed, this will:
+        1. Download and decompress the file
+        2. Re-upload to .deltaglider/tmp/ with expiration metadata
+        3. Return the new temporary file key
+
+        If the file is not deltaglider-compressed, returns None.
+
+        Args:
+            Bucket: S3 bucket name
+            Key: Object key
+            ExpiresIn: How long the temporary file should exist (seconds)
+
+        Returns:
+            New key for temporary file, or None if not deltaglider-compressed
+
+        Example:
+            >>> client = create_client()
+            >>> temp_key = client.rehydrate_for_download(
+            ...     Bucket='my-bucket',
+            ...     Key='large-file.zip.delta',
+            ...     ExpiresIn=3600  # 1 hour
+            ... )
+            >>> if temp_key:
+            ...     # Generate presigned URL for the temporary file
+            ...     url = client.generate_presigned_url(
+            ...         'get_object',
+            ...         Params={'Bucket': 'my-bucket', 'Key': temp_key},
+            ...         ExpiresIn=3600
+            ...     )
+        """
+        return self.service.rehydrate_for_download(Bucket, Key, ExpiresIn)
+
+    def generate_presigned_url_with_rehydration(
+        self,
+        Bucket: str,
+        Key: str,
+        ExpiresIn: int = 3600,
+    ) -> str:
+        """Generate a presigned URL with automatic rehydration for deltaglider files.
+
+        This method handles both regular and deltaglider-compressed files:
+        - For regular files: Returns a standard presigned URL
+        - For deltaglider files: Rehydrates to temporary location and returns presigned URL
+
+        Args:
+            Bucket: S3 bucket name
+            Key: Object key
+            ExpiresIn: URL expiration time in seconds
+
+        Returns:
+            Presigned URL for direct download
+
+        Example:
+            >>> client = create_client()
+            >>> # Works for both regular and deltaglider files
+            >>> url = client.generate_presigned_url_with_rehydration(
+            ...     Bucket='my-bucket',
+            ...     Key='any-file.zip',  # or 'any-file.zip.delta'
+            ...     ExpiresIn=3600
+            ... )
+            >>> print(f"Download URL: {url}")
+        """
+        # Try to rehydrate if it's a deltaglider file
+        temp_key = self.rehydrate_for_download(Bucket, Key, ExpiresIn)
+
+        # Use the temporary key if rehydration occurred, otherwise use original
+        download_key = temp_key if temp_key else Key
+
+        # Extract the original filename for Content-Disposition header
+        original_filename = Key.removesuffix(".delta") if Key.endswith(".delta") else Key
+        if "/" in original_filename:
+            original_filename = original_filename.split("/")[-1]
+
+        # Generate presigned URL with Content-Disposition to force correct filename
+        params = {"Bucket": Bucket, "Key": download_key}
+        if temp_key:
+            # For rehydrated files, set Content-Disposition to use original filename
+            params["ResponseContentDisposition"] = f'attachment; filename="{original_filename}"'
+
+        return self.generate_presigned_url("get_object", Params=params, ExpiresIn=ExpiresIn)
+
+    def purge_temp_files(self, Bucket: str) -> dict[str, Any]:
+        """Purge expired temporary files from .deltaglider/tmp/.
+
+        Scans the .deltaglider/tmp/ prefix and deletes any files
+        whose dg-expires-at metadata indicates they have expired.
+
+        Args:
+            Bucket: S3 bucket to purge temp files from
+
+        Returns:
+            dict with purge statistics including:
+            - deleted_count: Number of files deleted
+            - expired_count: Number of expired files found
+            - error_count: Number of errors encountered
+            - total_size_freed: Total bytes freed
+            - duration_seconds: Operation duration
+            - errors: List of error messages
+
+        Example:
+            >>> client = create_client()
+            >>> result = client.purge_temp_files(Bucket='my-bucket')
+            >>> print(f"Deleted {result['deleted_count']} expired files")
+            >>> print(f"Freed {result['total_size_freed']} bytes")
+        """
+        return self.service.purge_temp_files(Bucket)
+

 def create_client(
    endpoint_url: str | None = None,
@@ -1271,7 +1446,7 @@ def create_client(
    )

    # SECURITY: Always use ephemeral process-isolated cache
-    cache_dir = Path(tempfile.mkdtemp(prefix="deltaglider-", dir="/tmp"))
+    cache_dir = Path(tempfile.mkdtemp(prefix="deltaglider-"))
    # Register cleanup handler to remove cache on exit
    atexit.register(lambda: shutil.rmtree(cache_dir, ignore_errors=True))

@@ -1311,8 +1486,8 @@ def create_client(
    logger = StdLoggerAdapter(level=log_level)
    metrics = NoopMetricsAdapter()

-    # Get default values
-    tool_version = kwargs.pop("tool_version", "deltaglider/5.0.0")
+    # Get default values (use real package version)
+    tool_version = kwargs.pop("tool_version", f"deltaglider/{__version__}")
    max_ratio = kwargs.pop("max_ratio", 0.5)

    # Create service
--- a/src/deltaglider/client_delete_helpers.py
+++ b/src/deltaglider/client_delete_helpers.py
@@ -2,11 +2,12 @@

 from .core import DeltaService, ObjectKey
 from .core.errors import NotFoundError
+from .core.models import DeleteResult


 def delete_with_delta_suffix(
    service: DeltaService, bucket: str, key: str
-) -> tuple[str, dict[str, object]]:
+) -> tuple[str, DeleteResult]:
    """Delete an object, retrying with '.delta' suffix when needed.

    Args:
@@ -15,7 +16,7 @@ def delete_with_delta_suffix(
        key: Requested key (without forcing .delta suffix).

    Returns:
-        Tuple containing the actual key deleted in storage and the delete result dict.
+        Tuple containing the actual key deleted in storage and the DeleteResult.

    Raises:
        NotFoundError: Propagated when both the direct and '.delta' keys are missing.
--- a/src/deltaglider/client_models.py
+++ b/src/deltaglider/client_models.py
@@ -97,3 +97,4 @@ class BucketStats:
    average_compression_ratio: float
    delta_objects: int
    direct_objects: int
+    object_limit_reached: bool = False
--- a/src/deltaglider/client_operations/init.py
+++ b/src/deltaglider/client_operations/init.py
@@ -8,7 +8,7 @@ This package contains modular operation implementations:
 """

 from .batch import download_batch, upload_batch, upload_chunked
-from .bucket import create_bucket, delete_bucket, list_buckets
+from .bucket import create_bucket, delete_bucket, get_bucket_acl, list_buckets, put_bucket_acl
 from .presigned import generate_presigned_post, generate_presigned_url
 from .stats import (
    estimate_compression,
@@ -21,7 +21,9 @@ __all__ = [
    # Bucket operations
    "create_bucket",
    "delete_bucket",
+    "get_bucket_acl",
    "list_buckets",
+    "put_bucket_acl",
    # Presigned operations
    "generate_presigned_url",
    "generate_presigned_post",
--- a/src/deltaglider/client_operations/bucket.py
+++ b/src/deltaglider/client_operations/bucket.py
@@ -4,6 +4,8 @@ This module contains boto3-compatible bucket operations:
 - create_bucket
 - delete_bucket
 - list_buckets
+- put_bucket_acl
+- get_bucket_acl
 """

 from typing import Any
@@ -145,11 +147,12 @@ def list_buckets(
                bucket_data = dict(bucket_entry)
                name = bucket_data.get("Name")
                if isinstance(name, str) and name:
-                    cached_stats, detailed = client._get_cached_bucket_stats_for_listing(name)
-                    if cached_stats is not None:
+                    cached_stats, cached_mode = client._get_cached_bucket_stats_for_listing(name)
+                    if cached_stats is not None and cached_mode is not None:
                        bucket_data["DeltaGliderStats"] = {
                            "Cached": True,
-                            "Detailed": detailed,
+                            "Mode": cached_mode,
+                            "Detailed": cached_mode == "detailed",
                            "ObjectCount": cached_stats.object_count,
                            "TotalSize": cached_stats.total_size,
                            "CompressedSize": cached_stats.compressed_size,
@@ -172,3 +175,101 @@ def list_buckets(
            raise RuntimeError(f"Failed to list buckets: {e}") from e
    else:
        raise NotImplementedError("Storage adapter does not support bucket listing")
+
+
+def put_bucket_acl(
+    client: Any,  # DeltaGliderClient (avoiding circular import)
+    Bucket: str,
+    ACL: str | None = None,
+    AccessControlPolicy: dict[str, Any] | None = None,
+    GrantFullControl: str | None = None,
+    GrantRead: str | None = None,
+    GrantReadACP: str | None = None,
+    GrantWrite: str | None = None,
+    GrantWriteACP: str | None = None,
+    **kwargs: Any,
+) -> dict[str, Any]:
+    """Set the ACL for an S3 bucket (boto3-compatible passthrough).
+
+    Args:
+        client: DeltaGliderClient instance
+        Bucket: Bucket name
+        ACL: Canned ACL (private, public-read, public-read-write, authenticated-read)
+        AccessControlPolicy: Full ACL policy dict
+        GrantFullControl: Grants full control to the grantee
+        GrantRead: Allows grantee to list objects in the bucket
+        GrantReadACP: Allows grantee to read the bucket ACL
+        GrantWrite: Allows grantee to create objects in the bucket
+        GrantWriteACP: Allows grantee to write the ACL for the bucket
+        **kwargs: Additional S3 parameters (for compatibility)
+
+    Returns:
+        Response dict with status
+
+    Example:
+        >>> client = create_client()
+        >>> client.put_bucket_acl(Bucket='my-bucket', ACL='public-read')
+    """
+    storage_adapter = client.service.storage
+
+    if hasattr(storage_adapter, "client"):
+        try:
+            params: dict[str, Any] = {"Bucket": Bucket}
+            if ACL is not None:
+                params["ACL"] = ACL
+            if AccessControlPolicy is not None:
+                params["AccessControlPolicy"] = AccessControlPolicy
+            if GrantFullControl is not None:
+                params["GrantFullControl"] = GrantFullControl
+            if GrantRead is not None:
+                params["GrantRead"] = GrantRead
+            if GrantReadACP is not None:
+                params["GrantReadACP"] = GrantReadACP
+            if GrantWrite is not None:
+                params["GrantWrite"] = GrantWrite
+            if GrantWriteACP is not None:
+                params["GrantWriteACP"] = GrantWriteACP
+
+            storage_adapter.client.put_bucket_acl(**params)
+            return {
+                "ResponseMetadata": {
+                    "HTTPStatusCode": 200,
+                },
+            }
+        except Exception as e:
+            raise RuntimeError(f"Failed to set bucket ACL: {e}") from e
+    else:
+        raise NotImplementedError("Storage adapter does not support bucket ACL operations")
+
+
+def get_bucket_acl(
+    client: Any,  # DeltaGliderClient (avoiding circular import)
+    Bucket: str,
+    **kwargs: Any,
+) -> dict[str, Any]:
+    """Get the ACL for an S3 bucket (boto3-compatible passthrough).
+
+    Args:
+        client: DeltaGliderClient instance
+        Bucket: Bucket name
+        **kwargs: Additional S3 parameters (for compatibility)
+
+    Returns:
+        Response dict with Owner and Grants
+
+    Example:
+        >>> client = create_client()
+        >>> response = client.get_bucket_acl(Bucket='my-bucket')
+        >>> print(response['Owner'])
+        >>> print(response['Grants'])
+    """
+    storage_adapter = client.service.storage
+
+    if hasattr(storage_adapter, "client"):
+        try:
+            response: dict[str, Any] = storage_adapter.client.get_bucket_acl(Bucket=Bucket)
+            return response
+        except Exception as e:
+            raise RuntimeError(f"Failed to get bucket ACL: {e}") from e
+    else:
+        raise NotImplementedError("Storage adapter does not support bucket ACL operations")
--- a/src/deltaglider/client_operations/stats.py
+++ b/src/deltaglider/client_operations/stats.py
@@ -7,11 +7,521 @@ This module contains DeltaGlider-specific statistics operations:
 - find_similar_files
 """

+import concurrent.futures
+import json
 import re
+from dataclasses import asdict
+from datetime import UTC, datetime
 from pathlib import Path
-from typing import Any
+from typing import Any, Literal

 from ..client_models import BucketStats, CompressionEstimate, ObjectInfo
+from ..core.delta_extensions import is_delta_candidate
+from ..core.object_listing import list_all_objects
+from ..core.s3_uri import parse_s3_url
+
+StatsMode = Literal["quick", "sampled", "detailed"]
+
+# Cache configuration
+CACHE_VERSION = "1.0"
+CACHE_PREFIX = ".deltaglider"
+
+# Listing limits (prevent runaway scans on gigantic buckets)
+QUICK_LIST_LIMIT = 60_000
+SAMPLED_LIST_LIMIT = 30_000
+
+# ============================================================================
+# Internal Helper Functions
+# ============================================================================
+
+
+def _first_metadata_value(metadata: dict[str, Any], *keys: str) -> str | None:
+    """Return the first non-empty metadata value matching the provided keys."""
+    for key in keys:
+        value = metadata.get(key)
+        if value not in (None, ""):
+            return value
+    return None
+
+
+def _fetch_delta_metadata(
+    client: Any,
+    bucket: str,
+    delta_keys: list[str],
+    max_timeout: int = 600,
+) -> dict[str, dict[str, Any]]:
+    """Fetch metadata for delta files in parallel with timeout.
+
+    Args:
+        client: DeltaGliderClient instance
+        bucket: S3 bucket name
+        delta_keys: List of delta file keys
+        max_timeout: Maximum total timeout in seconds (default: 600 = 10 min)
+
+    Returns:
+        Dict mapping delta key -> metadata dict
+    """
+    metadata_map: dict[str, dict[str, Any]] = {}
+
+    if not delta_keys:
+        return metadata_map
+
+    client.service.logger.info(
+        f"Fetching metadata for {len(delta_keys)} delta files in parallel..."
+    )
+
+    def fetch_single_metadata(key: str) -> tuple[str, dict[str, Any] | None]:
+        try:
+            obj_head = client.service.storage.head(f"{bucket}/{key}")
+            if obj_head and obj_head.metadata:
+                return key, obj_head.metadata
+        except Exception as e:
+            client.service.logger.debug(f"Failed to fetch metadata for {key}: {e}")
+        return key, None
+
+    with concurrent.futures.ThreadPoolExecutor(max_workers=min(10, len(delta_keys))) as executor:
+        futures = [executor.submit(fetch_single_metadata, key) for key in delta_keys]
+
+        # Calculate timeout: 60s per file, capped at max_timeout
+        timeout_per_file = 60
+        total_timeout = min(len(delta_keys) * timeout_per_file, max_timeout)
+
+        try:
+            for future in concurrent.futures.as_completed(futures, timeout=total_timeout):
+                try:
+                    key, metadata = future.result(timeout=5)  # 5s per result
+                    if metadata:
+                        metadata_map[key] = metadata
+                except concurrent.futures.TimeoutError:
+                    client.service.logger.warning("Timeout fetching metadata for a delta file")
+                    continue
+        except concurrent.futures.TimeoutError:
+            client.service.logger.warning(
+                f"_fetch_delta_metadata: Timeout after {total_timeout}s. "
+                f"Fetched {len(metadata_map)}/{len(delta_keys)} metadata entries. "
+                f"Continuing with partial metadata..."
+            )
+            # Cancel remaining futures
+            for future in futures:
+                future.cancel()
+
+    return metadata_map
+
+
+def _extract_deltaspace(key: str) -> str:
+    """Return the delta space (prefix) for a given object key."""
+    if "/" in key:
+        return key.rsplit("/", 1)[0]
+    return ""
+
+
+def _get_cache_key(mode: StatsMode) -> str:
+    """Get the S3 key for a cache file based on mode.
+
+    Args:
+        mode: Stats mode (quick, sampled, or detailed)
+
+    Returns:
+        S3 key like ".deltaglider/stats_quick.json"
+    """
+    return f"{CACHE_PREFIX}/stats_{mode}.json"
+
+
+def _read_stats_cache(
+    client: Any,
+    bucket: str,
+    mode: StatsMode,
+) -> tuple[BucketStats | None, dict[str, Any] | None]:
+    """Read cached stats from S3 if available.
+
+    Args:
+        client: DeltaGliderClient instance
+        bucket: S3 bucket name
+        mode: Stats mode to read cache for
+
+    Returns:
+        Tuple of (BucketStats | None, validation_data | None)
+        Returns (None, None) if cache doesn't exist or is invalid
+    """
+    cache_key = _get_cache_key(mode)
+
+    try:
+        # Try to read cache file from S3
+        obj = client.service.storage.get(f"{bucket}/{cache_key}")
+        if not obj or not obj.data:
+            return None, None
+
+        # Parse JSON
+        cache_data = json.loads(obj.data.decode("utf-8"))
+
+        # Validate version
+        if cache_data.get("version") != CACHE_VERSION:
+            client.service.logger.warning(
+                f"Cache version mismatch: expected {CACHE_VERSION}, got {cache_data.get('version')}"
+            )
+            return None, None
+
+        # Validate mode
+        if cache_data.get("mode") != mode:
+            client.service.logger.warning(
+                f"Cache mode mismatch: expected {mode}, got {cache_data.get('mode')}"
+            )
+            return None, None
+
+        # Extract stats and validation data
+        stats_dict = cache_data.get("stats")
+        validation_data = cache_data.get("validation")
+
+        if not stats_dict or not validation_data:
+            client.service.logger.warning("Cache missing stats or validation data")
+            return None, None
+
+        # Reconstruct BucketStats from dict
+        stats = BucketStats(**stats_dict)
+
+        client.service.logger.debug(
+            f"Successfully read cache for {bucket} (mode={mode}, "
+            f"computed_at={cache_data.get('computed_at')})"
+        )
+
+        return stats, validation_data
+
+    except FileNotFoundError:
+        # Cache doesn't exist yet - this is normal
+        client.service.logger.debug(f"No cache found for {bucket} (mode={mode})")
+        return None, None
+    except json.JSONDecodeError as e:
+        client.service.logger.warning(f"Invalid JSON in cache file: {e}")
+        return None, None
+    except Exception as e:
+        client.service.logger.warning(f"Error reading cache: {e}")
+        return None, None
+
+
+def _write_stats_cache(
+    client: Any,
+    bucket: str,
+    mode: StatsMode,
+    stats: BucketStats,
+    object_count: int,
+    compressed_size: int,
+) -> None:
+    """Write computed stats to S3 cache.
+
+    Args:
+        client: DeltaGliderClient instance
+        bucket: S3 bucket name
+        mode: Stats mode being cached
+        stats: Computed BucketStats to cache
+        object_count: Current object count (for validation)
+        compressed_size: Current compressed size (for validation)
+    """
+    cache_key = _get_cache_key(mode)
+
+    try:
+        # Build cache structure
+        cache_data = {
+            "version": CACHE_VERSION,
+            "mode": mode,
+            "computed_at": datetime.now(UTC).isoformat(),
+            "validation": {
+                "object_count": object_count,
+                "compressed_size": compressed_size,
+            },
+            "stats": asdict(stats),
+        }
+
+        # Serialize to JSON
+        cache_json = json.dumps(cache_data, indent=2)
+
+        # Write to S3
+        client.service.storage.put(
+            address=f"{bucket}/{cache_key}",
+            data=cache_json.encode("utf-8"),
+            metadata={
+                "content-type": "application/json",
+                "x-deltaglider-cache": "true",
+            },
+        )
+
+        client.service.logger.info(
+            f"Wrote cache for {bucket} (mode={mode}, {len(cache_json)} bytes)"
+        )
+
+    except Exception as e:
+        # Log warning but don't fail - caching is optional
+        client.service.logger.warning(f"Failed to write cache (non-fatal): {e}")
+
+
+def _is_cache_valid(
+    cached_validation: dict[str, Any],
+    current_object_count: int,
+    current_compressed_size: int,
+) -> bool:
+    """Check if cached stats are still valid based on bucket state.
+
+    Validation strategy: Compare object count and total compressed size.
+    If either changed, the cache is stale.
+
+    Args:
+        cached_validation: Validation data from cache
+        current_object_count: Current object count from LIST
+        current_compressed_size: Current compressed size from LIST
+
+    Returns:
+        True if cache is still valid, False if stale
+    """
+    cached_count = cached_validation.get("object_count")
+    cached_size = cached_validation.get("compressed_size")
+
+    if cached_count != current_object_count:
+        return False
+
+    if cached_size != current_compressed_size:
+        return False
+
+    return True
+
+
+def _build_object_info_list(
+    raw_objects: list[dict[str, Any]],
+    metadata_map: dict[str, dict[str, Any]],
+    logger: Any,
+    sampled_space_metadata: dict[str, dict[str, Any]] | None = None,
+) -> list[ObjectInfo]:
+    """Build ObjectInfo list from raw objects and metadata.
+
+    Args:
+        raw_objects: List of raw object dicts from S3 LIST
+        metadata_map: Dict of key -> metadata for delta files
+        logger: Logger instance
+
+    Returns:
+        List of ObjectInfo objects
+    """
+    all_objects = []
+
+    for obj_dict in raw_objects:
+        key = obj_dict["key"]
+        size = obj_dict["size"]
+        is_delta = key.endswith(".delta")
+
+        deltaspace = _extract_deltaspace(key)
+
+        # Get metadata from map (empty dict if not present)
+        metadata = metadata_map.get(key)
+        if metadata is None and sampled_space_metadata and deltaspace in sampled_space_metadata:
+            metadata = sampled_space_metadata[deltaspace]
+        if metadata is None:
+            metadata = {}
+
+        # Parse compression ratio and original size
+        compression_ratio = 0.0
+        # For delta files without metadata, set original_size to None to indicate unknown
+        # This prevents nonsensical stats like "693 bytes compressed to 82MB"
+        original_size = None if is_delta else size
+
+        if is_delta and metadata:
+            try:
+                ratio_str = metadata.get("compression_ratio", "0.0")
+                compression_ratio = float(ratio_str) if ratio_str != "unknown" else 0.0
+            except (ValueError, TypeError):
+                compression_ratio = 0.0
+
+            try:
+                original_size_raw = _first_metadata_value(
+                    metadata,
+                    "dg-file-size",
+                    "dg_file_size",
+                    "file_size",
+                    "file-size",
+                    "deltaglider-original-size",
+                )
+                if original_size_raw is not None:
+                    original_size = int(original_size_raw)
+                    logger.debug(f"Delta {key}: using original_size={original_size} from metadata")
+                else:
+                    logger.warning(
+                        f"Delta {key}: metadata missing file size. Available keys: {list(metadata.keys())}. Using None as original_size (unknown)"
+                    )
+                    original_size = None
+            except (ValueError, TypeError) as e:
+                logger.warning(
+                    f"Delta {key}: failed to parse file size from metadata: {e}. Using None as original_size (unknown)"
+                )
+                original_size = None
+
+        all_objects.append(
+            ObjectInfo(
+                key=key,
+                size=size,
+                last_modified=obj_dict.get("last_modified", ""),
+                etag=obj_dict.get("etag"),
+                storage_class=obj_dict.get("storage_class", "STANDARD"),
+                original_size=original_size,
+                compressed_size=size,
+                is_delta=is_delta,
+                compression_ratio=compression_ratio,
+                reference_key=_first_metadata_value(
+                    metadata,
+                    "dg-ref-key",
+                    "dg_ref_key",
+                    "ref_key",
+                    "ref-key",
+                ),
+            )
+        )
+
+    return all_objects
+
+
+def _calculate_bucket_statistics(
+    all_objects: list[ObjectInfo],
+    bucket: str,
+    logger: Any,
+    mode: StatsMode = "quick",
+) -> BucketStats:
+    """Calculate statistics from ObjectInfo list.
+
+    Args:
+        all_objects: List of ObjectInfo objects
+        bucket: Bucket name for stats
+        logger: Logger instance
+        mode: Stats mode (quick, sampled, or detailed) - controls warning behavior
+
+    Returns:
+        BucketStats object
+    """
+    total_original_size = 0
+    total_compressed_size = 0
+    delta_count = 0
+    direct_count = 0
+    reference_files = {}  # deltaspace -> size
+
+    # First pass: identify object types and reference files
+    for obj in all_objects:
+        if obj.key.endswith("/reference.bin") or obj.key == "reference.bin":
+            deltaspace = obj.key.rsplit("/reference.bin", 1)[0] if "/" in obj.key else ""
+            reference_files[deltaspace] = obj.size
+        elif obj.is_delta:
+            delta_count += 1
+        else:
+            direct_count += 1
+
+    # Second pass: calculate sizes
+    for obj in all_objects:
+        # Skip reference.bin (handled separately)
+        if obj.key.endswith("/reference.bin") or obj.key == "reference.bin":
+            continue
+
+        if obj.is_delta:
+            # Delta: use original_size if available
+            if obj.original_size is not None:
+                logger.debug(f"Delta {obj.key}: using original_size={obj.original_size}")
+                total_original_size += obj.original_size
+            else:
+                # original_size is None - metadata not available
+                # In quick mode, this is expected (no HEAD requests)
+                # In sampled/detailed mode, this means metadata is genuinely missing
+                if mode != "quick":
+                    logger.warning(
+                        f"Delta {obj.key}: no original_size metadata available. "
+                        f"Cannot calculate original size without metadata. "
+                        f"Use --detailed mode for accurate stats."
+                    )
+                # Don't add anything to total_original_size for deltas without metadata
+                # This prevents nonsensical stats
+            total_compressed_size += obj.size
+        else:
+            # Direct files: original = compressed
+            total_original_size += obj.size
+            total_compressed_size += obj.size
+
+    # Handle reference.bin files
+    total_reference_size = sum(reference_files.values())
+
+    if delta_count > 0 and total_reference_size > 0:
+        total_compressed_size += total_reference_size
+        logger.info(
+            f"Including {len(reference_files)} reference.bin file(s) "
+            f"({total_reference_size:,} bytes) in compressed size"
+        )
+    elif delta_count == 0 and total_reference_size > 0:
+        _log_orphaned_references(bucket, reference_files, total_reference_size, logger)
+
+    # Calculate final metrics
+    # If we couldn't calculate original size (quick mode with deltas), set space_saved to 0
+    # to avoid nonsensical negative numbers
+    if total_original_size == 0 and total_compressed_size > 0:
+        space_saved = 0
+        avg_ratio = 0.0
+    else:
+        raw_space_saved = total_original_size - total_compressed_size
+        space_saved = raw_space_saved if raw_space_saved > 0 else 0
+        avg_ratio = (space_saved / total_original_size) if total_original_size > 0 else 0.0
+        if avg_ratio < 0:
+            avg_ratio = 0.0
+        elif avg_ratio > 1:
+            avg_ratio = 1.0
+
+    # Warn if quick mode with delta files (stats will be incomplete)
+    if mode == "quick" and delta_count > 0 and total_original_size == 0:
+        logger.warning(
+            f"Quick mode cannot calculate original size for delta files (no metadata fetched). "
+            f"Stats show {delta_count} delta file(s) with unknown original size. "
+            f"Use --detailed for accurate compression metrics."
+        )
+
+    return BucketStats(
+        bucket=bucket,
+        object_count=delta_count + direct_count,
+        total_size=total_original_size,
+        compressed_size=total_compressed_size,
+        space_saved=space_saved,
+        average_compression_ratio=avg_ratio,
+        delta_objects=delta_count,
+        direct_objects=direct_count,
+    )
+
+
+def _log_orphaned_references(
+    bucket: str,
+    reference_files: dict[str, int],
+    total_reference_size: int,
+    logger: Any,
+) -> None:
+    """Log warning about orphaned reference.bin files.
+
+    Args:
+        bucket: Bucket name
+        reference_files: Dict of deltaspace -> size
+        total_reference_size: Total size of all reference files
+        logger: Logger instance
+    """
+    waste_mb = total_reference_size / 1024 / 1024
+    logger.warning(
+        f"\n{'=' * 60}\n"
+        f"WARNING: ORPHANED REFERENCE FILE(S) DETECTED!\n"
+        f"{'=' * 60}\n"
+        f"Found {len(reference_files)} reference.bin file(s) totaling "
+        f"{total_reference_size:,} bytes ({waste_mb:.2f} MB)\n"
+        f"but NO delta files are using them.\n"
+        f"\n"
+        f"This wastes {waste_mb:.2f} MB of storage!\n"
+        f"\n"
+        f"Orphaned reference files:\n"
+    )
+
+    for deltaspace, size in reference_files.items():
+        path = f"{deltaspace}/reference.bin" if deltaspace else "reference.bin"
+        logger.warning(f"  - s3://{bucket}/{path} ({size:,} bytes)")
+
+    logger.warning("\nConsider removing these orphaned files:\n")
+    for deltaspace in reference_files:
+        path = f"{deltaspace}/reference.bin" if deltaspace else "reference.bin"
+        logger.warning(f"  aws s3 rm s3://{bucket}/{path}")
+
+    logger.warning(f"{'=' * 60}")


 def get_object_info(
@@ -27,14 +537,9 @@ def get_object_info(
    Returns:
        ObjectInfo with detailed metadata
    """
-    # Parse URL
-    if not s3_url.startswith("s3://"):
-        raise ValueError(f"Invalid S3 URL: {s3_url}")
-
-    s3_path = s3_url[5:]
-    parts = s3_path.split("/", 1)
-    bucket = parts[0]
-    key = parts[1] if len(parts) > 1 else ""
+    address = parse_s3_url(s3_url, allow_empty_key=False)
+    bucket = address.bucket
+    key = address.key

    # Get object metadata
    obj_head = client.service.storage.head(f"{bucket}/{key}")
@@ -60,116 +565,290 @@ def get_object_info(
 def get_bucket_stats(
    client: Any,  # DeltaGliderClient
    bucket: str,
-    detailed_stats: bool = False,
+    mode: StatsMode = "quick",
+    use_cache: bool = True,
+    refresh_cache: bool = False,
 ) -> BucketStats:
-    """Get statistics for a bucket with optional detailed compression metrics.
+    """Get statistics for a bucket with configurable metadata strategies and caching.

-    This method provides two modes:
-    - Quick stats (default): Fast overview using LIST only (~50ms)
-    - Detailed stats: Accurate compression metrics with HEAD requests (slower)
+    Modes:
+    - ``quick`` (default): Stream LIST results only. Compression metrics for delta files are
+      approximate (falls back to delta size when metadata is unavailable).
+    - ``sampled``: Fetch HEAD metadata for a single delta per delta-space and reuse the ratios for
+      other deltas in the same space. Balances accuracy and speed.
+    - ``detailed``: Fetch HEAD metadata for every delta object for the most accurate statistics.
+
+    Caching:
+    - Stats are cached per mode in ``.deltaglider/stats_{mode}.json``
+    - Cache is validated using object count and compressed size from LIST
+    - If bucket changed, cache is recomputed automatically
+    - Use ``refresh_cache=True`` to force recomputation
+    - Use ``use_cache=False`` to skip caching entirely
+
+    **Robustness**: This function is designed to always return valid stats:
+    - Returns partial stats if timeouts or pagination issues occur
+    - Returns empty stats (zeros) if bucket listing completely fails
+    - Never hangs indefinitely (max 10 min timeout, 10M object limit)

    Args:
        client: DeltaGliderClient instance
        bucket: S3 bucket name
-        detailed_stats: If True, fetch accurate compression ratios for delta files (default: False)
+        mode: Stats mode ("quick", "sampled", or "detailed")
+        use_cache: If True, use cached stats when available (default: True)
+        refresh_cache: If True, force cache recomputation even if valid (default: False)

    Returns:
-        BucketStats with compression and space savings info
+        BucketStats with compression and space savings info. Always returns a valid BucketStats
+        object, even if errors occur (will return empty/partial stats with warnings logged).
+
+    Raises:
+        RuntimeError: Only if bucket listing fails immediately with no objects collected.
+                      All other errors result in partial/empty stats being returned.

    Performance:
-        - With detailed_stats=False: ~50ms for any bucket size (1 LIST call per 1000 objects)
-        - With detailed_stats=True: ~2-3s per 1000 objects (adds HEAD calls for delta files only)
+        - With cache hit: ~50-100ms (LIST + cache read + validation)
+        - quick (no cache): ~50ms for any bucket size (LIST calls only)
+        - sampled (no cache): LIST + one HEAD per delta-space
+        - detailed (no cache): LIST + HEAD for every delta (slowest but accurate)
+        - Max timeout: 10 minutes (prevents indefinite hangs)
+        - Max objects: 10M (prevents infinite loops)

    Example:
-        # Quick stats for dashboard display
+        # Use cached stats (fast, ~100ms)
        stats = client.get_bucket_stats('releases')
-        print(f"Objects: {stats.object_count}, Size: {stats.total_size}")

-        # Detailed stats for analytics (slower but accurate)
-        stats = client.get_bucket_stats('releases', detailed_stats=True)
-        print(f"Compression ratio: {stats.average_compression_ratio:.1%}")
+        # Force refresh (slow, recomputes everything)
+        stats = client.get_bucket_stats('releases', refresh_cache=True)
+
+        # Skip cache entirely
+        stats = client.get_bucket_stats('releases', use_cache=False)
+
+        # Different modes with caching
+        stats_sampled = client.get_bucket_stats('releases', mode='sampled')
+        stats_detailed = client.get_bucket_stats('releases', mode='detailed')
    """
-    # List all objects with smart metadata fetching
-    all_objects = []
-    continuation_token = None
+    try:
+        if mode not in {"quick", "sampled", "detailed"}:
+            raise ValueError(f"Unknown stats mode: {mode}")

-    while True:
-        response = client.list_objects(
-            Bucket=bucket,
-            MaxKeys=1000,
-            ContinuationToken=continuation_token,
-            FetchMetadata=detailed_stats,  # Only fetch metadata if detailed stats requested
+        # Phase 1: Always do a quick LIST to get current state (needed for validation)
+        import time
+
+        phase1_start = time.time()
+        client.service.logger.info(
+            f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 1: Starting LIST operation for bucket '{bucket}'"
        )

-        # Extract S3Objects from response (with Metadata containing DeltaGlider info)
-        for obj_dict in response["Contents"]:
-            # Convert dict back to ObjectInfo for backward compatibility with stats calculation
-            metadata = obj_dict.get("Metadata", {})
-            # Parse compression ratio safely (handle "unknown" value)
-            compression_ratio_str = metadata.get("deltaglider-compression-ratio", "0.0")
-            try:
-                compression_ratio = (
-                    float(compression_ratio_str) if compression_ratio_str != "unknown" else 0.0
-                )
-            except ValueError:
-                compression_ratio = 0.0
+        list_cap = QUICK_LIST_LIMIT if mode == "quick" else SAMPLED_LIST_LIMIT
+        listing = list_all_objects(
+            client.service.storage,
+            bucket=bucket,
+            max_keys=1000,
+            logger=client.service.logger,
+            max_objects=list_cap,
+        )
+        raw_objects = listing.objects

-            all_objects.append(
-                ObjectInfo(
-                    key=obj_dict["Key"],
-                    size=obj_dict["Size"],
-                    last_modified=obj_dict.get("LastModified", ""),
-                    etag=obj_dict.get("ETag"),
-                    storage_class=obj_dict.get("StorageClass", "STANDARD"),
-                    original_size=int(metadata.get("deltaglider-original-size", obj_dict["Size"])),
-                    compressed_size=obj_dict["Size"],
-                    is_delta=metadata.get("deltaglider-is-delta", "false") == "true",
-                    compression_ratio=compression_ratio,
-                    reference_key=metadata.get("deltaglider-reference-key"),
-                )
+        # Calculate validation metrics from LIST
+        current_object_count = len(raw_objects)
+        current_compressed_size = sum(obj["size"] for obj in raw_objects)
+        limit_reached = listing.limit_reached or listing.is_truncated
+        if limit_reached:
+            client.service.logger.info(
+                f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 1: Listing capped at {list_cap} objects (bucket likely larger)."
            )

-        if not response.get("IsTruncated"):
-            break
+        phase1_duration = time.time() - phase1_start
+        client.service.logger.info(
+            f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 1: LIST completed in {phase1_duration:.2f}s - "
+            f"Found {current_object_count} objects, {current_compressed_size:,} bytes total"
+        )

-        continuation_token = response.get("NextContinuationToken")
+        # Phase 2: Try to use cache if enabled and not forcing refresh
+        phase2_start = time.time()
+        if use_cache and not refresh_cache:
+            client.service.logger.info(
+                f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 2: Checking cache for mode '{mode}'"
+            )
+            cached_stats, cached_validation = _read_stats_cache(client, bucket, mode)

-    # Calculate statistics
-    total_size = 0
-    compressed_size = 0
-    delta_count = 0
-    direct_count = 0
-
-    for obj in all_objects:
-        # Skip reference.bin files - they are internal implementation details
-        # and their size is already accounted for in delta metadata
-        if obj.key.endswith("/reference.bin") or obj.key == "reference.bin":
-            continue
-
-        compressed_size += obj.size
-
-        if obj.is_delta:
-            delta_count += 1
-            # Use actual original size if we have it, otherwise estimate
-            total_size += obj.original_size or obj.size
+            if cached_stats and cached_validation:
+                # Validate cache against current bucket state
+                if _is_cache_valid(
+                    cached_validation, current_object_count, current_compressed_size
+                ):
+                    phase2_duration = time.time() - phase2_start
+                    client.service.logger.info(
+                        f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 2: Cache HIT in {phase2_duration:.2f}s - "
+                        f"Using cached stats for {bucket} (mode={mode}, bucket unchanged)"
+                    )
+                    return cached_stats
+                else:
+                    phase2_duration = time.time() - phase2_start
+                    client.service.logger.info(
+                        f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 2: Cache INVALID in {phase2_duration:.2f}s - "
+                        f"Bucket changed: count {cached_validation.get('object_count')} → {current_object_count}, "
+                        f"size {cached_validation.get('compressed_size')} → {current_compressed_size}"
+                    )
+            else:
+                phase2_duration = time.time() - phase2_start
+                client.service.logger.info(
+                    f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 2: Cache MISS in {phase2_duration:.2f}s - "
+                    f"No valid cache found"
+                )
        else:
-            direct_count += 1
-            # For non-delta files, original equals compressed
-            total_size += obj.size
+            if refresh_cache:
+                client.service.logger.info(
+                    f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 2: Cache SKIPPED (refresh requested)"
+                )
+            elif not use_cache:
+                client.service.logger.info(
+                    f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 2: Cache DISABLED"
+                )

-    space_saved = total_size - compressed_size
-    avg_ratio = (space_saved / total_size) if total_size > 0 else 0.0
+        # Phase 3: Cache miss or invalid - compute stats from scratch
+        client.service.logger.info(
+            f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 3: Computing stats (mode={mode})"
+        )

-    return BucketStats(
-        bucket=bucket,
-        object_count=len(all_objects),
-        total_size=total_size,
-        compressed_size=compressed_size,
-        space_saved=space_saved,
-        average_compression_ratio=avg_ratio,
-        delta_objects=delta_count,
-        direct_objects=direct_count,
-    )
+        # Phase 4: Extract delta keys for metadata fetching
+        phase4_start = time.time()
+        delta_keys = [obj["key"] for obj in raw_objects if obj["key"].endswith(".delta")]
+        phase4_duration = time.time() - phase4_start
+
+        client.service.logger.info(
+            f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 4: Delta extraction completed in {phase4_duration:.3f}s - "
+            f"Found {len(delta_keys)} delta files"
+        )
+
+        # Phase 5: Fetch metadata for delta files based on mode
+        phase5_start = time.time()
+        metadata_map: dict[str, dict[str, Any]] = {}
+        sampled_space_metadata: dict[str, dict[str, Any]] | None = None
+
+        if delta_keys:
+            if mode == "detailed":
+                client.service.logger.info(
+                    f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 5: Fetching metadata for ALL {len(delta_keys)} delta files"
+                )
+                metadata_map = _fetch_delta_metadata(client, bucket, delta_keys)
+
+            elif mode == "sampled":
+                # Sample one delta per deltaspace
+                seen_spaces: set[str] = set()
+                sampled_keys: list[str] = []
+                for key in delta_keys:
+                    space = _extract_deltaspace(key)
+                    if space not in seen_spaces:
+                        seen_spaces.add(space)
+                        sampled_keys.append(key)
+
+                client.service.logger.info(
+                    f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 5: Sampling {len(sampled_keys)} delta files "
+                    f"(one per deltaspace) out of {len(delta_keys)} total delta files"
+                )
+
+                # Log which files are being sampled
+                if sampled_keys:
+                    for idx, key in enumerate(sampled_keys[:10], 1):  # Show first 10
+                        space = _extract_deltaspace(key)
+                        client.service.logger.info(
+                            f"  [{idx}] Sampling: {key} (deltaspace: '{space or '(root)'}')"
+                        )
+                    if len(sampled_keys) > 10:
+                        client.service.logger.info(f"  ... and {len(sampled_keys) - 10} more")
+
+                if sampled_keys:
+                    metadata_map = _fetch_delta_metadata(client, bucket, sampled_keys)
+                    sampled_space_metadata = {
+                        _extract_deltaspace(k): metadata for k, metadata in metadata_map.items()
+                    }
+
+        phase5_duration = time.time() - phase5_start
+        if mode == "quick":
+            client.service.logger.info(
+                f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 5: Skipped metadata fetching (quick mode) in {phase5_duration:.3f}s"
+            )
+        else:
+            client.service.logger.info(
+                f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 5: Metadata fetching completed in {phase5_duration:.2f}s - "
+                f"Fetched {len(metadata_map)} metadata records"
+            )
+
+        # Phase 6: Build ObjectInfo list
+        phase6_start = time.time()
+        all_objects = _build_object_info_list(
+            raw_objects,
+            metadata_map,
+            client.service.logger,
+            sampled_space_metadata,
+        )
+        phase6_duration = time.time() - phase6_start
+        client.service.logger.info(
+            f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 6: ObjectInfo list built in {phase6_duration:.3f}s - "
+            f"{len(all_objects)} objects processed"
+        )
+
+        # Phase 7: Calculate final statistics
+        phase7_start = time.time()
+        stats = _calculate_bucket_statistics(all_objects, bucket, client.service.logger, mode)
+        phase7_duration = time.time() - phase7_start
+        client.service.logger.info(
+            f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 7: Statistics calculated in {phase7_duration:.3f}s - "
+            f"{stats.delta_objects} delta, {stats.direct_objects} direct objects"
+        )
+
+        # Phase 8: Write cache if enabled
+        phase8_start = time.time()
+        if use_cache:
+            _write_stats_cache(
+                client=client,
+                bucket=bucket,
+                mode=mode,
+                stats=stats,
+                object_count=current_object_count,
+                compressed_size=current_compressed_size,
+            )
+            phase8_duration = time.time() - phase8_start
+            client.service.logger.info(
+                f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 8: Cache written in {phase8_duration:.3f}s"
+            )
+        else:
+            client.service.logger.info(
+                f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] Phase 8: Cache write skipped (caching disabled)"
+            )
+
+        # Summary
+        total_duration = time.time() - phase1_start
+        client.service.logger.info(
+            f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}] COMPLETE: Total time {total_duration:.2f}s for bucket '{bucket}' (mode={mode})"
+        )
+
+        stats.object_limit_reached = limit_reached
+        return stats
+
+    except Exception as e:
+        # Last resort: return empty stats with error indication
+        client.service.logger.error(
+            f"get_bucket_stats: Failed to build statistics for '{bucket}': {e}. "
+            f"Returning empty stats."
+        )
+        return BucketStats(
+            bucket=bucket,
+            object_count=0,
+            total_size=0,
+            compressed_size=0,
+            space_saved=0,
+            average_compression_ratio=0.0,
+            delta_objects=0,
+            direct_objects=0,
+            object_limit_reached=False,
+        )
+
+
+# ============================================================================
+# Public API Functions
+# ============================================================================


 def estimate_compression(
@@ -194,30 +873,8 @@ def estimate_compression(
    file_path = Path(file_path)
    file_size = file_path.stat().st_size

-    # Check file extension
+    filename = file_path.name
    ext = file_path.suffix.lower()
-    delta_extensions = {
-        ".zip",
-        ".tar",
-        ".gz",
-        ".tar.gz",
-        ".tgz",
-        ".bz2",
-        ".tar.bz2",
-        ".xz",
-        ".tar.xz",
-        ".7z",
-        ".rar",
-        ".dmg",
-        ".iso",
-        ".pkg",
-        ".deb",
-        ".rpm",
-        ".apk",
-        ".jar",
-        ".war",
-        ".ear",
-    }

    # Already compressed formats that won't benefit from delta
    incompressible = {".jpg", ".jpeg", ".png", ".mp4", ".mp3", ".avi", ".mov"}
@@ -231,7 +888,7 @@ def estimate_compression(
            should_use_delta=False,
        )

-    if ext not in delta_extensions:
+    if not is_delta_candidate(filename):
        # Unknown type, conservative estimate
        return CompressionEstimate(
            original_size=file_size,
--- a/src/deltaglider/core/init.py
+++ b/src/deltaglider/core/init.py
@@ -1,5 +1,10 @@
 """Core domain for DeltaGlider."""

+from .delta_extensions import (
+    DEFAULT_COMPOUND_DELTA_EXTENSIONS,
+    DEFAULT_DELTA_EXTENSIONS,
+    is_delta_candidate,
+)
 from .errors import (
    DeltaGliderError,
    DiffDecodeError,
@@ -11,14 +16,17 @@ from .errors import (
    StorageIOError,
 )
 from .models import (
+    DeleteResult,
    DeltaMeta,
    DeltaSpace,
    ObjectKey,
    PutSummary,
+    RecursiveDeleteResult,
    ReferenceMeta,
    Sha256,
    VerifyResult,
 )
+from .s3_uri import S3Url, build_s3_url, is_s3_url, parse_s3_url
 from .service import DeltaService

 __all__ = [
@@ -30,12 +38,21 @@ __all__ = [
    "DiffDecodeError",
    "StorageIOError",
    "PolicyViolationWarning",
+    "DeleteResult",
    "DeltaSpace",
    "ObjectKey",
+    "RecursiveDeleteResult",
    "Sha256",
    "DeltaMeta",
    "ReferenceMeta",
    "PutSummary",
    "VerifyResult",
    "DeltaService",
+    "DEFAULT_DELTA_EXTENSIONS",
+    "DEFAULT_COMPOUND_DELTA_EXTENSIONS",
+    "is_delta_candidate",
+    "S3Url",
+    "build_s3_url",
+    "is_s3_url",
+    "parse_s3_url",
 ]
--- a/src/deltaglider/core/config.py
+++ b/src/deltaglider/core/config.py
@@ -0,0 +1,53 @@
+"""Centralized configuration for DeltaGlider."""
+
+import os
+from dataclasses import dataclass, field
+
+
+@dataclass(slots=True)
+class DeltaGliderConfig:
+    """All DeltaGlider configuration in one place.
+
+    Environment variables (all optional):
+        DG_MAX_RATIO:           Max delta/file ratio before falling back to direct storage.
+                                Range 0.0-1.0, default 0.5.
+        DG_LOG_LEVEL:           Logging level. Default "INFO".
+        DG_CACHE_BACKEND:       "filesystem" (default) or "memory".
+        DG_CACHE_MEMORY_SIZE_MB: Memory cache size in MB. Default 100.
+        DG_METRICS:             Metrics backend: "noop", "logging" (default), "cloudwatch".
+        DG_METRICS_NAMESPACE:   CloudWatch namespace. Default "DeltaGlider".
+    """
+
+    max_ratio: float = 0.5
+    log_level: str = "INFO"
+    cache_backend: str = "filesystem"
+    cache_memory_size_mb: int = 100
+    metrics_type: str = "logging"
+    metrics_namespace: str = "DeltaGlider"
+
+    # Connection params (typically passed by CLI, not env vars)
+    endpoint_url: str | None = field(default=None, repr=False)
+    region: str | None = None
+    profile: str | None = None
+
+    @classmethod
+    def from_env(
+        cls,
+        *,
+        log_level: str = "INFO",
+        endpoint_url: str | None = None,
+        region: str | None = None,
+        profile: str | None = None,
+    ) -> "DeltaGliderConfig":
+        """Build config from environment variables + explicit overrides."""
+        return cls(
+            max_ratio=float(os.environ.get("DG_MAX_RATIO", "0.5")),
+            log_level=os.environ.get("DG_LOG_LEVEL", log_level),
+            cache_backend=os.environ.get("DG_CACHE_BACKEND", "filesystem"),
+            cache_memory_size_mb=int(os.environ.get("DG_CACHE_MEMORY_SIZE_MB", "100")),
+            metrics_type=os.environ.get("DG_METRICS", "logging"),
+            metrics_namespace=os.environ.get("DG_METRICS_NAMESPACE", "DeltaGlider"),
+            endpoint_url=endpoint_url,
+            region=region,
+            profile=profile,
+        )
--- a/src/deltaglider/core/delta_extensions.py
+++ b/src/deltaglider/core/delta_extensions.py
@@ -0,0 +1,56 @@
+"""Shared delta compression extension policy."""
+
+from __future__ import annotations
+
+from collections.abc import Collection, Iterable
+
+# Compound extensions must be checked before simple suffix matching so that
+# multi-part archives like ".tar.gz" are handled correctly.
+DEFAULT_COMPOUND_DELTA_EXTENSIONS: tuple[str, ...] = (".tar.gz", ".tar.bz2", ".tar.xz")
+
+# Simple extensions that benefit from delta compression. Keep this structure
+# immutable so it can be safely reused across modules.
+DEFAULT_DELTA_EXTENSIONS: frozenset[str] = frozenset(
+    {
+        ".zip",
+        ".tar",
+        ".gz",
+        ".tgz",
+        ".bz2",
+        ".xz",
+        ".7z",
+        ".rar",
+        ".dmg",
+        ".iso",
+        ".pkg",
+        ".deb",
+        ".rpm",
+        ".apk",
+        ".jar",
+        ".war",
+        ".ear",
+    }
+)
+
+
+def is_delta_candidate(
+    filename: str,
+    *,
+    simple_extensions: Collection[str] = DEFAULT_DELTA_EXTENSIONS,
+    compound_extensions: Iterable[str] = DEFAULT_COMPOUND_DELTA_EXTENSIONS,
+) -> bool:
+    """Check if a filename should use delta compression based on extension."""
+    name_lower = filename.lower()
+
+    for ext in compound_extensions:
+        if name_lower.endswith(ext):
+            return True
+
+    return any(name_lower.endswith(ext) for ext in simple_extensions)
+
+
+__all__ = [
+    "DEFAULT_COMPOUND_DELTA_EXTENSIONS",
+    "DEFAULT_DELTA_EXTENSIONS",
+    "is_delta_candidate",
+]
--- a/src/deltaglider/core/models.py
+++ b/src/deltaglider/core/models.py
@@ -1,8 +1,80 @@
 """Core domain models."""

-from dataclasses import dataclass
+import logging
+from dataclasses import dataclass, field
 from datetime import datetime

+# Metadata key prefix for DeltaGlider
+# AWS S3 automatically adds 'x-amz-meta-' prefix, so our keys become 'x-amz-meta-dg-*'
+METADATA_PREFIX = "dg-"
+
+# Canonical metadata key aliases.
+# Each field maps to all known key formats (current prefixed, legacy underscore, legacy bare,
+# legacy hyphenated). Order matters: first match wins during lookup.
+# Both DeltaMeta.from_dict() and service-layer _meta_value() MUST use these to stay in sync.
+METADATA_KEY_ALIASES: dict[str, tuple[str, ...]] = {
+    "tool": (f"{METADATA_PREFIX}tool", "dg_tool", "tool"),
+    "original_name": (
+        f"{METADATA_PREFIX}original-name",
+        "dg_original_name",
+        "original_name",
+        "original-name",
+    ),
+    "file_sha256": (
+        f"{METADATA_PREFIX}file-sha256",
+        "dg_file_sha256",
+        "file_sha256",
+        "file-sha256",
+    ),
+    "file_size": (
+        f"{METADATA_PREFIX}file-size",
+        "dg_file_size",
+        "file_size",
+        "file-size",
+    ),
+    "created_at": (
+        f"{METADATA_PREFIX}created-at",
+        "dg_created_at",
+        "created_at",
+        "created-at",
+    ),
+    "ref_key": (f"{METADATA_PREFIX}ref-key", "dg_ref_key", "ref_key", "ref-key"),
+    "ref_sha256": (
+        f"{METADATA_PREFIX}ref-sha256",
+        "dg_ref_sha256",
+        "ref_sha256",
+        "ref-sha256",
+    ),
+    "delta_size": (
+        f"{METADATA_PREFIX}delta-size",
+        "dg_delta_size",
+        "delta_size",
+        "delta-size",
+    ),
+    "delta_cmd": (
+        f"{METADATA_PREFIX}delta-cmd",
+        "dg_delta_cmd",
+        "delta_cmd",
+        "delta-cmd",
+    ),
+    "note": (f"{METADATA_PREFIX}note", "dg_note", "note"),
+}
+
+
+def resolve_metadata(metadata: dict[str, str], field: str) -> str | None:
+    """Look up a metadata field using all known key aliases.
+
+    Returns the first non-empty match, or None if not found.
+    """
+    for key in METADATA_KEY_ALIASES[field]:
+        value = metadata.get(key)
+        if value not in (None, ""):
+            return value
+    return None
+
+
+logger = logging.getLogger(__name__)
+

@dataclass(frozen=True)
 class DeltaSpace:
@@ -23,6 +95,11 @@ class ObjectKey:
    bucket: str
    key: str

+    @property
+    def full_key(self) -> str:
+        """Full S3 path: bucket/key."""
+        return f"{self.bucket}/{self.key}"
+

@dataclass(frozen=True)
 class Sha256:
@@ -47,13 +124,13 @@ class ReferenceMeta:
    note: str = "reference"

    def to_dict(self) -> dict[str, str]:
-        """Convert to S3 metadata dict."""
+        """Convert to S3 metadata dict with DeltaGlider namespace prefix."""
        return {
-            "tool": self.tool,
-            "source_name": self.source_name,
-            "file_sha256": self.file_sha256,
-            "created_at": self.created_at.isoformat() + "Z",
-            "note": self.note,
+            f"{METADATA_PREFIX}tool": self.tool,
+            f"{METADATA_PREFIX}source-name": self.source_name,
+            f"{METADATA_PREFIX}file-sha256": self.file_sha256,
+            f"{METADATA_PREFIX}created-at": self.created_at.isoformat() + "Z",
+            f"{METADATA_PREFIX}note": self.note,
        }


@@ -73,36 +150,79 @@ class DeltaMeta:
    note: str | None = None

    def to_dict(self) -> dict[str, str]:
-        """Convert to S3 metadata dict."""
+        """Convert to S3 metadata dict with DeltaGlider namespace prefix."""
        meta = {
-            "tool": self.tool,
-            "original_name": self.original_name,
-            "file_sha256": self.file_sha256,
-            "file_size": str(self.file_size),
-            "created_at": self.created_at.isoformat() + "Z",
-            "ref_key": self.ref_key,
-            "ref_sha256": self.ref_sha256,
-            "delta_size": str(self.delta_size),
-            "delta_cmd": self.delta_cmd,
+            f"{METADATA_PREFIX}tool": self.tool,
+            f"{METADATA_PREFIX}original-name": self.original_name,
+            f"{METADATA_PREFIX}file-sha256": self.file_sha256,
+            f"{METADATA_PREFIX}file-size": str(self.file_size),
+            f"{METADATA_PREFIX}created-at": self.created_at.isoformat() + "Z",
+            f"{METADATA_PREFIX}ref-key": self.ref_key,
+            f"{METADATA_PREFIX}ref-sha256": self.ref_sha256,
+            f"{METADATA_PREFIX}delta-size": str(self.delta_size),
+            f"{METADATA_PREFIX}delta-cmd": self.delta_cmd,
        }
        if self.note:
-            meta["note"] = self.note
+            meta[f"{METADATA_PREFIX}note"] = self.note
        return meta

    @classmethod
    def from_dict(cls, data: dict[str, str]) -> "DeltaMeta":
-        """Create from S3 metadata dict."""
+        """Create from S3 metadata dict with DeltaGlider namespace prefix."""
+
+        def _require(field: str) -> str:
+            value = resolve_metadata(data, field)
+            if value is None:
+                raise KeyError(METADATA_KEY_ALIASES[field][0])
+            return value
+
+        tool = _require("tool")
+        original_name = _require("original_name")
+        file_sha = _require("file_sha256")
+        file_size_raw = _require("file_size")
+        created_at_raw = _require("created_at")
+        ref_key = _require("ref_key")
+        ref_sha = _require("ref_sha256")
+        delta_size_raw = _require("delta_size")
+        delta_cmd_value = resolve_metadata(data, "delta_cmd") or ""
+        note_value = resolve_metadata(data, "note") or ""
+
+        try:
+            file_size = int(file_size_raw)
+        except (TypeError, ValueError):
+            raise ValueError(f"Invalid file size metadata: {file_size_raw}") from None
+
+        try:
+            delta_size = int(delta_size_raw)
+        except (TypeError, ValueError):
+            raise ValueError(f"Invalid delta size metadata: {delta_size_raw}") from None
+
+        created_at_text = created_at_raw.rstrip("Z")
+        try:
+            created_at = datetime.fromisoformat(created_at_text)
+        except ValueError as exc:
+            raise ValueError(f"Invalid created_at metadata: {created_at_raw}") from exc
+
+        if not delta_cmd_value:
+            object_name = original_name or "<unknown>"
+            logger.warning(
+                "Delta metadata missing %s for %s; using empty command",
+                f"{METADATA_PREFIX}delta-cmd",
+                object_name,
+            )
+            delta_cmd_value = ""
+
        return cls(
-            tool=data["tool"],
-            original_name=data["original_name"],
-            file_sha256=data["file_sha256"],
-            file_size=int(data["file_size"]),
-            created_at=datetime.fromisoformat(data["created_at"].rstrip("Z")),
-            ref_key=data["ref_key"],
-            ref_sha256=data["ref_sha256"],
-            delta_size=int(data["delta_size"]),
-            delta_cmd=data["delta_cmd"],
-            note=data.get("note"),
+            tool=tool,
+            original_name=original_name,
+            file_sha256=file_sha,
+            file_size=file_size,
+            created_at=created_at,
+            ref_key=ref_key,
+            ref_sha256=ref_sha,
+            delta_size=delta_size,
+            delta_cmd=delta_cmd_value,
+            note=note_value or None,
        )


@@ -131,3 +251,33 @@ class VerifyResult:
    expected_sha256: str
    actual_sha256: str
    message: str
+
+
+@dataclass
+class DeleteResult:
+    """Result of a single delete operation."""
+
+    key: str
+    bucket: str
+    deleted: bool = False
+    type: str = "unknown"
+    warnings: list[str] = field(default_factory=list)
+    original_name: str | None = None
+    dependent_deltas: int = 0
+    cleaned_reference: str | None = None
+
+
+@dataclass
+class RecursiveDeleteResult:
+    """Result of a recursive delete operation."""
+
+    bucket: str
+    prefix: str
+    deleted_count: int = 0
+    failed_count: int = 0
+    deltas_deleted: int = 0
+    references_deleted: int = 0
+    direct_deleted: int = 0
+    other_deleted: int = 0
+    errors: list[str] = field(default_factory=list)
+    warnings: list[str] = field(default_factory=list)
--- a/src/deltaglider/core/object_listing.py
+++ b/src/deltaglider/core/object_listing.py
@@ -0,0 +1,222 @@
+"""Shared helpers for listing bucket objects with pagination support."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from typing import Any
+
+from ..ports.storage import ObjectHead
+
+
+@dataclass(slots=True)
+class ObjectListing:
+    """All objects and prefixes returned from a bucket listing."""
+
+    objects: list[dict[str, Any]] = field(default_factory=list)
+    common_prefixes: list[str] = field(default_factory=list)
+    key_count: int = 0
+    is_truncated: bool = False
+    next_continuation_token: str | None = None
+    limit_reached: bool = False
+
+
+def list_objects_page(
+    storage: Any,
+    *,
+    bucket: str,
+    prefix: str = "",
+    delimiter: str = "",
+    max_keys: int = 1000,
+    start_after: str | None = None,
+    continuation_token: str | None = None,
+) -> ObjectListing:
+    """Perform a single list_objects call using the storage adapter."""
+    if not hasattr(storage, "list_objects"):
+        raise NotImplementedError("Storage adapter does not support list_objects")
+
+    response = storage.list_objects(
+        bucket=bucket,
+        prefix=prefix,
+        delimiter=delimiter,
+        max_keys=max_keys,
+        start_after=start_after,
+        continuation_token=continuation_token,
+    )
+
+    return ObjectListing(
+        objects=list(response.get("objects", [])),
+        common_prefixes=list(response.get("common_prefixes", [])),
+        key_count=response.get("key_count", len(response.get("objects", []))),
+        is_truncated=bool(response.get("is_truncated", False)),
+        next_continuation_token=response.get("next_continuation_token"),
+    )
+
+
+def list_all_objects(
+    storage: Any,
+    *,
+    bucket: str,
+    prefix: str = "",
+    delimiter: str = "",
+    max_keys: int = 1000,
+    logger: Any | None = None,
+    max_iterations: int = 10_000,
+    max_objects: int | None = None,
+) -> ObjectListing:
+    """Fetch all objects under the given bucket/prefix with pagination safety."""
+    import time
+    from datetime import UTC, datetime
+
+    aggregated = ObjectListing()
+    continuation_token: str | None = None
+    iteration_count = 0
+    list_start_time = time.time()
+    limit_reached = False
+
+    while True:
+        iteration_count += 1
+        if iteration_count > max_iterations:
+            if logger:
+                logger.warning(
+                    "list_all_objects: reached max iterations (%s). Returning partial results.",
+                    max_iterations,
+                )
+            aggregated.is_truncated = True
+            aggregated.next_continuation_token = continuation_token
+            break
+
+        # Log progress every 10 pages or on first page
+        if logger and (iteration_count == 1 or iteration_count % 10 == 0):
+            elapsed = time.time() - list_start_time
+            objects_per_sec = len(aggregated.objects) / elapsed if elapsed > 0 else 0
+            token_info = f", token={continuation_token[:20]}..." if continuation_token else ""
+            logger.info(
+                f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}]   LIST pagination: "
+                f"page {iteration_count}, {len(aggregated.objects)} objects so far "
+                f"({objects_per_sec:.0f} obj/s, {elapsed:.1f}s elapsed{token_info})"
+            )
+
+            # Warn if taking very long (>60s)
+            if elapsed > 60 and iteration_count % 50 == 0:
+                estimated_total = (len(aggregated.objects) / iteration_count) * max_iterations
+                logger.warning(
+                    f"LIST operation is slow ({elapsed:.0f}s elapsed). "
+                    f"This bucket has MANY objects ({len(aggregated.objects)} so far). "
+                    f"Consider using a smaller prefix or enabling caching. "
+                    f"Estimated remaining: {estimated_total - len(aggregated.objects):.0f} objects"
+                )
+
+        try:
+            page = list_objects_page(
+                storage,
+                bucket=bucket,
+                prefix=prefix,
+                delimiter=delimiter,
+                max_keys=max_keys,
+                continuation_token=continuation_token,
+            )
+        except Exception as exc:
+            if not aggregated.objects:
+                raise RuntimeError(f"Failed to list objects for bucket '{bucket}': {exc}") from exc
+            if logger:
+                logger.warning(
+                    "list_all_objects: pagination error after %s objects: %s. Returning partial results.",
+                    len(aggregated.objects),
+                    exc,
+                )
+            aggregated.is_truncated = True
+            aggregated.next_continuation_token = continuation_token
+            break
+
+        aggregated.objects.extend(page.objects)
+        aggregated.common_prefixes.extend(page.common_prefixes)
+        aggregated.key_count += page.key_count
+
+        if max_objects is not None and len(aggregated.objects) >= max_objects:
+            if logger:
+                logger.info(
+                    f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}]   LIST capped at {max_objects} objects."
+                )
+            aggregated.objects = aggregated.objects[:max_objects]
+            aggregated.key_count = len(aggregated.objects)
+            aggregated.is_truncated = True
+            aggregated.next_continuation_token = page.next_continuation_token
+            limit_reached = True
+            break
+
+        if not page.is_truncated:
+            aggregated.is_truncated = False
+            aggregated.next_continuation_token = None
+            if logger:
+                elapsed = time.time() - list_start_time
+                logger.info(
+                    f"[{datetime.now(UTC).strftime('%H:%M:%S.%f')[:-3]}]   LIST complete: "
+                    f"{iteration_count} pages, {len(aggregated.objects)} objects total in {elapsed:.2f}s"
+                )
+            break
+
+        continuation_token = page.next_continuation_token
+        if not continuation_token:
+            if logger:
+                logger.warning(
+                    "list_all_objects: truncated response without continuation token after %s objects.",
+                    len(aggregated.objects),
+                )
+            aggregated.is_truncated = True
+            aggregated.next_continuation_token = None
+            break
+
+    if aggregated.common_prefixes:
+        seen: set[str] = set()
+        unique_prefixes: list[str] = []
+        for prefix in aggregated.common_prefixes:
+            if prefix not in seen:
+                seen.add(prefix)
+                unique_prefixes.append(prefix)
+        aggregated.common_prefixes = unique_prefixes
+    aggregated.key_count = len(aggregated.objects)
+    aggregated.limit_reached = limit_reached
+    return aggregated
+
+
+def _parse_last_modified(value: Any) -> datetime:
+    if isinstance(value, datetime):
+        dt = value
+    elif value:
+        text = str(value)
+        if text.endswith("Z"):
+            text = text[:-1] + "+00:00"
+        try:
+            dt = datetime.fromisoformat(text)
+        except ValueError:
+            dt = datetime.fromtimestamp(0, tz=timezone.utc)  # noqa: UP017
+    else:
+        dt = datetime.fromtimestamp(0, tz=timezone.utc)  # noqa: UP017
+
+    if dt.tzinfo is None:
+        dt = dt.replace(tzinfo=timezone.utc)  # noqa: UP017
+    return dt
+
+
+def object_dict_to_head(obj: dict[str, Any]) -> ObjectHead:
+    """Convert a list_objects entry into ObjectHead for compatibility uses."""
+    metadata = obj.get("metadata")
+    if metadata is None or not isinstance(metadata, dict):
+        metadata = {}
+
+    return ObjectHead(
+        key=obj["key"],
+        size=int(obj.get("size", 0)),
+        etag=str(obj.get("etag", "")),
+        last_modified=_parse_last_modified(obj.get("last_modified")),
+        metadata=metadata,
+    )
+
+
+__all__ = [
+    "ObjectListing",
+    "list_objects_page",
+    "list_all_objects",
+    "object_dict_to_head",
+]
--- a/src/deltaglider/core/s3_uri.py
+++ b/src/deltaglider/core/s3_uri.py
@@ -0,0 +1,85 @@
+"""Utilities for working with S3-style URLs and keys."""
+
+from __future__ import annotations
+
+from typing import NamedTuple
+
+S3_SCHEME = "s3://"
+
+
+class S3Url(NamedTuple):
+    """Normalized representation of an S3 URL."""
+
+    bucket: str
+    key: str = ""
+
+    def to_url(self) -> str:
+        """Return the canonical string form."""
+        if self.key:
+            return f"{S3_SCHEME}{self.bucket}/{self.key}"
+        return f"{S3_SCHEME}{self.bucket}"
+
+    def with_key(self, key: str) -> S3Url:
+        """Return a new S3Url with a different key."""
+        return S3Url(self.bucket, key.lstrip("/"))
+
+    def join_key(self, suffix: str) -> S3Url:
+        """Append a suffix to the key using '/' semantics."""
+        suffix = suffix.lstrip("/")
+        if not self.key:
+            return self.with_key(suffix)
+        if not suffix:
+            return self
+        return self.with_key(f"{self.key.rstrip('/')}/{suffix}")
+
+
+def is_s3_url(value: str) -> bool:
+    """Check if a string is an S3 URL."""
+    return value.startswith(S3_SCHEME)
+
+
+def parse_s3_url(
+    url: str,
+    *,
+    allow_empty_key: bool = True,
+    strip_trailing_slash: bool = False,
+) -> S3Url:
+    """Parse an S3 URL into bucket and key components."""
+    if not is_s3_url(url):
+        raise ValueError(f"Invalid S3 URL: {url}")
+
+    path = url[len(S3_SCHEME) :]
+    if strip_trailing_slash:
+        path = path.rstrip("/")
+
+    bucket, sep, key = path.partition("/")
+    if not bucket:
+        raise ValueError(f"S3 URL missing bucket: {url}")
+
+    if not sep:
+        key = ""
+
+    key = key.lstrip("/")
+    if not key and not allow_empty_key:
+        raise ValueError(f"S3 URL must include a key: {url}")
+
+    return S3Url(bucket=bucket, key=key)
+
+
+def build_s3_url(bucket: str, key: str | None = None) -> str:
+    """Build an S3 URL from components."""
+    if not bucket:
+        raise ValueError("Bucket name cannot be empty")
+
+    if key:
+        key = key.lstrip("/")
+        return f"{S3_SCHEME}{bucket}/{key}"
+    return f"{S3_SCHEME}{bucket}"
+
+
+__all__ = [
+    "S3Url",
+    "build_s3_url",
+    "is_s3_url",
+    "parse_s3_url",
+]
--- a/src/deltaglider/core/service.py
+++ b/src/deltaglider/core/service.py
@@ -2,9 +2,11 @@

 import tempfile
 import warnings
+from datetime import UTC, timedelta
 from pathlib import Path
 from typing import Any, BinaryIO

+from .. import __version__
 from ..ports import (
    CachePort,
    ClockPort,
@@ -15,6 +17,11 @@ from ..ports import (
    StoragePort,
 )
 from ..ports.storage import ObjectHead
+from .delta_extensions import (
+    DEFAULT_COMPOUND_DELTA_EXTENSIONS,
+    DEFAULT_DELTA_EXTENSIONS,
+    is_delta_candidate,
+)
 from .errors import (
    DiffDecodeError,
    DiffEncodeError,
@@ -23,12 +30,15 @@ from .errors import (
    PolicyViolationWarning,
 )
 from .models import (
+    DeleteResult,
    DeltaMeta,
    DeltaSpace,
    ObjectKey,
    PutSummary,
+    RecursiveDeleteResult,
    ReferenceMeta,
    VerifyResult,
+    resolve_metadata,
 )


@@ -44,10 +54,17 @@ class DeltaService:
        clock: ClockPort,
        logger: LoggerPort,
        metrics: MetricsPort,
-        tool_version: str = "deltaglider/0.1.0",
+        tool_version: str | None = None,
        max_ratio: float = 0.5,
    ):
-        """Initialize service with ports."""
+        """Initialize service with ports.
+
+        Args:
+            tool_version: Version string for metadata. If None, uses package __version__.
+        """
+        # Use real package version if not explicitly provided
+        if tool_version is None:
+            tool_version = f"deltaglider/{__version__}"
        self.storage = storage
        self.diff = diff
        self.hasher = hasher
@@ -58,51 +75,41 @@ class DeltaService:
        self.tool_version = tool_version
        self.max_ratio = max_ratio

-        # File extensions that should use delta compression
-        self.delta_extensions = {
-            ".zip",
-            ".tar",
-            ".gz",
-            ".tar.gz",
-            ".tgz",
-            ".bz2",
-            ".tar.bz2",
-            ".xz",
-            ".tar.xz",
-            ".7z",
-            ".rar",
-            ".dmg",
-            ".iso",
-            ".pkg",
-            ".deb",
-            ".rpm",
-            ".apk",
-            ".jar",
-            ".war",
-            ".ear",
-        }
+        # File extensions that should use delta compression. Keep mutable copies
+        # so advanced callers can customize the policy if needed.
+        self.delta_extensions = set(DEFAULT_DELTA_EXTENSIONS)
+        self.compound_delta_extensions = DEFAULT_COMPOUND_DELTA_EXTENSIONS

    def should_use_delta(self, filename: str) -> bool:
        """Check if file should use delta compression based on extension."""
-        name_lower = filename.lower()
-        # Check compound extensions first
-        for ext in [".tar.gz", ".tar.bz2", ".tar.xz"]:
-            if name_lower.endswith(ext):
-                return True
-        # Check simple extensions
-        return any(name_lower.endswith(ext) for ext in self.delta_extensions)
+        return is_delta_candidate(
+            filename,
+            simple_extensions=self.delta_extensions,
+            compound_extensions=self.compound_delta_extensions,
+        )

    def put(
-        self, local_file: Path, delta_space: DeltaSpace, max_ratio: float | None = None
+        self,
+        local_file: Path,
+        delta_space: DeltaSpace,
+        max_ratio: float | None = None,
+        override_name: str | None = None,
    ) -> PutSummary:
-        """Upload file as reference or delta (for archive files) or directly (for other files)."""
+        """Upload file as reference or delta (for archive files) or directly (for other files).
+
+        Args:
+            local_file: Path to the local file to upload
+            delta_space: DeltaSpace (bucket + prefix) for the upload
+            max_ratio: Maximum acceptable delta/file ratio (default: service max_ratio)
+            override_name: Optional name to use instead of local_file.name (useful for S3-to-S3 copies)
+        """
        if max_ratio is None:
            max_ratio = self.max_ratio

        start_time = self.clock.now()
        file_size = local_file.stat().st_size
        file_sha256 = self.hasher.sha256(local_file)
-        original_name = local_file.name
+        original_name = override_name if override_name else local_file.name

        self.logger.info(
            "Starting put operation",
@@ -166,13 +173,13 @@ class DeltaService:
        self.logger.info("Starting get operation", key=object_key.key)

        # Get object metadata
-        obj_head = self.storage.head(f"{object_key.bucket}/{object_key.key}")
+        obj_head = self.storage.head(object_key.full_key)
        if obj_head is None:
            raise NotFoundError(f"Object not found: {object_key.key}")

        # Check if this is a regular S3 object (not uploaded via DeltaGlider)
-        # Regular S3 objects won't have DeltaGlider metadata
-        if "file_sha256" not in obj_head.metadata:
+        # Regular S3 objects won't have DeltaGlider metadata (dg-file-sha256 key)
+        if "dg-file-sha256" not in obj_head.metadata:
            # This is a regular S3 object, download it directly
            self.logger.info(
                "Downloading regular S3 object (no DeltaGlider metadata)",
@@ -196,11 +203,13 @@ class DeltaService:
            # Direct download without delta processing
            self._get_direct(object_key, obj_head, out)
            duration = (self.clock.now() - start_time).total_seconds()
+            file_size_meta = resolve_metadata(obj_head.metadata, "file_size")
+            file_size_value = int(file_size_meta) if file_size_meta else obj_head.size
            self.logger.log_operation(
                op="get",
                key=object_key.key,
                deltaspace=f"{object_key.bucket}",
-                sizes={"file": int(obj_head.metadata.get("file_size", 0))},
+                sizes={"file": file_size_value},
                durations={"total": duration},
                cache_hit=False,
            )
@@ -238,7 +247,7 @@ class DeltaService:

            # Download delta
            with open(delta_path, "wb") as f:
-                delta_stream = self.storage.get(f"{object_key.bucket}/{object_key.key}")
+                delta_stream = self.storage.get(object_key.full_key)
                for chunk in iter(lambda: delta_stream.read(8192), b""):
                    f.write(chunk)

@@ -338,10 +347,13 @@ class DeltaService:

        # Re-check for race condition
        ref_head = self.storage.head(full_ref_key)
-        if ref_head and ref_head.metadata.get("file_sha256") != file_sha256:
+        existing_sha = None
+        if ref_head:
+            existing_sha = resolve_metadata(ref_head.metadata, "file_sha256")
+        if ref_head and existing_sha and existing_sha != file_sha256:
            self.logger.warning("Reference creation race detected, using existing")
            # Proceed with existing reference
-            ref_sha256 = ref_head.metadata["file_sha256"]
+            ref_sha256 = existing_sha
        else:
            ref_sha256 = file_sha256

@@ -404,7 +416,9 @@ class DeltaService:
    ) -> PutSummary:
        """Create delta file."""
        ref_key = delta_space.reference_key()
-        ref_sha256 = ref_head.metadata["file_sha256"]
+        ref_sha256 = resolve_metadata(ref_head.metadata, "file_sha256")
+        if not ref_sha256:
+            raise ValueError("Reference metadata missing file SHA256")

        # Ensure reference is cached
        cache_hit = self.cache.has_ref(delta_space.bucket, delta_space.prefix, ref_sha256)
@@ -524,7 +538,7 @@ class DeltaService:
    ) -> None:
        """Download file directly from S3 without delta processing."""
        # Download the file directly
-        file_stream = self.storage.get(f"{object_key.bucket}/{object_key.key}")
+        file_stream = self.storage.get(object_key.full_key)

        if isinstance(out, Path):
            # Write to file path
@@ -537,7 +551,7 @@ class DeltaService:
                out.write(chunk)

        # Verify integrity if SHA256 is present
-        expected_sha = obj_head.metadata.get("file_sha256")
+        expected_sha = resolve_metadata(obj_head.metadata, "file_sha256")
        if expected_sha:
            if isinstance(out, Path):
                actual_sha = self.hasher.sha256(out)
@@ -558,7 +572,7 @@ class DeltaService:
        self.logger.info(
            "Direct download complete",
            key=object_key.key,
-            size=obj_head.metadata.get("file_size"),
+            size=resolve_metadata(obj_head.metadata, "file_size"),
        )

    def _upload_direct(
@@ -606,128 +620,37 @@ class DeltaService:
            file_sha256=file_sha256,
        )

-    def delete(self, object_key: ObjectKey) -> dict[str, Any]:
+    def delete(self, object_key: ObjectKey) -> DeleteResult:
        """Delete an object (delta-aware).

        For delta files, just deletes the delta.
        For reference files, checks if any deltas depend on it first.
        For direct uploads, simply deletes the file.
-
-        Returns:
-            dict with deletion details including type and any warnings
        """
        start_time = self.clock.now()
-        full_key = f"{object_key.bucket}/{object_key.key}"
+        full_key = object_key.full_key

        self.logger.info("Starting delete operation", key=object_key.key)

-        # Check if object exists
        obj_head = self.storage.head(full_key)
        if obj_head is None:
            raise NotFoundError(f"Object not found: {object_key.key}")

-        # Determine object type
-        is_reference = object_key.key.endswith("/reference.bin")
-        is_delta = object_key.key.endswith(".delta")
-        is_direct = obj_head.metadata.get("compression") == "none"
+        result = DeleteResult(key=object_key.key, bucket=object_key.bucket)

-        result: dict[str, Any] = {
-            "key": object_key.key,
-            "bucket": object_key.bucket,
-            "deleted": False,
-            "type": "unknown",
-            "warnings": [],
-        }
-
-        if is_reference:
-            # Check if any deltas depend on this reference
-            prefix = object_key.key.rsplit("/", 1)[0] if "/" in object_key.key else ""
-            dependent_deltas = []
-
-            for obj in self.storage.list(f"{object_key.bucket}/{prefix}"):
-                if obj.key.endswith(".delta") and obj.key != object_key.key:
-                    # Check if this delta references our reference
-                    delta_head = self.storage.head(f"{object_key.bucket}/{obj.key}")
-                    if delta_head and delta_head.metadata.get("ref_key") == object_key.key:
-                        dependent_deltas.append(obj.key)
-
-            if dependent_deltas:
-                warnings_list = result["warnings"]
-                assert isinstance(warnings_list, list)
-                warnings_list.append(
-                    f"Reference has {len(dependent_deltas)} dependent delta(s). "
-                    "Deleting this will make those deltas unrecoverable."
-                )
-                self.logger.warning(
-                    "Reference has dependent deltas",
-                    ref_key=object_key.key,
-                    delta_count=len(dependent_deltas),
-                    deltas=dependent_deltas[:5],  # Log first 5
-                )
-
-            # Delete the reference
+        if object_key.key.endswith("/reference.bin"):
+            self._delete_reference(object_key, full_key, result)
+        elif object_key.key.endswith(".delta"):
+            self._delete_delta(object_key, full_key, obj_head, result)
+        elif obj_head.metadata.get("compression") == "none":
            self.storage.delete(full_key)
-            result["deleted"] = True
-            result["type"] = "reference"
-            result["dependent_deltas"] = len(dependent_deltas)
-
-            # Clear from cache if present
-            if "/" in object_key.key:
-                deltaspace_prefix = object_key.key.rsplit("/", 1)[0]
-                try:
-                    self.cache.evict(object_key.bucket, deltaspace_prefix)
-                except Exception as e:
-                    self.logger.debug(f"Could not clear cache for {object_key.key}: {e}")
-
-        elif is_delta:
-            # Delete the delta file
-            self.storage.delete(full_key)
-            result["deleted"] = True
-            result["type"] = "delta"
-            result["original_name"] = obj_head.metadata.get("original_name", "unknown")
-
-            # Check if this was the last delta in the DeltaSpace - if so, clean up reference.bin
-            if "/" in object_key.key:
-                deltaspace_prefix = "/".join(object_key.key.split("/")[:-1])
-                ref_key = f"{deltaspace_prefix}/reference.bin"
-
-                # Check if any other delta files exist in this DeltaSpace
-                remaining_deltas = []
-                for obj in self.storage.list(f"{object_key.bucket}/{deltaspace_prefix}"):
-                    if obj.key.endswith(".delta") and obj.key != object_key.key:
-                        remaining_deltas.append(obj.key)
-
-                if not remaining_deltas:
-                    # No more deltas - clean up the orphaned reference.bin
-                    ref_full_key = f"{object_key.bucket}/{ref_key}"
-                    ref_head = self.storage.head(ref_full_key)
-                    if ref_head:
-                        self.storage.delete(ref_full_key)
-                        self.logger.info(
-                            "Cleaned up orphaned reference.bin",
-                            ref_key=ref_key,
-                            reason="no remaining deltas",
-                        )
-                        result["cleaned_reference"] = ref_key
-
-                        # Clear from cache
-                        try:
-                            self.cache.evict(object_key.bucket, deltaspace_prefix)
-                        except Exception as e:
-                            self.logger.debug(f"Could not clear cache for {deltaspace_prefix}: {e}")
-
-        elif is_direct:
-            # Simply delete the direct upload
-            self.storage.delete(full_key)
-            result["deleted"] = True
-            result["type"] = "direct"
-            result["original_name"] = obj_head.metadata.get("original_name", object_key.key)
-
+            result.deleted = True
+            result.type = "direct"
+            result.original_name = obj_head.metadata.get("original_name", object_key.key)
        else:
-            # Unknown file type, delete anyway
            self.storage.delete(full_key)
-            result["deleted"] = True
-            result["type"] = "unknown"
+            result.deleted = True
+            result.type = "unknown"

        duration = (self.clock.now() - start_time).total_seconds()
        self.logger.log_operation(
@@ -739,169 +662,139 @@ class DeltaService:
            cache_hit=False,
        )
        self.metrics.timing("deltaglider.delete.duration", duration)
-        self.metrics.increment(f"deltaglider.delete.{result['type']}")
+        self.metrics.increment(f"deltaglider.delete.{result.type}")

        return result

-    def delete_recursive(self, bucket: str, prefix: str) -> dict[str, Any]:
+    def _delete_reference(self, object_key: ObjectKey, full_key: str, result: DeleteResult) -> None:
+        """Handle deletion of a reference.bin file."""
+        prefix = object_key.key.rsplit("/", 1)[0] if "/" in object_key.key else ""
+        dependent_deltas = []
+
+        for obj in self.storage.list(f"{object_key.bucket}/{prefix}"):
+            if obj.key.endswith(".delta") and obj.key != object_key.key:
+                delta_head = self.storage.head(f"{object_key.bucket}/{obj.key}")
+                if delta_head and delta_head.metadata.get("ref_key") == object_key.key:
+                    dependent_deltas.append(obj.key)
+
+        if dependent_deltas:
+            result.warnings.append(
+                f"Reference has {len(dependent_deltas)} dependent delta(s). "
+                "Deleting this will make those deltas unrecoverable."
+            )
+            self.logger.warning(
+                "Reference has dependent deltas",
+                ref_key=object_key.key,
+                delta_count=len(dependent_deltas),
+                deltas=dependent_deltas[:5],
+            )
+
+        self.storage.delete(full_key)
+        result.deleted = True
+        result.type = "reference"
+        result.dependent_deltas = len(dependent_deltas)
+
+        if "/" in object_key.key:
+            deltaspace_prefix = object_key.key.rsplit("/", 1)[0]
+            try:
+                self.cache.evict(object_key.bucket, deltaspace_prefix)
+            except Exception as e:
+                self.logger.debug(f"Could not clear cache for {object_key.key}: {e}")
+
+    def _delete_delta(
+        self,
+        object_key: ObjectKey,
+        full_key: str,
+        obj_head: ObjectHead,
+        result: DeleteResult,
+    ) -> None:
+        """Handle deletion of a delta file, cleaning up orphaned references."""
+        self.storage.delete(full_key)
+        result.deleted = True
+        result.type = "delta"
+        result.original_name = obj_head.metadata.get("original_name", "unknown")
+
+        if "/" not in object_key.key:
+            return
+
+        deltaspace_prefix = "/".join(object_key.key.split("/")[:-1])
+        ref_key = f"{deltaspace_prefix}/reference.bin"
+
+        remaining_deltas = [
+            obj.key
+            for obj in self.storage.list(f"{object_key.bucket}/{deltaspace_prefix}")
+            if obj.key.endswith(".delta") and obj.key != object_key.key
+        ]
+
+        if not remaining_deltas:
+            ref_full_key = f"{object_key.bucket}/{ref_key}"
+            ref_head = self.storage.head(ref_full_key)
+            if ref_head:
+                self.storage.delete(ref_full_key)
+                self.logger.info(
+                    "Cleaned up orphaned reference.bin",
+                    ref_key=ref_key,
+                    reason="no remaining deltas",
+                )
+                result.cleaned_reference = ref_key
+
+                try:
+                    self.cache.evict(object_key.bucket, deltaspace_prefix)
+                except Exception as e:
+                    self.logger.debug(f"Could not clear cache for {deltaspace_prefix}: {e}")
+
+    def delete_recursive(self, bucket: str, prefix: str) -> RecursiveDeleteResult:
        """Recursively delete all objects under a prefix (delta-aware).

        Handles delta relationships intelligently:
        - Deletes deltas before references
        - Warns about orphaned deltas
        - Handles direct uploads
-
-        Args:
-            bucket: S3 bucket name
-            prefix: Prefix to delete recursively
-
-        Returns:
-            dict with deletion statistics and any warnings
        """
        start_time = self.clock.now()
        self.logger.info("Starting recursive delete", bucket=bucket, prefix=prefix)

-        # Ensure prefix ends with / for proper directory deletion
        if prefix and not prefix.endswith("/"):
            prefix = f"{prefix}/"

-        # Collect all objects under prefix
-        objects_to_delete = []
-        references = []
-        deltas = []
-        direct_uploads = []
-        affected_deltaspaces = set()
+        # Phase 1: classify objects by type
+        references, deltas, direct_uploads, other_objects, affected_deltaspaces = (
+            self._classify_objects_for_deletion(bucket, prefix)
+        )

-        for obj in self.storage.list(f"{bucket}/{prefix}" if prefix else bucket):
-            if not obj.key.startswith(prefix) and prefix:
-                continue
-
-            if obj.key.endswith("/reference.bin"):
-                references.append(obj.key)
-            elif obj.key.endswith(".delta"):
-                deltas.append(obj.key)
-                # Track which deltaspaces are affected by this deletion
-                if "/" in obj.key:
-                    deltaspace_prefix = "/".join(obj.key.split("/")[:-1])
-                    affected_deltaspaces.add(deltaspace_prefix)
-            else:
-                # Check if it's a direct upload
-                obj_head = self.storage.head(f"{bucket}/{obj.key}")
-                if obj_head and obj_head.metadata.get("compression") == "none":
-                    direct_uploads.append(obj.key)
-                else:
-                    objects_to_delete.append(obj.key)
-
-        # Also check for references in parent directories that might be affected
-        # by the deletion of delta files in affected deltaspaces
-        for deltaspace_prefix in affected_deltaspaces:
-            ref_key = f"{deltaspace_prefix}/reference.bin"
+        # Also check for references in parent deltaspaces affected by delta deletion
+        for ds_prefix in affected_deltaspaces:
+            ref_key = f"{ds_prefix}/reference.bin"
            if ref_key not in references:
-                # Check if this reference exists
                ref_head = self.storage.head(f"{bucket}/{ref_key}")
                if ref_head:
                    references.append(ref_key)

-        result: dict[str, Any] = {
-            "bucket": bucket,
-            "prefix": prefix,
-            "deleted_count": 0,
-            "failed_count": 0,
-            "deltas_deleted": len(deltas),
-            "references_deleted": len(references),
-            "direct_deleted": len(direct_uploads),
-            "other_deleted": len(objects_to_delete),
-            "errors": [],
-            "warnings": [],
-        }
+        result = RecursiveDeleteResult(
+            bucket=bucket,
+            prefix=prefix,
+            deltas_deleted=len(deltas),
+            references_deleted=len(references),
+            direct_deleted=len(direct_uploads),
+            other_deleted=len(other_objects),
+        )

-        # Delete in order: other files -> direct uploads -> deltas -> references (with checks)
-        # This ensures we don't delete references that deltas depend on prematurely
-        regular_files = objects_to_delete + direct_uploads + deltas
-
-        # Delete regular files first
-        for key in regular_files:
+        # Phase 2: delete non-reference files first (dependency order)
+        for key in other_objects + direct_uploads + deltas:
            try:
                self.storage.delete(f"{bucket}/{key}")
-                deleted_count = result["deleted_count"]
-                assert isinstance(deleted_count, int)
-                result["deleted_count"] = deleted_count + 1
+                result.deleted_count += 1
                self.logger.debug(f"Deleted {key}")
            except Exception as e:
-                failed_count = result["failed_count"]
-                assert isinstance(failed_count, int)
-                result["failed_count"] = failed_count + 1
-                errors_list = result["errors"]
-                assert isinstance(errors_list, list)
-                errors_list.append(f"Failed to delete {key}: {str(e)}")
+                result.failed_count += 1
+                result.errors.append(f"Failed to delete {key}: {str(e)}")
                self.logger.error(f"Failed to delete {key}: {e}")

-        # Handle references intelligently - only delete if no files outside deletion scope depend on them
-        references_kept = 0
-        for ref_key in references:
-            try:
-                # Extract deltaspace prefix from reference.bin path
-                if ref_key.endswith("/reference.bin"):
-                    deltaspace_prefix = ref_key[:-14]  # Remove "/reference.bin"
-                else:
-                    deltaspace_prefix = ""
+        # Phase 3: delete references only if safe
+        references_kept = self._delete_references_if_safe(bucket, prefix, references, result)
+        result.references_deleted -= references_kept

-                # Check if there are any remaining files in this deltaspace
-                # (outside of the deletion prefix)
-                deltaspace_list_prefix = (
-                    f"{bucket}/{deltaspace_prefix}" if deltaspace_prefix else bucket
-                )
-                remaining_objects = list(self.storage.list(deltaspace_list_prefix))
-
-                # Filter out objects that are being deleted (within our deletion scope)
-                # and the reference.bin file itself
-                deletion_prefix_full = f"{bucket}/{prefix}" if prefix else bucket
-                has_remaining_files = False
-
-                for remaining_obj in remaining_objects:
-                    obj_full_path = f"{bucket}/{remaining_obj.key}"
-                    # Skip if this object is within our deletion scope
-                    if prefix and obj_full_path.startswith(deletion_prefix_full):
-                        continue
-                    # Skip if this is the reference.bin file itself
-                    if remaining_obj.key == ref_key:
-                        continue
-                    # If we find any other file, the reference is still needed
-                    has_remaining_files = True
-                    break
-
-                if not has_remaining_files:
-                    # Safe to delete this reference.bin
-                    self.storage.delete(f"{bucket}/{ref_key}")
-                    deleted_count = result["deleted_count"]
-                    assert isinstance(deleted_count, int)
-                    result["deleted_count"] = deleted_count + 1
-                    self.logger.debug(f"Deleted reference {ref_key}")
-                else:
-                    # Keep the reference as it's still needed
-                    references_kept += 1
-                    warnings_list = result["warnings"]
-                    assert isinstance(warnings_list, list)
-                    warnings_list.append(f"Kept reference {ref_key} (still in use)")
-                    self.logger.info(
-                        f"Kept reference {ref_key} - still in use outside deletion scope"
-                    )
-
-            except Exception as e:
-                failed_count = result["failed_count"]
-                assert isinstance(failed_count, int)
-                result["failed_count"] = failed_count + 1
-                errors_list = result["errors"]
-                assert isinstance(errors_list, list)
-                errors_list.append(f"Failed to delete reference {ref_key}: {str(e)}")
-                self.logger.error(f"Failed to delete reference {ref_key}: {e}")
-
-        # Update reference deletion count
-        references_deleted = result["references_deleted"]
-        assert isinstance(references_deleted, int)
-        result["references_deleted"] = references_deleted - references_kept
-
-        # Clear any cached references for this prefix
+        # Clear cached references
        if references:
            try:
                self.cache.evict(bucket, prefix.rstrip("/") if prefix else "")
@@ -913,11 +806,291 @@ class DeltaService:
            "Recursive delete complete",
            bucket=bucket,
            prefix=prefix,
-            deleted=result["deleted_count"],
-            failed=result["failed_count"],
+            deleted=result.deleted_count,
+            failed=result.failed_count,
            duration=duration,
        )
        self.metrics.timing("deltaglider.delete_recursive.duration", duration)
        self.metrics.increment("deltaglider.delete_recursive.completed")

        return result
+
+    def _classify_objects_for_deletion(
+        self, bucket: str, prefix: str
+    ) -> tuple[list[str], list[str], list[str], list[str], set[str]]:
+        """Classify objects under a prefix into references, deltas, direct uploads, and other.
+
+        Returns:
+            (references, deltas, direct_uploads, other_objects, affected_deltaspaces)
+        """
+        references: list[str] = []
+        deltas: list[str] = []
+        direct_uploads: list[str] = []
+        other_objects: list[str] = []
+        affected_deltaspaces: set[str] = set()
+
+        for obj in self.storage.list(f"{bucket}/{prefix}" if prefix else bucket):
+            if prefix and not obj.key.startswith(prefix):
+                continue
+
+            if obj.key.endswith("/reference.bin"):
+                references.append(obj.key)
+            elif obj.key.endswith(".delta"):
+                deltas.append(obj.key)
+                if "/" in obj.key:
+                    affected_deltaspaces.add("/".join(obj.key.split("/")[:-1]))
+            else:
+                obj_head = self.storage.head(f"{bucket}/{obj.key}")
+                if obj_head and obj_head.metadata.get("compression") == "none":
+                    direct_uploads.append(obj.key)
+                else:
+                    other_objects.append(obj.key)
+
+        return references, deltas, direct_uploads, other_objects, affected_deltaspaces
+
+    def _delete_references_if_safe(
+        self,
+        bucket: str,
+        prefix: str,
+        references: list[str],
+        result: RecursiveDeleteResult,
+    ) -> int:
+        """Delete references only if no files outside the deletion scope depend on them.
+
+        Returns the number of references kept (not deleted).
+        """
+        references_kept = 0
+        deletion_prefix_full = f"{bucket}/{prefix}" if prefix else bucket
+
+        for ref_key in references:
+            try:
+                if ref_key.endswith("/reference.bin"):
+                    deltaspace_prefix = ref_key[:-14]  # Remove "/reference.bin"
+                else:
+                    deltaspace_prefix = ""
+
+                ds_list_prefix = f"{bucket}/{deltaspace_prefix}" if deltaspace_prefix else bucket
+                has_remaining_files = any(
+                    not (prefix and f"{bucket}/{obj.key}".startswith(deletion_prefix_full))
+                    and obj.key != ref_key
+                    for obj in self.storage.list(ds_list_prefix)
+                )
+
+                if not has_remaining_files:
+                    self.storage.delete(f"{bucket}/{ref_key}")
+                    result.deleted_count += 1
+                    self.logger.debug(f"Deleted reference {ref_key}")
+                else:
+                    references_kept += 1
+                    result.warnings.append(f"Kept reference {ref_key} (still in use)")
+                    self.logger.info(
+                        f"Kept reference {ref_key} - still in use outside deletion scope"
+                    )
+
+            except Exception as e:
+                result.failed_count += 1
+                result.errors.append(f"Failed to delete reference {ref_key}: {str(e)}")
+                self.logger.error(f"Failed to delete reference {ref_key}: {e}")
+
+        return references_kept
+
+    def rehydrate_for_download(
+        self,
+        bucket: str,
+        key: str,
+        expires_in_seconds: int = 3600,
+    ) -> str | None:
+        """Rehydrate a deltaglider-compressed file for direct download.
+
+        If the file is deltaglider-compressed, this will:
+        1. Download and decompress the file
+        2. Re-upload to .deltaglider/tmp/ with expiration metadata
+        3. Return the new temporary file key
+
+        If the file is not deltaglider-compressed, returns None.
+
+        Args:
+            bucket: S3 bucket name
+            key: Object key
+            expires_in_seconds: How long the temporary file should exist
+
+        Returns:
+            New key for temporary file, or None if not deltaglider-compressed
+        """
+        start_time = self.clock.now()
+
+        # Check if object exists and is deltaglider-compressed
+        obj_head = self.storage.head(f"{bucket}/{key}")
+
+        # If not found directly, try with .delta extension
+        if obj_head is None and not key.endswith(".delta"):
+            obj_head = self.storage.head(f"{bucket}/{key}.delta")
+            if obj_head is not None:
+                # Found the delta version, update the key
+                key = f"{key}.delta"
+
+        if obj_head is None:
+            raise NotFoundError(f"Object not found: {key}")
+
+        # Check if this is a deltaglider file
+        is_delta = key.endswith(".delta")
+        has_dg_metadata = "dg-file-sha256" in obj_head.metadata
+
+        if not is_delta and not has_dg_metadata:
+            # Not a deltaglider file, return None
+            self.logger.debug(f"File {key} is not deltaglider-compressed")
+            return None
+
+        # Generate temporary file path
+        import uuid
+
+        # Use the original filename without .delta extension for the temp file
+        original_name = key.removesuffix(".delta") if key.endswith(".delta") else key
+        temp_filename = f"{uuid.uuid4().hex}_{Path(original_name).name}"
+        temp_key = f".deltaglider/tmp/{temp_filename}"
+
+        # Download and decompress the file
+        with tempfile.TemporaryDirectory() as tmpdir:
+            tmp_path = Path(tmpdir)
+            decompressed_path = tmp_path / "decompressed"
+
+            # Use the existing get method to decompress
+            object_key = ObjectKey(bucket=bucket, key=key)
+            self.get(object_key, decompressed_path)
+
+            # Calculate expiration time
+            expires_at = self.clock.now() + timedelta(seconds=expires_in_seconds)
+
+            # Create metadata for temporary file
+            metadata = {
+                "dg-expires-at": expires_at.isoformat(),
+                "dg-original-key": key,
+                "dg-original-filename": Path(original_name).name,
+                "dg-rehydrated": "true",
+                "dg-created-at": self.clock.now().isoformat(),
+            }
+
+            # Upload the decompressed file
+            self.logger.info(
+                "Uploading rehydrated file",
+                original_key=key,
+                temp_key=temp_key,
+                expires_at=expires_at.isoformat(),
+            )
+
+            self.storage.put(
+                f"{bucket}/{temp_key}",
+                decompressed_path,
+                metadata,
+            )
+
+        duration = (self.clock.now() - start_time).total_seconds()
+        self.logger.info(
+            "Rehydration complete",
+            original_key=key,
+            temp_key=temp_key,
+            duration=duration,
+        )
+        self.metrics.timing("deltaglider.rehydrate.duration", duration)
+        self.metrics.increment("deltaglider.rehydrate.completed")
+
+        return temp_key
+
+    def purge_temp_files(self, bucket: str) -> dict[str, Any]:
+        """Purge expired temporary files from .deltaglider/tmp/.
+
+        Scans the .deltaglider/tmp/ prefix and deletes any files
+        whose dg-expires-at metadata indicates they have expired.
+
+        Args:
+            bucket: S3 bucket to purge temp files from
+
+        Returns:
+            dict with purge statistics
+        """
+        start_time = self.clock.now()
+        prefix = ".deltaglider/tmp/"
+
+        self.logger.info("Starting temp file purge", bucket=bucket, prefix=prefix)
+
+        deleted_count = 0
+        expired_count = 0
+        error_count = 0
+        total_size_freed = 0
+        errors = []
+
+        # List all objects in temp directory
+        for obj in self.storage.list(f"{bucket}/{prefix}"):
+            if not obj.key.startswith(prefix):
+                continue
+
+            try:
+                # Get object metadata
+                obj_head = self.storage.head(f"{bucket}/{obj.key}")
+                if obj_head is None:
+                    continue
+
+                # Check expiration
+                expires_at_str = obj_head.metadata.get("dg-expires-at")
+                if not expires_at_str:
+                    # No expiration metadata, skip
+                    self.logger.debug(f"No expiration metadata for {obj.key}")
+                    continue
+
+                # Parse expiration time
+                from datetime import datetime
+
+                try:
+                    expires_at = datetime.fromisoformat(expires_at_str.replace("Z", "+00:00"))
+                    if expires_at.tzinfo is None:
+                        expires_at = expires_at.replace(tzinfo=UTC)
+                except ValueError:
+                    self.logger.warning(
+                        f"Invalid expiration format for {obj.key}: {expires_at_str}"
+                    )
+                    continue
+
+                # Check if expired
+                if self.clock.now() >= expires_at:
+                    expired_count += 1
+                    # Delete the file
+                    self.storage.delete(f"{bucket}/{obj.key}")
+                    deleted_count += 1
+                    total_size_freed += obj.size
+                    self.logger.debug(
+                        f"Deleted expired temp file {obj.key}",
+                        expired_at=expires_at_str,
+                        size=obj.size,
+                    )
+
+            except Exception as e:
+                error_count += 1
+                errors.append(f"Error processing {obj.key}: {str(e)}")
+                self.logger.error(f"Failed to process temp file {obj.key}: {e}")
+
+        duration = (self.clock.now() - start_time).total_seconds()
+
+        result = {
+            "bucket": bucket,
+            "prefix": prefix,
+            "deleted_count": deleted_count,
+            "expired_count": expired_count,
+            "error_count": error_count,
+            "total_size_freed": total_size_freed,
+            "duration_seconds": duration,
+            "errors": errors,
+        }
+
+        self.logger.info(
+            "Temp file purge complete",
+            bucket=bucket,
+            deleted=deleted_count,
+            size_freed=total_size_freed,
+            duration=duration,
+        )
+
+        self.metrics.timing("deltaglider.purge.duration", duration)
+        self.metrics.gauge("deltaglider.purge.deleted_count", deleted_count)
+        self.metrics.gauge("deltaglider.purge.size_freed", total_size_freed)
+
+        return result
--- a/tests/integration/test_aws_cli_commands.py
+++ b/tests/integration/test_aws_cli_commands.py
@@ -130,17 +130,26 @@ class TestSyncCommand:

            # Mock service methods
            mock_service.storage.list.return_value = []  # No existing files
-            mock_service.put.return_value = PutSummary(
-                operation="create_reference",
-                bucket="test-bucket",
-                key="backup/file.zip.delta",
-                original_name="file.zip",
-                file_size=8,
-                file_sha256="ghi789",
-                delta_size=None,
-                delta_ratio=None,
-                ref_key=None,
-            )
+            # Mock list_objects to raise NotImplementedError so it falls back to list()
+            mock_service.storage.list_objects.side_effect = NotImplementedError()
+
+            # Mock service.put to avoid actual execution
+            def mock_put(local_path, delta_space, max_ratio=None):
+                return PutSummary(
+                    operation="create_reference",
+                    bucket="test-bucket",
+                    key=f"{delta_space.prefix}/{local_path.name}.delta"
+                    if delta_space.prefix
+                    else f"{local_path.name}.delta",
+                    original_name=local_path.name,
+                    file_size=local_path.stat().st_size,
+                    file_sha256="ghi789",
+                    delta_size=None,
+                    delta_ratio=None,
+                    ref_key=None,
+                )
+
+            mock_service.put.side_effect = mock_put

            with patch("deltaglider.app.cli.main.create_service", return_value=mock_service):
                result = runner.invoke(cli, ["sync", str(test_dir), "s3://test-bucket/backup/"])
@@ -175,6 +184,8 @@ class TestSyncCommand:
                    metadata={},
                ),
            ]
+            # Mock list_objects to raise NotImplementedError so it falls back to list()
+            mock_service.storage.list_objects.side_effect = NotImplementedError()
            mock_service.storage.head.side_effect = [
                None,  # file1.zip doesn't exist
                Mock(),  # file1.zip.delta exists
--- a/tests/integration/test_bucket_management.py
+++ b/tests/integration/test_bucket_management.py
@@ -153,13 +153,14 @@ class TestBucketManagement:
            delta_objects=6,
            direct_objects=4,
        )
-        client._store_bucket_stats_cache("bucket1", detailed_stats=True, stats=cached_stats)
+        client._store_bucket_stats_cache("bucket1", mode="detailed", stats=cached_stats)

        response = client.list_buckets()

        bucket1 = next(bucket for bucket in response["Buckets"] if bucket["Name"] == "bucket1")
        assert bucket1["DeltaGliderStats"]["Cached"] is True
        assert bucket1["DeltaGliderStats"]["Detailed"] is True
+        assert bucket1["DeltaGliderStats"]["Mode"] == "detailed"
        assert bucket1["DeltaGliderStats"]["ObjectCount"] == cached_stats.object_count
        assert bucket1["DeltaGliderStats"]["TotalSize"] == cached_stats.total_size

@@ -254,10 +255,16 @@ class TestBucketManagement:

        call_count = {"value": 0}

-        def fake_get_bucket_stats(_: Any, bucket: str, detailed_stats_flag: bool) -> BucketStats:
+        def fake_get_bucket_stats(
+            _: Any, bucket: str, mode: str, use_cache: bool = True, refresh_cache: bool = False
+        ) -> BucketStats:
            call_count["value"] += 1
            assert bucket == "bucket1"
-            return detailed_stats if detailed_stats_flag else quick_stats
+            if mode == "detailed":
+                return detailed_stats
+            if mode == "sampled":
+                return detailed_stats  # sampled treated as detailed for cache propagation
+            return quick_stats

        monkeypatch.setattr("deltaglider.client._get_bucket_stats", fake_get_bucket_stats)

@@ -266,24 +273,20 @@ class TestBucketManagement:
        assert result_quick is quick_stats
        assert call_count["value"] == 1

-        # Second quick call should hit cache
+        # Second quick call - caching is now done in _get_bucket_stats (S3-based)
+        # So each call goes through _get_bucket_stats (which handles caching internally)
        assert client.get_bucket_stats("bucket1") is quick_stats
-        assert call_count["value"] == 1
+        assert call_count["value"] == 2

        # Detailed call triggers new computation
-        result_detailed = client.get_bucket_stats("bucket1", detailed_stats=True)
+        result_detailed = client.get_bucket_stats("bucket1", mode="detailed")
        assert result_detailed is detailed_stats
-        assert call_count["value"] == 2
-
-        # Quick call after detailed uses detailed cached value (more accurate)
-        assert client.get_bucket_stats("bucket1") is detailed_stats
-        assert call_count["value"] == 2
-
-        # Clearing the cache should force recomputation
-        client.clear_cache()
-        assert client.get_bucket_stats("bucket1") is quick_stats
        assert call_count["value"] == 3

+        # Quick call - each mode has its own cache in _get_bucket_stats
+        assert client.get_bucket_stats("bucket1") is quick_stats
+        assert call_count["value"] == 4
+
    def test_bucket_methods_without_boto3_client(self):
        """Test that bucket methods raise NotImplementedError when storage doesn't support it."""
        service = create_service()
@@ -305,6 +308,148 @@ class TestBucketManagement:
        with pytest.raises(NotImplementedError):
            client.list_buckets()

+    def test_put_bucket_acl_with_canned_acl(self):
+        """Test setting a canned ACL on a bucket."""
+        service = create_service()
+        mock_storage = Mock()
+        service.storage = mock_storage
+
+        mock_boto3_client = Mock()
+        mock_boto3_client.put_bucket_acl.return_value = None
+        mock_storage.client = mock_boto3_client
+
+        client = DeltaGliderClient(service)
+        response = client.put_bucket_acl(Bucket="test-bucket", ACL="public-read")
+
+        assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
+        mock_boto3_client.put_bucket_acl.assert_called_once_with(
+            Bucket="test-bucket", ACL="public-read"
+        )
+
+    def test_put_bucket_acl_with_grants(self):
+        """Test setting ACL with grant parameters."""
+        service = create_service()
+        mock_storage = Mock()
+        service.storage = mock_storage
+
+        mock_boto3_client = Mock()
+        mock_boto3_client.put_bucket_acl.return_value = None
+        mock_storage.client = mock_boto3_client
+
+        client = DeltaGliderClient(service)
+        response = client.put_bucket_acl(
+            Bucket="test-bucket",
+            GrantRead="id=12345",
+            GrantWrite="id=67890",
+        )
+
+        assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
+        mock_boto3_client.put_bucket_acl.assert_called_once_with(
+            Bucket="test-bucket", GrantRead="id=12345", GrantWrite="id=67890"
+        )
+
+    def test_put_bucket_acl_with_access_control_policy(self):
+        """Test setting ACL with a full AccessControlPolicy dict."""
+        service = create_service()
+        mock_storage = Mock()
+        service.storage = mock_storage
+
+        mock_boto3_client = Mock()
+        mock_boto3_client.put_bucket_acl.return_value = None
+        mock_storage.client = mock_boto3_client
+
+        policy = {
+            "Grants": [
+                {
+                    "Grantee": {"Type": "CanonicalUser", "ID": "abc123"},
+                    "Permission": "FULL_CONTROL",
+                }
+            ],
+            "Owner": {"ID": "abc123"},
+        }
+
+        client = DeltaGliderClient(service)
+        response = client.put_bucket_acl(Bucket="test-bucket", AccessControlPolicy=policy)
+
+        assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
+        mock_boto3_client.put_bucket_acl.assert_called_once_with(
+            Bucket="test-bucket", AccessControlPolicy=policy
+        )
+
+    def test_put_bucket_acl_failure(self):
+        """Test that put_bucket_acl raises RuntimeError on boto3 failure."""
+        service = create_service()
+        mock_storage = Mock()
+        service.storage = mock_storage
+
+        mock_boto3_client = Mock()
+        mock_boto3_client.put_bucket_acl.side_effect = Exception("AccessDenied")
+        mock_storage.client = mock_boto3_client
+
+        client = DeltaGliderClient(service)
+
+        with pytest.raises(RuntimeError, match="Failed to set bucket ACL"):
+            client.put_bucket_acl(Bucket="test-bucket", ACL="public-read")
+
+    def test_put_bucket_acl_no_boto3_client(self):
+        """Test that put_bucket_acl raises NotImplementedError without boto3 client."""
+        service = create_service()
+        mock_storage = Mock()
+        service.storage = mock_storage
+        delattr(mock_storage, "client")
+
+        client = DeltaGliderClient(service)
+
+        with pytest.raises(NotImplementedError):
+            client.put_bucket_acl(Bucket="test-bucket", ACL="private")
+
+    def test_get_bucket_acl_success(self):
+        """Test getting bucket ACL successfully."""
+        service = create_service()
+        mock_storage = Mock()
+        service.storage = mock_storage
+
+        acl_response = {
+            "Owner": {"DisplayName": "test-user", "ID": "abc123"},
+            "Grants": [
+                {
+                    "Grantee": {
+                        "Type": "CanonicalUser",
+                        "DisplayName": "test-user",
+                        "ID": "abc123",
+                    },
+                    "Permission": "FULL_CONTROL",
+                }
+            ],
+        }
+
+        mock_boto3_client = Mock()
+        mock_boto3_client.get_bucket_acl.return_value = acl_response
+        mock_storage.client = mock_boto3_client
+
+        client = DeltaGliderClient(service)
+        response = client.get_bucket_acl(Bucket="test-bucket")
+
+        assert response["Owner"]["DisplayName"] == "test-user"
+        assert len(response["Grants"]) == 1
+        assert response["Grants"][0]["Permission"] == "FULL_CONTROL"
+        mock_boto3_client.get_bucket_acl.assert_called_once_with(Bucket="test-bucket")
+
+    def test_get_bucket_acl_failure(self):
+        """Test that get_bucket_acl raises RuntimeError on boto3 failure."""
+        service = create_service()
+        mock_storage = Mock()
+        service.storage = mock_storage
+
+        mock_boto3_client = Mock()
+        mock_boto3_client.get_bucket_acl.side_effect = Exception("NoSuchBucket")
+        mock_storage.client = mock_boto3_client
+
+        client = DeltaGliderClient(service)
+
+        with pytest.raises(RuntimeError, match="Failed to get bucket ACL"):
+            client.get_bucket_acl(Bucket="nonexistent-bucket")
+
    def test_complete_bucket_lifecycle(self):
        """Test complete bucket lifecycle: create, use, delete."""
        service = create_service()
--- a/tests/integration/test_client.py
+++ b/tests/integration/test_client.py
@@ -43,7 +43,15 @@ class MockStorage:
                if obj_head is not None:
                    yield obj_head

-    def list_objects(self, bucket, prefix="", delimiter="", max_keys=1000, start_after=None):
+    def list_objects(
+        self,
+        bucket,
+        prefix="",
+        delimiter="",
+        max_keys=1000,
+        start_after=None,
+        continuation_token=None,
+    ):
        """Mock list_objects operation for S3 features."""
        objects = []
        common_prefixes = set()
@@ -434,7 +442,7 @@ class TestDeltaGliderFeatures:

    def test_get_bucket_stats(self, client):
        """Test getting bucket statistics."""
-        # Test quick stats (default: detailed_stats=False)
+        # Test quick stats (LIST only)
        stats = client.get_bucket_stats("test-bucket")

        assert isinstance(stats, BucketStats)
@@ -442,8 +450,8 @@ class TestDeltaGliderFeatures:
        assert stats.total_size > 0
        assert stats.delta_objects >= 1  # We have archive.zip.delta

-        # Test with detailed_stats=True
-        detailed_stats = client.get_bucket_stats("test-bucket", detailed_stats=True)
+        # Test with detailed mode
+        detailed_stats = client.get_bucket_stats("test-bucket", mode="detailed")
        assert isinstance(detailed_stats, BucketStats)
        assert detailed_stats.object_count == stats.object_count

--- a/tests/integration/test_delete_objects_recursive.py
+++ b/tests/integration/test_delete_objects_recursive.py
@@ -6,6 +6,7 @@ from unittest.mock import Mock, patch
 import pytest

 from deltaglider import create_client
+from deltaglider.core.models import DeleteResult, RecursiveDeleteResult


 class MockStorage:
@@ -177,14 +178,16 @@ class TestDeleteObjectsRecursiveStatisticsAggregation:
    def test_aggregates_deleted_count_from_service_and_single_deletes(self, client):
        """Test that deleted counts are aggregated correctly."""
        # Setup: Mock service.delete_recursive to return specific counts
-        mock_result = {
-            "deleted_count": 5,
-            "failed_count": 0,
-            "deltas_deleted": 2,
-            "references_deleted": 1,
-            "direct_deleted": 2,
-            "other_deleted": 0,
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="test/",
+            deleted_count=5,
+            failed_count=0,
+            deltas_deleted=2,
+            references_deleted=1,
+            direct_deleted=2,
+            other_deleted=0,
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        # Execute
@@ -204,14 +207,16 @@ class TestDeleteObjectsRecursiveStatisticsAggregation:
        client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}

        # Mock service.delete_recursive to return additional counts
-        mock_result = {
-            "deleted_count": 3,
-            "failed_count": 0,
-            "deltas_deleted": 1,
-            "references_deleted": 0,
-            "direct_deleted": 2,
-            "other_deleted": 0,
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="file.txt",
+            deleted_count=3,
+            failed_count=0,
+            deltas_deleted=1,
+            references_deleted=0,
+            direct_deleted=2,
+            other_deleted=0,
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        # Execute
@@ -245,15 +250,17 @@ class TestDeleteObjectsRecursiveErrorHandling:
    def test_service_errors_propagated_in_response(self, client):
        """Test that errors from service.delete_recursive are propagated."""
        # Mock service to return errors
-        mock_result = {
-            "deleted_count": 2,
-            "failed_count": 1,
-            "deltas_deleted": 2,
-            "references_deleted": 0,
-            "direct_deleted": 0,
-            "other_deleted": 0,
-            "errors": ["Error deleting object1", "Error deleting object2"],
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="test/",
+            deleted_count=2,
+            failed_count=1,
+            deltas_deleted=2,
+            references_deleted=0,
+            direct_deleted=0,
+            other_deleted=0,
+            errors=["Error deleting object1", "Error deleting object2"],
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        # Execute
@@ -271,15 +278,17 @@ class TestDeleteObjectsRecursiveErrorHandling:
        client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}

        # Mock service to also return errors
-        mock_result = {
-            "deleted_count": 1,
-            "failed_count": 1,
-            "deltas_deleted": 0,
-            "references_deleted": 0,
-            "direct_deleted": 0,
-            "other_deleted": 0,
-            "errors": ["Service delete error"],
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="file.txt",
+            deleted_count=1,
+            failed_count=1,
+            deltas_deleted=0,
+            references_deleted=0,
+            direct_deleted=0,
+            other_deleted=0,
+            errors=["Service delete error"],
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        # Mock delete_with_delta_suffix to raise exception
@@ -302,15 +311,17 @@ class TestDeleteObjectsRecursiveWarningsHandling:
    def test_service_warnings_propagated_in_response(self, client):
        """Test that warnings from service.delete_recursive are propagated."""
        # Mock service to return warnings
-        mock_result = {
-            "deleted_count": 3,
-            "failed_count": 0,
-            "deltas_deleted": 2,
-            "references_deleted": 1,
-            "direct_deleted": 0,
-            "other_deleted": 0,
-            "warnings": ["Reference deleted, 2 dependent deltas invalidated"],
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="test/",
+            deleted_count=3,
+            failed_count=0,
+            deltas_deleted=2,
+            references_deleted=1,
+            direct_deleted=0,
+            other_deleted=0,
+            warnings=["Reference deleted, 2 dependent deltas invalidated"],
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        # Execute
@@ -326,25 +337,29 @@ class TestDeleteObjectsRecursiveWarningsHandling:
        client.service.storage.objects["test-bucket/ref.bin"] = {"size": 100}

        # Mock service
-        mock_result = {
-            "deleted_count": 0,
-            "failed_count": 0,
-            "deltas_deleted": 0,
-            "references_deleted": 0,
-            "direct_deleted": 0,
-            "other_deleted": 0,
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="ref.bin",
+            deleted_count=0,
+            failed_count=0,
+            deltas_deleted=0,
+            references_deleted=0,
+            direct_deleted=0,
+            other_deleted=0,
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        # Mock delete_with_delta_suffix to return warnings
        with patch("deltaglider.client.delete_with_delta_suffix") as mock_delete:
            mock_delete.return_value = (
                "ref.bin",
-                {
-                    "deleted": True,
-                    "type": "reference",
-                    "warnings": ["Warning from single delete"],
-                },
+                DeleteResult(
+                    key="ref.bin",
+                    bucket="test-bucket",
+                    deleted=True,
+                    type="reference",
+                    warnings=["Warning from single delete"],
+                ),
            )

            # Execute
@@ -364,26 +379,29 @@ class TestDeleteObjectsRecursiveSingleDeleteDetails:
        client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}

        # Mock service
-        mock_result = {
-            "deleted_count": 0,
-            "failed_count": 0,
-            "deltas_deleted": 0,
-            "references_deleted": 0,
-            "direct_deleted": 0,
-            "other_deleted": 0,
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="file.txt",
+            deleted_count=0,
+            failed_count=0,
+            deltas_deleted=0,
+            references_deleted=0,
+            direct_deleted=0,
+            other_deleted=0,
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        # Mock delete_with_delta_suffix
        with patch("deltaglider.client.delete_with_delta_suffix") as mock_delete:
            mock_delete.return_value = (
                "file.txt",
-                {
-                    "deleted": True,
-                    "type": "direct",
-                    "dependent_deltas": 0,
-                    "warnings": [],
-                },
+                DeleteResult(
+                    key="file.txt",
+                    bucket="test-bucket",
+                    deleted=True,
+                    type="direct",
+                    dependent_deltas=0,
+                ),
            )

            # Execute
@@ -412,25 +430,28 @@ class TestDeleteObjectsRecursiveSingleDeleteDetails:
            actual_key = "file.zip.delta" if key == "file.zip" else key
            return (
                actual_key,
-                {
-                    "deleted": True,
-                    "type": "delta",
-                    "dependent_deltas": 0,
-                    "warnings": [],
-                },
+                DeleteResult(
+                    key=actual_key,
+                    bucket=bucket,
+                    deleted=True,
+                    type="delta",
+                    dependent_deltas=0,
+                ),
            )

        client_delete_helpers.delete_with_delta_suffix = mock_delete

        # Mock service
-        mock_result = {
-            "deleted_count": 0,
-            "failed_count": 0,
-            "deltas_deleted": 0,
-            "references_deleted": 0,
-            "direct_deleted": 0,
-            "other_deleted": 0,
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="file.zip",
+            deleted_count=0,
+            failed_count=0,
+            deltas_deleted=0,
+            references_deleted=0,
+            direct_deleted=0,
+            other_deleted=0,
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        try:
@@ -479,26 +500,29 @@ class TestDeleteObjectsRecursiveEdgeCases:
        client.service.storage.objects["test-bucket/file.txt"] = {"size": 100}

        # Mock service
-        mock_result = {
-            "deleted_count": 0,
-            "failed_count": 0,
-            "deltas_deleted": 0,
-            "references_deleted": 0,
-            "direct_deleted": 0,
-            "other_deleted": 0,
-        }
+        mock_result = RecursiveDeleteResult(
+            bucket="test-bucket",
+            prefix="file.txt",
+            deleted_count=0,
+            failed_count=0,
+            deltas_deleted=0,
+            references_deleted=0,
+            direct_deleted=0,
+            other_deleted=0,
+        )
        client.service.delete_recursive = Mock(return_value=mock_result)

        # Mock delete_with_delta_suffix to return unknown type
        with patch("deltaglider.client.delete_with_delta_suffix") as mock_delete:
            mock_delete.return_value = (
                "file.txt",
-                {
-                    "deleted": True,
-                    "type": "unknown_type",  # Not in single_counts keys
-                    "dependent_deltas": 0,
-                    "warnings": [],
-                },
+                DeleteResult(
+                    key="file.txt",
+                    bucket="test-bucket",
+                    deleted=True,
+                    type="unknown_type",  # Not in single_counts keys
+                    dependent_deltas=0,
+                ),
            )

            # Execute
--- a/tests/integration/test_filtering_and_cleanup.py
+++ b/tests/integration/test_filtering_and_cleanup.py
@@ -243,12 +243,12 @@ class TestSingleDeleteCleanup:
        result = service.delete(ObjectKey(bucket="test-bucket", key="releases/app.zip.delta"))

        # Verify delta was deleted
-        assert result["deleted"] is True
-        assert result["type"] == "delta"
+        assert result.deleted is True
+        assert result.type == "delta"

        # Verify reference.bin cleanup was triggered
-        assert "cleaned_reference" in result
-        assert result["cleaned_reference"] == "releases/reference.bin"
+        assert result.cleaned_reference is not None
+        assert result.cleaned_reference == "releases/reference.bin"

        # Verify both files were deleted
        assert mock_storage.delete.call_count == 2
@@ -295,11 +295,11 @@ class TestSingleDeleteCleanup:
        result = service.delete(ObjectKey(bucket="test-bucket", key="releases/app-v1.zip.delta"))

        # Verify delta was deleted
-        assert result["deleted"] is True
-        assert result["type"] == "delta"
+        assert result.deleted is True
+        assert result.type == "delta"

        # Verify reference.bin was NOT cleaned up
-        assert "cleaned_reference" not in result
+        assert result.cleaned_reference is None

        # Verify only the delta was deleted, not reference.bin
        assert mock_storage.delete.call_count == 1
@@ -342,11 +342,11 @@ class TestSingleDeleteCleanup:
        result = service.delete(ObjectKey(bucket="test-bucket", key="releases/app.zip.delta"))

        # Verify delta was deleted
-        assert result["deleted"] is True
-        assert result["type"] == "delta"
+        assert result.deleted is True
+        assert result.type == "delta"

        # Verify no reference cleanup (since it didn't exist)
-        assert "cleaned_reference" not in result
+        assert result.cleaned_reference is None

        # Only delta should be deleted
        assert mock_storage.delete.call_count == 1
@@ -395,7 +395,7 @@ class TestSingleDeleteCleanup:
        result = service.delete(ObjectKey(bucket="test-bucket", key="releases/1.0/app.zip.delta"))

        # Should clean up only 1.0/reference.bin
-        assert result["cleaned_reference"] == "releases/1.0/reference.bin"
+        assert result.cleaned_reference == "releases/1.0/reference.bin"

        # Verify correct files deleted
        delete_calls = [call[0][0] for call in mock_storage.delete.call_args_list]
@@ -436,9 +436,9 @@ class TestRecursiveDeleteCleanup:
        result = service.delete_recursive("test-bucket", "data/")

        # Should delete both delta and reference
-        assert result["deleted_count"] == 2
-        assert result["deltas_deleted"] == 1
-        assert result["references_deleted"] == 1
+        assert result.deleted_count == 2
+        assert result.deltas_deleted == 1
+        assert result.references_deleted == 1


 if __name__ == "__main__":
--- a/tests/integration/test_recursive_delete_reference_cleanup.py
+++ b/tests/integration/test_recursive_delete_reference_cleanup.py
@@ -5,6 +5,7 @@ from unittest.mock import Mock, patch
 import pytest

 from deltaglider.app.cli.main import create_service
+from deltaglider.core.models import RecursiveDeleteResult
 from deltaglider.ports.storage import ObjectHead


@@ -28,10 +29,10 @@ class TestRecursiveDeleteReferenceCleanup:

        result = service.delete_recursive("test-bucket", "nonexistent/")

-        assert result["deleted_count"] == 0
-        assert result["failed_count"] == 0
-        assert isinstance(result["errors"], list)
-        assert isinstance(result["warnings"], list)
+        assert result.deleted_count == 0
+        assert result.failed_count == 0
+        assert isinstance(result.errors, list)
+        assert isinstance(result.warnings, list)

    def test_delete_recursive_returns_structured_result(self):
        """Test that delete_recursive returns a properly structured result."""
@@ -57,26 +58,22 @@ class TestRecursiveDeleteReferenceCleanup:

        result = service.delete_recursive("test-bucket", "test/")

-        # Verify structure
-        required_keys = [
-            "bucket",
-            "prefix",
-            "deleted_count",
-            "failed_count",
-            "deltas_deleted",
-            "references_deleted",
-            "direct_deleted",
-            "other_deleted",
-            "errors",
-            "warnings",
-        ]
-        for key in required_keys:
-            assert key in result, f"Missing key: {key}"
+        # Verify structure - result is a RecursiveDeleteResult dataclass
+        assert hasattr(result, "bucket")
+        assert hasattr(result, "prefix")
+        assert hasattr(result, "deleted_count")
+        assert hasattr(result, "failed_count")
+        assert hasattr(result, "deltas_deleted")
+        assert hasattr(result, "references_deleted")
+        assert hasattr(result, "direct_deleted")
+        assert hasattr(result, "other_deleted")
+        assert hasattr(result, "errors")
+        assert hasattr(result, "warnings")

-        assert isinstance(result["deleted_count"], int)
-        assert isinstance(result["failed_count"], int)
-        assert isinstance(result["errors"], list)
-        assert isinstance(result["warnings"], list)
+        assert isinstance(result.deleted_count, int)
+        assert isinstance(result.failed_count, int)
+        assert isinstance(result.errors, list)
+        assert isinstance(result.warnings, list)

    def test_delete_recursive_categorizes_objects_correctly(self):
        """Test that delete_recursive correctly categorizes different object types."""
@@ -117,12 +114,12 @@ class TestRecursiveDeleteReferenceCleanup:
        result = service.delete_recursive("test-bucket", "test/")

        # Should categorize correctly - the exact categorization depends on implementation
-        assert result["deltas_deleted"] == 1  # app.zip.delta
-        assert result["references_deleted"] == 1  # reference.bin
+        assert result.deltas_deleted == 1  # app.zip.delta
+        assert result.references_deleted == 1  # reference.bin
        # Direct and other files may be categorized differently based on metadata detection
-        assert result["direct_deleted"] + result["other_deleted"] == 2  # readme.txt + config.json
-        assert result["deleted_count"] == 4  # total
-        assert result["failed_count"] == 0
+        assert result.direct_deleted + result.other_deleted == 2  # readme.txt + config.json
+        assert result.deleted_count == 4  # total
+        assert result.failed_count == 0

    def test_delete_recursive_handles_storage_errors_gracefully(self):
        """Test that delete_recursive handles individual storage errors gracefully."""
@@ -151,10 +148,10 @@ class TestRecursiveDeleteReferenceCleanup:
        result = service.delete_recursive("test-bucket", "test/")

        # Should handle partial failure
-        assert result["deleted_count"] == 1  # good.zip.delta succeeded
-        assert result["failed_count"] == 1  # bad.zip.delta failed
-        assert len(result["errors"]) == 1
-        assert "bad" in result["errors"][0]
+        assert result.deleted_count == 1  # good.zip.delta succeeded
+        assert result.failed_count == 1  # bad.zip.delta failed
+        assert len(result.errors) == 1
+        assert "bad" in result.errors[0]

    def test_affected_deltaspaces_discovery(self):
        """Test that the system discovers affected deltaspaces when deleting deltas."""
@@ -206,8 +203,8 @@ class TestRecursiveDeleteReferenceCleanup:
        result = service.delete_recursive("test-bucket", "project/team-a/v1/")

        # Should have discovered and evaluated the parent reference
-        assert result["deleted_count"] >= 1  # At least the delta file
-        assert result["failed_count"] == 0
+        assert result.deleted_count >= 1  # At least the delta file
+        assert result.failed_count == 0

    def test_cli_uses_core_service_method(self):
        """Test that CLI rm -r command uses the core service delete_recursive method."""
@@ -222,14 +219,12 @@ class TestRecursiveDeleteReferenceCleanup:
            mock_create_service.return_value = mock_service

            # Mock successful deletion
-            mock_service.delete_recursive.return_value = {
-                "bucket": "test-bucket",
-                "prefix": "test/",
-                "deleted_count": 2,
-                "failed_count": 0,
-                "warnings": [],
-                "errors": [],
-            }
+            mock_service.delete_recursive.return_value = RecursiveDeleteResult(
+                bucket="test-bucket",
+                prefix="test/",
+                deleted_count=2,
+                failed_count=0,
+            )

            result = runner.invoke(cli, ["rm", "-r", "s3://test-bucket/test/"])

@@ -294,8 +289,8 @@ class TestRecursiveDeleteReferenceCleanup:

        result = service.delete(ObjectKey(bucket="test-bucket", key="test/file.zip.delta"))

-        assert result["deleted"]
-        assert result["type"] == "delta"
+        assert result.deleted
+        assert result.type == "delta"

    def test_reference_cleanup_intelligence_basic(self):
        """Basic test to verify reference cleanup intelligence is working."""
@@ -328,10 +323,10 @@ class TestRecursiveDeleteReferenceCleanup:
        result = service.delete_recursive("test-bucket", "simple/")

        # Should delete both delta and reference since there are no other dependencies
-        assert result["deleted_count"] == 2
-        assert result["deltas_deleted"] == 1
-        assert result["references_deleted"] == 1
-        assert result["failed_count"] == 0
+        assert result.deleted_count == 2
+        assert result.deltas_deleted == 1
+        assert result.references_deleted == 1
+        assert result.failed_count == 0

    def test_comprehensive_result_validation(self):
        """Test that all result fields are properly populated."""
@@ -366,31 +361,31 @@ class TestRecursiveDeleteReferenceCleanup:
        result = service.delete_recursive("test-bucket", "mixed/")

        # Validate all expected fields are present and have correct types
-        assert isinstance(result["bucket"], str)
-        assert isinstance(result["prefix"], str)
-        assert isinstance(result["deleted_count"], int)
-        assert isinstance(result["failed_count"], int)
-        assert isinstance(result["deltas_deleted"], int)
-        assert isinstance(result["references_deleted"], int)
-        assert isinstance(result["direct_deleted"], int)
-        assert isinstance(result["other_deleted"], int)
-        assert isinstance(result["errors"], list)
-        assert isinstance(result["warnings"], list)
+        assert isinstance(result.bucket, str)
+        assert isinstance(result.prefix, str)
+        assert isinstance(result.deleted_count, int)
+        assert isinstance(result.failed_count, int)
+        assert isinstance(result.deltas_deleted, int)
+        assert isinstance(result.references_deleted, int)
+        assert isinstance(result.direct_deleted, int)
+        assert isinstance(result.other_deleted, int)
+        assert isinstance(result.errors, list)
+        assert isinstance(result.warnings, list)

        # Validate counts add up
        total_by_type = (
-            result["deltas_deleted"]
-            + result["references_deleted"]
-            + result["direct_deleted"]
-            + result["other_deleted"]
+            result.deltas_deleted
+            + result.references_deleted
+            + result.direct_deleted
+            + result.other_deleted
        )
-        assert result["deleted_count"] == total_by_type
+        assert result.deleted_count == total_by_type

        # Validate specific counts for this scenario
-        assert result["deltas_deleted"] == 1
-        assert result["references_deleted"] == 1
+        assert result.deltas_deleted == 1
+        assert result.references_deleted == 1
        # Direct and other files may be categorized differently
-        assert result["direct_deleted"] + result["other_deleted"] == 2
+        assert result.direct_deleted + result.other_deleted == 2


 if __name__ == "__main__":
--- a/tests/integration/test_s3_migration.py
+++ b/tests/integration/test_s3_migration.py
@@ -0,0 +1,271 @@
+"""Test S3-to-S3 migration functionality."""
+
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from deltaglider.app.cli.aws_compat import migrate_s3_to_s3
+from deltaglider.core import DeltaService
+from deltaglider.ports import ObjectHead
+
+
+@pytest.fixture
+def mock_service():
+    """Create a mock DeltaService."""
+    service = MagicMock(spec=DeltaService)
+    service.storage = MagicMock()
+    return service
+
+
+def test_migrate_s3_to_s3_with_resume(mock_service):
+    """Test migration with resume support (skips existing files)."""
+    # Setup mock storage with source files
+    source_objects = [
+        ObjectHead(
+            key="file1.zip",
+            size=1024,
+            etag="abc123",
+            last_modified="2024-01-01T00:00:00Z",
+            metadata={},
+        ),
+        ObjectHead(
+            key="file2.zip",
+            size=2048,
+            etag="def456",
+            last_modified="2024-01-01T00:00:00Z",
+            metadata={},
+        ),
+        ObjectHead(
+            key="subdir/file3.zip",
+            size=512,
+            etag="ghi789",
+            last_modified="2024-01-01T00:00:00Z",
+            metadata={},
+        ),
+    ]
+
+    # Destination already has file1.zip (as .delta)
+    dest_objects = [
+        ObjectHead(
+            key="file1.zip.delta",
+            size=100,
+            last_modified="2024-01-02T00:00:00Z",
+            etag="delta123",
+            metadata={},
+        ),
+    ]
+
+    # Configure mock to return appropriate objects
+    def list_side_effect(prefix):
+        if "source-bucket" in prefix:
+            return iter(source_objects)
+        elif "dest-bucket" in prefix:
+            return iter(dest_objects)
+        return iter([])
+
+    mock_service.storage.list.side_effect = list_side_effect
+
+    # Mock the copy operation and click functions
+    # Use quiet=True to skip EC2 detection logging
+    with patch("deltaglider.app.cli.aws_compat.copy_s3_to_s3") as mock_copy:
+        with patch("deltaglider.app.cli.aws_compat.click.confirm", return_value=True):
+            migrate_s3_to_s3(
+                mock_service,
+                "s3://source-bucket/",
+                "s3://dest-bucket/",
+                exclude=None,
+                include=None,
+                quiet=True,  # Skip EC2 detection and logging
+                no_delta=False,
+                max_ratio=None,
+                dry_run=False,
+                skip_confirm=False,
+            )
+
+    # Should copy only file2.zip and subdir/file3.zip (file1 already exists)
+    assert mock_copy.call_count == 2
+
+    # Verify the files being migrated
+    call_args = [call[0] for call in mock_copy.call_args_list]
+    migrated_files = [(args[1], args[2]) for args in call_args]
+
+    assert ("s3://source-bucket/file2.zip", "s3://dest-bucket/file2.zip") in migrated_files
+    assert (
+        "s3://source-bucket/subdir/file3.zip",
+        "s3://dest-bucket/subdir/file3.zip",
+    ) in migrated_files
+
+
+def test_migrate_s3_to_s3_dry_run(mock_service):
+    """Test dry run mode shows what would be migrated without actually migrating."""
+    source_objects = [
+        ObjectHead(
+            key="file1.zip",
+            size=1024,
+            last_modified="2024-01-01T00:00:00Z",
+            etag="abc123",
+            metadata={},
+        ),
+    ]
+
+    mock_service.storage.list.return_value = iter(source_objects)
+
+    # Mock the copy operation and EC2 detection
+    with patch("deltaglider.app.cli.aws_compat.copy_s3_to_s3") as mock_copy:
+        with patch("deltaglider.app.cli.aws_compat.click.echo") as mock_echo:
+            with patch("deltaglider.app.cli.aws_compat.log_aws_region"):
+                migrate_s3_to_s3(
+                    mock_service,
+                    "s3://source-bucket/",
+                    "s3://dest-bucket/",
+                    exclude=None,
+                    include=None,
+                    quiet=False,  # Allow output to test dry run messages
+                    no_delta=False,
+                    max_ratio=None,
+                    dry_run=True,
+                    skip_confirm=False,
+                )
+
+    # Should not actually copy anything in dry run mode
+    mock_copy.assert_not_called()
+
+    # Should show dry run message
+    echo_calls = [str(call[0][0]) for call in mock_echo.call_args_list if call[0]]
+    assert any("DRY RUN MODE" in msg for msg in echo_calls)
+
+
+def test_migrate_s3_to_s3_with_filters(mock_service):
+    """Test migration with include/exclude filters."""
+    source_objects = [
+        ObjectHead(
+            key="file1.zip",
+            size=1024,
+            last_modified="2024-01-01T00:00:00Z",
+            etag="abc123",
+            metadata={},
+        ),
+        ObjectHead(
+            key="file2.log",
+            size=256,
+            last_modified="2024-01-01T00:00:00Z",
+            etag="def456",
+            metadata={},
+        ),
+        ObjectHead(
+            key="file3.tar",
+            size=512,
+            last_modified="2024-01-01T00:00:00Z",
+            etag="ghi789",
+            metadata={},
+        ),
+    ]
+
+    mock_service.storage.list.return_value = iter(source_objects)
+
+    # Mock the copy operation
+    with patch("deltaglider.app.cli.aws_compat.copy_s3_to_s3") as mock_copy:
+        with patch("click.echo"):
+            with patch("deltaglider.app.cli.aws_compat.click.confirm", return_value=True):
+                # Exclude .log files
+                migrate_s3_to_s3(
+                    mock_service,
+                    "s3://source-bucket/",
+                    "s3://dest-bucket/",
+                    exclude="*.log",
+                    include=None,
+                    quiet=True,  # Skip EC2 detection
+                    no_delta=False,
+                    max_ratio=None,
+                    dry_run=False,
+                    skip_confirm=False,
+                )
+
+    # Should copy file1.zip and file3.tar, but not file2.log
+    assert mock_copy.call_count == 2
+
+    call_args = [call[0] for call in mock_copy.call_args_list]
+    migrated_sources = [args[1] for args in call_args]
+
+    assert "s3://source-bucket/file1.zip" in migrated_sources
+    assert "s3://source-bucket/file3.tar" in migrated_sources
+    assert "s3://source-bucket/file2.log" not in migrated_sources
+
+
+def test_migrate_s3_to_s3_skip_confirm(mock_service):
+    """Test skipping confirmation prompt with skip_confirm=True."""
+    source_objects = [
+        ObjectHead(
+            key="file1.zip",
+            size=1024,
+            last_modified="2024-01-01T00:00:00Z",
+            etag="abc123",
+            metadata={},
+        ),
+    ]
+
+    mock_service.storage.list.return_value = iter(source_objects)
+
+    with patch("deltaglider.app.cli.aws_compat.copy_s3_to_s3") as mock_copy:
+        with patch("click.echo"):
+            with patch("deltaglider.app.cli.aws_compat.click.confirm") as mock_confirm:
+                migrate_s3_to_s3(
+                    mock_service,
+                    "s3://source-bucket/",
+                    "s3://dest-bucket/",
+                    exclude=None,
+                    include=None,
+                    quiet=True,  # Skip EC2 detection
+                    no_delta=False,
+                    max_ratio=None,
+                    dry_run=False,
+                    skip_confirm=True,  # Skip confirmation
+                )
+
+    # Should not ask for confirmation
+    mock_confirm.assert_not_called()
+
+    # Should still perform the copy
+    mock_copy.assert_called_once()
+
+
+def test_migrate_s3_to_s3_with_prefix(mock_service):
+    """Test migration with source and destination prefixes."""
+    source_objects = [
+        ObjectHead(
+            key="data/file1.zip",
+            size=1024,
+            last_modified="2024-01-01T00:00:00Z",
+            etag="abc123",
+            metadata={},
+        ),
+    ]
+
+    def list_side_effect(prefix):
+        if "source-bucket/data" in prefix:
+            return iter(source_objects)
+        return iter([])
+
+    mock_service.storage.list.side_effect = list_side_effect
+
+    with patch("deltaglider.app.cli.aws_compat.copy_s3_to_s3") as mock_copy:
+        with patch("click.echo"):
+            with patch("deltaglider.app.cli.aws_compat.click.confirm", return_value=True):
+                migrate_s3_to_s3(
+                    mock_service,
+                    "s3://source-bucket/data/",
+                    "s3://dest-bucket/archive/",
+                    exclude=None,
+                    include=None,
+                    quiet=True,  # Skip EC2 detection
+                    no_delta=False,
+                    max_ratio=None,
+                    dry_run=False,
+                    skip_confirm=False,
+                )
+
+    # Verify the correct destination path is used
+    mock_copy.assert_called_once()
+    call_args = mock_copy.call_args[0]
+    assert call_args[1] == "s3://source-bucket/data/file1.zip"
+    assert call_args[2] == "s3://dest-bucket/archive/file1.zip"
--- a/tests/integration/test_stats_command.py
+++ b/tests/integration/test_stats_command.py
@@ -50,7 +50,7 @@ class TestStatsCommand:

            # Verify client was called correctly
            mock_client.get_bucket_stats.assert_called_once_with(
-                "test-bucket", detailed_stats=False
+                "test-bucket", mode="quick", use_cache=True, refresh_cache=False
            )

    def test_stats_json_output_detailed(self):
@@ -79,7 +79,48 @@ class TestStatsCommand:
            assert output["average_compression_ratio"] == 0.95

            # Verify detailed flag was passed
-            mock_client.get_bucket_stats.assert_called_once_with("test-bucket", detailed_stats=True)
+            mock_client.get_bucket_stats.assert_called_once_with(
+                "test-bucket", mode="detailed", use_cache=True, refresh_cache=False
+            )
+
+    def test_stats_json_output_sampled(self):
+        """Test stats command with sampled JSON output."""
+        mock_stats = BucketStats(
+            bucket="test-bucket",
+            object_count=5,
+            total_size=2000000,
+            compressed_size=100000,
+            space_saved=1900000,
+            average_compression_ratio=0.95,
+            delta_objects=5,
+            direct_objects=0,
+        )
+
+        with patch("deltaglider.client.DeltaGliderClient") as mock_client_class:
+            mock_client = Mock()
+            mock_client.get_bucket_stats.return_value = mock_stats
+            mock_client_class.return_value = mock_client
+
+            runner = CliRunner()
+            result = runner.invoke(cli, ["stats", "test-bucket", "--sampled", "--json"])
+
+            assert result.exit_code == 0
+            mock_client.get_bucket_stats.assert_called_once_with(
+                "test-bucket", mode="sampled", use_cache=True, refresh_cache=False
+            )
+
+    def test_stats_sampled_and_detailed_conflict(self):
+        """--sampled and --detailed flags must be mutually exclusive."""
+
+        with patch("deltaglider.client.DeltaGliderClient") as mock_client_class:
+            mock_client = Mock()
+            mock_client_class.return_value = mock_client
+
+            runner = CliRunner()
+            result = runner.invoke(cli, ["stats", "test-bucket", "--sampled", "--detailed"])
+
+            assert result.exit_code == 1
+            assert "cannot be used together" in result.output

    def test_stats_human_readable_output(self):
        """Test stats command with human-readable output."""
@@ -156,7 +197,7 @@ class TestStatsCommand:
            assert result.exit_code == 0
            # Verify bucket name was parsed correctly from S3 URL
            mock_client.get_bucket_stats.assert_called_once_with(
-                "test-bucket", detailed_stats=False
+                "test-bucket", mode="quick", use_cache=True, refresh_cache=False
            )

    def test_stats_with_s3_url_trailing_slash(self):
@@ -183,7 +224,7 @@ class TestStatsCommand:
            assert result.exit_code == 0
            # Verify bucket name was parsed correctly from S3 URL with trailing slash
            mock_client.get_bucket_stats.assert_called_once_with(
-                "test-bucket", detailed_stats=False
+                "test-bucket", mode="quick", use_cache=True, refresh_cache=False
            )

    def test_stats_with_s3_url_with_prefix(self):
@@ -210,5 +251,5 @@ class TestStatsCommand:
            assert result.exit_code == 0
            # Verify only bucket name was extracted, prefix ignored
            mock_client.get_bucket_stats.assert_called_once_with(
-                "test-bucket", detailed_stats=False
+                "test-bucket", mode="quick", use_cache=True, refresh_cache=False
            )
--- a/tests/unit/test_core_service.py
+++ b/tests/unit/test_core_service.py
@@ -50,10 +50,10 @@ class TestDeltaServicePut:
        ref_sha = service.hasher.sha256(io.BytesIO(ref_content))

        ref_metadata = {
-            "tool": "deltaglider/0.1.0",
-            "source_name": "original.zip",
-            "file_sha256": ref_sha,
-            "created_at": "2025-01-01T00:00:00Z",
+            "dg-tool": "deltaglider/0.1.0",
+            "dg-source-name": "original.zip",
+            "dg-file-sha256": ref_sha,
+            "dg-created-at": "2025-01-01T00:00:00Z",
        }
        mock_storage.head.return_value = ObjectHead(
            key="test/prefix/reference.bin",
@@ -98,7 +98,7 @@ class TestDeltaServicePut:
        ref_sha = service.hasher.sha256(io.BytesIO(ref_content))

        ref_metadata = {
-            "file_sha256": ref_sha,
+            "dg-file-sha256": ref_sha,
        }
        mock_storage.head.return_value = ObjectHead(
            key="test/prefix/reference.bin",
@@ -200,15 +200,15 @@ class TestDeltaServiceVerify:
        ref_sha = service.hasher.sha256(io.BytesIO(ref_content))

        delta_metadata = {
-            "tool": "deltaglider/0.1.0",
-            "original_name": "file.zip",
-            "file_sha256": test_sha,
-            "file_size": str(len(test_content)),
-            "created_at": "2025-01-01T00:00:00Z",
-            "ref_key": "test/reference.bin",
-            "ref_sha256": ref_sha,
-            "delta_size": "100",
-            "delta_cmd": "xdelta3 -e -9 -s reference.bin file.zip file.zip.delta",
+            "dg-tool": "deltaglider/0.1.0",
+            "dg-original-name": "file.zip",
+            "dg-file-sha256": test_sha,
+            "dg-file-size": str(len(test_content)),
+            "dg-created-at": "2025-01-01T00:00:00Z",
+            "dg-ref-key": "test/reference.bin",
+            "dg-ref-sha256": ref_sha,
+            "dg-delta-size": "100",
+            "dg-delta-cmd": "xdelta3 -e -9 -s reference.bin file.zip file.zip.delta",
        }
        mock_storage.head.return_value = ObjectHead(
            key="test/file.zip.delta",
--- a/tests/unit/test_delta_extensions.py
+++ b/tests/unit/test_delta_extensions.py
@@ -0,0 +1,25 @@
+"""Tests for shared delta extension policy."""
+
+from deltaglider.core.delta_extensions import (
+    DEFAULT_COMPOUND_DELTA_EXTENSIONS,
+    DEFAULT_DELTA_EXTENSIONS,
+    is_delta_candidate,
+)
+
+
+def test_is_delta_candidate_matches_default_extensions():
+    """All default extensions should be detected as delta candidates."""
+    for ext in DEFAULT_DELTA_EXTENSIONS:
+        assert is_delta_candidate(f"file{ext}")
+
+
+def test_is_delta_candidate_matches_compound_extensions():
+    """Compound extensions should be handled even with multiple suffixes."""
+    for ext in DEFAULT_COMPOUND_DELTA_EXTENSIONS:
+        assert is_delta_candidate(f"file{ext}")
+
+
+def test_is_delta_candidate_rejects_other_extensions():
+    """Non delta-friendly extensions should return False."""
+    assert not is_delta_candidate("document.txt")
+    assert not is_delta_candidate("image.jpeg")
--- a/tests/unit/test_object_listing.py
+++ b/tests/unit/test_object_listing.py
@@ -0,0 +1,112 @@
+"""Unit tests for object_listing pagination."""
+
+from unittest.mock import Mock
+
+from deltaglider.core.object_listing import list_all_objects, list_objects_page
+
+
+def test_list_objects_page_passes_continuation_token():
+    """Test that list_objects_page passes continuation_token to storage."""
+    storage = Mock()
+    storage.list_objects.return_value = {
+        "objects": [],
+        "common_prefixes": [],
+        "is_truncated": False,
+        "next_continuation_token": None,
+        "key_count": 0,
+    }
+
+    list_objects_page(
+        storage,
+        bucket="test-bucket",
+        continuation_token="test-token",
+    )
+
+    # Verify continuation_token was passed
+    storage.list_objects.assert_called_once()
+    call_kwargs = storage.list_objects.call_args.kwargs
+    assert call_kwargs["continuation_token"] == "test-token"
+
+
+def test_list_all_objects_uses_continuation_token_for_pagination():
+    """Test that list_all_objects uses continuation_token (not start_after) for pagination."""
+    storage = Mock()
+
+    # Mock 3 pages of results
+    responses = [
+        {
+            "objects": [{"key": f"obj{i}"} for i in range(1000)],
+            "common_prefixes": [],
+            "is_truncated": True,
+            "next_continuation_token": "token1",
+            "key_count": 1000,
+        },
+        {
+            "objects": [{"key": f"obj{i}"} for i in range(1000, 2000)],
+            "common_prefixes": [],
+            "is_truncated": True,
+            "next_continuation_token": "token2",
+            "key_count": 1000,
+        },
+        {
+            "objects": [{"key": f"obj{i}"} for i in range(2000, 2500)],
+            "common_prefixes": [],
+            "is_truncated": False,
+            "next_continuation_token": None,
+            "key_count": 500,
+        },
+    ]
+
+    storage.list_objects.side_effect = responses
+
+    result = list_all_objects(
+        storage,
+        bucket="test-bucket",
+        prefix="",
+    )
+
+    # Should have made 3 calls
+    assert storage.list_objects.call_count == 3
+
+    # Should have collected all objects
+    assert len(result.objects) == 2500
+
+    # Should not be truncated
+    assert not result.is_truncated
+
+    # Verify the calls used continuation_token correctly
+    calls = storage.list_objects.call_args_list
+    assert len(calls) == 3
+
+    # First call should have no continuation_token
+    assert calls[0].kwargs.get("continuation_token") is None
+
+    # Second call should use token1
+    assert calls[1].kwargs.get("continuation_token") == "token1"
+
+    # Third call should use token2
+    assert calls[2].kwargs.get("continuation_token") == "token2"
+
+
+def test_list_all_objects_prevents_infinite_loop():
+    """Test that list_all_objects has max_iterations protection."""
+    storage = Mock()
+
+    # Mock infinite pagination (always returns more)
+    storage.list_objects.return_value = {
+        "objects": [{"key": "obj"}],
+        "common_prefixes": [],
+        "is_truncated": True,
+        "next_continuation_token": "token",
+        "key_count": 1,
+    }
+
+    result = list_all_objects(
+        storage,
+        bucket="test-bucket",
+        max_iterations=10,  # Low limit for testing
+    )
+
+    # Should stop at max_iterations
+    assert storage.list_objects.call_count == 10
+    assert result.is_truncated
--- a/tests/unit/test_s3_uri.py
+++ b/tests/unit/test_s3_uri.py
@@ -0,0 +1,44 @@
+"""Tests for S3 URI helpers."""
+
+import pytest
+
+from deltaglider.core.s3_uri import build_s3_url, is_s3_url, parse_s3_url
+
+
+def test_is_s3_url_detects_scheme() -> None:
+    """is_s3_url should only match the S3 scheme."""
+    assert is_s3_url("s3://bucket/path")
+    assert not is_s3_url("https://example.com/object")
+
+
+def test_parse_s3_url_returns_bucket_and_key() -> None:
+    """Parsing should split bucket and key correctly."""
+    parsed = parse_s3_url("s3://my-bucket/path/to/object.txt")
+    assert parsed.bucket == "my-bucket"
+    assert parsed.key == "path/to/object.txt"
+
+
+def test_parse_strips_trailing_slash_when_requested() -> None:
+    """strip_trailing_slash should normalise directory-style URLs."""
+    parsed = parse_s3_url("s3://my-bucket/path/to/", strip_trailing_slash=True)
+    assert parsed.bucket == "my-bucket"
+    assert parsed.key == "path/to"
+
+
+def test_parse_requires_key_when_configured() -> None:
+    """allow_empty_key=False should reject bucket-only URLs."""
+    with pytest.raises(ValueError):
+        parse_s3_url("s3://bucket-only", allow_empty_key=False)
+
+
+def test_build_s3_url_round_trip() -> None:
+    """build_s3_url should round-trip with parse_s3_url."""
+    url = build_s3_url("bucket", "dir/file.tar")
+    parsed = parse_s3_url(url)
+    assert parsed.bucket == "bucket"
+    assert parsed.key == "dir/file.tar"
+
+
+def test_build_s3_url_for_bucket_root() -> None:
+    """When key is missing, build_s3_url should omit the trailing slash."""
+    assert build_s3_url("root-bucket") == "s3://root-bucket"
--- a/tests/unit/test_stats_algorithm.py
+++ b/tests/unit/test_stats_algorithm.py
@@ -0,0 +1,479 @@
+"""Exhaustive tests for the bucket statistics algorithm."""
+
+from unittest.mock import MagicMock, Mock, patch
+
+import pytest
+
+from deltaglider.client_operations.stats import get_bucket_stats
+
+
+class TestBucketStatsAlgorithm:
+    """Test suite for get_bucket_stats algorithm."""
+
+    @pytest.fixture
+    def mock_client(self):
+        """Create a mock DeltaGliderClient."""
+        client = Mock()
+        client.service = Mock()
+        client.service.storage = Mock()
+        client.service.logger = Mock()
+        return client
+
+    def test_empty_bucket(self, mock_client):
+        """Test statistics for an empty bucket."""
+        # Setup: Empty bucket
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [],
+            "is_truncated": False,
+        }
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "empty-bucket")
+
+        # Verify
+        assert stats.bucket == "empty-bucket"
+        assert stats.object_count == 0
+        assert stats.total_size == 0
+        assert stats.compressed_size == 0
+        assert stats.space_saved == 0
+        assert stats.average_compression_ratio == 0.0
+        assert stats.delta_objects == 0
+        assert stats.direct_objects == 0
+
+    def test_bucket_with_only_direct_files(self, mock_client):
+        """Test bucket with only direct files (no compression)."""
+        # Setup: Bucket with 3 direct files
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "file1.pdf", "size": 1000000, "last_modified": "2024-01-01"},
+                {"key": "file2.html", "size": 500000, "last_modified": "2024-01-02"},
+                {"key": "file3.txt", "size": 250000, "last_modified": "2024-01-03"},
+            ],
+            "is_truncated": False,
+        }
+        mock_client.service.storage.head.return_value = None
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "direct-only-bucket")
+
+        # Verify
+        assert stats.object_count == 3
+        assert stats.total_size == 1750000  # Sum of all files
+        assert stats.compressed_size == 1750000  # Same as total (no compression)
+        assert stats.space_saved == 0
+        assert stats.average_compression_ratio == 0.0
+        assert stats.delta_objects == 0
+        assert stats.direct_objects == 3
+
+    def test_bucket_with_delta_compression(self, mock_client):
+        """Test bucket with delta-compressed files."""
+        # Setup: Bucket with reference.bin and 2 delta files
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "reference.bin", "size": 20000000, "last_modified": "2024-01-01"},
+                {"key": "file1.zip.delta", "size": 50000, "last_modified": "2024-01-02"},
+                {"key": "file2.zip.delta", "size": 60000, "last_modified": "2024-01-03"},
+            ],
+            "is_truncated": False,
+        }
+
+        # Mock metadata for delta files
+        def mock_head(path):
+            if "file1.zip.delta" in path:
+                head = Mock()
+                head.metadata = {"dg-file-size": "19500000", "compression_ratio": "0.997"}
+                return head
+            elif "file2.zip.delta" in path:
+                head = Mock()
+                head.metadata = {"dg-file-size": "19600000", "compression_ratio": "0.997"}
+                return head
+            return None
+
+        mock_client.service.storage.head.side_effect = mock_head
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "compressed-bucket", mode="detailed")
+
+        # Verify
+        assert stats.object_count == 2  # Only delta files counted (not reference.bin)
+        assert stats.total_size == 39100000  # 19.5M + 19.6M
+        assert stats.compressed_size == 20110000  # reference (20M) + deltas (50K + 60K)
+        assert stats.space_saved == 18990000  # ~19MB saved
+        assert stats.average_compression_ratio > 0.48  # ~48.6% compression
+        assert stats.delta_objects == 2
+        assert stats.direct_objects == 0
+
+    def test_orphaned_reference_bin_detection(self, mock_client):
+        """Test detection of orphaned reference.bin files."""
+        # Setup: Bucket with reference.bin but no delta files
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "reference.bin", "size": 20000000, "last_modified": "2024-01-01"},
+                {"key": "regular.pdf", "size": 1000000, "last_modified": "2024-01-02"},
+            ],
+            "is_truncated": False,
+        }
+        mock_client.service.storage.head.return_value = None
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "orphaned-ref-bucket")
+
+        # Verify stats
+        assert stats.object_count == 1  # Only regular.pdf
+        assert stats.total_size == 1000000  # Only regular.pdf size
+        assert stats.compressed_size == 1000000  # reference.bin NOT included
+        assert stats.space_saved == 0
+        assert stats.delta_objects == 0
+        assert stats.direct_objects == 1
+
+        # Verify warning was logged
+        warning_calls = mock_client.service.logger.warning.call_args_list
+        assert any("ORPHANED REFERENCE FILE" in str(call) for call in warning_calls)
+        assert any("20,000,000 bytes" in str(call) for call in warning_calls)
+        assert any(
+            "aws s3 rm s3://orphaned-ref-bucket/reference.bin" in str(call)
+            for call in warning_calls
+        )
+
+    def test_mixed_bucket(self, mock_client):
+        """Test bucket with both delta and direct files."""
+        # Setup: Mixed bucket
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "pro/reference.bin", "size": 20000000, "last_modified": "2024-01-01"},
+                {"key": "pro/v1.zip.delta", "size": 50000, "last_modified": "2024-01-02"},
+                {"key": "pro/v2.zip.delta", "size": 60000, "last_modified": "2024-01-03"},
+                {"key": "docs/readme.pdf", "size": 500000, "last_modified": "2024-01-04"},
+                {"key": "docs/manual.html", "size": 300000, "last_modified": "2024-01-05"},
+            ],
+            "is_truncated": False,
+        }
+
+        # Mock metadata for delta files
+        def mock_head(path):
+            if "v1.zip.delta" in path:
+                head = Mock()
+                head.metadata = {"dg-file-size": "19500000"}
+                return head
+            elif "v2.zip.delta" in path:
+                head = Mock()
+                head.metadata = {"dg-file-size": "19600000"}
+                return head
+            return None
+
+        mock_client.service.storage.head.side_effect = mock_head
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "mixed-bucket", mode="detailed")
+
+        # Verify
+        assert stats.object_count == 4  # 2 delta + 2 direct files
+        assert stats.total_size == 39900000  # 19.5M + 19.6M + 0.5M + 0.3M
+        assert stats.compressed_size == 20910000  # ref (20M) + deltas (110K) + direct (800K)
+        assert stats.space_saved == 18990000
+        assert stats.delta_objects == 2
+        assert stats.direct_objects == 2
+
+    def test_sha1_files_included(self, mock_client):
+        """Test that .sha1 checksum files are counted properly."""
+        # Setup: Bucket with .sha1 files
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "file1.zip", "size": 1000000, "last_modified": "2024-01-01"},
+                {"key": "file1.zip.sha1", "size": 41, "last_modified": "2024-01-01"},
+                {"key": "file2.tar", "size": 2000000, "last_modified": "2024-01-02"},
+                {"key": "file2.tar.sha1", "size": 41, "last_modified": "2024-01-02"},
+            ],
+            "is_truncated": False,
+        }
+        mock_client.service.storage.head.return_value = None
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "sha1-bucket")
+
+        # Verify - .sha1 files ARE counted
+        assert stats.object_count == 4
+        assert stats.total_size == 3000082  # All files including .sha1
+        assert stats.compressed_size == 3000082
+        assert stats.direct_objects == 4
+
+    def test_multiple_deltaspaces(self, mock_client):
+        """Test bucket with multiple deltaspaces (different prefixes)."""
+        # Setup: Multiple deltaspaces
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "pro/reference.bin", "size": 20000000, "last_modified": "2024-01-01"},
+                {"key": "pro/v1.zip.delta", "size": 50000, "last_modified": "2024-01-02"},
+                {
+                    "key": "enterprise/reference.bin",
+                    "size": 25000000,
+                    "last_modified": "2024-01-03",
+                },
+                {"key": "enterprise/v1.zip.delta", "size": 70000, "last_modified": "2024-01-04"},
+            ],
+            "is_truncated": False,
+        }
+
+        # Mock metadata
+        def mock_head(path):
+            if "pro/v1.zip.delta" in path:
+                head = Mock()
+                head.metadata = {"dg-file-size": "19500000"}
+                return head
+            elif "enterprise/v1.zip.delta" in path:
+                head = Mock()
+                head.metadata = {"dg-file-size": "24500000"}
+                return head
+            return None
+
+        mock_client.service.storage.head.side_effect = mock_head
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "multi-deltaspace-bucket", mode="detailed")
+
+        # Verify
+        assert stats.object_count == 2  # Only delta files
+        assert stats.total_size == 44000000  # 19.5M + 24.5M
+        assert stats.compressed_size == 45120000  # Both references + both deltas
+        assert stats.delta_objects == 2
+        assert stats.direct_objects == 0
+
+    def test_pagination_handling(self, mock_client):
+        """Test handling of paginated results."""
+        # Setup: Paginated responses
+        mock_client.service.storage.list_objects.side_effect = [
+            {
+                "objects": [
+                    {"key": f"file{i}.txt", "size": 1000, "last_modified": "2024-01-01"}
+                    for i in range(1000)
+                ],
+                "is_truncated": True,
+                "next_continuation_token": "token1",
+            },
+            {
+                "objects": [
+                    {"key": f"file{i}.txt", "size": 1000, "last_modified": "2024-01-01"}
+                    for i in range(1000, 1500)
+                ],
+                "is_truncated": False,
+            },
+        ]
+        mock_client.service.storage.head.return_value = None
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "paginated-bucket")
+
+        # Verify
+        assert stats.object_count == 1500
+        assert stats.total_size == 1500000
+        assert stats.compressed_size == 1500000
+        assert stats.direct_objects == 1500
+
+        # Verify pagination was handled
+        assert mock_client.service.storage.list_objects.call_count == 2
+
+    def test_delta_file_without_metadata(self, mock_client):
+        """Test handling of delta files with missing metadata in quick mode."""
+        # Setup: Delta file without metadata
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "reference.bin", "size": 20000000, "last_modified": "2024-01-01"},
+                {"key": "file.zip.delta", "size": 50000, "last_modified": "2024-01-02"},
+            ],
+            "is_truncated": False,
+        }
+
+        # No metadata available (quick mode doesn't fetch metadata)
+        mock_client.service.storage.head.return_value = None
+
+        # Execute in quick mode (default)
+        stats = get_bucket_stats(mock_client, "no-metadata-bucket", mode="quick")
+
+        # Verify - without metadata, original size cannot be calculated
+        assert stats.object_count == 1
+        assert stats.total_size == 0  # Cannot calculate without metadata
+        assert stats.compressed_size == 20050000  # reference + delta
+        assert stats.space_saved == 0  # Cannot calculate without metadata
+        assert stats.delta_objects == 1
+
+        # Verify warning was logged about incomplete stats in quick mode
+        warning_calls = mock_client.service.logger.warning.call_args_list
+        assert any("Quick mode cannot calculate" in str(call) for call in warning_calls)
+
+    def test_parallel_metadata_fetching(self, mock_client):
+        """Test that metadata is fetched in parallel for performance."""
+        # Setup: Many delta files
+        num_deltas = 50
+        objects = [{"key": "reference.bin", "size": 20000000, "last_modified": "2024-01-01"}]
+        objects.extend(
+            [
+                {
+                    "key": f"file{i}.zip.delta",
+                    "size": 50000 + i,
+                    "last_modified": f"2024-01-{i + 2:02d}",
+                }
+                for i in range(num_deltas)
+            ]
+        )
+
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": objects,
+            "is_truncated": False,
+        }
+
+        # Mock metadata
+        def mock_head(path):
+            head = Mock()
+            head.metadata = {"dg-file-size": "19500000"}
+            return head
+
+        mock_client.service.storage.head.side_effect = mock_head
+
+        # Execute with mocked ThreadPoolExecutor
+        with patch("concurrent.futures.ThreadPoolExecutor") as mock_executor:
+            mock_pool = MagicMock()
+            mock_executor.return_value.__enter__.return_value = mock_pool
+
+            # Simulate parallel execution
+            futures = []
+            for i in range(num_deltas):
+                future = Mock()
+                future.result.return_value = (f"file{i}.zip.delta", {"dg-file-size": "19500000"})
+                futures.append(future)
+
+            mock_pool.submit.side_effect = futures
+            patch_as_completed = patch(
+                "concurrent.futures.as_completed",
+                return_value=futures,
+            )
+
+            with patch_as_completed:
+                _ = get_bucket_stats(mock_client, "parallel-bucket", mode="detailed")
+
+        # Verify ThreadPoolExecutor was used with correct max_workers
+        mock_executor.assert_called_once_with(max_workers=10)  # min(10, 50) = 10
+
+    def test_stats_modes_control_metadata_fetch(self, mock_client):
+        """Metadata fetching should depend on the selected stats mode."""
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "alpha/reference.bin", "size": 100, "last_modified": "2024-01-01"},
+                {"key": "alpha/file1.zip.delta", "size": 10, "last_modified": "2024-01-02"},
+                {"key": "alpha/file2.zip.delta", "size": 12, "last_modified": "2024-01-03"},
+                {"key": "beta/reference.bin", "size": 200, "last_modified": "2024-01-04"},
+                {"key": "beta/file1.zip.delta", "size": 20, "last_modified": "2024-01-05"},
+            ],
+            "is_truncated": False,
+        }
+
+        metadata_by_key = {
+            "alpha/file1.zip.delta": {"dg-file-size": "100", "compression_ratio": "0.9"},
+            "alpha/file2.zip.delta": {"dg-file-size": "120", "compression_ratio": "0.88"},
+            "beta/file1.zip.delta": {"dg-file-size": "210", "compression_ratio": "0.9"},
+        }
+
+        def mock_head(path: str):
+            for key, metadata in metadata_by_key.items():
+                if key in path:
+                    head = Mock()
+                    head.metadata = metadata
+                    return head
+            return None
+
+        mock_client.service.storage.head.side_effect = mock_head
+
+        # Quick mode: no metadata fetch
+        _ = get_bucket_stats(mock_client, "mode-test", mode="quick")
+        assert mock_client.service.storage.head.call_count == 0
+
+        # Sampled mode: one HEAD per delta-space (alpha, beta)
+        mock_client.service.storage.head.reset_mock()
+        stats_sampled = get_bucket_stats(mock_client, "mode-test", mode="sampled")
+        assert mock_client.service.storage.head.call_count == 2
+
+        # Detailed mode: HEAD for every delta (3 total)
+        mock_client.service.storage.head.reset_mock()
+        stats_detailed = get_bucket_stats(mock_client, "mode-test", mode="detailed")
+        assert mock_client.service.storage.head.call_count == 3
+
+        # Sampled totals should be close to detailed but not identical
+        assert stats_detailed.total_size == 100 + 120 + 210
+        assert stats_sampled.total_size == 100 + 100 + 210
+
+    def test_error_handling_in_metadata_fetch(self, mock_client):
+        """Test graceful handling of errors during metadata fetch."""
+        # Setup
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "reference.bin", "size": 20000000, "last_modified": "2024-01-01"},
+                {"key": "file1.zip.delta", "size": 50000, "last_modified": "2024-01-02"},
+                {"key": "file2.zip.delta", "size": 60000, "last_modified": "2024-01-03"},
+            ],
+            "is_truncated": False,
+        }
+
+        # Mock metadata fetch to fail for one file
+        def mock_head(path):
+            if "file1.zip.delta" in path:
+                raise Exception("S3 error")
+            elif "file2.zip.delta" in path:
+                head = Mock()
+                head.metadata = {"dg-file-size": "19600000"}
+                return head
+            return None
+
+        mock_client.service.storage.head.side_effect = mock_head
+
+        # Execute - should handle error gracefully
+        stats = get_bucket_stats(mock_client, "error-bucket", mode="detailed")
+
+        # Verify - file1 has no metadata (error), file2 uses metadata
+        assert stats.object_count == 2
+        assert stats.delta_objects == 2
+        # file1 has no metadata so not counted in original size, file2 uses metadata (19600000)
+        assert stats.total_size == 19600000
+
+        # Verify warning was logged for file1
+        warning_calls = mock_client.service.logger.warning.call_args_list
+        assert any(
+            "file1.zip.delta" in str(call) and "no original_size metadata" in str(call)
+            for call in warning_calls
+        )
+
+    def test_multiple_orphaned_references(self, mock_client):
+        """Test detection of multiple orphaned reference.bin files."""
+        # Setup: Multiple orphaned references
+        mock_client.service.storage.list_objects.return_value = {
+            "objects": [
+                {"key": "pro/reference.bin", "size": 20000000, "last_modified": "2024-01-01"},
+                {
+                    "key": "enterprise/reference.bin",
+                    "size": 25000000,
+                    "last_modified": "2024-01-02",
+                },
+                {"key": "community/reference.bin", "size": 15000000, "last_modified": "2024-01-03"},
+                {"key": "regular.pdf", "size": 1000000, "last_modified": "2024-01-04"},
+            ],
+            "is_truncated": False,
+        }
+        mock_client.service.storage.head.return_value = None
+
+        # Execute
+        stats = get_bucket_stats(mock_client, "multi-orphaned-bucket")
+
+        # Verify stats
+        assert stats.object_count == 1  # Only regular.pdf
+        assert stats.total_size == 1000000
+        assert stats.compressed_size == 1000000  # No references included
+        assert stats.space_saved == 0
+
+        # Verify warnings for all orphaned references
+        warning_calls = [str(call) for call in mock_client.service.logger.warning.call_args_list]
+        warning_text = " ".join(warning_calls)
+
+        assert "ORPHANED REFERENCE FILE" in warning_text
+        assert "3 reference.bin file(s)" in warning_text
+        assert "60,000,000 bytes" in warning_text  # Total of all references
+        assert "s3://multi-orphaned-bucket/pro/reference.bin" in warning_text
+        assert "s3://multi-orphaned-bucket/enterprise/reference.bin" in warning_text
+        assert "s3://multi-orphaned-bucket/community/reference.bin" in warning_text
--- a/tests/unit/test_stats_caching.py
+++ b/tests/unit/test_stats_caching.py
@@ -0,0 +1,284 @@
+"""Unit tests for bucket stats caching functionality."""
+
+import json
+from unittest.mock import MagicMock
+
+from deltaglider.client_models import BucketStats
+from deltaglider.client_operations.stats import (
+    _get_cache_key,
+    _is_cache_valid,
+    _read_stats_cache,
+    _write_stats_cache,
+)
+
+
+def test_get_cache_key():
+    """Test cache key generation for different modes."""
+    assert _get_cache_key("quick") == ".deltaglider/stats_quick.json"
+    assert _get_cache_key("sampled") == ".deltaglider/stats_sampled.json"
+    assert _get_cache_key("detailed") == ".deltaglider/stats_detailed.json"
+
+
+def test_is_cache_valid_when_unchanged():
+    """Test cache validation when bucket hasn't changed."""
+    cached_validation = {
+        "object_count": 100,
+        "compressed_size": 50000,
+    }
+
+    assert _is_cache_valid(cached_validation, 100, 50000) is True
+
+
+def test_is_cache_valid_when_count_changed():
+    """Test cache validation when object count changed."""
+    cached_validation = {
+        "object_count": 100,
+        "compressed_size": 50000,
+    }
+
+    # Object count changed
+    assert _is_cache_valid(cached_validation, 101, 50000) is False
+
+
+def test_is_cache_valid_when_size_changed():
+    """Test cache validation when compressed size changed."""
+    cached_validation = {
+        "object_count": 100,
+        "compressed_size": 50000,
+    }
+
+    # Compressed size changed
+    assert _is_cache_valid(cached_validation, 100, 60000) is False
+
+
+def test_write_and_read_cache_roundtrip():
+    """Test writing and reading cache with valid data."""
+    # Create mock client and storage
+    mock_storage = MagicMock()
+    mock_logger = MagicMock()
+    mock_service = MagicMock()
+    mock_service.storage = mock_storage
+    mock_service.logger = mock_logger
+    mock_client = MagicMock()
+    mock_client.service = mock_service
+
+    # Create test stats
+    test_stats = BucketStats(
+        bucket="test-bucket",
+        object_count=150,
+        total_size=1000000,
+        compressed_size=50000,
+        space_saved=950000,
+        average_compression_ratio=0.95,
+        delta_objects=140,
+        direct_objects=10,
+    )
+
+    # Capture what was written to storage
+    written_data = None
+
+    def capture_put(address, data, metadata):
+        nonlocal written_data
+        written_data = data
+
+    mock_storage.put = capture_put
+
+    # Write cache
+    _write_stats_cache(
+        client=mock_client,
+        bucket="test-bucket",
+        mode="quick",
+        stats=test_stats,
+        object_count=150,
+        compressed_size=50000,
+    )
+
+    # Verify something was written
+    assert written_data is not None
+
+    # Parse written data
+    cache_data = json.loads(written_data.decode("utf-8"))
+
+    # Verify structure
+    assert cache_data["version"] == "1.0"
+    assert cache_data["mode"] == "quick"
+    assert "computed_at" in cache_data
+    assert cache_data["validation"]["object_count"] == 150
+    assert cache_data["validation"]["compressed_size"] == 50000
+    assert cache_data["stats"]["bucket"] == "test-bucket"
+    assert cache_data["stats"]["object_count"] == 150
+    assert cache_data["stats"]["delta_objects"] == 140
+
+    # Now test reading it back
+    mock_obj = MagicMock()
+    mock_obj.data = written_data
+    mock_storage.get = MagicMock(return_value=mock_obj)
+
+    stats, validation = _read_stats_cache(mock_client, "test-bucket", "quick")
+
+    # Verify read stats match original
+    assert stats is not None
+    assert validation is not None
+    assert stats.bucket == "test-bucket"
+    assert stats.object_count == 150
+    assert stats.delta_objects == 140
+    assert stats.average_compression_ratio == 0.95
+    assert validation["object_count"] == 150
+    assert validation["compressed_size"] == 50000
+
+
+def test_read_cache_missing_file():
+    """Test reading cache when file doesn't exist."""
+    mock_storage = MagicMock()
+    mock_logger = MagicMock()
+    mock_service = MagicMock()
+    mock_service.storage = mock_storage
+    mock_service.logger = mock_logger
+    mock_client = MagicMock()
+    mock_client.service = mock_service
+
+    # Simulate FileNotFoundError
+    mock_storage.get.side_effect = FileNotFoundError("No such key")
+
+    stats, validation = _read_stats_cache(mock_client, "test-bucket", "quick")
+
+    assert stats is None
+    assert validation is None
+
+
+def test_read_cache_invalid_json():
+    """Test reading cache with corrupted JSON."""
+    mock_storage = MagicMock()
+    mock_logger = MagicMock()
+    mock_service = MagicMock()
+    mock_service.storage = mock_storage
+    mock_service.logger = mock_logger
+    mock_client = MagicMock()
+    mock_client.service = mock_service
+
+    # Return invalid JSON
+    mock_obj = MagicMock()
+    mock_obj.data = b"not valid json {]["
+    mock_storage.get = MagicMock(return_value=mock_obj)
+
+    stats, validation = _read_stats_cache(mock_client, "test-bucket", "quick")
+
+    assert stats is None
+    assert validation is None
+    mock_logger.warning.assert_called_once()
+
+
+def test_read_cache_version_mismatch():
+    """Test reading cache with wrong version."""
+    mock_storage = MagicMock()
+    mock_logger = MagicMock()
+    mock_service = MagicMock()
+    mock_service.storage = mock_storage
+    mock_service.logger = mock_logger
+    mock_client = MagicMock()
+    mock_client.service = mock_service
+
+    # Cache with wrong version
+    cache_data = {
+        "version": "2.0",  # Wrong version
+        "mode": "quick",
+        "validation": {"object_count": 100, "compressed_size": 50000},
+        "stats": {
+            "bucket": "test",
+            "object_count": 100,
+            "total_size": 1000,
+            "compressed_size": 500,
+            "space_saved": 500,
+            "average_compression_ratio": 0.5,
+            "delta_objects": 90,
+            "direct_objects": 10,
+        },
+    }
+
+    mock_obj = MagicMock()
+    mock_obj.data = json.dumps(cache_data).encode("utf-8")
+    mock_storage.get = MagicMock(return_value=mock_obj)
+
+    stats, validation = _read_stats_cache(mock_client, "test-bucket", "quick")
+
+    assert stats is None
+    assert validation is None
+    mock_logger.warning.assert_called_once()
+
+
+def test_read_cache_mode_mismatch():
+    """Test reading cache with wrong mode."""
+    mock_storage = MagicMock()
+    mock_logger = MagicMock()
+    mock_service = MagicMock()
+    mock_service.storage = mock_storage
+    mock_service.logger = mock_logger
+    mock_client = MagicMock()
+    mock_client.service = mock_service
+
+    # Cache with mismatched mode
+    cache_data = {
+        "version": "1.0",
+        "mode": "detailed",  # Wrong mode
+        "validation": {"object_count": 100, "compressed_size": 50000},
+        "stats": {
+            "bucket": "test",
+            "object_count": 100,
+            "total_size": 1000,
+            "compressed_size": 500,
+            "space_saved": 500,
+            "average_compression_ratio": 0.5,
+            "delta_objects": 90,
+            "direct_objects": 10,
+        },
+    }
+
+    mock_obj = MagicMock()
+    mock_obj.data = json.dumps(cache_data).encode("utf-8")
+    mock_storage.get = MagicMock(return_value=mock_obj)
+
+    # Request "quick" mode but cache has "detailed"
+    stats, validation = _read_stats_cache(mock_client, "test-bucket", "quick")
+
+    assert stats is None
+    assert validation is None
+    mock_logger.warning.assert_called_once()
+
+
+def test_write_cache_handles_errors_gracefully():
+    """Test that cache write failures don't crash the program."""
+    mock_storage = MagicMock()
+    mock_logger = MagicMock()
+    mock_service = MagicMock()
+    mock_service.storage = mock_storage
+    mock_service.logger = mock_logger
+    mock_client = MagicMock()
+    mock_client.service = mock_service
+
+    # Simulate S3 permission error
+    mock_storage.put.side_effect = PermissionError("Access denied")
+
+    test_stats = BucketStats(
+        bucket="test-bucket",
+        object_count=150,
+        total_size=1000000,
+        compressed_size=50000,
+        space_saved=950000,
+        average_compression_ratio=0.95,
+        delta_objects=140,
+        direct_objects=10,
+    )
+
+    # Should not raise exception
+    _write_stats_cache(
+        client=mock_client,
+        bucket="test-bucket",
+        mode="quick",
+        stats=test_stats,
+        object_count=150,
+        compressed_size=50000,
+    )
+
+    # Should log warning
+    mock_logger.warning.assert_called_once()
+    assert "Failed to write cache" in str(mock_logger.warning.call_args)
Author	SHA1	Message	Date
Antonio V	ea8b9aa78b	fix: remove hardcoded /tmp dir in create_client() Fixes #5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 10:27:29 +07:00
Simone Scarduzio	9bfe121f44	style: format files for ruff format --check compliance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 16:02:40 +01:00
Simone Scarduzio	6cab3de9a0	fix: disable sha tag on tag pushes to avoid invalid Docker tag The sha tag template `prefix={{branch}}-` produces `:-hash` on tag pushes because {{branch}} is empty, resulting in an invalid Docker tag like `beshultd/deltaglider:-482f45f`. Only emit sha tags on branch pushes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 15:57:37 +01:00
Simone Scarduzio	482f45fc02	docs: update CHANGELOG for v6.1.0 release Add v6.1.0 section with bucket ACL support, Docker publishing, config/model refactoring. Backfill v6.0.0 section from previously unreleased entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 15:55:50 +01:00
Simone Scarduzio	6b3245266e	feat: add put_bucket_acl and get_bucket_acl support Add boto3-compatible bucket ACL operations as pure S3 passthroughs, following the existing create_bucket/delete_bucket pattern. Includes CLI commands (put-bucket-acl, get-bucket-acl), 7 integration tests, and documentation updates (method count 21→23). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 15:53:33 +01:00
Simone Scarduzio	20053acb5f	fix: remove unused imports flagged by ruff Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 08:48:22 +01:00
Simone Scarduzio	87f425734f	refactor: typed result dataclasses, centralized metadata aliases, config extraction - Replace dict[str,Any] returns in delete/delete_recursive with DeleteResult and RecursiveDeleteResult dataclasses for type safety - Extract _delete_reference/_delete_delta/_classify_objects_for_deletion helper methods from oversized delete methods in service.py - Centralize metadata key aliases in METADATA_KEY_ALIASES dict with resolve_metadata() replacing duplicated _meta_value() lookups - Add DeltaGliderConfig dataclass with from_env() for centralized config - Add ObjectKey.full_key property, remove dead _multipart_uploads dict - Update all consumers (client, CLI, tests) for dataclass access patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 23:16:57 +01:00
Simone Scarduzio	012662c377	updates	2025-11-11 17:20:43 +01:00
Simone Scarduzio	284f030fae	updates to docs	2025-11-11 17:05:50 +01:00
Simone Scarduzio	7a4d30a007	freshen up	2025-11-11 11:18:06 +01:00
Simone Scarduzio	0d46283ff0	width	2025-11-11 09:55:52 +01:00
Simone Scarduzio	805e2967bc	dark mode	2025-11-11 09:53:54 +01:00
Simone Scarduzio	2ef1741d51	freshen up readme	2025-11-11 09:48:34 +01:00
Simone Scarduzio	2c1d756e7b	tweak readme	2025-11-06 16:14:29 +01:00
Simone Scarduzio	c6cee7ae26	docker	2025-11-06 15:56:15 +01:00
Simone Scarduzio	cee9a9fd2d	higher limits why not	2025-10-17 18:43:46 +02:00
Simone Scarduzio	0507e6ebcd	format	2025-10-16 17:14:37 +02:00
Simone Scarduzio	fa9c4fa42d	feat: Implement rehydration and purge functionality for deltaglider files - Added `rehydrate_for_download` method to download and decompress deltaglider-compressed files, re-uploading them with expiration metadata. - Introduced `generate_presigned_url_with_rehydration` method to generate presigned URLs that automatically handle rehydration for both regular and deltaglider files. - Implemented `purge_temp_files` command in CLI to delete expired temporary files from the .deltaglider/tmp/ directory, with options for dry run and JSON output. - Enhanced service methods to support the new rehydration and purging features, including detailed logging and metrics tracking.	2025-10-16 17:02:00 +02:00
Simone Scarduzio	934d83975c	fix: format models.py	2025-10-16 11:21:33 +02:00
Simone Scarduzio	c32d5265d9	feat: Enhance metadata handling and bucket statistics - Added object_limit_reached attribute to BucketStats for tracking limits. - Introduced QUICK_LIST_LIMIT and SAMPLED_LIST_LIMIT constants to manage listing limits. - Implemented _first_metadata_value helper function for improved metadata retrieval. - Updated get_bucket_stats to log when listing is capped due to limits. - Refactored DeltaMeta to streamline metadata extraction with error handling. - Enhanced object listing to support max_objects parameter and limit tracking.	2025-10-16 11:17:13 +02:00
Simone Scarduzio	1cf7e3ad21	import	2025-10-15 18:52:56 +02:00
Simone Scarduzio	9b36087438	not mandatory to have the command metadata field set	2025-10-15 18:16:43 +02:00
Simone Scarduzio	60877966f2	docs: Remove outdated METADATA_ISSUE_DIAGNOSIS.md This document describes the old metadata format without dg- prefix. Since v6.0.0 uses the new dg- prefixed format and requires all files to be re-uploaded (greenfield approach), this diagnosis doc is no longer relevant.	2025-10-15 11:45:52 +02:00
Simone Scarduzio	fbd44ea3c3	style: Format integration test files with ruff	2025-10-15 11:38:17 +02:00
Simone Scarduzio	3f689fc601	fix: Update integration tests for new metadata format and caching behavior - Fix sync tests: Add list_objects.side_effect = NotImplementedError() to mock - Fix sync tests: Add side_effect for put() to avoid hanging - Fix MockStorage: Add continuation_token parameter to list_objects() - Fix stats tests: Update assertions to include use_cache and refresh_cache params - Fix bucket management test: Update caching expectations for S3-based cache All 97 integration tests now pass.	2025-10-15 11:34:43 +02:00
Simone Scarduzio	3753212f96	style: Format test file with ruff 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-15 11:22:00 +02:00
Simone Scarduzio	db7d14f8a8	feat: Add metadata namespace and fix stats calculation This is a major release with breaking changes to metadata format. BREAKING CHANGES: - All metadata keys now use 'dg-' namespace prefix (becomes 'x-amz-meta-dg-*' in S3) - Old metadata format is not supported - all files must be re-uploaded - Stats behavior changed: quick mode no longer shows misleading warnings Features: - Metadata now uses real package version (dg-tool: deltaglider/VERSION) - All metadata keys properly namespaced with 'dg-' prefix - Clean stats output in quick mode (no per-file warning spam) - Fixed nonsensical negative compression ratios in quick mode Fixes: - Stats now correctly handles delta files without metadata - Space saved shows 0 instead of negative numbers when metadata unavailable - Removed misleading warnings in quick mode (metadata not fetched is expected) - Fixed metadata keys to use hyphens instead of underscores Documentation: - Added comprehensive metadata documentation - Added stats calculation behavior guide - Added real version tracking documentation Tests: - Updated all tests to use new dg- prefixed metadata keys - All 73 unit tests passing - All quality checks passing (ruff, mypy) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-15 11:19:10 +02:00
Simone Scarduzio	e1259b7ea8	fix: Code quality improvements for v5.2.2 release - Fix pagination bug using continuation_token instead of start_after - Add stats caching to prevent blocking web apps - Improve code formatting and type checking - Add comprehensive unit tests for new features - Fix test mock usage in object_listing tests	2025-10-14 23:54:49 +02:00
Simone Scarduzio	ff05e77c24	fix: Prevent get_bucket_stats from blocking web apps indefinitely Performance Issues Fixed: 1. aws_compat.py: Changed to use cached stats only (no bucket scans after uploads) 2. stats.py: Added safety mechanisms to prevent infinite hangs - Max 10k iterations (10M object limit) - 10 min timeout on metadata fetching - Missing pagination token detection - Graceful error recovery with partial stats Refactoring: - Reduced nesting in get_bucket_stats from 5 levels to 2 levels - Extracted 5 helper functions for better maintainability - Main function reduced from 300+ lines to 33 lines - 100% backward compatible - no API changes Benefits: - Web apps no longer hang on upload/delete operations - Explicit get_bucket_stats() calls complete within bounded time - Better error handling and logging - Easier to test and maintain 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-14 14:47:39 +02:00
Simone Scarduzio	c3d385bf18	fix tests	2025-10-13 17:26:35 +02:00
Simone Scarduzio	aea5cb5d9a	feat: Enhance S3 migration CLI with new commands and EC2 detection option	2025-10-12 23:12:32 +02:00
Simone Scarduzio	b2ca59490b	feat: Add EC2 region detection and cost optimization features	2025-10-12 22:41:48 +02:00
Simone Scarduzio	4f56c4b600	fix: Preserve original filenames during S3-to-S3 migration	2025-10-12 18:10:04 +02:00
Simone Scarduzio	14c6af0f35	handle version in cli	2025-10-12 17:47:05 +02:00
Simone Scarduzio	67792b2031	migrate CLI support	2025-10-12 17:37:44 +02:00
Simone Scarduzio	a9a1396e6e	style: Format test_stats_algorithm.py with ruff	2025-10-11 14:17:49 +02:00
Simone Scarduzio	52eb5bba21	fix: Fix unit test import issues for concurrent.futures - Remove unnecessary concurrent.futures patches in tests - Update test_detailed_stats_flag to match current implementation behavior - Tests now properly handle parallel metadata fetching without mocking	2025-10-11 14:13:40 +02:00
Simone Scarduzio	f75db142e8	fix: Correct logging message formatting in get_bucket_stats and update test assertionsalls for clarity.	2025-10-11 14:05:54 +02:00
Simone Scarduzio	35d34d4862	chore: Update CHANGELOG for v5.1.1 release - Document stats command fixes - Document performance improvements	2025-10-10 19:57:11 +02:00
Simone Scarduzio	9230cbd762	test	2025-10-10 19:52:15 +02:00
Simone Scarduzio	2eba6e8d38	optimisation	2025-10-10 19:50:33 +02:00
Simone Scarduzio	656726b57b	algorithm correctness	2025-10-10 19:46:39 +02:00