# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview DeltaGlider is a drop-in S3 replacement that achieves 99.9% compression for versioned artifacts through intelligent binary delta compression using xdelta3. It's designed to store 4TB of similar files in 5GB by storing only the differences between versions. ## Essential Commands ### Development Setup ```bash # Install with development dependencies using uv (preferred) uv pip install -e ".[dev]" # Or using pip pip install -e ".[dev]" ``` ### Testing ```bash # Run all tests uv run pytest # Run unit tests only uv run pytest tests/unit # Run integration tests only uv run pytest tests/integration # Run a specific test file uv run pytest tests/integration/test_full_workflow.py # Run a specific test uv run pytest tests/integration/test_full_workflow.py::test_full_put_get_workflow # Run with verbose output uv run pytest -v # Run with coverage uv run pytest --cov=deltaglider ``` ### Code Quality ```bash # Run linter (ruff) uv run ruff check src/ # Fix linting issues automatically uv run ruff check --fix src/ # Format code uv run ruff format src/ # Type checking with mypy uv run mypy src/ # Run all checks (linting + type checking) uv run ruff check src/ && uv run mypy src/ ``` ### Local Testing with MinIO ```bash # Start MinIO for local S3 testing docker run -p 9000:9000 -p 9001:9001 \ -e MINIO_ROOT_USER=minioadmin \ -e MINIO_ROOT_PASSWORD=minioadmin \ minio/minio server /data --console-address ":9001" # Test with local MinIO export AWS_ENDPOINT_URL=http://localhost:9000 export AWS_ACCESS_KEY_ID=minioadmin export AWS_SECRET_ACCESS_KEY=minioadmin # Now you can use deltaglider commands deltaglider cp test.zip s3://test-bucket/ ``` ## Architecture ### Hexagonal Architecture Pattern The codebase follows a clean hexagonal (ports and adapters) architecture: ``` src/deltaglider/ ├── core/ # Domain logic (pure Python, no external dependencies) │ ├── service.py # Main DeltaService orchestration │ ├── models.py # Data models (DeltaSpace, ObjectKey, PutSummary, etc.) │ └── errors.py # Domain-specific exceptions ├── ports/ # Abstract interfaces (protocols) │ ├── storage.py # StoragePort protocol for S3-like operations │ ├── diff.py # DiffPort protocol for delta operations │ ├── hash.py # HashPort protocol for integrity checks │ ├── cache.py # CachePort protocol for local references │ ├── clock.py # ClockPort protocol for time operations │ ├── logger.py # LoggerPort protocol for logging │ └── metrics.py # MetricsPort protocol for observability ├── adapters/ # Concrete implementations │ ├── storage_s3.py # S3StorageAdapter using boto3 │ ├── diff_xdelta.py # XdeltaAdapter using xdelta3 binary │ ├── hash_sha256.py # Sha256Adapter for checksums │ ├── cache_fs.py # FsCacheAdapter for file system cache │ ├── clock_utc.py # UtcClockAdapter for UTC timestamps │ ├── logger_std.py # StdLoggerAdapter for console output │ └── metrics_noop.py # NoopMetricsAdapter (placeholder) └── app/ └── cli/ # Click-based CLI application ├── main.py # Main CLI entry point with AWS S3 commands ├── aws_compat.py # AWS S3 compatibility helpers └── sync.py # Sync command implementation ``` ### Core Concepts 1. **DeltaSpace**: A prefix in S3 where related files are stored for delta compression. Contains a `reference.bin` file that serves as the base for delta compression. 2. **Delta Compression Flow**: - First file uploaded to a DeltaSpace becomes the reference (stored as `reference.bin`) - Subsequent files are compared against the reference using xdelta3 - Only the differences (delta) are stored with `.delta` suffix - Metadata in S3 tags preserves original file info and delta relationships 3. **File Type Intelligence**: - Archive files (`.zip`, `.tar`, `.gz`, `.jar`, etc.) use delta compression - Text files, small files, and already-compressed unique files bypass delta - Decision made by `should_use_delta()` in `core/service.py` 4. **AWS S3 CLI Compatibility**: - Commands (`cp`, `ls`, `rm`, `sync`) mirror AWS CLI syntax exactly - Located in `app/cli/main.py` with helpers in `aws_compat.py` ### Key Algorithms 1. **Delta Ratio Check** (`core/service.py`): - After creating a delta, checks if `delta_size / file_size > max_ratio` (default 0.5) - If delta is too large (>50% of original), stores file directly instead - Prevents inefficient compression for dissimilar files 2. **Reference Management** (`core/service.py`): - Reference stored at `{deltaspace.prefix}/reference.bin` - SHA256 verification on every read/write - Local cache in `/tmp/.deltaglider/reference_cache` for performance 3. **Sync Algorithm** (`app/cli/sync.py`): - Compares local vs S3 using size and modification time - For delta files, uses timestamp comparison with 1-second tolerance - Supports `--delete` flag for true mirroring ## Testing Strategy - **Unit Tests** (`tests/unit/`): Test individual adapters and core logic with mocks - **Integration Tests** (`tests/integration/`): Test CLI commands and workflows - **E2E Tests** (`tests/e2e/`): Require LocalStack for full S3 simulation Key test files: - `test_full_workflow.py`: Complete put/get cycle testing - `test_aws_cli_commands_v2.py`: AWS S3 CLI compatibility tests - `test_xdelta.py`: Binary diff engine integration tests ## Common Development Tasks ### Adding a New CLI Command 1. Add command function to `src/deltaglider/app/cli/main.py` 2. Use `@cli.command()` decorator and `@click.pass_obj` for service access 3. Follow AWS S3 CLI conventions for flags and arguments 4. Add tests to `tests/integration/test_aws_cli_commands_v2.py` ### Adding a New Port/Adapter Pair 1. Define protocol in `src/deltaglider/ports/` 2. Implement adapter in `src/deltaglider/adapters/` 3. Wire adapter in `create_service()` in `app/cli/main.py` 4. Add unit tests in `tests/unit/test_adapters.py` ### Modifying Delta Logic Core delta logic is in `src/deltaglider/core/service.py`: - `put()`: Handles upload with delta compression - `get()`: Handles download with delta reconstruction - `should_use_delta()`: File type discrimination logic ## Environment Variables - `DG_LOG_LEVEL`: Logging level (default: "INFO") - `DG_CACHE_DIR`: Local reference cache directory (default: "/tmp/.deltaglider/reference_cache") - `DG_MAX_RATIO`: Maximum acceptable delta/file ratio (default: "0.5") - `AWS_ENDPOINT_URL`: Override S3 endpoint for MinIO/LocalStack - `AWS_ACCESS_KEY_ID`: AWS credentials - `AWS_SECRET_ACCESS_KEY`: AWS credentials - `AWS_DEFAULT_REGION`: AWS region ## Important Implementation Details 1. **xdelta3 Binary Dependency**: The system requires xdelta3 binary installed on the system. The `XdeltaAdapter` uses subprocess to call it. 2. **Metadata Storage**: File metadata is stored in S3 object metadata/tags, not in a separate database. This keeps the system simple and stateless. 3. **SHA256 Verification**: Every read and write operation includes SHA256 verification for data integrity. 4. **Atomic Operations**: All S3 operations are atomic - no partial states are left if operations fail. 5. **Reference File Updates**: Currently, the first file uploaded to a DeltaSpace becomes the permanent reference. Future versions may implement reference rotation. ## Performance Considerations - Local reference caching dramatically improves performance for repeated operations - Delta compression is CPU-intensive; consider parallelization for bulk uploads - The default max_ratio of 0.5 prevents storing inefficient deltas - For files <1MB, delta overhead may exceed benefits ## Security Notes - Never store AWS credentials in code - Use IAM roles when possible - All S3 operations respect bucket policies and encryption settings - SHA256 checksums prevent tampering and corruption