8.1 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
DeltaGlider is a drop-in S3 replacement that achieves 99.9% compression for versioned artifacts through intelligent binary delta compression using xdelta3. It's designed to store 4TB of similar files in 5GB by storing only the differences between versions.
Essential Commands
Development Setup
# Install with development dependencies using uv (preferred)
uv pip install -e ".[dev]"
# Or using pip
pip install -e ".[dev]"
Testing
# Run all tests
uv run pytest
# Run unit tests only
uv run pytest tests/unit
# Run integration tests only
uv run pytest tests/integration
# Run a specific test file
uv run pytest tests/integration/test_full_workflow.py
# Run a specific test
uv run pytest tests/integration/test_full_workflow.py::test_full_put_get_workflow
# Run with verbose output
uv run pytest -v
# Run with coverage
uv run pytest --cov=deltaglider
Code Quality
# Run linter (ruff)
uv run ruff check src/
# Fix linting issues automatically
uv run ruff check --fix src/
# Format code
uv run ruff format src/
# Type checking with mypy
uv run mypy src/
# Run all checks (linting + type checking)
uv run ruff check src/ && uv run mypy src/
Local Testing with MinIO
# Start MinIO for local S3 testing
docker run -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=minioadmin \
-e MINIO_ROOT_PASSWORD=minioadmin \
minio/minio server /data --console-address ":9001"
# Test with local MinIO
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
# Now you can use deltaglider commands
deltaglider cp test.zip s3://test-bucket/
Architecture
Hexagonal Architecture Pattern
The codebase follows a clean hexagonal (ports and adapters) architecture:
src/deltaglider/
├── core/ # Domain logic (pure Python, no external dependencies)
│ ├── service.py # Main DeltaService orchestration
│ ├── models.py # Data models (DeltaSpace, ObjectKey, PutSummary, etc.)
│ └── errors.py # Domain-specific exceptions
├── ports/ # Abstract interfaces (protocols)
│ ├── storage.py # StoragePort protocol for S3-like operations
│ ├── diff.py # DiffPort protocol for delta operations
│ ├── hash.py # HashPort protocol for integrity checks
│ ├── cache.py # CachePort protocol for local references
│ ├── clock.py # ClockPort protocol for time operations
│ ├── logger.py # LoggerPort protocol for logging
│ └── metrics.py # MetricsPort protocol for observability
├── adapters/ # Concrete implementations
│ ├── storage_s3.py # S3StorageAdapter using boto3
│ ├── diff_xdelta.py # XdeltaAdapter using xdelta3 binary
│ ├── hash_sha256.py # Sha256Adapter for checksums
│ ├── cache_fs.py # FsCacheAdapter for file system cache
│ ├── clock_utc.py # UtcClockAdapter for UTC timestamps
│ ├── logger_std.py # StdLoggerAdapter for console output
│ └── metrics_noop.py # NoopMetricsAdapter (placeholder)
└── app/
└── cli/ # Click-based CLI application
├── main.py # Main CLI entry point with AWS S3 commands
├── aws_compat.py # AWS S3 compatibility helpers
└── sync.py # Sync command implementation
Core Concepts
-
DeltaSpace: A prefix in S3 where related files are stored for delta compression. Contains a
reference.binfile that serves as the base for delta compression. -
Delta Compression Flow:
- First file uploaded to a DeltaSpace becomes the reference (stored as
reference.bin) - Subsequent files are compared against the reference using xdelta3
- Only the differences (delta) are stored with
.deltasuffix - Metadata in S3 tags preserves original file info and delta relationships
- First file uploaded to a DeltaSpace becomes the reference (stored as
-
File Type Intelligence:
- Archive files (
.zip,.tar,.gz,.jar, etc.) use delta compression - Text files, small files, and already-compressed unique files bypass delta
- Decision made by
should_use_delta()incore/service.py
- Archive files (
-
AWS S3 CLI Compatibility:
- Commands (
cp,ls,rm,sync) mirror AWS CLI syntax exactly - Located in
app/cli/main.pywith helpers inaws_compat.py - Maintains backward compatibility with original
put/getcommands
- Commands (
Key Algorithms
-
Delta Ratio Check (
core/service.py):- After creating a delta, checks if
delta_size / file_size > max_ratio(default 0.5) - If delta is too large (>50% of original), stores file directly instead
- Prevents inefficient compression for dissimilar files
- After creating a delta, checks if
-
Reference Management (
core/service.py):- Reference stored at
{deltaspace.prefix}/reference.bin - SHA256 verification on every read/write
- Local cache in
/tmp/.deltaglider/reference_cachefor performance
- Reference stored at
-
Sync Algorithm (
app/cli/sync.py):- Compares local vs S3 using size and modification time
- For delta files, uses timestamp comparison with 1-second tolerance
- Supports
--deleteflag for true mirroring
Testing Strategy
- Unit Tests (
tests/unit/): Test individual adapters and core logic with mocks - Integration Tests (
tests/integration/): Test CLI commands and workflows - E2E Tests (
tests/e2e/): Require LocalStack for full S3 simulation
Key test files:
test_full_workflow.py: Complete put/get cycle testingtest_aws_cli_commands_v2.py: AWS S3 CLI compatibility teststest_xdelta.py: Binary diff engine integration tests
Common Development Tasks
Adding a New CLI Command
- Add command function to
src/deltaglider/app/cli/main.py - Use
@cli.command()decorator and@click.pass_objfor service access - Follow AWS S3 CLI conventions for flags and arguments
- Add tests to
tests/integration/test_aws_cli_commands_v2.py
Adding a New Port/Adapter Pair
- Define protocol in
src/deltaglider/ports/ - Implement adapter in
src/deltaglider/adapters/ - Wire adapter in
create_service()inapp/cli/main.py - Add unit tests in
tests/unit/test_adapters.py
Modifying Delta Logic
Core delta logic is in src/deltaglider/core/service.py:
put(): Handles upload with delta compressionget(): Handles download with delta reconstructionshould_use_delta(): File type discrimination logic
Environment Variables
DG_LOG_LEVEL: Logging level (default: "INFO")DG_CACHE_DIR: Local reference cache directory (default: "/tmp/.deltaglider/reference_cache")DG_MAX_RATIO: Maximum acceptable delta/file ratio (default: "0.5")AWS_ENDPOINT_URL: Override S3 endpoint for MinIO/LocalStackAWS_ACCESS_KEY_ID: AWS credentialsAWS_SECRET_ACCESS_KEY: AWS credentialsAWS_DEFAULT_REGION: AWS region
Important Implementation Details
-
xdelta3 Binary Dependency: The system requires xdelta3 binary installed on the system. The
XdeltaAdapteruses subprocess to call it. -
Metadata Storage: File metadata is stored in S3 object metadata/tags, not in a separate database. This keeps the system simple and stateless.
-
SHA256 Verification: Every read and write operation includes SHA256 verification for data integrity.
-
Atomic Operations: All S3 operations are atomic - no partial states are left if operations fail.
-
Reference File Updates: Currently, the first file uploaded to a DeltaSpace becomes the permanent reference. Future versions may implement reference rotation.
Performance Considerations
- Local reference caching dramatically improves performance for repeated operations
- Delta compression is CPU-intensive; consider parallelization for bulk uploads
- The default max_ratio of 0.5 prevents storing inefficient deltas
- For files <1MB, delta overhead may exceed benefits
Security Notes
- Never store AWS credentials in code
- Use IAM roles when possible
- All S3 operations respect bucket policies and encryption settings
- SHA256 checksums prevent tampering and corruption