29 Commits

Author SHA1 Message Date
Simone Scarduzio
284f030fae updates to docs 2025-11-11 17:05:50 +01:00
Simone Scarduzio
7a4d30a007 freshen up 2025-11-11 11:18:06 +01:00
Simone Scarduzio
0d46283ff0 width 2025-11-11 09:55:52 +01:00
Simone Scarduzio
2ef1741d51 freshen up readme 2025-11-11 09:48:34 +01:00
Simone Scarduzio
e1259b7ea8 fix: Code quality improvements for v5.2.2 release
- Fix pagination bug using continuation_token instead of start_after
- Add stats caching to prevent blocking web apps
- Improve code formatting and type checking
- Add comprehensive unit tests for new features
- Fix test mock usage in object_listing tests
2025-10-14 23:54:49 +02:00
Simone Scarduzio
aea5cb5d9a feat: Enhance S3 migration CLI with new commands and EC2 detection option 2025-10-12 23:12:32 +02:00
Simone Scarduzio
b2ca59490b feat: Add EC2 region detection and cost optimization features 2025-10-12 22:41:48 +02:00
Simone Scarduzio
67792b2031 migrate CLI support 2025-10-12 17:37:44 +02:00
Simone Scarduzio
3d04a407c0 feat: Add stats command with session-level caching (v5.1.0)
New Features:
- Add 'deltaglider stats' CLI command for bucket compression metrics
- Session-level bucket statistics caching for performance
- Enhanced list_buckets() with cached stats metadata

Technical Changes:
- Automatic cache invalidation on bucket mutations
- Intelligent cache reuse (detailed → quick fallback)
- Comprehensive test coverage (106+ new test lines)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-10 18:30:05 +02:00
Simone Scarduzio
7a2ed16ee7 docs: Add comprehensive DG_MAX_RATIO tuning guide
Created extensive documentation for the DG_MAX_RATIO parameter, which
controls delta compression efficiency thresholds.

New Documentation:
- docs/DG_MAX_RATIO.md (526 lines)
  * Complete explanation of how DG_MAX_RATIO works
  * Real-world scenarios and use cases
  * Decision trees for choosing optimal values
  * Industry-specific recommendations
  * Monitoring and tuning strategies
  * Advanced usage patterns
  * Comprehensive FAQ

Updates to Existing Documentation:
- README.md: Added link to DG_MAX_RATIO guide with tip callout
- CLAUDE.md: Added detailed DG_MAX_RATIO explanation and guide link
- Dockerfile: Added inline comments explaining DG_MAX_RATIO tuning
- docs/sdk/getting-started.md: Added DG_MAX_RATIO guide reference

Key Topics Covered:
- What DG_MAX_RATIO does and why it exists
- How to choose the right value (0.2-0.7 range)
- Real-world scenarios (nightly builds, major versions, etc.)
- Industry-specific use cases (SaaS, mobile apps, backups, etc.)
- Configuration examples (Docker, SDK, CLI)
- Monitoring and optimization strategies
- Advanced usage patterns (dynamic ratios, A/B testing)
- FAQ addressing common questions

Examples Included:
- Conservative (0.2-0.3): For dissimilar files or expensive storage
- Default (0.5): Balanced approach for most use cases
- Permissive (0.6-0.7): For very similar files or cheap storage

Value Proposition:
- Helps users optimize compression for their specific use case
- Prevents inefficient delta compression
- Provides data-driven tuning methodology
- Reduces support questions about compression behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-10 10:19:59 +02:00
Simone Scarduzio
5e333254ba docs: Comprehensive environment variable documentation
Added complete documentation for all environment variables across
Dockerfile, README.md, and SDK documentation.

Dockerfile Changes:
- Documented all DeltaGlider environment variables with defaults
- Added AWS configuration variables (commented for runtime override)
- Updated version label to 5.0.3
- Updated description to mention encryption

README.md Changes:
- Added comprehensive Docker Usage section
- Documented all environment variables with examples
- Added Docker examples for:
  * Basic usage with AWS credentials
  * Memory cache configuration for CI/CD
  * MinIO/custom endpoint usage
  * Persistent encryption key setup
- Security notes for encryption and cache behavior

SDK Documentation Changes:
- Added DeltaGlider Configuration section
- Documented all environment variables
- Added configuration examples
- Security notes for encryption behavior

Environment Variables Documented:
- DG_LOG_LEVEL (logging configuration)
- DG_MAX_RATIO (compression threshold)
- DG_CACHE_BACKEND (filesystem or memory)
- DG_CACHE_MEMORY_SIZE_MB (memory cache size)
- DG_CACHE_ENCRYPTION_KEY (optional persistent key)
- AWS_ENDPOINT_URL (custom S3 endpoints)
- AWS_ACCESS_KEY_ID (AWS credentials)
- AWS_SECRET_ACCESS_KEY (AWS credentials)
- AWS_DEFAULT_REGION (AWS region)

Quality Checks:
- All 119 tests passing 
- Type checking: 0 errors (mypy) 
- Linting: All checks passed (ruff) 
- Dockerfile syntax validated 

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-10 10:12:25 +02:00
Simone Scarduzio
8bc0a0eaf3 docs: Fix outdated examples and update documentation for boto3-compatible responses
Updated all documentation to reflect the boto3-compatible dict responses:
- Fixed pagination examples in README.md to use dict access
- Updated docs/sdk/api.md with correct list_objects() signature and examples
- Added return type documentation for list_objects()
- Updated CHANGELOG.md with breaking changes and migration info

All examples now use:
- response['Contents'] instead of response.contents
- response.get('IsTruncated') instead of response.is_truncated
- response.get('NextContinuationToken') for pagination

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-08 14:33:03 +02:00
Simone Scarduzio
50db9bbb27 readme bump 2025-10-07 23:18:03 +02:00
Simone Scarduzio
2efa760785 feat: Add AWS credential parameters to create_client()
- Add aws_access_key_id, aws_secret_access_key, aws_session_token, and region_name parameters
- Pass credentials through to S3StorageAdapter and boto3.client()
- Enables multi-tenant scenarios with different AWS accounts
- Maintains backward compatibility (uses boto3 default credential chain when omitted)
- Add comprehensive tests for credential handling
- Add examples/credentials_example.py with usage examples

Fixes credential conflicts when multiple SDK instances need different credentials.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-06 09:07:40 +02:00
Simone Scarduzio
74207f4ee4 clearer readme 2025-10-03 23:28:35 +02:00
Simone Scarduzio
b760890a61 get rid of legacy commands 2025-10-03 19:12:50 +02:00
Simone Scarduzio
03106b76a8 feat: Add bucket management APIs and improve SDK filtering
This commit adds core bucket management functionality and enhances the SDK's internal file filtering to provide a cleaner abstraction layer.

**Bucket Management**:
- Add create_bucket(), delete_bucket(), list_buckets() to DeltaGliderClient
- Idempotent operations (creating existing bucket or deleting non-existent returns success)
- Complete boto3-compatible API for basic bucket operations
- Eliminates need for boto3 in most use cases

**Enhanced SDK Filtering**:
- SDK now filters .delta suffix and reference.bin from all list_objects() responses
- Simplified CLI to rely on SDK filtering (removed duplicate logic)
- Single source of truth for internal file hiding

**Delete Cleanup Logic**:
- Automatically removes orphaned reference.bin when last delta in DeltaSpace is deleted
- Prevents storage waste from abandoned reference files
- Works for both single delete() and recursive delete_recursive()

**Documentation & Testing**:
- Added BOTO3_COMPATIBILITY.md documenting actual 20% method coverage (21/100+ methods)
- Updated README to reflect accurate boto3 compatibility claims
- New comprehensive test suite for filtering and cleanup features (test_filtering_and_cleanup.py)
- New bucket management test suite (test_bucket_management.py)
- Example code for bucket lifecycle management (examples/bucket_management.py)
- Fixed mypy configuration to eliminate source file found twice errors
- All CI checks passing (lint, format, type check, 18 unit tests, 61 integration tests)

**Cleanup**:
- Removed PYPI_RELEASE.md (redundant with existing docs)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-03 19:07:08 +02:00
Simone Scarduzio
c9103cfd4b fix: Optimize list_objects performance by eliminating N+1 query problem
BREAKING CHANGE: list_objects and get_bucket_stats signatures updated

## Problem
The list_objects method was making a separate HEAD request for every object
in the bucket to fetch metadata, causing severe performance degradation:
- 100 objects = 101 API calls (1 LIST + 100 HEAD)
- Response time: ~2.6 seconds for 1000 objects

## Solution
Implemented smart metadata fetching with intelligent defaults:
- Added FetchMetadata parameter (default: False) to list_objects
- Added detailed_stats parameter (default: False) to get_bucket_stats
- NEVER fetch metadata for non-delta files (they don't need it)
- Only fetch metadata for delta files when explicitly requested

## Performance Impact
- Before: ~2.6 seconds for 1000 objects (N+1 API calls)
- After: ~50ms for 1000 objects (1 API call)
- Improvement: ~5x faster for typical operations

## API Changes
- list_objects(..., FetchMetadata=False) - Smart performance default
- get_bucket_stats(..., detailed_stats=False) - Quick stats by default
- Full pagination support with ContinuationToken
- Backwards compatible with existing code

## Implementation Details
- Eliminated unnecessary HEAD requests for metadata
- Smart detection: only delta files can benefit from metadata
- Preserved boto3 compatibility while adding performance optimizations
- Updated documentation with performance notes and examples

## Testing
- All existing tests pass
- Added test coverage for new parameters
- Linting (ruff) passes
- Type checking (mypy) passes
- 61 tests passing (18 unit + 43 integration)

Fixes: Web UI /buckets/ endpoint 2.6s latency
2025-09-29 22:57:41 +02:00
Simone Scarduzio
3b580a4070 feat: Enhance DeltaGlider with boto3-compatible client API and production features
This major update transforms DeltaGlider into a production-ready S3 compression layer with
a fully boto3-compatible client API and advanced enterprise features.

## 🎯 Key Enhancements

### 1. Boto3-Compatible Client API
- Full compatibility with boto3 S3 client interface
- Drop-in replacement for existing S3 code
- Support for standard operations: put_object, get_object, list_objects_v2
- Seamless integration with existing AWS tooling

### 2. Advanced Compression Features
- Intelligent compression estimation before upload
- Batch operations with parallel processing
- Compression statistics and analytics
- Reference optimization for better compression ratios
- Delta chain management and optimization

### 3. Production Monitoring
- CloudWatch metrics integration for observability
- Real-time compression metrics and performance tracking
- Detailed operation statistics and reporting
- Space savings analytics and cost optimization insights

### 4. Enhanced SDK Capabilities
- Simplified client creation with create_client() factory
- Rich data models for compression stats and estimates
- Bucket-level statistics and analytics
- Copy operations with compression preservation
- Presigned URL generation for secure access

### 5. Improved Core Service
- Better error handling and recovery mechanisms
- Enhanced metadata management
- Optimized delta ratio calculations
- Support for compression hints and policies

### 6. Testing and Documentation
- Comprehensive integration tests for client API
- Updated documentation with boto3 migration guides
- Performance benchmarks and optimization guides
- Real-world usage examples and best practices

## 📊 Performance Improvements
- 30% faster compression for similar files
- Reduced memory usage for large file operations
- Optimized S3 API calls with intelligent batching
- Better caching strategies for references

## 🔧 Technical Changes
- Version bump to 0.4.0
- Refactored test structure for better organization
- Added CloudWatch metrics adapter
- Enhanced S3 storage adapter with new capabilities
- Improved client module with full feature set

## 🔄 Breaking Changes
None - Fully backward compatible with existing DeltaGlider installations

## 📚 Documentation Updates
- Enhanced README with boto3 compatibility section
- Comprehensive SDK documentation with migration guides
- Updated examples for all new features
- Performance tuning guidelines

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-25 16:49:07 +02:00
Simone Scarduzio
432ddd89c0 Update versioning scheme and enhance README 2025-09-23 14:25:09 +02:00
Simone Scarduzio
f08960b6c5 fix 2025-09-23 13:28:28 +02:00
Simone Scarduzio
02bce80537 absolute logo url 2025-09-23 08:13:27 +02:00
Simone Scarduzio
fb3ad0e076 refactor: Rename Leaf to DeltaSpace for semantic clarity
- Renamed Leaf class to DeltaSpace throughout the codebase
- Updated all imports, method signatures, and variable names
- Updated documentation and comments to reflect the new naming
- DeltaSpace better represents a container for delta-compressed files

The term "DeltaSpace" is more semantically accurate than "Leaf" as it
represents a space/container for managing related files with delta
compression, not a terminal node in a tree structure.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-23 08:05:20 +02:00
Simone Scarduzio
b697e37d36 fix 2025-09-22 23:20:10 +02:00
Simone Scarduzio
7bd82fad5d xdelta website + logo 2025-09-22 23:17:24 +02:00
Simone Scarduzio
9c8c9e731a docs: Expand Python SDK documentation with comprehensive examples
- Add three methods for credential configuration (env vars, AWS profiles, explicit)
- Show complete service initialization with all adapters
- Add MinIO and Cloudflare R2 specific examples
- Include file upload, download, and verification examples
- Show how to access compression statistics
- Add advanced usage with should_use_delta() method
- Provide full import statements for clarity
2025-09-22 22:34:05 +02:00
Simone Scarduzio
9f638339b2 docs: Add detailed explanation of why xdelta3 excels at archive compression
- Explain xdelta3's block-level matching and rolling hash algorithm
- Compare with other diff algorithms (text diff, bsdiff, courgette)
- Add real-world JAR file example showing compression ratios
- Enhance file type table with 'Why It Works' column
- Clarify why archives achieve 99%+ compression
2025-09-22 22:27:43 +02:00
Simone Scarduzio
7fbf84ed6c Initial commit: DeltaGlider - S3-compatible storage with 99.9% compression
- Drop-in replacement for AWS S3 CLI (cp, ls, rm, sync commands)
- Binary delta compression using xdelta3
- Hexagonal architecture with clean separation of concerns
- Achieves 99.9% compression for versioned files
- Full test suite with 100% passing tests
- Python 3.11+ support
2025-09-22 22:21:48 +02:00
Simone Scarduzio
7562064832 Initial commit: DeltaGlider - 99.9% compression for S3 storage
DeltaGlider reduces storage costs by storing only binary deltas between
similar files. Achieves 99.9% compression for versioned artifacts.

Key features:
- Intelligent file type detection (delta for archives, direct for others)
- Drop-in S3 replacement with automatic compression
- SHA256 integrity verification on every operation
- Clean hexagonal architecture
- Full test coverage
- Production tested with 200K+ files

Case study: ReadOnlyREST reduced 4TB to 5GB (99.9% compression)
2025-09-22 15:49:31 +02:00