deltaglider

mirror of https://github.com/beshu-tech/deltaglider.git synced 2026-02-25 11:54:52 +01:00

Author	SHA1	Message	Date
Simone Scarduzio	9bfe121f44	style: format files for ruff format --check compliance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> v6.1.0	2026-02-07 16:02:40 +01:00
Simone Scarduzio	6cab3de9a0	fix: disable sha tag on tag pushes to avoid invalid Docker tag The sha tag template `prefix={{branch}}-` produces `:-hash` on tag pushes because {{branch}} is empty, resulting in an invalid Docker tag like `beshultd/deltaglider:-482f45f`. Only emit sha tags on branch pushes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 15:57:37 +01:00
Simone Scarduzio	482f45fc02	docs: update CHANGELOG for v6.1.0 release Add v6.1.0 section with bucket ACL support, Docker publishing, config/model refactoring. Backfill v6.0.0 section from previously unreleased entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 15:55:50 +01:00
Simone Scarduzio	6b3245266e	feat: add put_bucket_acl and get_bucket_acl support Add boto3-compatible bucket ACL operations as pure S3 passthroughs, following the existing create_bucket/delete_bucket pattern. Includes CLI commands (put-bucket-acl, get-bucket-acl), 7 integration tests, and documentation updates (method count 21→23). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 15:53:33 +01:00
Simone Scarduzio	20053acb5f	fix: remove unused imports flagged by ruff Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 08:48:22 +01:00
Simone Scarduzio	87f425734f	refactor: typed result dataclasses, centralized metadata aliases, config extraction - Replace dict[str,Any] returns in delete/delete_recursive with DeleteResult and RecursiveDeleteResult dataclasses for type safety - Extract _delete_reference/_delete_delta/_classify_objects_for_deletion helper methods from oversized delete methods in service.py - Centralize metadata key aliases in METADATA_KEY_ALIASES dict with resolve_metadata() replacing duplicated _meta_value() lookups - Add DeltaGliderConfig dataclass with from_env() for centralized config - Add ObjectKey.full_key property, remove dead _multipart_uploads dict - Update all consumers (client, CLI, tests) for dataclass access patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 23:16:57 +01:00
Simone Scarduzio	012662c377	updates	2025-11-11 17:20:43 +01:00
Simone Scarduzio	284f030fae	updates to docs	2025-11-11 17:05:50 +01:00
Simone Scarduzio	7a4d30a007	freshen up	2025-11-11 11:18:06 +01:00
Simone Scarduzio	0d46283ff0	width	2025-11-11 09:55:52 +01:00
Simone Scarduzio	805e2967bc	dark mode	2025-11-11 09:53:54 +01:00
Simone Scarduzio	2ef1741d51	freshen up readme	2025-11-11 09:48:34 +01:00
Simone Scarduzio	2c1d756e7b	tweak readme	2025-11-06 16:14:29 +01:00
Simone Scarduzio	c6cee7ae26	docker	2025-11-06 15:56:15 +01:00
Simone Scarduzio	cee9a9fd2d	higher limits why not v6.0.2	2025-10-17 18:43:46 +02:00
Simone Scarduzio	0507e6ebcd	format	2025-10-16 17:14:37 +02:00
Simone Scarduzio	fa9c4fa42d	feat: Implement rehydration and purge functionality for deltaglider files - Added `rehydrate_for_download` method to download and decompress deltaglider-compressed files, re-uploading them with expiration metadata. - Introduced `generate_presigned_url_with_rehydration` method to generate presigned URLs that automatically handle rehydration for both regular and deltaglider files. - Implemented `purge_temp_files` command in CLI to delete expired temporary files from the .deltaglider/tmp/ directory, with options for dry run and JSON output. - Enhanced service methods to support the new rehydration and purging features, including detailed logging and metrics tracking.	2025-10-16 17:02:00 +02:00
Simone Scarduzio	934d83975c	fix: format models.py v6.0.1	2025-10-16 11:21:33 +02:00
Simone Scarduzio	c32d5265d9	feat: Enhance metadata handling and bucket statistics - Added object_limit_reached attribute to BucketStats for tracking limits. - Introduced QUICK_LIST_LIMIT and SAMPLED_LIST_LIMIT constants to manage listing limits. - Implemented _first_metadata_value helper function for improved metadata retrieval. - Updated get_bucket_stats to log when listing is capped due to limits. - Refactored DeltaMeta to streamline metadata extraction with error handling. - Enhanced object listing to support max_objects parameter and limit tracking.	2025-10-16 11:17:13 +02:00
Simone Scarduzio	1cf7e3ad21	import	2025-10-15 18:52:56 +02:00
Simone Scarduzio	9b36087438	not mandatory to have the command metadata field set	2025-10-15 18:16:43 +02:00
Simone Scarduzio	60877966f2	docs: Remove outdated METADATA_ISSUE_DIAGNOSIS.md This document describes the old metadata format without dg- prefix. Since v6.0.0 uses the new dg- prefixed format and requires all files to be re-uploaded (greenfield approach), this diagnosis doc is no longer relevant.	2025-10-15 11:45:52 +02:00
Simone Scarduzio	fbd44ea3c3	style: Format integration test files with ruff v6.0.0	2025-10-15 11:38:17 +02:00
Simone Scarduzio	3f689fc601	fix: Update integration tests for new metadata format and caching behavior - Fix sync tests: Add list_objects.side_effect = NotImplementedError() to mock - Fix sync tests: Add side_effect for put() to avoid hanging - Fix MockStorage: Add continuation_token parameter to list_objects() - Fix stats tests: Update assertions to include use_cache and refresh_cache params - Fix bucket management test: Update caching expectations for S3-based cache All 97 integration tests now pass.	2025-10-15 11:34:43 +02:00
Simone Scarduzio	3753212f96	style: Format test file with ruff 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-15 11:22:00 +02:00
Simone Scarduzio	db7d14f8a8	feat: Add metadata namespace and fix stats calculation This is a major release with breaking changes to metadata format. BREAKING CHANGES: - All metadata keys now use 'dg-' namespace prefix (becomes 'x-amz-meta-dg-*' in S3) - Old metadata format is not supported - all files must be re-uploaded - Stats behavior changed: quick mode no longer shows misleading warnings Features: - Metadata now uses real package version (dg-tool: deltaglider/VERSION) - All metadata keys properly namespaced with 'dg-' prefix - Clean stats output in quick mode (no per-file warning spam) - Fixed nonsensical negative compression ratios in quick mode Fixes: - Stats now correctly handles delta files without metadata - Space saved shows 0 instead of negative numbers when metadata unavailable - Removed misleading warnings in quick mode (metadata not fetched is expected) - Fixed metadata keys to use hyphens instead of underscores Documentation: - Added comprehensive metadata documentation - Added stats calculation behavior guide - Added real version tracking documentation Tests: - Updated all tests to use new dg- prefixed metadata keys - All 73 unit tests passing - All quality checks passing (ruff, mypy) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-15 11:19:10 +02:00
Simone Scarduzio	e1259b7ea8	fix: Code quality improvements for v5.2.2 release - Fix pagination bug using continuation_token instead of start_after - Add stats caching to prevent blocking web apps - Improve code formatting and type checking - Add comprehensive unit tests for new features - Fix test mock usage in object_listing tests v5.2.2	2025-10-14 23:54:49 +02:00
Simone Scarduzio	ff05e77c24	fix: Prevent get_bucket_stats from blocking web apps indefinitely Performance Issues Fixed: 1. aws_compat.py: Changed to use cached stats only (no bucket scans after uploads) 2. stats.py: Added safety mechanisms to prevent infinite hangs - Max 10k iterations (10M object limit) - 10 min timeout on metadata fetching - Missing pagination token detection - Graceful error recovery with partial stats Refactoring: - Reduced nesting in get_bucket_stats from 5 levels to 2 levels - Extracted 5 helper functions for better maintainability - Main function reduced from 300+ lines to 33 lines - 100% backward compatible - no API changes Benefits: - Web apps no longer hang on upload/delete operations - Explicit get_bucket_stats() calls complete within bounded time - Better error handling and logging - Easier to test and maintain 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> v5.2.1	2025-10-14 14:47:39 +02:00
Simone Scarduzio	c3d385bf18	fix tests v5.2.0	2025-10-13 17:26:35 +02:00
Simone Scarduzio	aea5cb5d9a	feat: Enhance S3 migration CLI with new commands and EC2 detection option	2025-10-12 23:12:32 +02:00
Simone Scarduzio	b2ca59490b	feat: Add EC2 region detection and cost optimization features	2025-10-12 22:41:48 +02:00
Simone Scarduzio	4f56c4b600	fix: Preserve original filenames during S3-to-S3 migration	2025-10-12 18:10:04 +02:00
Simone Scarduzio	14c6af0f35	handle version in cli	2025-10-12 17:47:05 +02:00
Simone Scarduzio	67792b2031	migrate CLI support	2025-10-12 17:37:44 +02:00
Simone Scarduzio	a9a1396e6e	style: Format test_stats_algorithm.py with ruff v5.1.1	2025-10-11 14:17:49 +02:00
Simone Scarduzio	52eb5bba21	fix: Fix unit test import issues for concurrent.futures - Remove unnecessary concurrent.futures patches in tests - Update test_detailed_stats_flag to match current implementation behavior - Tests now properly handle parallel metadata fetching without mocking	2025-10-11 14:13:40 +02:00
Simone Scarduzio	f75db142e8	fix: Correct logging message formatting in get_bucket_stats and update test assertionsalls for clarity.	2025-10-11 14:05:54 +02:00
Simone Scarduzio	35d34d4862	chore: Update CHANGELOG for v5.1.1 release - Document stats command fixes - Document performance improvements	2025-10-10 19:57:11 +02:00
Simone Scarduzio	9230cbd762	test	2025-10-10 19:52:15 +02:00
Simone Scarduzio	2eba6e8d38	optimisation	2025-10-10 19:50:33 +02:00
Simone Scarduzio	656726b57b	algorithm correctness	2025-10-10 19:46:39 +02:00
Simone Scarduzio	85dd315424	ruff v5.1.0 v5.0.4	2025-10-10 18:44:46 +02:00
Simone Scarduzio	dbd2632cae	docs: Update SDK documentation for v5.1.0 features - Add session-level caching documentation to API reference - Document clear_cache() and evict_cache() methods - Add comprehensive bucket statistics examples - Update list_buckets() with DeltaGliderStats metadata - Add cache management patterns and best practices - Update CHANGELOG comparison links 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-10 18:34:44 +02:00
Simone Scarduzio	3d04a407c0	feat: Add stats command with session-level caching (v5.1.0) New Features: - Add 'deltaglider stats' CLI command for bucket compression metrics - Session-level bucket statistics caching for performance - Enhanced list_buckets() with cached stats metadata Technical Changes: - Automatic cache invalidation on bucket mutations - Intelligent cache reuse (detailed → quick fallback) - Comprehensive test coverage (106+ new test lines) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-10 18:30:05 +02:00
Simone Scarduzio	47f022fffe	feat: Add programmatic cache management for long-running applications Implements cache clearing functionality for SDK users who need manual cache management in long-running applications where automatic cleanup on process exit is not sufficient. New Features: - Added `clear()` method to CachePort protocol - Implemented `clear()` in all cache adapters: * ContentAddressedCache: Clears files and SHA mappings * EncryptedCache: Clears encryption mappings and delegates to backend * MemoryCache: Already had clear() method - Added `clear_cache()` method to DeltaGliderClient for public API Cache Management API: ```python from deltaglider import create_client client = create_client() # Upload files client.put_object(Bucket='bucket', Key='file.zip', Body=data) # Clear cache manually (important for long-running apps!) client.clear_cache() ``` New Documentation: - docs/CACHE_MANAGEMENT.md (684 lines) * Comprehensive guide for programmatic cache management * Long-running application strategies (web apps, services, batch jobs) * Encryption key management (ephemeral vs. persistent) * Key rotation procedures * Memory vs. filesystem cache trade-offs * Best practices by application type * Monitoring and troubleshooting Key Topics Covered: - Why SDK requires manual cache management (vs. CLI auto-cleanup) - When to clear cache (periodic, config changes, tests, etc.) - Cache strategies for 5 application types: * Long-running background services * Periodic batch jobs * Web applications / API servers * Testing / CI/CD * AWS Lambda / Serverless - Encryption key management: * Ephemeral keys (default, maximum security) * Persistent keys (shared cache scenarios) * Key rotation procedures * Secure key storage (Secrets Manager) - Memory vs. filesystem cache selection - Monitoring cache health - Troubleshooting common issues Use Cases: - Long-running services: Periodic cache clearing to prevent growth - Batch jobs: Clear cache in finally block - Tests: Clear cache after each test for clean state - Multi-process: Shared cache with persistent encryption keys - High performance: Memory cache with automatic LRU eviction Security Enhancements: - Documented encryption key lifecycle management - Key rotation procedures - Secure key storage best practices - Ephemeral vs. persistent key trade-offs Testing: - All 119 tests passing ✅ - Type checking: 0 errors (mypy) ✅ - Linting: All checks passed (ruff) ✅ Breaking Changes: None (new API only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-10 10:34:02 +02:00
Simone Scarduzio	7a2ed16ee7	docs: Add comprehensive DG_MAX_RATIO tuning guide Created extensive documentation for the DG_MAX_RATIO parameter, which controls delta compression efficiency thresholds. New Documentation: - docs/DG_MAX_RATIO.md (526 lines) * Complete explanation of how DG_MAX_RATIO works * Real-world scenarios and use cases * Decision trees for choosing optimal values * Industry-specific recommendations * Monitoring and tuning strategies * Advanced usage patterns * Comprehensive FAQ Updates to Existing Documentation: - README.md: Added link to DG_MAX_RATIO guide with tip callout - CLAUDE.md: Added detailed DG_MAX_RATIO explanation and guide link - Dockerfile: Added inline comments explaining DG_MAX_RATIO tuning - docs/sdk/getting-started.md: Added DG_MAX_RATIO guide reference Key Topics Covered: - What DG_MAX_RATIO does and why it exists - How to choose the right value (0.2-0.7 range) - Real-world scenarios (nightly builds, major versions, etc.) - Industry-specific use cases (SaaS, mobile apps, backups, etc.) - Configuration examples (Docker, SDK, CLI) - Monitoring and optimization strategies - Advanced usage patterns (dynamic ratios, A/B testing) - FAQ addressing common questions Examples Included: - Conservative (0.2-0.3): For dissimilar files or expensive storage - Default (0.5): Balanced approach for most use cases - Permissive (0.6-0.7): For very similar files or cheap storage Value Proposition: - Helps users optimize compression for their specific use case - Prevents inefficient delta compression - Provides data-driven tuning methodology - Reduces support questions about compression behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-10 10:19:59 +02:00
Simone Scarduzio	5e333254ba	docs: Comprehensive environment variable documentation Added complete documentation for all environment variables across Dockerfile, README.md, and SDK documentation. Dockerfile Changes: - Documented all DeltaGlider environment variables with defaults - Added AWS configuration variables (commented for runtime override) - Updated version label to 5.0.3 - Updated description to mention encryption README.md Changes: - Added comprehensive Docker Usage section - Documented all environment variables with examples - Added Docker examples for: * Basic usage with AWS credentials * Memory cache configuration for CI/CD * MinIO/custom endpoint usage * Persistent encryption key setup - Security notes for encryption and cache behavior SDK Documentation Changes: - Added DeltaGlider Configuration section - Documented all environment variables - Added configuration examples - Security notes for encryption behavior Environment Variables Documented: - DG_LOG_LEVEL (logging configuration) - DG_MAX_RATIO (compression threshold) - DG_CACHE_BACKEND (filesystem or memory) - DG_CACHE_MEMORY_SIZE_MB (memory cache size) - DG_CACHE_ENCRYPTION_KEY (optional persistent key) - AWS_ENDPOINT_URL (custom S3 endpoints) - AWS_ACCESS_KEY_ID (AWS credentials) - AWS_SECRET_ACCESS_KEY (AWS credentials) - AWS_DEFAULT_REGION (AWS region) Quality Checks: - All 119 tests passing ✅ - Type checking: 0 errors (mypy) ✅ - Linting: All checks passed (ruff) ✅ - Dockerfile syntax validated ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> v5.0.3	2025-10-10 10:12:25 +02:00
Simone Scarduzio	04cc984d4a	ruff	2025-10-10 10:09:11 +02:00
Simone Scarduzio	ac7d4e067f	security: Make encryption always-on with auto-cleanup BREAKING CHANGES: - Encryption is now ALWAYS enabled (cannot be disabled) - Removed DG_CACHE_ENCRYPTION environment variable Security Enhancements: - Encryption is mandatory for all cache operations - Ephemeral encryption keys per process (forward secrecy) - Automatic deletion of corrupted cache files on decryption failures - Auto-cleanup on both decryption failures and SHA mismatches Changes: - Removed DG_CACHE_ENCRYPTION toggle from CLI and SDK - Updated EncryptedCache to auto-delete corrupted files - Simplified cache initialization (always wrapped with encryption) - DG_CACHE_ENCRYPTION_KEY remains optional for persistent keys Documentation: - Updated CLAUDE.md with encryption always-on behavior - Updated CHANGELOG.md with breaking changes - Clarified security model and auto-cleanup behavior Testing: - All 119 tests passing with encryption always-on - Type checking: 0 errors (mypy) - Linting: All checks passed (ruff) Rationale: - Zero-trust cache architecture requires encryption - Corrupted cache is security risk - auto-deletion prevents exploitation - Ephemeral keys provide maximum security by default - Users who need cross-process sharing can opt-in with persistent keys 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-10 09:51:29 +02:00
Simone Scarduzio	e8fb926fd6	docs: Update SECURITY_FIX_ROADMAP.md - mark encryption complete	2025-10-10 09:40:02 +02:00

1 2 3

131 Commits