security: Make encryption always-on with auto-cleanup

BREAKING CHANGES:
- Encryption is now ALWAYS enabled (cannot be disabled)
- Removed DG_CACHE_ENCRYPTION environment variable

Security Enhancements:
- Encryption is mandatory for all cache operations
- Ephemeral encryption keys per process (forward secrecy)
- Automatic deletion of corrupted cache files on decryption failures
- Auto-cleanup on both decryption failures and SHA mismatches

Changes:
- Removed DG_CACHE_ENCRYPTION toggle from CLI and SDK
- Updated EncryptedCache to auto-delete corrupted files
- Simplified cache initialization (always wrapped with encryption)
- DG_CACHE_ENCRYPTION_KEY remains optional for persistent keys

Documentation:
- Updated CLAUDE.md with encryption always-on behavior
- Updated CHANGELOG.md with breaking changes
- Clarified security model and auto-cleanup behavior

Testing:
- All 119 tests passing with encryption always-on
- Type checking: 0 errors (mypy)
- Linting: All checks passed (ruff)

Rationale:
- Zero-trust cache architecture requires encryption
- Corrupted cache is security risk - auto-deletion prevents exploitation
- Ephemeral keys provide maximum security by default
- Users who need cross-process sharing can opt-in with persistent keys

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Simone Scarduzio
2025-10-10 09:51:29 +02:00
parent e8fb926fd6
commit ac7d4e067f
5 changed files with 51 additions and 25 deletions

View File

@@ -11,12 +11,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Security
- **BREAKING**: Removed all legacy shared cache code for security
- **BREAKING**: Encryption is now ALWAYS ON (cannot be disabled)
- Ephemeral process-isolated cache is now the ONLY mode (no opt-out)
- **Content-Addressed Storage (CAS)**: Implemented SHA256-based cache storage
- Zero collision risk (SHA256 namespace guarantees uniqueness)
- Automatic deduplication (same content = same filename)
- Tampering protection (changing content changes SHA, breaks lookup)
- Two-level directory structure for filesystem optimization
- **Encrypted Cache**: All cache data encrypted at rest using Fernet (AES-128-CBC + HMAC)
- Ephemeral encryption keys per process (forward secrecy)
- Optional persistent keys via `DG_CACHE_ENCRYPTION_KEY` for shared filesystems
- Automatic cleanup of corrupted cache files on decryption failures
- Fixed TOCTOU vulnerabilities with atomic SHA validation at use-time
- Added `get_validated_ref()` method to prevent cache poisoning
- Eliminated multi-user data exposure through mandatory cache isolation
@@ -24,6 +29,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Removed
- **BREAKING**: Removed `DG_UNSAFE_SHARED_CACHE` environment variable
- **BREAKING**: Removed `DG_CACHE_DIR` environment variable
- **BREAKING**: Removed `DG_CACHE_ENCRYPTION` environment variable (encryption always on)
- **BREAKING**: Removed `cache_dir` parameter from `create_client()`
### Changed
@@ -33,14 +39,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- New `ContentAddressedCache` adapter in `adapters/cache_cas.py`
- New `EncryptedCache` wrapper in `adapters/cache_encrypted.py`
- New `MemoryCache` adapter in `adapters/cache_memory.py` with LRU eviction
- Self-describing cache structure with SHA256-based filenames
- Configurable cache backends via `DG_CACHE_BACKEND` (filesystem or memory)
- Memory cache size limit via `DG_CACHE_MEMORY_SIZE_MB` (default: 100MB)
### Internal
- Updated all tests to use Content-Addressed Storage
- All 99 tests passing with zero errors
- Updated all tests to use Content-Addressed Storage and encryption
- All 119 tests passing with zero errors (99 original + 20 new cache tests)
- Type checking: 0 errors (mypy)
- Linting: All checks passed (ruff)
- Completed Phase 1 & Phase 2 of SECURITY_FIX_ROADMAP.md
- Completed Phase 1, 2, and 7 of SECURITY_FIX_ROADMAP.md
- Added comprehensive test suites for encryption (13 tests) and memory cache (10 tests)
## [5.0.1] - 2025-01-10

View File

@@ -192,14 +192,18 @@ Core delta logic is in `src/deltaglider/core/service.py`:
- `DG_MAX_RATIO`: Maximum acceptable delta/file ratio (default: "0.5")
- `DG_CACHE_BACKEND`: Cache backend type - "filesystem" (default) or "memory"
- `DG_CACHE_MEMORY_SIZE_MB`: Memory cache size limit in MB (default: "100")
- `DG_CACHE_ENCRYPTION`: Enable cache encryption - "true" (default) or "false"
- `DG_CACHE_ENCRYPTION_KEY`: Optional base64-encoded Fernet key for persistent encryption (ephemeral by default)
- `AWS_ENDPOINT_URL`: Override S3 endpoint for MinIO/LocalStack
- `AWS_ACCESS_KEY_ID`: AWS credentials
- `AWS_SECRET_ACCESS_KEY`: AWS credentials
- `AWS_DEFAULT_REGION`: AWS region
**Note**: DeltaGlider uses ephemeral, process-isolated cache for security. Cache is automatically created in `/tmp/deltaglider-*` and cleaned up on exit. Encryption is enabled by default with ephemeral keys for forward secrecy.
**Security Notes**:
- **Encryption Always On**: Cache data is ALWAYS encrypted (cannot be disabled)
- **Ephemeral Keys**: Encryption keys auto-generated per process for maximum security
- **Auto-Cleanup**: Corrupted cache files automatically deleted on decryption failures
- **Process Isolation**: Each process gets isolated cache in `/tmp/deltaglider-*`, cleaned up on exit
- **Persistent Keys**: Set `DG_CACHE_ENCRYPTION_KEY` only if you need cross-process cache sharing (e.g., shared filesystems)
## Important Implementation Details
@@ -230,7 +234,9 @@ Core delta logic is in `src/deltaglider/core/service.py`:
- Use IAM roles when possible
- All S3 operations respect bucket policies and encryption settings
- SHA256 checksums prevent tampering and corruption
- **Encryption at Rest**: Cache data encrypted by default using Fernet (AES-128-CBC + HMAC)
- **Ephemeral Keys**: Encryption keys auto-generated per process for forward secrecy
- **Persistent Keys**: Set `DG_CACHE_ENCRYPTION_KEY` for cross-process cache sharing (use secrets management)
- **Content-Addressed Storage**: SHA256-based filenames prevent collision attacks
- **Encryption Always On**: Cache data is ALWAYS encrypted using Fernet (AES-128-CBC + HMAC) - cannot be disabled
- **Ephemeral Keys**: Encryption keys auto-generated per process for forward secrecy and process isolation
- **Auto-Cleanup**: Corrupted or tampered cache files automatically deleted on decryption failures
- **Persistent Keys**: Set `DG_CACHE_ENCRYPTION_KEY` only for cross-process cache sharing (use secrets management)
- **Content-Addressed Storage**: SHA256-based filenames prevent collision attacks
- **Zero-Trust Cache**: All cache operations include cryptographic validation

View File

@@ -164,9 +164,17 @@ class EncryptedCache(CachePort):
decrypted_data = self._cipher.decrypt(encrypted_data)
except Exception as e:
# Fernet raises InvalidToken for tampering/wrong key
# SECURITY: Auto-delete corrupted cache files
try:
encrypted_path.unlink(missing_ok=True)
# Clean up mapping
if key in self._plaintext_sha_map:
del self._plaintext_sha_map[key]
except Exception:
pass # Best effort cleanup
raise CacheCorruptionError(
f"Decryption failed for {bucket}/{prefix}: {e}. "
f"Cache may be corrupted or key mismatch."
f"Corrupted cache deleted automatically."
) from e
# Validate SHA of decrypted content
@@ -174,9 +182,18 @@ class EncryptedCache(CachePort):
actual_sha = hashlib.sha256(decrypted_data).hexdigest()
if actual_sha != expected_sha:
# SECURITY: Auto-delete corrupted cache files
try:
encrypted_path.unlink(missing_ok=True)
# Clean up mapping
if key in self._plaintext_sha_map:
del self._plaintext_sha_map[key]
except Exception:
pass # Best effort cleanup
raise CacheCorruptionError(
f"Decrypted content SHA mismatch for {bucket}/{prefix}: "
f"expected {expected_sha}, got {actual_sha}"
f"expected {expected_sha}, got {actual_sha}. "
f"Corrupted cache deleted automatically."
)
# Write decrypted content to temporary file

View File

@@ -75,13 +75,9 @@ def create_service(
# Filesystem-backed with Content-Addressed Storage
base_cache = ContentAddressedCache(cache_dir, hasher)
# Apply encryption if enabled
enable_encryption = os.environ.get("DG_CACHE_ENCRYPTION", "true").lower() == "true"
cache: CachePort
if enable_encryption:
cache = EncryptedCache.from_env(base_cache)
else:
cache = base_cache
# Always apply encryption with ephemeral keys (security hardening)
# Encryption key is optional via DG_CACHE_ENCRYPTION_KEY (ephemeral if not set)
cache: CachePort = EncryptedCache.from_env(base_cache)
clock = UtcClockAdapter()
logger = StdLoggerAdapter(level=log_level)

View File

@@ -1159,13 +1159,9 @@ def create_client(
# Filesystem-backed with Content-Addressed Storage
base_cache = ContentAddressedCache(cache_dir, hasher)
# Apply encryption if enabled (default: true)
enable_encryption = os.environ.get("DG_CACHE_ENCRYPTION", "true").lower() == "true"
cache: CachePort
if enable_encryption:
cache = EncryptedCache.from_env(base_cache)
else:
cache = base_cache
# Always apply encryption with ephemeral keys (security hardening)
# Encryption key is optional via DG_CACHE_ENCRYPTION_KEY (ephemeral if not set)
cache: CachePort = EncryptedCache.from_env(base_cache)
clock = UtcClockAdapter()
logger = StdLoggerAdapter(level=log_level)