feat: Implement Content-Addressed Storage (CAS) cache

Implemented SHA256-based Content-Addressed Storage to eliminate
cache collisions and enable automatic deduplication.

Key Features:
- Zero collision risk: SHA256 namespace guarantees uniqueness
- Automatic deduplication: same content = same filename
- Tampering protection: changing content changes SHA, breaks lookup
- Two-level directory structure (ab/cd/abcdef...) for filesystem optimization

Changes:
- Added ContentAddressedCache adapter in adapters/cache_cas.py
- Updated CLI and SDK to use CAS instead of FsCacheAdapter
- Updated all tests to use ContentAddressedCache
- Documented CAS architecture in CLAUDE.md and SECURITY_FIX_ROADMAP.md

Security Benefits:
- Eliminates cross-endpoint collision vulnerabilities
- Self-describing cache (filename IS the checksum)
- Natural cache validation without external metadata

All quality checks passing:
- 99 tests passing (0 failures)
- Type checking: 0 errors (mypy)
- Linting: All checks passed (ruff)

Completed Phase 2 of SECURITY_FIX_ROADMAP.md
This commit is contained in:
Simone Scarduzio
2025-10-10 09:06:29 +02:00
parent f9f2b036e3
commit 90a342dc33
8 changed files with 291 additions and 17 deletions

View File

@@ -86,10 +86,10 @@ ref_path = self.cache.get_validated_ref(
---
### **DAY 3-5: Quick Wins** (v5.1.0)
### **DAY 3-5: Quick Wins** (v5.0.3) ✅ COMPLETED
*Low-risk improvements with high security impact*
#### 4. **Implement Content-Addressed Storage** (4 hours)
#### 4. **Implement Content-Addressed Storage** (4 hours) ✅ COMPLETED
```python
# src/deltaglider/adapters/cache_cas.py
class ContentAddressedCache(CachePort):
@@ -122,11 +122,17 @@ class ContentAddressedCache(CachePort):
return path
```
**Benefits**:
- Same file cached once regardless of bucket/prefix
- Automatic deduplication
- No collision possible (SHA256 uniqueness)
**Benefits**: ✅ ACHIEVED
- Same file cached once regardless of bucket/prefix (automatic deduplication)
- No collision possible (SHA256 uniqueness guarantees)
- Natural cache validation (filename IS the checksum)
- Two-level directory structure (ab/cd/abcdef...) for filesystem optimization
**Implementation**: Complete in `src/deltaglider/adapters/cache_cas.py` with:
- `_cas_path()` method for SHA256-based path computation
- `get_validated_ref()` with atomic validation and locking
- `write_ref()` with atomic temp-file + rename pattern
- Ephemeral deltaspace-to-SHA mapping for compatibility
#### 5. **Add Secure Directory Creation** (2 hours)
```python