Implemented SHA256-based Content-Addressed Storage to eliminate cache collisions and enable automatic deduplication. Key Features: - Zero collision risk: SHA256 namespace guarantees uniqueness - Automatic deduplication: same content = same filename - Tampering protection: changing content changes SHA, breaks lookup - Two-level directory structure (ab/cd/abcdef...) for filesystem optimization Changes: - Added ContentAddressedCache adapter in adapters/cache_cas.py - Updated CLI and SDK to use CAS instead of FsCacheAdapter - Updated all tests to use ContentAddressedCache - Documented CAS architecture in CLAUDE.md and SECURITY_FIX_ROADMAP.md Security Benefits: - Eliminates cross-endpoint collision vulnerabilities - Self-describing cache (filename IS the checksum) - Natural cache validation without external metadata All quality checks passing: - 99 tests passing (0 failures) - Type checking: 0 errors (mypy) - Linting: All checks passed (ruff) Completed Phase 2 of SECURITY_FIX_ROADMAP.md
18 KiB
🛡️ DeltaGlider Security Fix Roadmap
Executive Summary
Critical security vulnerabilities have been identified in DeltaGlider's cache system that enable multi-user attacks, data exposure, and cache poisoning. This document provides a chronological, actionable roadmap to eliminate these threats through bold architectural changes.
Key Innovation: Instead of patching individual issues, we propose a "Zero-Trust Cache Architecture" that eliminates entire classes of vulnerabilities.
🚀 The Bold Solution: Ephemeral Signed Cache
Core Concept
Replace filesystem cache with ephemeral, cryptographically-signed, user-isolated cache that eliminates:
- TOCTOU vulnerabilities (no shared filesystem)
- Multi-user interference (process isolation)
- Cache poisoning (cryptographic signatures)
- Information disclosure (encrypted metadata)
- Cross-endpoint collision (content-addressed storage)
Note: DeltaGlider is designed as a standalone CLI/SDK application. All solutions maintain this architecture without requiring external services.
📋 Implementation Roadmap
DAY 1-2: Emergency Hotfix (v5.0.3) ✅ COMPLETED
Stop the bleeding - minimal changes for immediate deployment
1. Ephemeral Process-Isolated Cache (2 hours) ✅ COMPLETED
# src/deltaglider/app/cli/main.py
import tempfile
import atexit
# SECURITY: Always use ephemeral process-isolated cache
cache_dir = Path(tempfile.mkdtemp(prefix="deltaglider-", dir="/tmp"))
atexit.register(lambda: shutil.rmtree(cache_dir, ignore_errors=True))
Impact: Each process gets isolated cache, auto-cleaned on exit. Eliminates multi-user attacks. Implementation: All legacy shared cache code removed. Ephemeral cache is now the ONLY mode.
2. Add SHA Validation at Use-Time (2 hours) ✅ COMPLETED
# src/deltaglider/ports/cache.py
class CachePort(Protocol):
def get_validated_ref(self, bucket: str, prefix: str, expected_sha: str) -> Path:
"""Get reference with atomic SHA validation - MUST use this for all operations."""
...
# src/deltaglider/adapters/cache_fs.py
def get_validated_ref(self, bucket: str, prefix: str, expected_sha: str) -> Path:
path = self.ref_path(bucket, prefix)
if not path.exists():
raise CacheMissError(f"Cache miss for {bucket}/{prefix}")
# Lock file for atomic read (Unix only)
with open(path, 'rb') as f:
if sys.platform != "win32":
fcntl.flock(f.fileno(), fcntl.LOCK_SH)
content = f.read()
actual_sha = hashlib.sha256(content).hexdigest()
if actual_sha != expected_sha:
path.unlink() # Remove corrupted cache
raise CacheCorruptionError(f"SHA mismatch: cache corrupted")
return path
3. Update All Usage Points (1 hour) ✅ COMPLETED
# src/deltaglider/core/service.py
# Replaced ALL instances in two locations:
# - Line 234 (get method for decoding)
# - Line 415 (_create_delta method for encoding)
ref_path = self.cache.get_validated_ref(
delta_space.bucket,
delta_space.prefix,
ref_sha256 # Pass expected SHA
)
Test & Deploy: ✅ All 99 tests passing + ready for release
DAY 3-5: Quick Wins (v5.0.3) ✅ COMPLETED
Low-risk improvements with high security impact
4. Implement Content-Addressed Storage (4 hours) ✅ COMPLETED
# src/deltaglider/adapters/cache_cas.py
class ContentAddressedCache(CachePort):
"""Cache using SHA as filename - eliminates collisions"""
def ref_path(self, bucket: str, prefix: str, sha256: str) -> Path:
# Use SHA as filename - guaranteed unique
return self.base_dir / sha256[:2] / sha256[2:4] / sha256
def write_ref(self, bucket: str, prefix: str, src: Path, sha256: str) -> Path:
path = self.ref_path(bucket, prefix, sha256)
# If file with this SHA exists, we're done (deduplication!)
if path.exists():
return path
# Atomic write
path.parent.mkdir(parents=True, mode=0o700, exist_ok=True)
tmp = path.with_suffix('.tmp')
shutil.copy2(src, tmp)
os.chmod(tmp, 0o600)
# Verify content before committing
actual_sha = self.hasher.sha256(tmp)
if actual_sha != sha256:
tmp.unlink()
raise ValueError("File corruption during cache write")
os.replace(tmp, path) # Atomic
return path
Benefits: ✅ ACHIEVED
- Same file cached once regardless of bucket/prefix (automatic deduplication)
- No collision possible (SHA256 uniqueness guarantees)
- Natural cache validation (filename IS the checksum)
- Two-level directory structure (ab/cd/abcdef...) for filesystem optimization
Implementation: Complete in src/deltaglider/adapters/cache_cas.py with:
_cas_path()method for SHA256-based path computationget_validated_ref()with atomic validation and lockingwrite_ref()with atomic temp-file + rename pattern- Ephemeral deltaspace-to-SHA mapping for compatibility
5. Add Secure Directory Creation (2 hours)
# src/deltaglider/utils/secure_fs.py
import os
import stat
def secure_makedirs(path: Path, mode: int = 0o700) -> None:
"""Create directory with secure permissions atomically."""
try:
path.mkdir(parents=True, mode=mode, exist_ok=False)
except FileExistsError:
# Verify it's ours and has correct permissions
st = path.stat()
if st.st_uid != os.getuid():
raise SecurityError(f"Directory {path} owned by different user")
if stat.S_IMODE(st.st_mode) != mode:
os.chmod(path, mode) # Fix permissions
6. Unify Cache Configuration (1 hour)
# src/deltaglider/config.py
import os
from pathlib import Path
def get_cache_dir() -> Path:
"""Single source of truth for cache directory."""
if os.environ.get("DG_NO_CACHE") == "true":
return None # Feature flag to disable cache
if os.environ.get("DG_EPHEMERAL_CACHE") == "true":
return Path(tempfile.mkdtemp(prefix="dg-cache-"))
# User-specific cache by default
cache_base = os.environ.get("DG_CACHE_DIR",
os.path.expanduser("~/.cache/deltaglider"))
return Path(cache_base) / "v2" # Version cache format
DAY 6-10: Architecture Redesign (v5.2.0)
The bold solution that eliminates entire vulnerability classes
7. Implement Memory Cache with Encryption (8 hours)
# src/deltaglider/adapters/cache_memory.py
import mmap
import pickle
from cryptography.fernet import Fernet
class MemoryCache(CachePort):
"""In-memory cache with optional disk backing via mmap."""
def __init__(self, max_size_mb: int = 100):
self.cache = {} # SHA -> (data, metadata)
self.max_size = max_size_mb * 1024 * 1024
self.current_size = 0
# Per-process encryption key (never persisted)
self.cipher = Fernet(Fernet.generate_key())
# Optional mmap for larger files
self.mmap_file = tempfile.NamedTemporaryFile(prefix="dg-mmap-")
self.mmap = mmap.mmap(self.mmap_file.fileno(), self.max_size)
def write_ref(self, bucket: str, prefix: str, src: Path, sha256: str) -> bytes:
# Read and encrypt data
with open(src, 'rb') as f:
data = f.read()
encrypted = self.cipher.encrypt(data)
# Store in memory or mmap based on size
if len(encrypted) + self.current_size <= self.max_size:
self.cache[sha256] = encrypted
self.current_size += len(encrypted)
else:
# Evict LRU or use mmap
self._evict_lru()
self.cache[sha256] = encrypted
return encrypted
def get_validated_ref(self, bucket: str, prefix: str, sha256: str) -> bytes:
if sha256 not in self.cache:
raise CacheMissError()
encrypted = self.cache[sha256]
data = self.cipher.decrypt(encrypted)
# Always validate
actual_sha = hashlib.sha256(data).hexdigest()
if actual_sha != sha256:
del self.cache[sha256]
raise CacheCorruptionError()
return data
Benefits:
- No filesystem access = no permission issues
- Encrypted in memory = secure even in core dumps
- Per-process isolation = no multi-user issues
- Zero TOCTOU window = memory is atomic
8. Implement Signed Cache Entries (6 hours)
# src/deltaglider/adapters/cache_signed.py
import hmac
import json
from datetime import datetime, timedelta
class SignedCache(CachePort):
"""Cache with cryptographic signatures and expiry."""
def __init__(self, base_dir: Path, secret_key: bytes = None):
self.base_dir = base_dir
# Per-session key if not provided
self.secret = secret_key or os.urandom(32)
def _sign_metadata(self, metadata: dict) -> str:
"""Create HMAC signature for metadata."""
json_meta = json.dumps(metadata, sort_keys=True)
signature = hmac.new(
self.secret,
json_meta.encode(),
hashlib.sha256
).hexdigest()
return signature
def write_ref(self, bucket: str, prefix: str, src: Path, sha256: str) -> Path:
# Create signed metadata
metadata = {
"sha256": sha256,
"bucket": bucket,
"prefix": prefix,
"timestamp": datetime.utcnow().isoformat(),
"expires": (datetime.utcnow() + timedelta(hours=24)).isoformat(),
"pid": os.getpid(),
"uid": os.getuid(),
}
signature = self._sign_metadata(metadata)
# Store data + metadata
cache_dir = self.base_dir / signature[:8] # Use signature prefix as namespace
cache_dir.mkdir(parents=True, mode=0o700, exist_ok=True)
data_path = cache_dir / f"{sha256}.bin"
meta_path = cache_dir / f"{sha256}.meta"
# Atomic writes
shutil.copy2(src, data_path)
os.chmod(data_path, 0o600)
with open(meta_path, 'w') as f:
json.dump({"metadata": metadata, "signature": signature}, f)
os.chmod(meta_path, 0o600)
return data_path
def get_validated_ref(self, bucket: str, prefix: str, sha256: str) -> Path:
# Find and validate signed entry
pattern = self.base_dir / "*" / f"{sha256}.meta"
matches = list(Path(self.base_dir).glob(f"*/{sha256}.meta"))
for meta_path in matches:
with open(meta_path) as f:
entry = json.load(f)
# Verify signature
expected_sig = self._sign_metadata(entry["metadata"])
if not hmac.compare_digest(entry["signature"], expected_sig):
meta_path.unlink() # Remove tampered entry
continue
# Check expiry
expires = datetime.fromisoformat(entry["metadata"]["expires"])
if datetime.utcnow() > expires:
meta_path.unlink()
continue
# Validate data integrity
data_path = meta_path.with_suffix('.bin')
actual_sha = self.hasher.sha256(data_path)
if actual_sha != sha256:
data_path.unlink()
meta_path.unlink()
continue
return data_path
raise CacheMissError(f"No valid cache entry for {sha256}")
DAY 11-15: Advanced Security (v6.0.0)
Next-generation features for standalone security
9. Add Integrity Monitoring (4 hours)
# src/deltaglider/security/monitor.py
import inotify
import logging
class CacheIntegrityMonitor:
"""Detect and alert on cache tampering attempts."""
def __init__(self, cache_dir: Path):
self.cache_dir = cache_dir
self.notifier = inotify.INotify()
self.watch_desc = self.notifier.add_watch(
str(cache_dir),
inotify.IN_MODIFY | inotify.IN_DELETE | inotify.IN_ATTRIB
)
self.logger = logging.getLogger("security")
async def monitor(self):
"""Monitor for unauthorized cache modifications."""
async for event in self.notifier:
if event.mask & inotify.IN_MODIFY:
# File modified - verify it was by our process
if not self._is_our_modification(event):
self.logger.critical(
f"SECURITY: Unauthorized cache modification detected: {event.path}"
)
# Immediately invalidate affected cache
Path(event.path).unlink(missing_ok=True)
elif event.mask & inotify.IN_ATTRIB:
# Permission change - always suspicious
self.logger.warning(
f"SECURITY: Cache permission change: {event.path}"
)
DAY 16-20: Testing & Rollout (v6.0.0 release)
10. Security Test Suite (8 hours)
# tests/security/test_cache_attacks.py
import pytest
import os
import threading
import time
class TestCacheSecurity:
"""Test all known attack vectors."""
def test_toctou_attack_prevented(self, cache):
"""Verify TOCTOU window is eliminated."""
sha = "abc123"
cache.write_ref("bucket", "prefix", test_file, sha)
# Attacker thread tries to replace file during read
def attacker():
time.sleep(0.0001) # Try to hit the TOCTOU window
cache_path = cache.ref_path("bucket", "prefix", sha)
cache_path.write_bytes(b"malicious")
thread = threading.Thread(target=attacker)
thread.start()
# Should detect tampering
with pytest.raises(CacheCorruptionError):
cache.get_validated_ref("bucket", "prefix", sha)
def test_multi_user_isolation(self, cache):
"""Verify users can't access each other's cache."""
# Create cache as user A
cache_a = SignedCache(Path("/tmp/cache"), secret=b"key_a")
cache_a.write_ref("bucket", "prefix", test_file, "sha_a")
# Try to read as user B with different key
cache_b = SignedCache(Path("/tmp/cache"), secret=b"key_b")
with pytest.raises(CacheMissError):
cache_b.get_validated_ref("bucket", "prefix", "sha_a")
def test_cache_poisoning_prevented(self, cache):
"""Verify corrupted cache is detected."""
sha = "abc123"
cache.write_ref("bucket", "prefix", test_file, sha)
# Corrupt the cache file
cache_path = cache.ref_path("bucket", "prefix", sha)
with open(cache_path, 'ab') as f:
f.write(b"corrupted")
# Should detect corruption
with pytest.raises(CacheCorruptionError):
cache.get_validated_ref("bucket", "prefix", sha)
11. Migration Guide (4 hours)
# src/deltaglider/migration/v5_to_v6.py
def migrate_cache():
"""Migrate from v5 shared cache to v6 secure cache."""
old_cache = Path("/tmp/.deltaglider/cache")
if old_cache.exists():
print("WARNING: Old insecure cache detected at", old_cache)
print("This cache had security vulnerabilities and will not be migrated.")
response = input("Delete old cache? [y/N]: ")
if response.lower() == 'y':
shutil.rmtree(old_cache)
print("Old cache deleted. New secure cache will be created on demand.")
else:
print("Old cache retained at", old_cache)
print("Set DG_CACHE_DIR to use a different location.")
12. Performance Benchmarks (4 hours)
# benchmarks/cache_performance.py
def benchmark_cache_implementations():
"""Compare performance of cache implementations."""
implementations = [
("Filesystem (v5)", FsCacheAdapter),
("Content-Addressed", ContentAddressedCache),
("Memory", MemoryCache),
("Signed", SignedCache),
]
for name, cache_class in implementations:
cache = cache_class(test_dir)
# Measure write performance
start = time.perf_counter()
for i in range(1000):
cache.write_ref("bucket", f"prefix{i}", test_file, f"sha{i}")
write_time = time.perf_counter() - start
# Measure read performance
start = time.perf_counter()
for i in range(1000):
cache.get_validated_ref("bucket", f"prefix{i}", f"sha{i}")
read_time = time.perf_counter() - start
print(f"{name}: Write={write_time:.3f}s Read={read_time:.3f}s")
📊 Decision Matrix
| Solution | Security | Performance | Complexity | Breaking Change |
|---|---|---|---|---|
| Hotfix (Day 1-2) | ⭐⭐⭐ | ⭐⭐ | ⭐ | No |
| Content-Addressed | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | No |
| Memory Cache | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | No |
| Signed Cache | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | No |
🎯 Recommended Approach
For Immediate Production (Next 48 hours)
Deploy Hotfix v5.0.3 with ephemeral cache + SHA validation
For Next Release (1 week)
Implement Content-Addressed Storage (v5.1.0) - best balance of security and simplicity
For Enterprise (1 month)
Deploy Signed Cache (v6.0.0) for maximum security with built-in TTL and integrity
🚦 Success Metrics
After implementation, verify:
- Security Tests Pass: All attack vectors prevented
- Performance Maintained: <10% degradation vs v5
- Zero CVEs: No security vulnerabilities in cache
- User Isolation: Multi-user systems work safely
- Backward Compatible: Existing workflows unaffected
📞 Support
For questions or security concerns:
- Security Team: security@deltaglider.io
- Lead Developer: @architect
- Immediate Issues: Create SECURITY labeled issue
⚠️ Disclosure Timeline
- Day 0: Vulnerabilities discovered
- Day 1: Hotfix released (v5.0.3)
- Day 7: Improved version released (v5.1.0)
- Day 30: Full disclosure published
- Day 45: v6.0.0 with complete redesign
Document Version: 1.0 Classification: SENSITIVE - INTERNAL USE ONLY Last Updated: 2024-10-09