Files
deltaglider-beshu-tech/SECURITY_FIX_ROADMAP.md
Simone Scarduzio 90a342dc33 feat: Implement Content-Addressed Storage (CAS) cache
Implemented SHA256-based Content-Addressed Storage to eliminate
cache collisions and enable automatic deduplication.

Key Features:
- Zero collision risk: SHA256 namespace guarantees uniqueness
- Automatic deduplication: same content = same filename
- Tampering protection: changing content changes SHA, breaks lookup
- Two-level directory structure (ab/cd/abcdef...) for filesystem optimization

Changes:
- Added ContentAddressedCache adapter in adapters/cache_cas.py
- Updated CLI and SDK to use CAS instead of FsCacheAdapter
- Updated all tests to use ContentAddressedCache
- Documented CAS architecture in CLAUDE.md and SECURITY_FIX_ROADMAP.md

Security Benefits:
- Eliminates cross-endpoint collision vulnerabilities
- Self-describing cache (filename IS the checksum)
- Natural cache validation without external metadata

All quality checks passing:
- 99 tests passing (0 failures)
- Type checking: 0 errors (mypy)
- Linting: All checks passed (ruff)

Completed Phase 2 of SECURITY_FIX_ROADMAP.md
2025-10-10 09:06:29 +02:00

18 KiB

🛡️ DeltaGlider Security Fix Roadmap

Executive Summary

Critical security vulnerabilities have been identified in DeltaGlider's cache system that enable multi-user attacks, data exposure, and cache poisoning. This document provides a chronological, actionable roadmap to eliminate these threats through bold architectural changes.

Key Innovation: Instead of patching individual issues, we propose a "Zero-Trust Cache Architecture" that eliminates entire classes of vulnerabilities.


🚀 The Bold Solution: Ephemeral Signed Cache

Core Concept

Replace filesystem cache with ephemeral, cryptographically-signed, user-isolated cache that eliminates:

  • TOCTOU vulnerabilities (no shared filesystem)
  • Multi-user interference (process isolation)
  • Cache poisoning (cryptographic signatures)
  • Information disclosure (encrypted metadata)
  • Cross-endpoint collision (content-addressed storage)

Note: DeltaGlider is designed as a standalone CLI/SDK application. All solutions maintain this architecture without requiring external services.


📋 Implementation Roadmap

DAY 1-2: Emergency Hotfix (v5.0.3) COMPLETED

Stop the bleeding - minimal changes for immediate deployment

1. Ephemeral Process-Isolated Cache (2 hours) COMPLETED

# src/deltaglider/app/cli/main.py
import tempfile
import atexit

# SECURITY: Always use ephemeral process-isolated cache
cache_dir = Path(tempfile.mkdtemp(prefix="deltaglider-", dir="/tmp"))
atexit.register(lambda: shutil.rmtree(cache_dir, ignore_errors=True))

Impact: Each process gets isolated cache, auto-cleaned on exit. Eliminates multi-user attacks. Implementation: All legacy shared cache code removed. Ephemeral cache is now the ONLY mode.

2. Add SHA Validation at Use-Time (2 hours) COMPLETED

# src/deltaglider/ports/cache.py
class CachePort(Protocol):
    def get_validated_ref(self, bucket: str, prefix: str, expected_sha: str) -> Path:
        """Get reference with atomic SHA validation - MUST use this for all operations."""
        ...

# src/deltaglider/adapters/cache_fs.py
def get_validated_ref(self, bucket: str, prefix: str, expected_sha: str) -> Path:
    path = self.ref_path(bucket, prefix)
    if not path.exists():
        raise CacheMissError(f"Cache miss for {bucket}/{prefix}")

    # Lock file for atomic read (Unix only)
    with open(path, 'rb') as f:
        if sys.platform != "win32":
            fcntl.flock(f.fileno(), fcntl.LOCK_SH)
        content = f.read()
        actual_sha = hashlib.sha256(content).hexdigest()

    if actual_sha != expected_sha:
        path.unlink()  # Remove corrupted cache
        raise CacheCorruptionError(f"SHA mismatch: cache corrupted")

    return path

3. Update All Usage Points (1 hour) COMPLETED

# src/deltaglider/core/service.py
# Replaced ALL instances in two locations:
# - Line 234 (get method for decoding)
# - Line 415 (_create_delta method for encoding)

ref_path = self.cache.get_validated_ref(
    delta_space.bucket,
    delta_space.prefix,
    ref_sha256  # Pass expected SHA
)

Test & Deploy: All 99 tests passing + ready for release


DAY 3-5: Quick Wins (v5.0.3) COMPLETED

Low-risk improvements with high security impact

4. Implement Content-Addressed Storage (4 hours) COMPLETED

# src/deltaglider/adapters/cache_cas.py
class ContentAddressedCache(CachePort):
    """Cache using SHA as filename - eliminates collisions"""

    def ref_path(self, bucket: str, prefix: str, sha256: str) -> Path:
        # Use SHA as filename - guaranteed unique
        return self.base_dir / sha256[:2] / sha256[2:4] / sha256

    def write_ref(self, bucket: str, prefix: str, src: Path, sha256: str) -> Path:
        path = self.ref_path(bucket, prefix, sha256)

        # If file with this SHA exists, we're done (deduplication!)
        if path.exists():
            return path

        # Atomic write
        path.parent.mkdir(parents=True, mode=0o700, exist_ok=True)
        tmp = path.with_suffix('.tmp')
        shutil.copy2(src, tmp)
        os.chmod(tmp, 0o600)

        # Verify content before committing
        actual_sha = self.hasher.sha256(tmp)
        if actual_sha != sha256:
            tmp.unlink()
            raise ValueError("File corruption during cache write")

        os.replace(tmp, path)  # Atomic
        return path

Benefits: ACHIEVED

  • Same file cached once regardless of bucket/prefix (automatic deduplication)
  • No collision possible (SHA256 uniqueness guarantees)
  • Natural cache validation (filename IS the checksum)
  • Two-level directory structure (ab/cd/abcdef...) for filesystem optimization

Implementation: Complete in src/deltaglider/adapters/cache_cas.py with:

  • _cas_path() method for SHA256-based path computation
  • get_validated_ref() with atomic validation and locking
  • write_ref() with atomic temp-file + rename pattern
  • Ephemeral deltaspace-to-SHA mapping for compatibility

5. Add Secure Directory Creation (2 hours)

# src/deltaglider/utils/secure_fs.py
import os
import stat

def secure_makedirs(path: Path, mode: int = 0o700) -> None:
    """Create directory with secure permissions atomically."""
    try:
        path.mkdir(parents=True, mode=mode, exist_ok=False)
    except FileExistsError:
        # Verify it's ours and has correct permissions
        st = path.stat()
        if st.st_uid != os.getuid():
            raise SecurityError(f"Directory {path} owned by different user")
        if stat.S_IMODE(st.st_mode) != mode:
            os.chmod(path, mode)  # Fix permissions

6. Unify Cache Configuration (1 hour)

# src/deltaglider/config.py
import os
from pathlib import Path

def get_cache_dir() -> Path:
    """Single source of truth for cache directory."""
    if os.environ.get("DG_NO_CACHE") == "true":
        return None  # Feature flag to disable cache

    if os.environ.get("DG_EPHEMERAL_CACHE") == "true":
        return Path(tempfile.mkdtemp(prefix="dg-cache-"))

    # User-specific cache by default
    cache_base = os.environ.get("DG_CACHE_DIR",
                                os.path.expanduser("~/.cache/deltaglider"))
    return Path(cache_base) / "v2"  # Version cache format

DAY 6-10: Architecture Redesign (v5.2.0)

The bold solution that eliminates entire vulnerability classes

7. Implement Memory Cache with Encryption (8 hours)

# src/deltaglider/adapters/cache_memory.py
import mmap
import pickle
from cryptography.fernet import Fernet

class MemoryCache(CachePort):
    """In-memory cache with optional disk backing via mmap."""

    def __init__(self, max_size_mb: int = 100):
        self.cache = {}  # SHA -> (data, metadata)
        self.max_size = max_size_mb * 1024 * 1024
        self.current_size = 0

        # Per-process encryption key (never persisted)
        self.cipher = Fernet(Fernet.generate_key())

        # Optional mmap for larger files
        self.mmap_file = tempfile.NamedTemporaryFile(prefix="dg-mmap-")
        self.mmap = mmap.mmap(self.mmap_file.fileno(), self.max_size)

    def write_ref(self, bucket: str, prefix: str, src: Path, sha256: str) -> bytes:
        # Read and encrypt data
        with open(src, 'rb') as f:
            data = f.read()

        encrypted = self.cipher.encrypt(data)

        # Store in memory or mmap based on size
        if len(encrypted) + self.current_size <= self.max_size:
            self.cache[sha256] = encrypted
            self.current_size += len(encrypted)
        else:
            # Evict LRU or use mmap
            self._evict_lru()
            self.cache[sha256] = encrypted

        return encrypted

    def get_validated_ref(self, bucket: str, prefix: str, sha256: str) -> bytes:
        if sha256 not in self.cache:
            raise CacheMissError()

        encrypted = self.cache[sha256]
        data = self.cipher.decrypt(encrypted)

        # Always validate
        actual_sha = hashlib.sha256(data).hexdigest()
        if actual_sha != sha256:
            del self.cache[sha256]
            raise CacheCorruptionError()

        return data

Benefits:

  • No filesystem access = no permission issues
  • Encrypted in memory = secure even in core dumps
  • Per-process isolation = no multi-user issues
  • Zero TOCTOU window = memory is atomic

8. Implement Signed Cache Entries (6 hours)

# src/deltaglider/adapters/cache_signed.py
import hmac
import json
from datetime import datetime, timedelta

class SignedCache(CachePort):
    """Cache with cryptographic signatures and expiry."""

    def __init__(self, base_dir: Path, secret_key: bytes = None):
        self.base_dir = base_dir
        # Per-session key if not provided
        self.secret = secret_key or os.urandom(32)

    def _sign_metadata(self, metadata: dict) -> str:
        """Create HMAC signature for metadata."""
        json_meta = json.dumps(metadata, sort_keys=True)
        signature = hmac.new(
            self.secret,
            json_meta.encode(),
            hashlib.sha256
        ).hexdigest()
        return signature

    def write_ref(self, bucket: str, prefix: str, src: Path, sha256: str) -> Path:
        # Create signed metadata
        metadata = {
            "sha256": sha256,
            "bucket": bucket,
            "prefix": prefix,
            "timestamp": datetime.utcnow().isoformat(),
            "expires": (datetime.utcnow() + timedelta(hours=24)).isoformat(),
            "pid": os.getpid(),
            "uid": os.getuid(),
        }
        signature = self._sign_metadata(metadata)

        # Store data + metadata
        cache_dir = self.base_dir / signature[:8]  # Use signature prefix as namespace
        cache_dir.mkdir(parents=True, mode=0o700, exist_ok=True)

        data_path = cache_dir / f"{sha256}.bin"
        meta_path = cache_dir / f"{sha256}.meta"

        # Atomic writes
        shutil.copy2(src, data_path)
        os.chmod(data_path, 0o600)

        with open(meta_path, 'w') as f:
            json.dump({"metadata": metadata, "signature": signature}, f)
        os.chmod(meta_path, 0o600)

        return data_path

    def get_validated_ref(self, bucket: str, prefix: str, sha256: str) -> Path:
        # Find and validate signed entry
        pattern = self.base_dir / "*" / f"{sha256}.meta"
        matches = list(Path(self.base_dir).glob(f"*/{sha256}.meta"))

        for meta_path in matches:
            with open(meta_path) as f:
                entry = json.load(f)

            # Verify signature
            expected_sig = self._sign_metadata(entry["metadata"])
            if not hmac.compare_digest(entry["signature"], expected_sig):
                meta_path.unlink()  # Remove tampered entry
                continue

            # Check expiry
            expires = datetime.fromisoformat(entry["metadata"]["expires"])
            if datetime.utcnow() > expires:
                meta_path.unlink()
                continue

            # Validate data integrity
            data_path = meta_path.with_suffix('.bin')
            actual_sha = self.hasher.sha256(data_path)
            if actual_sha != sha256:
                data_path.unlink()
                meta_path.unlink()
                continue

            return data_path

        raise CacheMissError(f"No valid cache entry for {sha256}")

DAY 11-15: Advanced Security (v6.0.0)

Next-generation features for standalone security

9. Add Integrity Monitoring (4 hours)

# src/deltaglider/security/monitor.py
import inotify
import logging

class CacheIntegrityMonitor:
    """Detect and alert on cache tampering attempts."""

    def __init__(self, cache_dir: Path):
        self.cache_dir = cache_dir
        self.notifier = inotify.INotify()
        self.watch_desc = self.notifier.add_watch(
            str(cache_dir),
            inotify.IN_MODIFY | inotify.IN_DELETE | inotify.IN_ATTRIB
        )
        self.logger = logging.getLogger("security")

    async def monitor(self):
        """Monitor for unauthorized cache modifications."""
        async for event in self.notifier:
            if event.mask & inotify.IN_MODIFY:
                # File modified - verify it was by our process
                if not self._is_our_modification(event):
                    self.logger.critical(
                        f"SECURITY: Unauthorized cache modification detected: {event.path}"
                    )
                    # Immediately invalidate affected cache
                    Path(event.path).unlink(missing_ok=True)

            elif event.mask & inotify.IN_ATTRIB:
                # Permission change - always suspicious
                self.logger.warning(
                    f"SECURITY: Cache permission change: {event.path}"
                )

DAY 16-20: Testing & Rollout (v6.0.0 release)

10. Security Test Suite (8 hours)

# tests/security/test_cache_attacks.py
import pytest
import os
import threading
import time

class TestCacheSecurity:
    """Test all known attack vectors."""

    def test_toctou_attack_prevented(self, cache):
        """Verify TOCTOU window is eliminated."""
        sha = "abc123"
        cache.write_ref("bucket", "prefix", test_file, sha)

        # Attacker thread tries to replace file during read
        def attacker():
            time.sleep(0.0001)  # Try to hit the TOCTOU window
            cache_path = cache.ref_path("bucket", "prefix", sha)
            cache_path.write_bytes(b"malicious")

        thread = threading.Thread(target=attacker)
        thread.start()

        # Should detect tampering
        with pytest.raises(CacheCorruptionError):
            cache.get_validated_ref("bucket", "prefix", sha)

    def test_multi_user_isolation(self, cache):
        """Verify users can't access each other's cache."""
        # Create cache as user A
        cache_a = SignedCache(Path("/tmp/cache"), secret=b"key_a")
        cache_a.write_ref("bucket", "prefix", test_file, "sha_a")

        # Try to read as user B with different key
        cache_b = SignedCache(Path("/tmp/cache"), secret=b"key_b")

        with pytest.raises(CacheMissError):
            cache_b.get_validated_ref("bucket", "prefix", "sha_a")

    def test_cache_poisoning_prevented(self, cache):
        """Verify corrupted cache is detected."""
        sha = "abc123"
        cache.write_ref("bucket", "prefix", test_file, sha)

        # Corrupt the cache file
        cache_path = cache.ref_path("bucket", "prefix", sha)
        with open(cache_path, 'ab') as f:
            f.write(b"corrupted")

        # Should detect corruption
        with pytest.raises(CacheCorruptionError):
            cache.get_validated_ref("bucket", "prefix", sha)

11. Migration Guide (4 hours)

# src/deltaglider/migration/v5_to_v6.py
def migrate_cache():
    """Migrate from v5 shared cache to v6 secure cache."""
    old_cache = Path("/tmp/.deltaglider/cache")

    if old_cache.exists():
        print("WARNING: Old insecure cache detected at", old_cache)
        print("This cache had security vulnerabilities and will not be migrated.")

        response = input("Delete old cache? [y/N]: ")
        if response.lower() == 'y':
            shutil.rmtree(old_cache)
            print("Old cache deleted. New secure cache will be created on demand.")
        else:
            print("Old cache retained at", old_cache)
            print("Set DG_CACHE_DIR to use a different location.")

12. Performance Benchmarks (4 hours)

# benchmarks/cache_performance.py
def benchmark_cache_implementations():
    """Compare performance of cache implementations."""

    implementations = [
        ("Filesystem (v5)", FsCacheAdapter),
        ("Content-Addressed", ContentAddressedCache),
        ("Memory", MemoryCache),
        ("Signed", SignedCache),
    ]

    for name, cache_class in implementations:
        cache = cache_class(test_dir)

        # Measure write performance
        start = time.perf_counter()
        for i in range(1000):
            cache.write_ref("bucket", f"prefix{i}", test_file, f"sha{i}")
        write_time = time.perf_counter() - start

        # Measure read performance
        start = time.perf_counter()
        for i in range(1000):
            cache.get_validated_ref("bucket", f"prefix{i}", f"sha{i}")
        read_time = time.perf_counter() - start

        print(f"{name}: Write={write_time:.3f}s Read={read_time:.3f}s")

📊 Decision Matrix

Solution Security Performance Complexity Breaking Change
Hotfix (Day 1-2) No
Content-Addressed No
Memory Cache No
Signed Cache No

For Immediate Production (Next 48 hours)

Deploy Hotfix v5.0.3 with ephemeral cache + SHA validation

For Next Release (1 week)

Implement Content-Addressed Storage (v5.1.0) - best balance of security and simplicity

For Enterprise (1 month)

Deploy Signed Cache (v6.0.0) for maximum security with built-in TTL and integrity


🚦 Success Metrics

After implementation, verify:

  1. Security Tests Pass: All attack vectors prevented
  2. Performance Maintained: <10% degradation vs v5
  3. Zero CVEs: No security vulnerabilities in cache
  4. User Isolation: Multi-user systems work safely
  5. Backward Compatible: Existing workflows unaffected

📞 Support

For questions or security concerns:

  • Security Team: security@deltaglider.io
  • Lead Developer: @architect
  • Immediate Issues: Create SECURITY labeled issue

⚠️ Disclosure Timeline

  • Day 0: Vulnerabilities discovered
  • Day 1: Hotfix released (v5.0.3)
  • Day 7: Improved version released (v5.1.0)
  • Day 30: Full disclosure published
  • Day 45: v6.0.0 with complete redesign

Document Version: 1.0 Classification: SENSITIVE - INTERNAL USE ONLY Last Updated: 2024-10-09