minne/docs/architecture.md

# Architecture

## Tech Stack

| Layer | Technology |
|-------|------------|
| Backend | Rust with Axum (SSR) |
| Frontend | HTML + HTMX + minimal JS |
| Database | SurrealDB (graph, document, vector) |
| AI | OpenAI-compatible API |
| Web Processing | Headless Chromium |

## Crate Structure

```
minne/
├── main/                 # Combined server + worker binary
├── api-router/           # REST API routes
├── html-router/          # SSR web interface
├── ingestion-pipeline/   # Content processing pipeline
├── retrieval-pipeline/   # Search and retrieval logic
├── common/               # Shared types, storage, utilities
├── evaluations/          # Benchmarking framework
└── json-stream-parser/   # Streaming JSON utilities
```

## Process Modes

| Binary | Purpose |
|--------|---------|
| `main` | All-in-one: serves UI and processes content |
| `server` | UI and API only (no background processing) |
| `worker` | Background processing only (no UI) |

Split deployment is useful for scaling or resource isolation.

## Data Flow

```
Content In → Ingestion Pipeline → SurrealDB
                    ↓
            Entity Extraction
                    ↓
            Embedding Generation
                    ↓
            Graph Relationships

Query → Retrieval Pipeline → Results
              ↓
       Vector Search + FTS
              ↓
       RRF Fusion → (Optional Rerank) → Response
```

## Database Schema

SurrealDB stores:

- **TextContent** — Raw ingested content
- **TextChunk** — Chunked content with embeddings
- **KnowledgeEntity** — Extracted entities (people, concepts, etc.)
- **KnowledgeRelationship** — Connections between entities
- **User** — Authentication and preferences
- **SystemSettings** — Model configuration

Embeddings are stored in dedicated tables with HNSW indexes for fast vector search.

## Retrieval Strategy

1. **Collect candidates** — Vector similarity + full-text search
2. **Merge ranks** — Reciprocal Rank Fusion (RRF)
3. **Attach context** — Link chunks to parent entities
4. **Rerank** (optional) — Cross-encoder reranking
5. **Return** — Top-k results with metadata