Files
minne/docs/architecture.md
2025-12-24 23:36:58 +01:00

75 lines
2.2 KiB
Markdown

# Architecture
## Tech Stack
| Layer | Technology |
|-------|------------|
| Backend | Rust with Axum (SSR) |
| Frontend | HTML + HTMX + minimal JS |
| Database | SurrealDB (graph, document, vector) |
| AI | OpenAI-compatible API |
| Web Processing | Headless Chromium |
## Crate Structure
```
minne/
├── main/ # Combined server + worker binary
├── api-router/ # REST API routes
├── html-router/ # SSR web interface
├── ingestion-pipeline/ # Content processing pipeline
├── retrieval-pipeline/ # Search and retrieval logic
├── common/ # Shared types, storage, utilities
├── evaluations/ # Benchmarking framework
└── json-stream-parser/ # Streaming JSON utilities
```
## Process Modes
| Binary | Purpose |
|--------|---------|
| `main` | All-in-one: serves UI and processes content |
| `server` | UI and API only (no background processing) |
| `worker` | Background processing only (no UI) |
Split deployment is useful for scaling or resource isolation.
## Data Flow
```
Content In → Ingestion Pipeline → SurrealDB
Entity Extraction
Embedding Generation
Graph Relationships
Query → Retrieval Pipeline → Results
Vector Search + FTS
RRF Fusion → (Optional Rerank) → Response
```
## Database Schema
SurrealDB stores:
- **TextContent** — Raw ingested content
- **TextChunk** — Chunked content with embeddings
- **KnowledgeEntity** — Extracted entities (people, concepts, etc.)
- **KnowledgeRelationship** — Connections between entities
- **User** — Authentication and preferences
- **SystemSettings** — Model configuration
Embeddings are stored in dedicated tables with HNSW indexes for fast vector search.
## Retrieval Strategy
1. **Collect candidates** — Vector similarity + full-text search
2. **Merge ranks** — Reciprocal Rank Fusion (RRF)
3. **Attach context** — Link chunks to parent entities
4. **Rerank** (optional) — Cross-encoder reranking
5. **Return** — Top-k results with metadata