docs: evaluations instructions and readme refactoring

2026-07-08 05:45:14 +02:00 · 2025-12-22 18:32:59 +01:00
parent 30b8a65377
commit f9f48d1046
7 changed files with 573 additions and 230 deletions
@@ -0,0 +1,74 @@
+# Architecture
+
+## Tech Stack
+
+| Layer | Technology |
+|-------|------------|
+| Backend | Rust with Axum (SSR) |
+| Frontend | HTML + HTMX + minimal JS |
+| Database | SurrealDB (graph, document, vector) |
+| AI | OpenAI-compatible API |
+| Web Processing | Headless Chromium |
+
+## Crate Structure
+
+```
+minne/
+├── main/                 # Combined server + worker binary
+├── api-router/           # REST API routes
+├── html-router/          # SSR web interface
+├── ingestion-pipeline/   # Content processing pipeline
+├── retrieval-pipeline/   # Search and retrieval logic
+├── common/               # Shared types, storage, utilities
+├── evaluations/          # Benchmarking framework
+└── json-stream-parser/   # Streaming JSON utilities
+```
+
+## Process Modes
+
+| Binary | Purpose |
+|--------|---------|
+| `main` | All-in-one: serves UI and processes content |
+| `server` | UI and API only (no background processing) |
+| `worker` | Background processing only (no UI) |
+
+Split deployment is useful for scaling or resource isolation.
+
+## Data Flow
+
+```
+Content In → Ingestion Pipeline → SurrealDB
+                    ↓
+            Entity Extraction
+                    ↓
+            Embedding Generation
+                    ↓
+            Graph Relationships
+
+Query → Retrieval Pipeline → Results
+              ↓
+       Vector Search + FTS + Graph
+              ↓
+       RRF Fusion → (Optional Rerank) → Response
+```
+
+## Database Schema
+
+SurrealDB stores:
+
+- **TextContent** — Raw ingested content
+- **TextChunk** — Chunked content with embeddings
+- **KnowledgeEntity** — Extracted entities (people, concepts, etc.)
+- **KnowledgeRelationship** — Connections between entities
+- **User** — Authentication and preferences
+- **SystemSettings** — Model configuration
+
+Embeddings are stored in dedicated tables with HNSW indexes for fast vector search.
+
+## Retrieval Strategy
+
+1. **Collect candidates** — Vector similarity + full-text search
+2. **Merge ranks** — Reciprocal Rank Fusion (RRF)
+3. **Attach context** — Link chunks to parent entities
+4. **Rerank** (optional) — Cross-encoder rescoring
+5. **Return** — Top-k results with metadata