Files
minne/docs/architecture.md
2025-12-24 23:36:58 +01:00

2.2 KiB

Architecture

Tech Stack

Layer Technology
Backend Rust with Axum (SSR)
Frontend HTML + HTMX + minimal JS
Database SurrealDB (graph, document, vector)
AI OpenAI-compatible API
Web Processing Headless Chromium

Crate Structure

minne/
├── main/                 # Combined server + worker binary
├── api-router/           # REST API routes
├── html-router/          # SSR web interface
├── ingestion-pipeline/   # Content processing pipeline
├── retrieval-pipeline/   # Search and retrieval logic
├── common/               # Shared types, storage, utilities
├── evaluations/          # Benchmarking framework
└── json-stream-parser/   # Streaming JSON utilities

Process Modes

Binary Purpose
main All-in-one: serves UI and processes content
server UI and API only (no background processing)
worker Background processing only (no UI)

Split deployment is useful for scaling or resource isolation.

Data Flow

Content In → Ingestion Pipeline → SurrealDB
                    ↓
            Entity Extraction
                    ↓
            Embedding Generation
                    ↓
            Graph Relationships

Query → Retrieval Pipeline → Results
              ↓
       Vector Search + FTS
              ↓
       RRF Fusion → (Optional Rerank) → Response

Database Schema

SurrealDB stores:

  • TextContent — Raw ingested content
  • TextChunk — Chunked content with embeddings
  • KnowledgeEntity — Extracted entities (people, concepts, etc.)
  • KnowledgeRelationship — Connections between entities
  • User — Authentication and preferences
  • SystemSettings — Model configuration

Embeddings are stored in dedicated tables with HNSW indexes for fast vector search.

Retrieval Strategy

  1. Collect candidates — Vector similarity + full-text search
  2. Merge ranks — Reciprocal Rank Fusion (RRF)
  3. Attach context — Link chunks to parent entities
  4. Rerank (optional) — Cross-encoder reranking
  5. Return — Top-k results with metadata