mirror of
https://github.com/perstarkse/minne.git
synced 2026-01-16 23:16:36 +01:00
75 lines
2.2 KiB
Markdown
75 lines
2.2 KiB
Markdown
# Architecture
|
|
|
|
## Tech Stack
|
|
|
|
| Layer | Technology |
|
|
|-------|------------|
|
|
| Backend | Rust with Axum (SSR) |
|
|
| Frontend | HTML + HTMX + minimal JS |
|
|
| Database | SurrealDB (graph, document, vector) |
|
|
| AI | OpenAI-compatible API |
|
|
| Web Processing | Headless Chromium |
|
|
|
|
## Crate Structure
|
|
|
|
```
|
|
minne/
|
|
├── main/ # Combined server + worker binary
|
|
├── api-router/ # REST API routes
|
|
├── html-router/ # SSR web interface
|
|
├── ingestion-pipeline/ # Content processing pipeline
|
|
├── retrieval-pipeline/ # Search and retrieval logic
|
|
├── common/ # Shared types, storage, utilities
|
|
├── evaluations/ # Benchmarking framework
|
|
└── json-stream-parser/ # Streaming JSON utilities
|
|
```
|
|
|
|
## Process Modes
|
|
|
|
| Binary | Purpose |
|
|
|--------|---------|
|
|
| `main` | All-in-one: serves UI and processes content |
|
|
| `server` | UI and API only (no background processing) |
|
|
| `worker` | Background processing only (no UI) |
|
|
|
|
Split deployment is useful for scaling or resource isolation.
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
Content In → Ingestion Pipeline → SurrealDB
|
|
↓
|
|
Entity Extraction
|
|
↓
|
|
Embedding Generation
|
|
↓
|
|
Graph Relationships
|
|
|
|
Query → Retrieval Pipeline → Results
|
|
↓
|
|
Vector Search + FTS
|
|
↓
|
|
RRF Fusion → (Optional Rerank) → Response
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
SurrealDB stores:
|
|
|
|
- **TextContent** — Raw ingested content
|
|
- **TextChunk** — Chunked content with embeddings
|
|
- **KnowledgeEntity** — Extracted entities (people, concepts, etc.)
|
|
- **KnowledgeRelationship** — Connections between entities
|
|
- **User** — Authentication and preferences
|
|
- **SystemSettings** — Model configuration
|
|
|
|
Embeddings are stored in dedicated tables with HNSW indexes for fast vector search.
|
|
|
|
## Retrieval Strategy
|
|
|
|
1. **Collect candidates** — Vector similarity + full-text search
|
|
2. **Merge ranks** — Reciprocal Rank Fusion (RRF)
|
|
3. **Attach context** — Link chunks to parent entities
|
|
4. **Rerank** (optional) — Cross-encoder reranking
|
|
5. **Return** — Top-k results with metadata
|