docs: more complete and correct

This commit is contained in:
Per Stark
2025-12-24 23:36:58 +01:00
parent 84695fa0cc
commit 8e8370b080
2 changed files with 16 additions and 4 deletions

View File

@@ -47,7 +47,7 @@ Content In → Ingestion Pipeline → SurrealDB
Query → Retrieval Pipeline → Results
Vector Search + FTS + Graph
Vector Search + FTS
RRF Fusion → (Optional Rerank) → Response
```
@@ -70,5 +70,5 @@ Embeddings are stored in dedicated tables with HNSW indexes for fast vector sear
1. **Collect candidates** — Vector similarity + full-text search
2. **Merge ranks** — Reciprocal Rank Fusion (RRF)
3. **Attach context** — Link chunks to parent entities
4. **Rerank** (optional) — Cross-encoder rescoring
4. **Rerank** (optional) — Cross-encoder reranking
5. **Return** — Top-k results with metadata

View File

@@ -13,6 +13,7 @@ Minne can be configured via environment variables or a `config.yaml` file. Envir
| `SURREALDB_DATABASE` | Database name | `minne_db` |
| `SURREALDB_NAMESPACE` | Namespace | `minne_ns` |
## Optional Settings
| Variable | Description | Default |
@@ -21,14 +22,20 @@ Minne can be configured via environment variables or a `config.yaml` file. Envir
| `DATA_DIR` | Local data directory | `./data` |
| `OPENAI_BASE_URL` | Custom AI provider URL | OpenAI default |
| `RUST_LOG` | Logging level | `info` |
| `STORAGE` | Storage backend (`local`, `memory`) | `local` |
| `PDF_INGEST_MODE` | PDF ingestion strategy (`classic`, `llm-first`) | `llm-first` |
| `RETRIEVAL_STRATEGY` | Default retrieval strategy | - |
| `EMBEDDING_BACKEND` | Embedding provider (`openai`, `fastembed`, `hashed`) | `fastembed` |
| `FASTEMBED_CACHE_DIR` | Model cache directory | `<data_dir>/fastembed` |
| `FASTEMBED_SHOW_DOWNLOAD_PROGRESS` | Show progress bar for model downloads | `false` |
| `FASTEMBED_MAX_LENGTH` | Max sequence length for FastEmbed models | - |
### Reranking (Optional)
| Variable | Description | Default |
|----------|-------------|---------|
| `RERANKING_ENABLED` | Enable FastEmbed reranking | `false` |
| `RERANKING_POOL_SIZE` | Concurrent reranker workers | `2` |
| `FASTEMBED_CACHE_DIR` | Model cache directory | `<data_dir>/fastembed/reranker` |
| `RERANKING_POOL_SIZE` | Concurrent reranker workers | - |
> [!NOTE]
> Enabling reranking downloads ~1.1 GB of model data on first startup.
@@ -45,6 +52,11 @@ openai_api_key: "sk-your-key-here"
data_dir: "./minne_data"
http_port: 3000
# New settings
storage: "local"
pdf_ingest_mode: "llm-first"
embedding_backend: "fastembed"
# Optional reranking
reranking_enabled: true
reranking_pool_size: 2