mirror of
https://github.com/perstarkse/minne.git
synced 2026-05-31 03:40:38 +02:00
5c2d2e24d3
Collapse the multi-strategy entity engine into one benchmarked chunk retrieval path, derive entities from retrieved chunks, and update consumers, docs, and clippy fixes across the workspace.
3.8 KiB
3.8 KiB
Configuration
Minne can be configured via environment variables or a config.yaml file. Environment variables take precedence.
Required Settings
| Variable | Description | Example |
|---|---|---|
OPENAI_API_KEY |
API key for OpenAI-compatible endpoint | sk-... |
SURREALDB_ADDRESS |
WebSocket address of SurrealDB | ws://127.0.0.1:8000 |
SURREALDB_USERNAME |
SurrealDB username | root_user |
SURREALDB_PASSWORD |
SurrealDB password | root_password |
SURREALDB_DATABASE |
Database name | minne_db |
SURREALDB_NAMESPACE |
Namespace | minne_ns |
Optional Settings
| Variable | Description | Default |
|---|---|---|
HTTP_PORT |
Server port | 3000 |
DATA_DIR |
Local data directory | ./data |
OPENAI_BASE_URL |
Custom AI provider URL | OpenAI default |
RUST_LOG |
Logging level | info |
STORAGE |
Storage backend (local, memory, s3) |
local |
PDF_INGEST_MODE |
PDF ingestion strategy (classic, llm-first) |
llm-first |
EMBEDDING_BACKEND |
Embedding provider (openai, fastembed) |
fastembed |
FASTEMBED_CACHE_DIR |
Model cache directory | <data_dir>/fastembed |
FASTEMBED_SHOW_DOWNLOAD_PROGRESS |
Show progress bar for model downloads | false |
FASTEMBED_MAX_LENGTH |
Max sequence length for FastEmbed models | - |
INGEST_MAX_BODY_BYTES |
Max request body size for ingest endpoints | 20000000 |
INGEST_MAX_FILES |
Max files allowed per ingest request | 5 |
INGEST_MAX_CONTENT_BYTES |
Max content field size for ingest requests |
262144 |
INGEST_MAX_CONTEXT_BYTES |
Max context field size for ingest requests |
16384 |
INGEST_MAX_CATEGORY_BYTES |
Max category field size for ingest requests |
128 |
S3 Storage (Optional)
Used when STORAGE is set to s3.
| Variable | Description | Default |
|---|---|---|
S3_BUCKET |
S3 bucket name | - |
S3_ENDPOINT |
Custom endpoint (e.g. MinIO) | AWS default |
S3_REGION |
AWS Region | us-east-1 |
AWS_ACCESS_KEY_ID |
Access key | - |
AWS_SECRET_ACCESS_KEY |
Secret key | - |
Reranking (Optional)
| Variable | Description | Default |
|---|---|---|
RERANKING_ENABLED |
Enable FastEmbed reranking | false |
RERANKING_POOL_SIZE |
Concurrent reranker workers | - |
Note
Enabling reranking downloads ~1.1 GB of model data on first startup.
Example config.yaml
surrealdb_address: "ws://127.0.0.1:8000"
surrealdb_username: "root_user"
surrealdb_password: "root_password"
surrealdb_database: "minne_db"
surrealdb_namespace: "minne_ns"
openai_api_key: "sk-your-key-here"
data_dir: "./minne_data"
http_port: 3000
# New settings
storage: "local"
# storage: "s3"
# s3_bucket: "my-bucket"
# s3_endpoint: "http://localhost:9000" # Optional, for MinIO etc.
# s3_region: "us-east-1"
pdf_ingest_mode: "llm-first"
embedding_backend: "fastembed"
# Optional reranking
reranking_enabled: true
reranking_pool_size: 2
# Ingest safety limits
ingest_max_body_bytes: 20000000
ingest_max_files: 5
ingest_max_content_bytes: 262144
ingest_max_context_bytes: 16384
ingest_max_category_bytes: 128
AI Provider Setup
Minne works with any OpenAI-compatible API that supports structured outputs.
OpenAI (Default)
Set OPENAI_API_KEY only. The default base URL points to OpenAI.
Ollama
OPENAI_API_KEY="ollama"
OPENAI_BASE_URL="http://localhost:11434/v1"
Other Providers
Any provider exposing an OpenAI-compatible endpoint works. Set OPENAI_BASE_URL accordingly.
Model Selection
- Access
/adminin your Minne instance - Select models for content processing and chat
- Content Processing: Must support structured outputs
- Embedding Dimensions: Update when changing embedding models (e.g., 1536 for
text-embedding-3-small)