mirror of
https://github.com/perstarkse/minne.git
synced 2026-05-31 03:40:38 +02:00
5c2d2e24d3
Collapse the multi-strategy entity engine into one benchmarked chunk retrieval path, derive entities from retrieved chunks, and update consumers, docs, and clippy fixes across the workspace.
117 lines
3.8 KiB
Markdown
117 lines
3.8 KiB
Markdown
# Configuration
|
|
|
|
Minne can be configured via environment variables or a `config.yaml` file. Environment variables take precedence.
|
|
|
|
## Required Settings
|
|
|
|
| Variable | Description | Example |
|
|
|----------|-------------|---------|
|
|
| `OPENAI_API_KEY` | API key for OpenAI-compatible endpoint | `sk-...` |
|
|
| `SURREALDB_ADDRESS` | WebSocket address of SurrealDB | `ws://127.0.0.1:8000` |
|
|
| `SURREALDB_USERNAME` | SurrealDB username | `root_user` |
|
|
| `SURREALDB_PASSWORD` | SurrealDB password | `root_password` |
|
|
| `SURREALDB_DATABASE` | Database name | `minne_db` |
|
|
| `SURREALDB_NAMESPACE` | Namespace | `minne_ns` |
|
|
|
|
|
|
## Optional Settings
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `HTTP_PORT` | Server port | `3000` |
|
|
| `DATA_DIR` | Local data directory | `./data` |
|
|
| `OPENAI_BASE_URL` | Custom AI provider URL | OpenAI default |
|
|
| `RUST_LOG` | Logging level | `info` |
|
|
| `STORAGE` | Storage backend (`local`, `memory`, `s3`) | `local` |
|
|
| `PDF_INGEST_MODE` | PDF ingestion strategy (`classic`, `llm-first`) | `llm-first` |
|
|
| `EMBEDDING_BACKEND` | Embedding provider (`openai`, `fastembed`) | `fastembed` |
|
|
| `FASTEMBED_CACHE_DIR` | Model cache directory | `<data_dir>/fastembed` |
|
|
| `FASTEMBED_SHOW_DOWNLOAD_PROGRESS` | Show progress bar for model downloads | `false` |
|
|
| `FASTEMBED_MAX_LENGTH` | Max sequence length for FastEmbed models | - |
|
|
| `INGEST_MAX_BODY_BYTES` | Max request body size for ingest endpoints | `20000000` |
|
|
| `INGEST_MAX_FILES` | Max files allowed per ingest request | `5` |
|
|
| `INGEST_MAX_CONTENT_BYTES` | Max `content` field size for ingest requests | `262144` |
|
|
| `INGEST_MAX_CONTEXT_BYTES` | Max `context` field size for ingest requests | `16384` |
|
|
| `INGEST_MAX_CATEGORY_BYTES` | Max `category` field size for ingest requests | `128` |
|
|
|
|
### S3 Storage (Optional)
|
|
|
|
Used when `STORAGE` is set to `s3`.
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `S3_BUCKET` | S3 bucket name | - |
|
|
| `S3_ENDPOINT` | Custom endpoint (e.g. MinIO) | AWS default |
|
|
| `S3_REGION` | AWS Region | `us-east-1` |
|
|
| `AWS_ACCESS_KEY_ID` | Access key | - |
|
|
| `AWS_SECRET_ACCESS_KEY` | Secret key | - |
|
|
|
|
### Reranking (Optional)
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `RERANKING_ENABLED` | Enable FastEmbed reranking | `false` |
|
|
| `RERANKING_POOL_SIZE` | Concurrent reranker workers | - |
|
|
|
|
> [!NOTE]
|
|
> Enabling reranking downloads ~1.1 GB of model data on first startup.
|
|
|
|
## Example config.yaml
|
|
|
|
```yaml
|
|
surrealdb_address: "ws://127.0.0.1:8000"
|
|
surrealdb_username: "root_user"
|
|
surrealdb_password: "root_password"
|
|
surrealdb_database: "minne_db"
|
|
surrealdb_namespace: "minne_ns"
|
|
openai_api_key: "sk-your-key-here"
|
|
data_dir: "./minne_data"
|
|
http_port: 3000
|
|
|
|
# New settings
|
|
storage: "local"
|
|
# storage: "s3"
|
|
# s3_bucket: "my-bucket"
|
|
# s3_endpoint: "http://localhost:9000" # Optional, for MinIO etc.
|
|
# s3_region: "us-east-1"
|
|
pdf_ingest_mode: "llm-first"
|
|
embedding_backend: "fastembed"
|
|
|
|
# Optional reranking
|
|
reranking_enabled: true
|
|
reranking_pool_size: 2
|
|
|
|
# Ingest safety limits
|
|
ingest_max_body_bytes: 20000000
|
|
ingest_max_files: 5
|
|
ingest_max_content_bytes: 262144
|
|
ingest_max_context_bytes: 16384
|
|
ingest_max_category_bytes: 128
|
|
```
|
|
|
|
## AI Provider Setup
|
|
|
|
Minne works with any OpenAI-compatible API that supports structured outputs.
|
|
|
|
### OpenAI (Default)
|
|
|
|
Set `OPENAI_API_KEY` only. The default base URL points to OpenAI.
|
|
|
|
### Ollama
|
|
|
|
```bash
|
|
OPENAI_API_KEY="ollama"
|
|
OPENAI_BASE_URL="http://localhost:11434/v1"
|
|
```
|
|
|
|
### Other Providers
|
|
|
|
Any provider exposing an OpenAI-compatible endpoint works. Set `OPENAI_BASE_URL` accordingly.
|
|
|
|
## Model Selection
|
|
|
|
1. Access `/admin` in your Minne instance
|
|
2. Select models for content processing and chat
|
|
3. **Content Processing**: Must support structured outputs
|
|
4. **Embedding Dimensions**: Update when changing embedding models (e.g., 1536 for `text-embedding-3-small`)
|