1 Commits

Author SHA1 Message Date
Per Stark
9a623cbc3f docs: evaluations instructions and readme refactoring 2025-12-22 18:32:59 +01:00
7 changed files with 570 additions and 232 deletions

260
README.md
View File

@@ -1,265 +1,61 @@
# Minne - A Graph-Powered Personal Knowledge Base
# Minne
**Minne (Swedish for "memory")** is a personal knowledge management system and save-for-later application for capturing, organizing, and accessing your information. Inspired by the Zettelkasten method, it uses a graph database to automatically create connections between your notes without manual linking overhead.
**A graph-powered personal knowledge base that remembers for you.**
Capture content effortlessly, let AI discover connections, and explore your knowledge visually. Self-hosted and privacy-focused.
[![Release Status](https://github.com/perstarkse/minne/actions/workflows/release.yml/badge.svg)](https://github.com/perstarkse/minne/actions/workflows/release.yml)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Latest Release](https://img.shields.io/github/v/release/perstarkse/minne?sort=semver)](https://github.com/perstarkse/minne/releases/latest)
![Screenshot](screenshot-graph.webp)
## Try It
## Demo deployment
To test _Minne_ out, enter [this](https://minne-demo.stark.pub) read-only demo deployment to view and test functionality out.
## Noteworthy Features
- **Search & Chat Interface** - Find content or knowledge instantly with full-text search, or use the chat mode and conversational AI to find and reason about content
- **Manual and AI-assisted connections** - Build entities and relationships manually with full control, let AI create entities and relationships automatically, or blend both approaches with AI suggestions for manual approval
- **Hybrid Retrieval System** - Search combining vector similarity, full-text search, and graph traversal for highly relevant results
- **Scratchpad Feature** - Quickly capture thoughts and convert them to permanent content when ready
- **Visual Graph Explorer** - Interactive D3-based navigation of your knowledge entities and connections
- **Multi-Format Support** - Ingest text, URLs, PDFs, audio files, and images into your knowledge base
- **Performance Focus** - Built with Rust and server-side rendering for speed and efficiency
- **Self-Hosted & Privacy-Focused** - Full control over your data, and compatible with any OpenAI-compatible API that supports structured outputs
## The "Why" Behind Minne
For a while I've been fascinated by personal knowledge management systems. I wanted something that made it incredibly easy to capture content - snippets of text, URLs, and other media - while automatically discovering connections between ideas. But I also wanted to maintain control over my knowledge structure.
Traditional tools like Logseq and Obsidian are excellent, but the manual linking process often became a hindrance. Meanwhile, fully automated systems sometimes miss important context or create relationships I wouldn't have chosen myself.
So I built Minne to offer the best of both worlds: effortless content capture with AI-assisted relationship discovery, but with the flexibility to manually curate, edit, or override any connections. You can let AI handle the heavy lifting of extracting entities and finding relationships, take full control yourself, or use a hybrid approach where AI suggests connections that you can approve or modify.
While developing Minne, I discovered [KaraKeep](https://github.com/karakeep-app/karakeep) (formerly Hoarder), which is an excellent application in a similar space you probably want to check it out! However, if you're interested in a PKM that offers both intelligent automation and manual curation, with the ability to chat with your knowledge base, then Minne might be worth testing.
## Table of Contents
- [Quick Start](#quick-start)
- [Features in Detail](#features-in-detail)
- [Configuration](#configuration)
- [Tech Stack](#tech-stack)
- [Application Architecture](#application-architecture)
- [AI Configuration](#ai-configuration--model-selection)
- [Roadmap](#roadmap)
- [Development](#development)
- [Contributing](#contributing)
- [License](#license)
**[Live Demo](https://minne-demo.stark.pub)** — Read-only demo deployment
## Quick Start
The fastest way to get Minne running is with Docker Compose:
```bash
# Clone the repository
git clone https://github.com/perstarkse/minne.git
cd minne
# Start Minne and its database
# Set your OpenAI API key in docker-compose.yml, then:
docker compose up -d
# Access at http://localhost:3000
# Open http://localhost:3000
```
**Required Setup:**
- Replace `your_openai_api_key_here` in `docker-compose.yml` with your actual API key
- Configure `OPENAI_BASE_URL` if using a custom AI provider (like Ollama)
For detailed installation options, see [Configuration](#configuration).
## Features in Detail
### Search vs. Chat mode
**Search** - Use when you know roughly what you're looking for. Full-text search finds items quickly by matching your query terms.
**Chat Mode** - Use when you want to explore concepts, find connections, or reason about your knowledge. The AI analyzes your query and finds relevant context across your entire knowledge base.
### Content Processing
Minne automatically processes content you save:
1. **Web scraping** extracts readable text from URLs
2. **Text analysis** identifies key concepts and relationships
3. **Graph creation** builds connections between related content
4. **Embedding generation** enables semantic search capabilities
### Visual Knowledge Graph
Explore your knowledge as an interactive network with flexible curation options:
**Manual Curation** - Create knowledge entities and relationships yourself with full control over your graph structure
**AI Automation** - Let AI automatically extract entities and discover relationships from your content
**Hybrid Approach** - Get AI-suggested relationships and entities that you can manually review, edit, or approve
The graph visualization shows:
- Knowledge entities as nodes (manually created or AI-extracted)
- Relationships as connections (manually defined, AI-discovered, or suggested)
- Interactive navigation for discovery and editing
### Optional FastEmbed Reranking
Minne ships with an opt-in reranking stage powered by [fastembed-rs](https://github.com/Anush008/fastembed-rs). When enabled, the hybrid retrieval results are rescored with a lightweight cross-encoder before being returned to chat or ingestion flows. In practice this often means more relevant results, boosting answer quality and downstream enrichment.
⚠️ **Resource notes**
- Enabling reranking downloads and caches ~1.1GB of model data on first startup (cached under `<data_dir>/fastembed/reranker` by default).
- Initialization takes longer while warming the cache, and each query consumes extra CPU. The default pool size (2) is tuned for a singe user setup, but could work with a pool size on 1 as well.
- The feature is disabled by default. Set `reranking_enabled: true` (or `RERANKING_ENABLED=true`) if youre comfortable with the additional footprint.
Example configuration:
```yaml
reranking_enabled: true
reranking_pool_size: 2
fastembed_cache_dir: "/var/lib/minne/fastembed" # optional override, defaults to .fastembed_cache
```
## Tech Stack
- **Backend:** Rust with Axum framework and Server-Side Rendering (SSR)
- **Frontend:** HTML with HTMX and minimal JavaScript for interactivity
- **Database:** SurrealDB (graph, document, and vector search)
- **AI Integration:** OpenAI-compatible API with structured outputs
- **Web Processing:** Headless Chrome for robust webpage content extraction
## Configuration
Minne can be configured using environment variables or a `config.yaml` file. Environment variables take precedence over `config.yaml`.
### Required Configuration
- `SURREALDB_ADDRESS`: WebSocket address of your SurrealDB instance (e.g., `ws://127.0.0.1:8000`)
- `SURREALDB_USERNAME`: Username for SurrealDB (e.g., `root_user`)
- `SURREALDB_PASSWORD`: Password for SurrealDB (e.g., `root_password`)
- `SURREALDB_DATABASE`: Database name in SurrealDB (e.g., `minne_db`)
- `SURREALDB_NAMESPACE`: Namespace in SurrealDB (e.g., `minne_ns`)
- `OPENAI_API_KEY`: Your API key for OpenAI compatible endpoint
- `HTTP_PORT`: Port for the Minne server (Default: `3000`)
### Optional Configuration
- `RUST_LOG`: Controls logging level (e.g., `minne=info,tower_http=debug`)
- `DATA_DIR`: Directory to store local data (e.g., `./data`)
- `OPENAI_BASE_URL`: Base URL for custom AI providers (like Ollama)
- `RERANKING_ENABLED` / `reranking_enabled`: Set to `true` to enable the FastEmbed reranking stage (default `false`)
- `RERANKING_POOL_SIZE` / `reranking_pool_size`: Maximum concurrent reranker workers (defaults to `2`)
- `FASTEMBED_CACHE_DIR` / `fastembed_cache_dir`: Directory for cached FastEmbed models (defaults to `<data_dir>/fastembed/reranker`)
- `FASTEMBED_SHOW_DOWNLOAD_PROGRESS` / `fastembed_show_download_progress`: Show model download progress when warming the cache (default `true`)
### Example config.yaml
```yaml
surrealdb_address: "ws://127.0.0.1:8000"
surrealdb_username: "root_user"
surrealdb_password: "root_password"
surrealdb_database: "minne_db"
surrealdb_namespace: "minne_ns"
openai_api_key: "sk-YourActualOpenAIKeyGoesHere"
data_dir: "./minne_app_data"
http_port: 3000
# rust_log: "info"
```
## Installation Options
### 1. Docker Compose (Recommended)
```bash
# Clone and run
git clone https://github.com/perstarkse/minne.git
cd minne
docker compose up -d
```
The included `docker-compose.yml` handles SurrealDB and Chromium dependencies automatically.
### 2. Nix
Or with Nix:
```bash
nix run 'github:perstarkse/minne#main'
```
This fetches Minne and all dependencies, including Chromium.
## Features
### 3. Pre-built Binaries
- **Search & Chat** — Full-text search or conversational AI to find and reason about content
- **Knowledge Graph** — Visual exploration with automatic or manual relationship curation
- **Hybrid Retrieval** — Vector similarity + full-text + graph traversal for relevant results
- **Multi-Format** — Ingest text, URLs, PDFs, audio, and images
- **Self-Hosted** — Your data, your server, any OpenAI-compatible API
Download binaries for Windows, macOS, and Linux from the [GitHub Releases](https://github.com/perstarkse/minne/releases/latest).
## Documentation
**Requirements:** You'll need to provide SurrealDB and Chromium separately.
| Guide | Description |
|-------|-------------|
| [Installation](docs/installation.md) | Docker, Nix, binaries, source builds |
| [Configuration](docs/configuration.md) | Environment variables, config.yaml, AI setup |
| [Features](docs/features.md) | Search, Chat, Graph, Reranking, Ingestion |
| [Architecture](docs/architecture.md) | Tech stack, crate structure, data flow |
| [Vision](docs/vision.md) | Philosophy, roadmap, related projects |
### 4. Build from Source
## Tech Stack
```bash
git clone https://github.com/perstarkse/minne.git
cd minne
cargo run --release --bin main
```
**Requirements:** SurrealDB and Chromium must be installed and accessible in your PATH.
## Application Architecture
Minne offers flexible deployment options:
- **`main`**: Combined server and worker in one process (recommended for most users)
- **`server`**: Web interface and API only
- **`worker`**: Background processing only (for resource optimization)
## Usage
Once Minne is running at `http://localhost:3000`:
1. **Web Interface**: Full-featured experience for desktop and mobile
2. **iOS Shortcut**: Use the [Minne iOS Shortcut](https://www.icloud.com/shortcuts/e433fbd7602f4e2eaa70dca162323477) for quick content capture
3. **Content Types**: Save notes, URLs, audio files, and more
4. **Knowledge Graph**: Explore automatic connections between your content
5. **Chat Interface**: Query your knowledge base conversationally
## AI Configuration & Model Selection
### Setting Up AI Providers
Minne uses OpenAI-compatible APIs. Configure via environment variables or `config.yaml`:
- `OPENAI_API_KEY` (required): Your API key
- `OPENAI_BASE_URL` (optional): Custom provider URL (e.g., Ollama: `http://localhost:11434/v1`)
### Model Selection
1. Access the `/admin` page in your Minne instance
2. Select models for content processing and chat from your configured provider
3. **Content Processing Requirements**: The model must support structured outputs
4. **Embedding Dimensions**: Update this setting when changing embedding models (e.g., 1536 for `text-embedding-3-small`, 768 for `nomic-embed-text`)
## Roadmap
Current development focus:
- TUI frontend with system editor integration
- Enhanced reranking for improved retrieval recall
- Additional content type support
Feature requests and contributions are welcome!
## Development
```bash
# Run tests
cargo test
# Development build
cargo build
# Comprehensive linting
cargo clippy --workspace --all-targets --all-features
```
The codebase includes extensive unit tests. Integration tests and additional contributions are welcome.
Rust • Axum • HTMX • SurrealDB • FastEmbed
## Contributing
I've developed Minne primarily for my own use, but having been in the selfhosted space for a long time, and using the efforts by others, I thought I'd share with the community. Feature requests are welcome.
Feature requests and contributions welcome. See [Vision](docs/vision.md) for roadmap.
## License
Minne is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**. See the [LICENSE](LICENSE) file for details.
[AGPL-3.0](LICENSE)

74
docs/architecture.md Normal file
View File

@@ -0,0 +1,74 @@
# Architecture
## Tech Stack
| Layer | Technology |
|-------|------------|
| Backend | Rust with Axum (SSR) |
| Frontend | HTML + HTMX + minimal JS |
| Database | SurrealDB (graph, document, vector) |
| AI | OpenAI-compatible API |
| Web Processing | Headless Chromium |
## Crate Structure
```
minne/
├── main/ # Combined server + worker binary
├── api-router/ # REST API routes
├── html-router/ # SSR web interface
├── ingestion-pipeline/ # Content processing pipeline
├── retrieval-pipeline/ # Search and retrieval logic
├── common/ # Shared types, storage, utilities
├── evaluations/ # Benchmarking framework
└── json-stream-parser/ # Streaming JSON utilities
```
## Process Modes
| Binary | Purpose |
|--------|---------|
| `main` | All-in-one: serves UI and processes content |
| `server` | UI and API only (no background processing) |
| `worker` | Background processing only (no UI) |
Split deployment is useful for scaling or resource isolation.
## Data Flow
```
Content In → Ingestion Pipeline → SurrealDB
Entity Extraction
Embedding Generation
Graph Relationships
Query → Retrieval Pipeline → Results
Vector Search + FTS + Graph
RRF Fusion → (Optional Rerank) → Response
```
## Database Schema
SurrealDB stores:
- **TextContent** — Raw ingested content
- **TextChunk** — Chunked content with embeddings
- **KnowledgeEntity** — Extracted entities (people, concepts, etc.)
- **KnowledgeRelationship** — Connections between entities
- **User** — Authentication and preferences
- **SystemSettings** — Model configuration
Embeddings are stored in dedicated tables with HNSW indexes for fast vector search.
## Retrieval Strategy
1. **Collect candidates** — Vector similarity + full-text search
2. **Merge ranks** — Reciprocal Rank Fusion (RRF)
3. **Attach context** — Link chunks to parent entities
4. **Rerank** (optional) — Cross-encoder rescoring
5. **Return** — Top-k results with metadata

77
docs/configuration.md Normal file
View File

@@ -0,0 +1,77 @@
# Configuration
Minne can be configured via environment variables or a `config.yaml` file. Environment variables take precedence.
## Required Settings
| Variable | Description | Example |
|----------|-------------|---------|
| `OPENAI_API_KEY` | API key for OpenAI-compatible endpoint | `sk-...` |
| `SURREALDB_ADDRESS` | WebSocket address of SurrealDB | `ws://127.0.0.1:8000` |
| `SURREALDB_USERNAME` | SurrealDB username | `root_user` |
| `SURREALDB_PASSWORD` | SurrealDB password | `root_password` |
| `SURREALDB_DATABASE` | Database name | `minne_db` |
| `SURREALDB_NAMESPACE` | Namespace | `minne_ns` |
## Optional Settings
| Variable | Description | Default |
|----------|-------------|---------|
| `HTTP_PORT` | Server port | `3000` |
| `DATA_DIR` | Local data directory | `./data` |
| `OPENAI_BASE_URL` | Custom AI provider URL | OpenAI default |
| `RUST_LOG` | Logging level | `info` |
### Reranking (Optional)
| Variable | Description | Default |
|----------|-------------|---------|
| `RERANKING_ENABLED` | Enable FastEmbed reranking | `false` |
| `RERANKING_POOL_SIZE` | Concurrent reranker workers | `2` |
| `FASTEMBED_CACHE_DIR` | Model cache directory | `<data_dir>/fastembed/reranker` |
> [!NOTE]
> Enabling reranking downloads ~1.1 GB of model data on first startup.
## Example config.yaml
```yaml
surrealdb_address: "ws://127.0.0.1:8000"
surrealdb_username: "root_user"
surrealdb_password: "root_password"
surrealdb_database: "minne_db"
surrealdb_namespace: "minne_ns"
openai_api_key: "sk-your-key-here"
data_dir: "./minne_data"
http_port: 3000
# Optional reranking
reranking_enabled: true
reranking_pool_size: 2
```
## AI Provider Setup
Minne works with any OpenAI-compatible API that supports structured outputs.
### OpenAI (Default)
Set `OPENAI_API_KEY` only. The default base URL points to OpenAI.
### Ollama
```bash
OPENAI_API_KEY="ollama"
OPENAI_BASE_URL="http://localhost:11434/v1"
```
### Other Providers
Any provider exposing an OpenAI-compatible endpoint works. Set `OPENAI_BASE_URL` accordingly.
## Model Selection
1. Access `/admin` in your Minne instance
2. Select models for content processing and chat
3. **Content Processing**: Must support structured outputs
4. **Embedding Dimensions**: Update when changing embedding models (e.g., 1536 for `text-embedding-3-small`)

64
docs/features.md Normal file
View File

@@ -0,0 +1,64 @@
# Features
## Search vs Chat
**Search** — Use when you know what you're looking for. Full-text search matches query terms across your content.
**Chat** — Use when exploring concepts or reasoning about your knowledge. The AI analyzes your query and retrieves relevant context from your entire knowledge base.
## Content Processing
Minne automatically processes saved content:
1. **Web scraping** extracts readable text from URLs (via headless Chrome)
2. **Text analysis** identifies key concepts and relationships
3. **Graph creation** builds connections between related content
4. **Embedding generation** enables semantic search
## Knowledge Graph
Explore your knowledge as an interactive network:
- **Manual curation** — Create entities and relationships yourself
- **AI automation** — Let AI extract entities and discover relationships
- **Hybrid approach** — AI suggests connections for your approval
The D3-based graph visualization shows entities as nodes and relationships as edges.
## Hybrid Retrieval
Minne combines multiple retrieval strategies:
- **Vector similarity** — Semantic matching via embeddings
- **Full-text search** — Keyword matching with BM25
- **Graph traversal** — Following relationships between entities
Results are merged using Reciprocal Rank Fusion (RRF) for optimal relevance.
## Reranking (Optional)
When enabled, retrieval results are rescored with a cross-encoder model for improved relevance. Powered by [fastembed-rs](https://github.com/Anush008/fastembed-rs).
**Trade-offs:**
- Downloads ~1.1 GB of model data
- Adds latency per query
- Potentially improves answer quality, see [blog post](https://blog.stark.pub/posts/eval-retrieval-refactor/)
Enable via `RERANKING_ENABLED=true`. See [Configuration](./configuration.md).
## Multi-Format Ingestion
Supported content types:
- Plain text and notes
- URLs (web pages)
- PDF documents
- Audio files
- Images
## Scratchpad
Quickly capture content without committing to permanent storage. Convert to full content when ready.
## iOS Shortcut
Use the [Minne iOS Shortcut](https://www.icloud.com/shortcuts/e433fbd7602f4e2eaa70dca162323477) for quick content capture from your phone.

67
docs/installation.md Normal file
View File

@@ -0,0 +1,67 @@
# Installation
Minne can be installed through several methods. Choose the one that best fits your setup.
## Docker Compose (Recommended)
The fastest way to get Minne running with all dependencies:
```bash
git clone https://github.com/perstarkse/minne.git
cd minne
docker compose up -d
```
The included `docker-compose.yml` handles SurrealDB and Chromium automatically.
**Required:** Set your `OPENAI_API_KEY` in `docker-compose.yml` before starting.
## Nix
Run Minne directly with Nix (includes Chromium):
```bash
nix run 'github:perstarkse/minne#main'
```
Configure via environment variables or a `config.yaml` file. See [Configuration](./configuration.md).
## Pre-built Binaries
Download binaries for Windows, macOS, and Linux from [GitHub Releases](https://github.com/perstarkse/minne/releases/latest).
**Requirements:**
- SurrealDB instance (local or remote)
- Chromium (for web scraping)
## Build from Source
```bash
git clone https://github.com/perstarkse/minne.git
cd minne
cargo build --release --bin main
```
The binary will be at `target/release/main`.
**Requirements:**
- Rust toolchain
- SurrealDB accessible at configured address
- Chromium in PATH
## Process Modes
Minne offers flexible deployment:
| Binary | Description |
|--------|-------------|
| `main` | Combined server + worker (recommended) |
| `server` | Web interface and API only |
| `worker` | Background processing only |
For most users, `main` is the right choice. Split deployments are useful for resource optimization or scaling.
## Next Steps
- [Configuration](./configuration.md) — Environment variables and config.yaml
- [Features](./features.md) — What Minne can do

48
docs/vision.md Normal file
View File

@@ -0,0 +1,48 @@
# Vision
## The "Why" Behind Minne
Personal knowledge management has always fascinated me. I wanted something that made it incredibly easy to capture content—snippets of text, URLs, media—while automatically discovering connections between ideas. But I also wanted control over my knowledge structure.
Traditional tools like Logseq and Obsidian are excellent, but manual linking often becomes a hindrance. Fully automated systems sometimes miss important context or create relationships I wouldn't have chosen.
Minne offers the best of both worlds: effortless capture with AI-assisted relationship discovery, but with flexibility to manually curate, edit, or override connections. Let AI handle the heavy lifting, take full control yourself, or use a hybrid approach where AI suggests and you approve.
## Design Principles
- **Capture should be instant** — No friction between thought and storage
- **Connections should emerge** — AI finds relationships you might miss
- **Control should be optional** — Automate by default, curate when it matters
- **Privacy should be default** — Self-hosted, your data stays yours
## Roadmap
### Near-term
- [ ] TUI frontend with system editor integration
- [ ] Enhanced retrieval recall via improved reranking
- [ ] Additional content type support (e-books, research papers)
### Medium-term
- [ ] Embedded SurrealDB option (zero-config `nix run` with just `OPENAI_API_KEY`)
- [ ] Browser extension for seamless capture
- [ ] Mobile-native apps
### Long-term
- [ ] Federated knowledge sharing (opt-in)
- [ ] Local LLM integration (fully offline operation)
- [ ] Plugin system for custom entity extractors
## Related Projects
If Minne isn't quite right for you, check out:
- [Karakeep](https://github.com/karakeep-app/karakeep) (formerly Hoarder) — Excellent bookmark/read-later with AI tagging
- [Logseq](https://logseq.com/) — Outliner-based PKM with manual linking
- [Obsidian](https://obsidian.md/) — Markdown-based PKM with plugin ecosystem
## Contributing
Feature requests and contributions are welcome. Minne was built for personal use first, but the self-hosted community benefits when we share.

212
evaluations/README.md Normal file
View File

@@ -0,0 +1,212 @@
# Evaluations
The `evaluations` crate provides a retrieval evaluation framework for benchmarking Minne's information retrieval pipeline against standard datasets.
## Quick Start
```bash
# Run SQuAD v2.0 evaluation (vector-only, recommended)
cargo run --package evaluations -- --ingest-chunks-only
# Run a specific dataset
cargo run --package evaluations -- --dataset fiqa --ingest-chunks-only
# Convert dataset only (no evaluation)
cargo run --package evaluations -- --convert-only
```
## Prerequisites
### 1. SurrealDB
Start a SurrealDB instance before running evaluations:
```bash
docker-compose up -d surrealdb
```
Or using the default endpoint configuration:
```bash
surreal start --user root_user --pass root_password
```
### 2. Download Raw Datasets
Raw datasets must be downloaded manually and placed in `evaluations/data/raw/`. See [Dataset Sources](#dataset-sources) below for links and formats.
## Directory Structure
```
evaluations/
├── data/
│ ├── raw/ # Downloaded raw datasets (manual)
│ │ ├── squad/ # SQuAD v2.0
│ │ ├── nq-dev/ # Natural Questions
│ │ ├── fiqa/ # BEIR: FiQA-2018
│ │ ├── fever/ # BEIR: FEVER
│ │ ├── hotpotqa/ # BEIR: HotpotQA
│ │ └── ... # Other BEIR subsets
│ └── converted/ # Auto-generated (Minne JSON format)
├── cache/ # Ingestion and embedding caches
├── reports/ # Evaluation output (JSON + Markdown)
├── manifest.yaml # Dataset and slice definitions
└── src/ # Evaluation source code
```
## Dataset Sources
### SQuAD v2.0
Download and place at `data/raw/squad/dev-v2.0.json`:
```bash
mkdir -p evaluations/data/raw/squad
curl -L https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json \
-o evaluations/data/raw/squad/dev-v2.0.json
```
### Natural Questions (NQ)
Download and place at `data/raw/nq-dev/dev-all.jsonl`:
```bash
mkdir -p evaluations/data/raw/nq-dev
# Download from Google's Natural Questions page or HuggingFace
# File: dev-all.jsonl (simplified JSONL format)
```
Source: [Google Natural Questions](https://ai.google.com/research/NaturalQuestions)
### BEIR Datasets
All BEIR datasets follow the same format structure:
```
data/raw/<dataset>/
├── corpus.jsonl # Document corpus
├── queries.jsonl # Query set
└── qrels/
└── test.tsv # Relevance judgments (or dev.tsv)
```
Download datasets from the [BEIR Benchmark repository](https://github.com/beir-cellar/beir). Each dataset zip extracts to the required directory structure.
| Dataset | Directory |
|------------|---------------|
| FEVER | `fever/` |
| FiQA-2018 | `fiqa/` |
| HotpotQA | `hotpotqa/` |
| NFCorpus | `nfcorpus/` |
| Quora | `quora/` |
| TREC-COVID | `trec-covid/` |
| SciFact | `scifact/` |
| NQ (BEIR) | `nq/` |
Example download:
```bash
cd evaluations/data/raw
curl -L https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/fiqa.zip -o fiqa.zip
unzip fiqa.zip && rm fiqa.zip
```
## Dataset Conversion
Raw datasets are automatically converted to Minne's internal JSON format on first run. To force reconversion:
```bash
cargo run --package evaluations -- --force-convert
```
Converted files are saved to `data/converted/` and cached for subsequent runs.
## CLI Reference
### Common Options
| Flag | Description | Default |
|------|-------------|---------|
| `--dataset <NAME>` | Dataset to evaluate | `squad-v2` |
| `--limit <N>` | Max questions to evaluate (0 = all) | `200` |
| `--k <N>` | Precision@k cutoff | `5` |
| `--slice <ID>` | Use a predefined slice from manifest | — |
| `--rerank` | Enable FastEmbed reranking stage | disabled |
| `--embedding-backend <BE>` | `fastembed` or `hashed` | `fastembed` |
| `--ingest-chunks-only` | Skip entity extraction, ingest only text chunks | disabled |
> [!TIP]
> Use `--ingest-chunks-only` when evaluating vector-only retrieval strategies. This skips the LLM-based entity extraction and graph generation, significantly speeding up ingestion while focusing on pure chunk-based vector search.
### Available Datasets
```
squad-v2, natural-questions, beir, fever, fiqa, hotpotqa,
nfcorpus, quora, trec-covid, scifact, nq-beir
```
### Database Configuration
| Flag | Environment | Default |
|------|-------------|---------|
| `--db-endpoint` | `EVAL_DB_ENDPOINT` | `ws://127.0.0.1:8000` |
| `--db-username` | `EVAL_DB_USERNAME` | `root_user` |
| `--db-password` | `EVAL_DB_PASSWORD` | `root_password` |
| `--db-namespace` | `EVAL_DB_NAMESPACE` | auto-generated |
| `--db-database` | `EVAL_DB_DATABASE` | auto-generated |
### Example Runs
```bash
# Vector-only evaluation (recommended for benchmarking)
cargo run --package evaluations -- \
--dataset fiqa \
--ingest-chunks-only \
--limit 200
# Full FiQA evaluation with reranking
cargo run --package evaluations -- \
--dataset fiqa \
--ingest-chunks-only \
--limit 500 \
--rerank \
--k 10
# Use a predefined slice for reproducibility
cargo run --package evaluations -- --slice fiqa-test-200 --ingest-chunks-only
# Run the mixed BEIR benchmark
cargo run --package evaluations -- --dataset beir --slice beir-mix-600 --ingest-chunks-only
```
## Slices
Slices are predefined, reproducible subsets defined in `manifest.yaml`. Each slice specifies:
- **limit**: Number of questions
- **corpus_limit**: Maximum corpus size
- **seed**: Fixed RNG seed for reproducibility
View available slices in [manifest.yaml](./manifest.yaml).
## Reports
Evaluations generate reports in `reports/`:
- **JSON**: Full structured results (`*-report.json`)
- **Markdown**: Human-readable summary with sample mismatches (`*-report.md`)
- **History**: Timestamped run history (`history/`)
## Performance Tuning
```bash
# Log per-stage performance timings
cargo run --package evaluations -- --perf-log-console
# Save telemetry to file
cargo run --package evaluations -- --perf-log-json ./perf.json
```
## License
See [../LICENSE](../LICENSE).