mirror of
https://github.com/perstarkse/minne.git
synced 2026-03-26 19:31:32 +01:00
docs: evaluations instructions and readme refactoring
This commit is contained in:
74
docs/architecture.md
Normal file
74
docs/architecture.md
Normal file
@@ -0,0 +1,74 @@
|
||||
# Architecture
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|------------|
|
||||
| Backend | Rust with Axum (SSR) |
|
||||
| Frontend | HTML + HTMX + minimal JS |
|
||||
| Database | SurrealDB (graph, document, vector) |
|
||||
| AI | OpenAI-compatible API |
|
||||
| Web Processing | Headless Chromium |
|
||||
|
||||
## Crate Structure
|
||||
|
||||
```
|
||||
minne/
|
||||
├── main/ # Combined server + worker binary
|
||||
├── api-router/ # REST API routes
|
||||
├── html-router/ # SSR web interface
|
||||
├── ingestion-pipeline/ # Content processing pipeline
|
||||
├── retrieval-pipeline/ # Search and retrieval logic
|
||||
├── common/ # Shared types, storage, utilities
|
||||
├── evaluations/ # Benchmarking framework
|
||||
└── json-stream-parser/ # Streaming JSON utilities
|
||||
```
|
||||
|
||||
## Process Modes
|
||||
|
||||
| Binary | Purpose |
|
||||
|--------|---------|
|
||||
| `main` | All-in-one: serves UI and processes content |
|
||||
| `server` | UI and API only (no background processing) |
|
||||
| `worker` | Background processing only (no UI) |
|
||||
|
||||
Split deployment is useful for scaling or resource isolation.
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
Content In → Ingestion Pipeline → SurrealDB
|
||||
↓
|
||||
Entity Extraction
|
||||
↓
|
||||
Embedding Generation
|
||||
↓
|
||||
Graph Relationships
|
||||
|
||||
Query → Retrieval Pipeline → Results
|
||||
↓
|
||||
Vector Search + FTS + Graph
|
||||
↓
|
||||
RRF Fusion → (Optional Rerank) → Response
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
SurrealDB stores:
|
||||
|
||||
- **TextContent** — Raw ingested content
|
||||
- **TextChunk** — Chunked content with embeddings
|
||||
- **KnowledgeEntity** — Extracted entities (people, concepts, etc.)
|
||||
- **KnowledgeRelationship** — Connections between entities
|
||||
- **User** — Authentication and preferences
|
||||
- **SystemSettings** — Model configuration
|
||||
|
||||
Embeddings are stored in dedicated tables with HNSW indexes for fast vector search.
|
||||
|
||||
## Retrieval Strategy
|
||||
|
||||
1. **Collect candidates** — Vector similarity + full-text search
|
||||
2. **Merge ranks** — Reciprocal Rank Fusion (RRF)
|
||||
3. **Attach context** — Link chunks to parent entities
|
||||
4. **Rerank** (optional) — Cross-encoder rescoring
|
||||
5. **Return** — Top-k results with metadata
|
||||
77
docs/configuration.md
Normal file
77
docs/configuration.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# Configuration
|
||||
|
||||
Minne can be configured via environment variables or a `config.yaml` file. Environment variables take precedence.
|
||||
|
||||
## Required Settings
|
||||
|
||||
| Variable | Description | Example |
|
||||
|----------|-------------|---------|
|
||||
| `OPENAI_API_KEY` | API key for OpenAI-compatible endpoint | `sk-...` |
|
||||
| `SURREALDB_ADDRESS` | WebSocket address of SurrealDB | `ws://127.0.0.1:8000` |
|
||||
| `SURREALDB_USERNAME` | SurrealDB username | `root_user` |
|
||||
| `SURREALDB_PASSWORD` | SurrealDB password | `root_password` |
|
||||
| `SURREALDB_DATABASE` | Database name | `minne_db` |
|
||||
| `SURREALDB_NAMESPACE` | Namespace | `minne_ns` |
|
||||
|
||||
## Optional Settings
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `HTTP_PORT` | Server port | `3000` |
|
||||
| `DATA_DIR` | Local data directory | `./data` |
|
||||
| `OPENAI_BASE_URL` | Custom AI provider URL | OpenAI default |
|
||||
| `RUST_LOG` | Logging level | `info` |
|
||||
|
||||
### Reranking (Optional)
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `RERANKING_ENABLED` | Enable FastEmbed reranking | `false` |
|
||||
| `RERANKING_POOL_SIZE` | Concurrent reranker workers | `2` |
|
||||
| `FASTEMBED_CACHE_DIR` | Model cache directory | `<data_dir>/fastembed/reranker` |
|
||||
|
||||
> [!NOTE]
|
||||
> Enabling reranking downloads ~1.1 GB of model data on first startup.
|
||||
|
||||
## Example config.yaml
|
||||
|
||||
```yaml
|
||||
surrealdb_address: "ws://127.0.0.1:8000"
|
||||
surrealdb_username: "root_user"
|
||||
surrealdb_password: "root_password"
|
||||
surrealdb_database: "minne_db"
|
||||
surrealdb_namespace: "minne_ns"
|
||||
openai_api_key: "sk-your-key-here"
|
||||
data_dir: "./minne_data"
|
||||
http_port: 3000
|
||||
|
||||
# Optional reranking
|
||||
reranking_enabled: true
|
||||
reranking_pool_size: 2
|
||||
```
|
||||
|
||||
## AI Provider Setup
|
||||
|
||||
Minne works with any OpenAI-compatible API that supports structured outputs.
|
||||
|
||||
### OpenAI (Default)
|
||||
|
||||
Set `OPENAI_API_KEY` only. The default base URL points to OpenAI.
|
||||
|
||||
### Ollama
|
||||
|
||||
```bash
|
||||
OPENAI_API_KEY="ollama"
|
||||
OPENAI_BASE_URL="http://localhost:11434/v1"
|
||||
```
|
||||
|
||||
### Other Providers
|
||||
|
||||
Any provider exposing an OpenAI-compatible endpoint works. Set `OPENAI_BASE_URL` accordingly.
|
||||
|
||||
## Model Selection
|
||||
|
||||
1. Access `/admin` in your Minne instance
|
||||
2. Select models for content processing and chat
|
||||
3. **Content Processing**: Must support structured outputs
|
||||
4. **Embedding Dimensions**: Update when changing embedding models (e.g., 1536 for `text-embedding-3-small`)
|
||||
64
docs/features.md
Normal file
64
docs/features.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Features
|
||||
|
||||
## Search vs Chat
|
||||
|
||||
**Search** — Use when you know what you're looking for. Full-text search matches query terms across your content.
|
||||
|
||||
**Chat** — Use when exploring concepts or reasoning about your knowledge. The AI analyzes your query and retrieves relevant context from your entire knowledge base.
|
||||
|
||||
## Content Processing
|
||||
|
||||
Minne automatically processes saved content:
|
||||
|
||||
1. **Web scraping** extracts readable text from URLs (via headless Chrome)
|
||||
2. **Text analysis** identifies key concepts and relationships
|
||||
3. **Graph creation** builds connections between related content
|
||||
4. **Embedding generation** enables semantic search
|
||||
|
||||
## Knowledge Graph
|
||||
|
||||
Explore your knowledge as an interactive network:
|
||||
|
||||
- **Manual curation** — Create entities and relationships yourself
|
||||
- **AI automation** — Let AI extract entities and discover relationships
|
||||
- **Hybrid approach** — AI suggests connections for your approval
|
||||
|
||||
The D3-based graph visualization shows entities as nodes and relationships as edges.
|
||||
|
||||
## Hybrid Retrieval
|
||||
|
||||
Minne combines multiple retrieval strategies:
|
||||
|
||||
- **Vector similarity** — Semantic matching via embeddings
|
||||
- **Full-text search** — Keyword matching with BM25
|
||||
- **Graph traversal** — Following relationships between entities
|
||||
|
||||
Results are merged using Reciprocal Rank Fusion (RRF) for optimal relevance.
|
||||
|
||||
## Reranking (Optional)
|
||||
|
||||
When enabled, retrieval results are rescored with a cross-encoder model for improved relevance. Powered by [fastembed-rs](https://github.com/Anush008/fastembed-rs).
|
||||
|
||||
**Trade-offs:**
|
||||
- Downloads ~1.1 GB of model data
|
||||
- Adds latency per query
|
||||
- Potentially improves answer quality, see [blog post](https://blog.stark.pub/posts/eval-retrieval-refactor/)
|
||||
|
||||
Enable via `RERANKING_ENABLED=true`. See [Configuration](./configuration.md).
|
||||
|
||||
## Multi-Format Ingestion
|
||||
|
||||
Supported content types:
|
||||
- Plain text and notes
|
||||
- URLs (web pages)
|
||||
- PDF documents
|
||||
- Audio files
|
||||
- Images
|
||||
|
||||
## Scratchpad
|
||||
|
||||
Quickly capture content without committing to permanent storage. Convert to full content when ready.
|
||||
|
||||
## iOS Shortcut
|
||||
|
||||
Use the [Minne iOS Shortcut](https://www.icloud.com/shortcuts/e433fbd7602f4e2eaa70dca162323477) for quick content capture from your phone.
|
||||
67
docs/installation.md
Normal file
67
docs/installation.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Installation
|
||||
|
||||
Minne can be installed through several methods. Choose the one that best fits your setup.
|
||||
|
||||
## Docker Compose (Recommended)
|
||||
|
||||
The fastest way to get Minne running with all dependencies:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/perstarkse/minne.git
|
||||
cd minne
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The included `docker-compose.yml` handles SurrealDB and Chromium automatically.
|
||||
|
||||
**Required:** Set your `OPENAI_API_KEY` in `docker-compose.yml` before starting.
|
||||
|
||||
## Nix
|
||||
|
||||
Run Minne directly with Nix (includes Chromium):
|
||||
|
||||
```bash
|
||||
nix run 'github:perstarkse/minne#main'
|
||||
```
|
||||
|
||||
Configure via environment variables or a `config.yaml` file. See [Configuration](./configuration.md).
|
||||
|
||||
## Pre-built Binaries
|
||||
|
||||
Download binaries for Windows, macOS, and Linux from [GitHub Releases](https://github.com/perstarkse/minne/releases/latest).
|
||||
|
||||
**Requirements:**
|
||||
- SurrealDB instance (local or remote)
|
||||
- Chromium (for web scraping)
|
||||
|
||||
## Build from Source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/perstarkse/minne.git
|
||||
cd minne
|
||||
cargo build --release --bin main
|
||||
```
|
||||
|
||||
The binary will be at `target/release/main`.
|
||||
|
||||
**Requirements:**
|
||||
- Rust toolchain
|
||||
- SurrealDB accessible at configured address
|
||||
- Chromium in PATH
|
||||
|
||||
## Process Modes
|
||||
|
||||
Minne offers flexible deployment:
|
||||
|
||||
| Binary | Description |
|
||||
|--------|-------------|
|
||||
| `main` | Combined server + worker (recommended) |
|
||||
| `server` | Web interface and API only |
|
||||
| `worker` | Background processing only |
|
||||
|
||||
For most users, `main` is the right choice. Split deployments are useful for resource optimization or scaling.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Configuration](./configuration.md) — Environment variables and config.yaml
|
||||
- [Features](./features.md) — What Minne can do
|
||||
48
docs/vision.md
Normal file
48
docs/vision.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Vision
|
||||
|
||||
## The "Why" Behind Minne
|
||||
|
||||
Personal knowledge management has always fascinated me. I wanted something that made it incredibly easy to capture content—snippets of text, URLs, media—while automatically discovering connections between ideas. But I also wanted control over my knowledge structure.
|
||||
|
||||
Traditional tools like Logseq and Obsidian are excellent, but manual linking often becomes a hindrance. Fully automated systems sometimes miss important context or create relationships I wouldn't have chosen.
|
||||
|
||||
Minne offers the best of both worlds: effortless capture with AI-assisted relationship discovery, but with flexibility to manually curate, edit, or override connections. Let AI handle the heavy lifting, take full control yourself, or use a hybrid approach where AI suggests and you approve.
|
||||
|
||||
## Design Principles
|
||||
|
||||
- **Capture should be instant** — No friction between thought and storage
|
||||
- **Connections should emerge** — AI finds relationships you might miss
|
||||
- **Control should be optional** — Automate by default, curate when it matters
|
||||
- **Privacy should be default** — Self-hosted, your data stays yours
|
||||
|
||||
## Roadmap
|
||||
|
||||
### Near-term
|
||||
|
||||
- [ ] TUI frontend with system editor integration
|
||||
- [ ] Enhanced retrieval recall via improved reranking
|
||||
- [ ] Additional content type support (e-books, research papers)
|
||||
|
||||
### Medium-term
|
||||
|
||||
- [ ] Embedded SurrealDB option (zero-config `nix run` with just `OPENAI_API_KEY`)
|
||||
- [ ] Browser extension for seamless capture
|
||||
- [ ] Mobile-native apps
|
||||
|
||||
### Long-term
|
||||
|
||||
- [ ] Federated knowledge sharing (opt-in)
|
||||
- [ ] Local LLM integration (fully offline operation)
|
||||
- [ ] Plugin system for custom entity extractors
|
||||
|
||||
## Related Projects
|
||||
|
||||
If Minne isn't quite right for you, check out:
|
||||
|
||||
- [Karakeep](https://github.com/karakeep-app/karakeep) (formerly Hoarder) — Excellent bookmark/read-later with AI tagging
|
||||
- [Logseq](https://logseq.com/) — Outliner-based PKM with manual linking
|
||||
- [Obsidian](https://obsidian.md/) — Markdown-based PKM with plugin ecosystem
|
||||
|
||||
## Contributing
|
||||
|
||||
Feature requests and contributions are welcome. Minne was built for personal use first, but the self-hosted community benefits when we share.
|
||||
Reference in New Issue
Block a user