mirror of https://github.com/perstarkse/minne.git synced 2026-05-31 03:40:38 +02:00

Files

T

Per Stark 5c2d2e24d3 chore: refactor retrieval pipeline to chunk-first RRF with derived entities and slimmer eval surface.

Collapse the multi-strategy entity engine into one benchmarked chunk retrieval path, derive entities from retrieved chunks, and update consumers, docs, and clippy fixes across the workspace.

2026-05-30 22:19:08 +02:00

2.4 KiB

Raw Blame History

Features

Search vs Chat

Search — Use when you know what you're looking for. Full-text search matches query terms across your content.

Chat — Use when exploring concepts or reasoning about your knowledge. The AI analyzes your query and retrieves relevant context from your entire knowledge base.

Content Processing

Minne automatically processes saved content:

Web scraping extracts readable text from URLs (via headless Chrome)
Text analysis identifies key concepts and relationships
Graph creation builds connections between related content
Embedding generation enables semantic search

Knowledge Graph

Explore your knowledge as an interactive network:

Manual curation — Create entities and relationships yourself
AI automation — Let AI extract entities and discover relationships
Hybrid approach — AI suggests connections for your approval

The D3-based graph visualization shows entities as nodes and relationships as edges.

Hybrid Retrieval

Minne uses chunk-first hybrid retrieval over the knowledge base:

Vector similarity — Semantic matching via embeddings over text chunks
Full-text search — Keyword matching with BM25 over the same chunk index

The two ranked candidate lists are merged with Reciprocal Rank Fusion (RRF). When a caller needs knowledge entities (search, ingestion linking, relationship suggestion), entities are derived from the top retrieved chunks grouped by source_id.

Optional reranking can rescore the fused chunk list with a cross-encoder model; see below.

Reranking (Optional)

When enabled, retrieval results are rescored with a cross-encoder model for improved relevance. Powered by fastembed-rs.

Trade-offs:

Downloads ~1.1 GB of model data
Adds latency per query
Potentially improves answer quality, see blog post

Enable via RERANKING_ENABLED=true. See Configuration.

Multi-Format Ingestion

Supported content types:

Plain text and notes
URLs (web pages)
PDF documents
Audio files
Images

Scratchpad

Quickly capture content without committing to permanent storage. Convert to full content when ready.

iOS Shortcut

Use the Minne iOS Shortcut for quick content capture from your phone.

2.4 KiB Raw Blame History