chore: refactor retrieval pipeline to chunk-first RRF with derived entities and slimmer eval surface.

Collapse the multi-strategy entity engine into one benchmarked chunk retrieval path, derive entities from retrieved chunks, and update consumers, docs, and clippy fixes across the workspace.
This commit is contained in:
Per Stark
2026-05-30 22:19:08 +02:00
parent c70141de35
commit 5c2d2e24d3
38 changed files with 1049 additions and 2614 deletions
+6 -5
View File
@@ -27,13 +27,14 @@ The D3-based graph visualization shows entities as nodes and relationships as ed
## Hybrid Retrieval
Minne combines multiple retrieval strategies:
Minne uses chunk-first hybrid retrieval over the knowledge base:
- **Vector similarity** — Semantic matching via embeddings
- **Full-text search** — Keyword matching with BM25
- **Graph traversal** — Following relationships between entities
- **Vector similarity** — Semantic matching via embeddings over text chunks
- **Full-text search** — Keyword matching with BM25 over the same chunk index
Results are merged using Reciprocal Rank Fusion (RRF) for optimal relevance.
The two ranked candidate lists are merged with Reciprocal Rank Fusion (RRF). When a caller needs knowledge entities (search, ingestion linking, relationship suggestion), entities are derived from the top retrieved chunks grouped by `source_id`.
Optional **reranking** can rescore the fused chunk list with a cross-encoder model; see below.
## Reranking (Optional)