mirror of
https://github.com/perstarkse/minne.git
synced 2026-06-24 10:56:29 +02:00
3.9 KiB
3.9 KiB
Evaluations crate refactor plan
This document records the architecture review and the simplification work applied to the
evaluations crate. No backwards compatibility is maintained for converted JSON layouts,
legacy report history, or old cache artifact formats.
Goals
- Smaller, linear pipeline (no state machine ceremony)
- Sharded converted store for all datasets (memory-efficient partial loading)
- Slice-first loading when a catalog slice is selected
- In-memory SurrealDB for ingestion (no ephemeral server namespaces)
- Single DB lifecycle module (
db/) - CLI helpers under
cli/
Primary workflow
# One-time prep (converts raw data if needed, builds slice ledger, corpus cache, DB seed)
cargo eval --warm --dataset beir --slice beir-mix-600
# Check readiness
cargo eval --status --dataset beir --slice beir-mix-600
# Steady-state benchmark
cargo eval --dataset beir --slice beir-mix-600 --require-ready
Default dataset is beir. Chunk-only ingestion is the default; pass --include-entities to
opt into entity extraction (requires OPENAI_API_KEY). Slice tuning such as
negative_multiplier lives in manifest.yaml (e.g. beir-mix-600 uses 9.0).
Cache layers (after refactor)
| Layer | Location | Purpose |
|---|---|---|
| Converted store | data/converted/<name>/ |
Sharded paragraphs + question catalog |
| Slice ledger | cache/slices/<dataset>/<slice-id>.json |
Deterministic questions + paragraph set |
| Corpus cache | cache/ingested/<dataset>/<slice-id>/ |
Ingestion paragraph shards, manifest, and namespace reuse seed |
Namespace reuse state lives in the corpus manifest (metadata.namespace_seed), not a separate
snapshots/ tree. After upgrading, delete old *-minne.json monolithic files, any
cache/snapshots/ directories, and re-run --warm.
Phases applied
Phase 0 — dead code
- Removed unused
criteriondependency - Removed unused
EmbeddingCache - Updated README for current CLI
Phase 1 — structure
- Flattened pipeline to linear
async fnstages - Removed
eval.rshub; imports go to owning modules - Merged
namespace.rs,db_helpers.rs→db/; dropped standalonesnapshot.rs - Moved
status.rs→cli/status.rs - Fixed catalog slice bootstrap (build ledger when explicit slice manifest is missing)
Phase 2 — no legacy paths
- All datasets use sharded converted store only
- Removed legacy JSON layout and migration
- Removed legacy report history format
- Auto-apply first catalog slice when
--sliceomitted - Namespace seed folded into corpus manifest (removed
cache/snapshots/)
Phase 3 — performance
- Ingestion always uses in-memory SurrealDB
- Slice-first partial load when ledger is complete
- Default catalog slice for dataset when
--slicenot passed - Split
slice/intomod.rs,build.rs, andbeir.rs
Phase 4 — BEIR mix slice-first
beiris a virtual mix: slice ledger references prefixed ids (fever-…,fiqa-…, …)- Conversion is qrels-closed per subset (only documents appearing in qrels, not full corpus)
- Slice ledger is resolved for the requested
--slice(catalog preset or custom id +--limit) - Only ledger paragraph ids are materialized into per-subset stores (
fever-minne/,fiqa-minne/, …) - No monolithic
beir-minne/merged store - Raw BEIR data lives in per-subset dirs under
data/raw/;data/raw/beiris a catalog placeholder
Do not re-introduce
- Monolithic
*-minne.jsonconverted files - Monolithic
beir-minne/merged converted store (use per-subset stores + virtual mix loader) state-machinespipeline for this linear floweval.rsre-export hub- Legacy history migration in reports
- Ephemeral
ingest_eval_*namespaces on the shared SurrealDB server - Separate
cache/snapshots/namespace state files
Open follow-ups
- Generate
DatasetKindfrommanifest.yamlat build time - Split
report.rswhen touching reporting again