mirror of
https://github.com/perstarkse/minne.git
synced 2026-06-24 19:06:30 +02:00
99 lines
3.9 KiB
Markdown
99 lines
3.9 KiB
Markdown
# Evaluations crate refactor plan
|
|
|
|
This document records the architecture review and the simplification work applied to the
|
|
`evaluations` crate. **No backwards compatibility** is maintained for converted JSON layouts,
|
|
legacy report history, or old cache artifact formats.
|
|
|
|
## Goals
|
|
|
|
- Smaller, linear pipeline (no state machine ceremony)
|
|
- Sharded converted store for **all** datasets (memory-efficient partial loading)
|
|
- Slice-first loading when a catalog slice is selected
|
|
- In-memory SurrealDB for ingestion (no ephemeral server namespaces)
|
|
- Single DB lifecycle module (`db/`)
|
|
- CLI helpers under `cli/`
|
|
|
|
## Primary workflow
|
|
|
|
```bash
|
|
# One-time prep (converts raw data if needed, builds slice ledger, corpus cache, DB seed)
|
|
cargo eval --warm --dataset beir --slice beir-mix-600
|
|
|
|
# Check readiness
|
|
cargo eval --status --dataset beir --slice beir-mix-600
|
|
|
|
# Steady-state benchmark
|
|
cargo eval --dataset beir --slice beir-mix-600 --require-ready
|
|
```
|
|
|
|
Default dataset is `beir`. Chunk-only ingestion is the default; pass `--include-entities` to
|
|
opt into entity extraction (requires `OPENAI_API_KEY`). Slice tuning such as
|
|
`negative_multiplier` lives in `manifest.yaml` (e.g. `beir-mix-600` uses `9.0`).
|
|
|
|
## Cache layers (after refactor)
|
|
|
|
| Layer | Location | Purpose |
|
|
|-------|----------|---------|
|
|
| Converted store | `data/converted/<name>/` | Sharded paragraphs + question catalog |
|
|
| Slice ledger | `cache/slices/<dataset>/<slice-id>.json` | Deterministic questions + paragraph set |
|
|
| Corpus cache | `cache/ingested/<dataset>/<slice-id>/` | Ingestion paragraph shards, manifest, and namespace reuse seed |
|
|
|
|
Namespace reuse state lives in the corpus manifest (`metadata.namespace_seed`), not a separate
|
|
`snapshots/` tree. After upgrading, delete old `*-minne.json` monolithic files, any
|
|
`cache/snapshots/` directories, and re-run `--warm`.
|
|
|
|
## Phases applied
|
|
|
|
### Phase 0 — dead code
|
|
|
|
- Removed unused `criterion` dependency
|
|
- Removed unused `EmbeddingCache`
|
|
- Updated README for current CLI
|
|
|
|
### Phase 1 — structure
|
|
|
|
- Flattened pipeline to linear `async fn` stages
|
|
- Removed `eval.rs` hub; imports go to owning modules
|
|
- Merged `namespace.rs`, `db_helpers.rs` → `db/`; dropped standalone `snapshot.rs`
|
|
- Moved `status.rs` → `cli/status.rs`
|
|
- Fixed catalog slice bootstrap (build ledger when explicit slice manifest is missing)
|
|
|
|
### Phase 2 — no legacy paths
|
|
|
|
- All datasets use sharded converted store only
|
|
- Removed legacy JSON layout and migration
|
|
- Removed legacy report history format
|
|
- Auto-apply first catalog slice when `--slice` omitted
|
|
- Namespace seed folded into corpus manifest (removed `cache/snapshots/`)
|
|
|
|
### Phase 3 — performance
|
|
|
|
- Ingestion always uses in-memory SurrealDB
|
|
- Slice-first partial load when ledger is complete
|
|
- Default catalog slice for dataset when `--slice` not passed
|
|
- Split `slice/` into `mod.rs`, `build.rs`, and `beir.rs`
|
|
|
|
### Phase 4 — BEIR mix slice-first
|
|
|
|
- `beir` is a virtual mix: slice ledger references prefixed ids (`fever-…`, `fiqa-…`, …)
|
|
- Conversion is **qrels-closed** per subset (only documents appearing in qrels, not full corpus)
|
|
- Slice ledger is resolved for the requested `--slice` (catalog preset or custom id + `--limit`)
|
|
- Only ledger paragraph ids are materialized into per-subset stores (`fever-minne/`, `fiqa-minne/`, …)
|
|
- No monolithic `beir-minne/` merged store
|
|
- Raw BEIR data lives in per-subset dirs under `data/raw/`; `data/raw/beir` is a catalog placeholder
|
|
|
|
## Do not re-introduce
|
|
|
|
- Monolithic `*-minne.json` converted files
|
|
- Monolithic `beir-minne/` merged converted store (use per-subset stores + virtual mix loader)
|
|
- `state-machines` pipeline for this linear flow
|
|
- `eval.rs` re-export hub
|
|
- Legacy history migration in reports
|
|
- Ephemeral `ingest_eval_*` namespaces on the shared SurrealDB server
|
|
- Separate `cache/snapshots/` namespace state files
|
|
|
|
## Open follow-ups
|
|
|
|
- Generate `DatasetKind` from `manifest.yaml` at build time
|
|
- Split `report.rs` when touching reporting again
|