refactor: replace headless_chrome with lighter alternatives

This commit is contained in:
Per Stark
2026-06-21 18:15:54 +02:00
parent 87e6fa14b2
commit 588e616baf
19 changed files with 6440 additions and 639 deletions
+1 -1
View File
@@ -8,7 +8,7 @@
| Frontend | HTML + HTMX + minimal JS |
| Database | SurrealDB (graph, document, vector) |
| AI | OpenAI-compatible API |
| Web Processing | Headless Chromium |
| Web Processing | Servo engine (servo-fetch) + PDFium |
## Crate Structure
+3 -1
View File
@@ -10,7 +10,7 @@
Minne automatically processes saved content:
1. **Web scraping** extracts readable text from URLs (via headless Chrome)
1. **Web scraping** extracts readable text from URLs (via embedded Servo engine)
2. **Text analysis** identifies key concepts and relationships
3. **Graph creation** builds connections between related content
4. **Embedding generation** enables semantic search
@@ -43,6 +43,7 @@ Optional **reranking** can rescore fused chunk lists with a cross-encoder model;
When enabled, retrieval results are rescored with a cross-encoder model for improved relevance. Powered by [fastembed-rs](https://github.com/Anush008/fastembed-rs).
**Trade-offs:**
- Downloads ~1.1 GB of model data
- Adds latency per query
- Potentially improves answer quality, see [blog post](https://blog.stark.pub/posts/eval-retrieval-refactor/)
@@ -52,6 +53,7 @@ Enable via `RERANKING_ENABLED=true`. See [Configuration](./configuration.md).
## Multi-Format Ingestion
Supported content types:
- Plain text and notes
- URLs (web pages)
- PDF documents
+6 -4
View File
@@ -12,13 +12,13 @@ cd minne
docker compose up -d
```
The included `docker-compose.yml` handles SurrealDB and Chromium automatically.
The included `docker-compose.yml` handles SurrealDB automatically.
**Required:** Set your `OPENAI_API_KEY` in `docker-compose.yml` before starting.
## Nix
Run Minne directly with Nix (includes Chromium):
Run Minne directly with Nix:
```bash
nix run 'github:perstarkse/minne#main'
@@ -31,8 +31,9 @@ Configure via environment variables or a `config.yaml` file. See [Configuration]
Download binaries for Windows, macOS, and Linux from [GitHub Releases](https://github.com/perstarkse/minne/releases/latest).
**Requirements:**
- SurrealDB instance (local or remote)
- Chromium (for web scraping)
- `libEGL` + `libfontconfig` (for servo-fetch web scraping)
## Build from Source
@@ -45,9 +46,10 @@ cargo build --release --bin main
The binary will be at `target/release/main`.
**Requirements:**
- Rust toolchain
- SurrealDB accessible at configured address
- Chromium in PATH
- `libEGL` + `libfontconfig` for servo-fetch (web scraping) — bundled in Nix and Docker images
## Process Modes