mirror of
https://github.com/perstarkse/minne.git
synced 2026-06-24 10:56:29 +02:00
refactor: replace headless_chrome with lighter alternatives
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Frontend | HTML + HTMX + minimal JS |
|
||||
| Database | SurrealDB (graph, document, vector) |
|
||||
| AI | OpenAI-compatible API |
|
||||
| Web Processing | Headless Chromium |
|
||||
| Web Processing | Servo engine (servo-fetch) + PDFium |
|
||||
|
||||
## Crate Structure
|
||||
|
||||
|
||||
+3
-1
@@ -10,7 +10,7 @@
|
||||
|
||||
Minne automatically processes saved content:
|
||||
|
||||
1. **Web scraping** extracts readable text from URLs (via headless Chrome)
|
||||
1. **Web scraping** extracts readable text from URLs (via embedded Servo engine)
|
||||
2. **Text analysis** identifies key concepts and relationships
|
||||
3. **Graph creation** builds connections between related content
|
||||
4. **Embedding generation** enables semantic search
|
||||
@@ -43,6 +43,7 @@ Optional **reranking** can rescore fused chunk lists with a cross-encoder model;
|
||||
When enabled, retrieval results are rescored with a cross-encoder model for improved relevance. Powered by [fastembed-rs](https://github.com/Anush008/fastembed-rs).
|
||||
|
||||
**Trade-offs:**
|
||||
|
||||
- Downloads ~1.1 GB of model data
|
||||
- Adds latency per query
|
||||
- Potentially improves answer quality, see [blog post](https://blog.stark.pub/posts/eval-retrieval-refactor/)
|
||||
@@ -52,6 +53,7 @@ Enable via `RERANKING_ENABLED=true`. See [Configuration](./configuration.md).
|
||||
## Multi-Format Ingestion
|
||||
|
||||
Supported content types:
|
||||
|
||||
- Plain text and notes
|
||||
- URLs (web pages)
|
||||
- PDF documents
|
||||
|
||||
@@ -12,13 +12,13 @@ cd minne
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The included `docker-compose.yml` handles SurrealDB and Chromium automatically.
|
||||
The included `docker-compose.yml` handles SurrealDB automatically.
|
||||
|
||||
**Required:** Set your `OPENAI_API_KEY` in `docker-compose.yml` before starting.
|
||||
|
||||
## Nix
|
||||
|
||||
Run Minne directly with Nix (includes Chromium):
|
||||
Run Minne directly with Nix:
|
||||
|
||||
```bash
|
||||
nix run 'github:perstarkse/minne#main'
|
||||
@@ -31,8 +31,9 @@ Configure via environment variables or a `config.yaml` file. See [Configuration]
|
||||
Download binaries for Windows, macOS, and Linux from [GitHub Releases](https://github.com/perstarkse/minne/releases/latest).
|
||||
|
||||
**Requirements:**
|
||||
|
||||
- SurrealDB instance (local or remote)
|
||||
- Chromium (for web scraping)
|
||||
- `libEGL` + `libfontconfig` (for servo-fetch web scraping)
|
||||
|
||||
## Build from Source
|
||||
|
||||
@@ -45,9 +46,10 @@ cargo build --release --bin main
|
||||
The binary will be at `target/release/main`.
|
||||
|
||||
**Requirements:**
|
||||
|
||||
- Rust toolchain
|
||||
- SurrealDB accessible at configured address
|
||||
- Chromium in PATH
|
||||
- `libEGL` + `libfontconfig` for servo-fetch (web scraping) — bundled in Nix and Docker images
|
||||
|
||||
## Process Modes
|
||||
|
||||
|
||||
Reference in New Issue
Block a user