Commit Graph

49 Commits

Author SHA1 Message Date
Per Stark 4559ee0aa8 fix: arc-share retrieved chunks, centralize entity embeddings, and trim hot-path clones. 2026-06-06 23:05:53 +02:00
Per Stark 676fdbc132 fix: replaced several instances if cloning, reduced allocations 2026-06-06 19:45:18 +02:00
Per Stark 20de557294 feat: configure FastEmbed model in config and admin, with restart to apply
Expose fastembed_model in config and a model dropdown on Admin → Models.
Persist dimension from the chosen model, require restart to load it, and
align legacy OpenAI default settings so fresh local-embedding installs
start cleanly.
2026-06-04 21:51:57 +02:00
Per Stark c3b68e8bd3 feat: pool fastembed, batch embeddings, and reconcile embedding config on startup 2026-06-04 21:51:57 +02:00
Per Stark 6c3475ca0e chore: ingestion-pipeline refactor, sort technical debt, rustfmt 2026-05-31 19:48:41 +02:00
Per Stark e9d8654324 chore: refactor retrieval pipeline to chunk-first RRF with derived entities and slimmer eval surface.
Collapse the multi-strategy entity engine into one benchmarked chunk retrieval path, derive entities from retrieved chunks, and update consumers, docs, and clippy fixes across the workspace.
2026-05-30 22:19:08 +02:00
Per Stark ec80a4e540 chore: improve html-router auth, caching, and analytics while centralizing search labels in common.
small fix
2026-05-29 15:03:55 +02:00
Per Stark 920d7b5efb chore: centralize embedding errors, retrieval strategy, and test DB helpers.
Replace anyhow in embedding production code with EmbeddingError, move
RetrievalStrategy into common config, and deduplicate Surreal test setup
via common::test_utils.
2026-05-29 14:44:23 +02:00
Per Stark d90319f3b0 chore: harden common storage bootstrap and slim embedded db assets
Unify embedding config, build providers from system settings, and fail
startup when index builds error or time out. Move Surreal assets under
common/db so embeds exclude crate source, and read storage via streams.
2026-05-29 14:44:23 +02:00
Per Stark 964d57ec97 test: cover system settings sync, validation, and ingestion prompts
Add tests for embedding provider sync, patch isolation, typed backend
serde, and DB-backed ingestion prompts.
2026-05-29 14:44:23 +02:00
Per Stark 544a790e34 chore: harden system settings and unify prompt usage
Validate settings updates, use typed embedding backends, and route
ingestion through DB-stored prompts so admin edits take effect.
2026-05-29 14:44:23 +02:00
Per Stark f625a7e0a9 chore: move serde helpers to common utils
Relocate SurrealDB serde helpers out of storage types so they can be
reused broadly, and align retrieval-pipeline test setup with configured
embedding dimensions.
2026-05-29 14:44:23 +02:00
Per Stark 1e0dba72c8 chore: harden common errors, fastembed blocking, and ingest ownership
Run FastEmbed inference on spawn_blocking, propagate Surreal take
failures,
add AppError::internal and typed ingest/embedding parse errors, and take
owned file lists in ingestion payload construction.
2026-05-29 14:44:23 +02:00
Per Stark 4f02fcb853 chore: rename get_id to id, add doc comments, pre-allocate format_history 2026-05-27 18:06:16 +02:00
Per Stark 9ccf8dde25 chore: lowercase all error messages and add # Errors doc sections
- Fix err-lowercase-msg: normalize all #[error(...)] display strings to
  lowercase (AppError, FileError, ApiErr) and update affected tests
- Fix err-doc-errors: add # Errors sections to 25+ fallible public
  functions across db.rs, store.rs, embedding.rs, indexes.rs,
  ingestion_task.rs, and ingest_limits.rs
2026-05-27 14:59:48 +02:00
Per Stark 81624850c0 chore: add must_use to 27 non-Result public functions
- constructors: KnowledgeEntity, TextChunk, Scratchpad, IngestionTask,
  Conversation, KnowledgeRelationship, Message, TextContent,
  KnowledgeEntityEmbedding, TextChunkEmbedding
- accessors: Theme::as_str, Theme::initial_theme, TaskState::as_str,
  TaskState::display_label, StorageManager::backend_kind,
  StorageManager::local_base_path, EmbeddingProvider::backend_label,
  EmbeddingProvider::dimension, EmbeddingProvider::model_code
- queries: TaskState::is_terminal, IngestionTask::can_retry,
  KnowledgeEntityType::variants, StorageManager::resolve_local_path,
  resolve_base_dir, IngestionTask::lease_duration
- helpers: Message::format_history
- builders: StorageManager::with_backend
2026-05-27 14:23:56 +02:00
Per Stark e2284b1e69 chore: removed anyhow from apperror for improved error handling 2026-05-27 13:33:02 +02:00
Per Stark 76fcdcd6ce chore: index slicing and lowercase errors 2026-05-27 12:41:26 +02:00
Per Stark 056f116885 perf: avoid small own clones and intermediate Vec allocations
- Derive Copy on 6 small enums (MessageRole, TaskState, StorageKind, EmbeddingBackend, PdfIngestMode, KnowledgeEntityType)
- Change create_ingestion_payload files param from Vec<FileInfo> to &[FileInfo]
- Remove 5 intermediate Vec allocations (4 embedding serialization + 1 format_history) using write! loop
- Remove 7 unnecessary .clone() calls exposed by Copy derive
2026-05-27 10:28:08 +02:00
Per Stark 5ce7a76c75 clippy: adhere to pedantic clippy, uniform test error handling 2026-05-26 20:21:13 +02:00
Per Stark d1b3e9b23a fix: name harmonization of endpoints & ingestion security hardening 2026-02-13 22:36:00 +01:00
Per Stark a9a37b2468 feat: s3 storage backend 2026-01-16 23:38:47 +01:00
Per Stark 534d0f8c31 fix: allow for multiple templates directories 2026-01-12 21:25:12 +01:00
Per Stark 8664abdf01 release: 1.0.0
fix: cargo dist
2026-01-11 20:35:01 +01:00
Per Stark 86270de873 tidying stuff up, dto for search 2025-12-20 22:30:31 +01:00
Per Stark 90bac299a3 passed wide smoke check 2025-12-10 13:54:08 +01:00
Per Stark 8121e04125 retrieval simplfied 2025-12-09 20:35:42 +01:00
Per Stark 05bdaac672 evals: v3, ebeddings at the side
additional indexes
2025-11-26 15:15:10 +01:00
Per Stark 97d35a8982 retrieval-pipeline: v0 2025-11-18 22:46:35 +01:00
Per Stark 7f30c8ff6e Merge branch 'main' into development 2025-11-03 12:48:04 +01:00
Per Stark 3196e65172 fix: improved storage manager, prep for s3 2025-11-03 12:39:15 +01:00
Per Stark b0deabaf3f release: 0.2.6 2025-10-31 13:38:11 +01:00
Per Stark 1b7c24747a fix: in memory object store handler for testing 2025-10-27 17:03:03 +01:00
Per Stark 72578296db feat: reranking with fastembed added 2025-10-27 13:05:10 +01:00
Per Stark 199186e5a3 fix: variable name 2025-10-16 11:24:07 +02:00
Per Stark c3a7e8dc59 chore: clippy performance improvements 2025-10-15 22:24:59 +02:00
Per Stark 5cb15dab45 feat: pdf support 2025-09-28 20:53:51 +02:00
Per Stark 62d909bb7e refactor: merge new storage backend into main
This is in preparation for s3 storage support
2025-09-14 12:22:03 +02:00
Per Stark 69954cf78e chore: clippy helps out 2025-09-06 21:00:39 +02:00
Per Stark 37584ed9fd Merge branch 'custom_llm_base'
fix: updated readme and corrected server and worker to updates

added migration

fix: openai url typo & displaying models

chore: tidying up
2025-06-08 08:28:14 +02:00
Per Stark a363c6cc05 feat: support for other providers of ai models 2025-06-06 23:16:41 +02:00
Per Stark d2772bd09c feat: port selection 2025-05-30 07:44:26 +02:00
Per Stark b93e7b5299 feat: full text search 2025-05-15 14:40:00 +02:00
Per Stark 850878d5c3 feat: customizable data storage path 2025-05-09 23:28:36 +02:00
Per Stark ce006f6ecc refactor: separation of json-stream-parser to own crate 2025-04-22 16:44:37 +02:00
Per Stark 233df1b79a fix: own implementation of stream parser 2025-04-10 08:23:55 +02:00
Per Stark 435547de66 fix: working stream parser 2025-04-10 08:06:30 +02:00
Per Stark 804461ac01 feat: improved configuration
configuration now works with both env variables and config file
2025-04-09 11:32:23 +02:00
Per Stark 5bc48fb30b refactor: better separation of dependencies to crates
node stuff to html crate only
2025-04-04 12:50:38 +02:00