A low-latency RAG (Retrieval-Augmented Generation) service that lets you upload documents and perform semantic search using OpenAI embeddings with local vector storage. Includes both direct retrieval and LLM-powered summary modes.

4461 views3Local (stdio)

What it does

  • Upload and index documents with vector embeddings
  • Perform semantic search with cosine similarity
  • Generate AI summaries of retrieved content
  • Filter documents by metadata
  • Configure multiple embedding providers
  • Manage documents through web interface

Best for

Building RAG applications with document Q&ALocal knowledge base search and retrievalDocument analysis with AI summarizationPrototyping semantic search features
Sub-100ms local retrievalDual modes: raw retrieval and AI summaryWeb UI for document management

Alternatives