RAG Documentation Search

Name: RAG Documentation Search
Rating: 4.9 (42 reviews)
Author: sanderkooger

Enables semantic search through documentation using vector embeddings, allowing AI assistants to retrieve and cite relevant documentation context for user queries.

Provides semantic document search and retrieval through vector embeddings, enabling context-aware responses backed by specific documentation sources

25758 views6Local (stdio)

ai ml developer tools

GitHub

What it does

Search documentation using semantic vector embeddings
Retrieve relevant context from multiple documentation sources
Generate embeddings locally with Ollama or via OpenAI
Process and index documentation automatically
Augment AI responses with cited documentation sources

Best for

Building documentation-aware AI assistantsDevelopers needing context-aware toolingTeams wanting to search knowledge bases semantically

Local embeddings with Ollama supportMultiple documentation sourcesReal-time context augmentation

About RAG Documentation Search

RAG Documentation Search is a community-built MCP server published by sanderkooger that provides AI assistants with tools and capabilities via the Model Context Protocol. Leverage retrieval augmented generation and Pinecone vector database for precise, context-aware document search and retrieval from your documentation. It is categorized under ai ml, developer tools.

How to install

You can install RAG Documentation Search in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

RAG Documentation Search is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

MCP-server-ragdocs

NPM Downloads

An MCP server implementation that provides tools for retrieving and processing documentation through vector search, enabling AI assistants to augment their responses with relevant documentation context.

Usage
Features
Configuration
Deployment
- Local Development
- Cloud Deployment
Playwright Integration
Tools
Project Structure
Using Ollama Embeddings
License
Development Workflow
Contributing
Forkception Acknowledgments

Usage

The RAG Documentation tool is designed for:

Enhancing AI responses with relevant documentation
Building documentation-aware AI assistants
Creating context-aware tooling for developers
Implementing semantic documentation search
Augmenting existing knowledge bases

Features

Vector-based documentation search and retrieval
Support for multiple documentation sources
Support for local (Ollama) embeddings generation or OPENAI
Semantic search capabilities
Automated documentation processing
Real-time context augmentation for LLMs

Configuration

{
  "mcpServers": {
    "rag-docs": {
      "command": "npx",
      "args": ["-y", "@sanderkooger/mcp-server-ragdocs"],
      "env": {
        "EMBEDDINGS_PROVIDER": "ollama",
        "QDRANT_URL": "your-qdrant-url",
        "QDRANT_API_KEY": "your-qdrant-key" # if applicable
      }
    }
  }
}

Usage with Claude Desktop

Add this to your claude_desktop_config.json:

OpenAI Configuration

{
  "mcpServers": {
    "rag-docs-openai": {
      "command": "npx",
      "args": ["-y", "@sanderkooger/mcp-server-ragdocs"],
      "env": {
        "EMBEDDINGS_PROVIDER": "openai",
        "OPENAI_API_KEY": "your-openai-key-here",
        "QDRANT_URL": "your-qdrant-url",
        "QDRANT_API_KEY": "your-qdrant-key"
      }
    }
  }
}

Ollama Configuration

{
  "mcpServers": {
    "rag-docs-ollama": {
      "command": "npx",
      "args": ["-y", "@sanderkooger/mcp-server-ragdocs"],
      "env": {
        "EMBEDDINGS_PROVIDER": "ollama",
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "QDRANT_URL": "your-qdrant-url",
        "QDRANT_API_KEY": "your-qdrant-key"
      }
    }
  }
}

Ollama run from this codebase

"ragdocs-mcp": {
      "command": "node",
      "args": [
        "/home/sander/code/mcp-server-ragdocs/build/index.js"
      ],
      "env": {
        "QDRANT_URL": "http://127.0.0.1:6333",
        "EMBEDDINGS_PROVIDER": "ollama",
        "OLLAMA_URL": "http://localhost:11434"
      },
      "alwaysAllow": [
        "run_queue",
        "list_queue",
        "list_sources",
        "search_documentation",
        "clear_queue",
        "remove_documentation",
        "extract_urls"
      ],
      "timeout": 3600
    }

Environment Variables Reference

Variable	Required For	Default	remarks
`EMBEDDINGS_PROVIDER`	All	`ollama`	"openai" or "ollama"
`OPENAI_API_KEY`	OpenAI	-	Obtain from OpenAI dashboard
`OLLAMA_BASE_URL`	Ollama	`http://localhost:11434`	Local Ollama server URL
`QDRANT_URL`	All	`http://localhost:6333`	Qdrant endpoint URL
`QDRANT_API_KEY`	Cloud Qdrant	-	From Qdrant Cloud console
`PLAYWRIGHT_WS_ENDPOINT`	Playwright Remote	-	WebSocket endpoint for remote Playwright server (e.g., `ws://localhost:3000/`)

Local Deployment

The repository includes Docker Compose configuration for local development:

Docker Compose Download

docker compose up -d

This starts:

Qdrant vector database on port 6333
Ollama LLM service on port 11434

Access endpoints:

Qdrant: http://localhost:6333
Ollama: http://localhost:11434

Cloud Deployment

For production deployments:

Use hosted Qdrant Cloud service
Set these environment variables:

QDRANT_URL=your-cloud-cluster-url
QDRANT_API_KEY=your-cloud-api-key

Playwright Integration

This project supports running Playwright either locally or via a Docker container. This provides flexibility for environments where Playwright's dependencies might be challenging to install directly.

How it Works

The src/api-client.ts file automatically detects the presence of the PLAYWRIGHT_WS_ENDPOINT environment variable:

If PLAYWRIGHT_WS_ENDPOINT is set: The application will attempt to connect to a remote Playwright server at the specified WebSocket endpoint using chromium.connect(). This is ideal for using a containerized Playwright instance.
If PLAYWRIGHT_WS_ENDPOINT is not set: The application will launch a local Playwright browser instance using chromium.launch().

Running Playwright in Docker

A playwright service has been added to the docker-compose.yml file to facilitate running Playwright in a Docker container.

To start the Playwright server in Docker:

docker-compose up playwright

This command will pull the mcr.microsoft.com/playwright:v1.53.0-noble image and start a Playwright server accessible on port 3000 of your host machine.

To configure your application to use this containerized Playwright instance, set the following environment variable:

PLAYWRIGHT_WS_ENDPOINT=ws://localhost:3000/

Tools

search_documentation

Search through stored documentation using natural language queries. Returns matching excerpts with context, ranked by relevance.

Inputs:

query (string): The text to search for in the documentation. Can be a natural language query, specific terms, or code snippets.
limit (number, optional): Maximum number of results to return (1-20, default: 5). Higher limits provide more comprehensive results but may take longer to process.

list_sources

List all documentation sources currently stored in the system. Returns a comprehensive list of all indexed documentation including source URLs, titles, and last update times. Use this to understand what documentation is available for searching or to verify if specific sources have been indexed.

extract_urls

Extract and analyze all URLs from a given web page. This tool crawls the specified webpage, identifies all hyperlinks, and optionally adds them to the processing queue.

Inputs:

url (string): The complete URL of the webpage to analyze (must include protocol, e.g., https://). The page must be publicly accessible.
add_to_queue (boolean, optional): If true, automatically add extracted URLs to the processing queue for later indexing. Use with caution on large sites to avoid excessive queuing.

remove_documentation

Remove specific documentation sources from the system by their URLs. The removal is permanent and will affect future search results.

Inputs:

urls (string[]): Array of URLs to remove from the database. Each URL must exactly match the URL used when the documentation was added.

list_queue

List all URLs currently waiting in the documentation processing queue. Shows pending documentation sources that will be processed when run_queue is called. Use this to monitor queue status, verify URLs were added correctly, or check processing backlog.

run_queue

Process and index all URLs currently in the documentation queue. Each URL is processed sequentially, with proper error handling and retry logic. Progress updates are provided as processing occurs. Long-running operations will process until the queue is empty or an unrecoverable error occurs.

clear_queue

Remove all pending URLs from the documentation processing queue. Use this to reset the queue when you want to start fresh, remove unwanted URLs, or cancel pending processing. This operation is immediate and permanent - URLs will need to be re-added if you want to process them later.

Project Structure

The package follows a modular architecture with clear separation between core components and MCP protocol handlers. See ARCHITECTURE.md for detailed structural documentation and design decisions.

Using Ollama Embeddings without docker

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Download the nomic-embed-text model:

ollama pull nomic-embed-text

Verify installation:

ollama list

License

This MCP server is licensed under the MIT License. This means you are free to use, modify, and distribute the software, subject to the terms and conditions of the MIT License. For more details, please see the LICENSE file in the project repository.

Contributing

We welcome contributions! Please see our CONTRIBUTING.md for detailed guidelines, but here a

README truncated. View full README on GitHub.

Alternatives

Knowledge Graph Memory

anthropic

80.5k

Build persistent semantic networks for enterprise & engineering data management. Enable data persistence and memory across chats efficiently.

OfficialPopular

2.7k171

Context7

upstash

48.2k

Boost your AI code assistant with Context7: inject real-time API documentation from OpenAPI specification sources into your coding workflow.

OfficialRemotePopular

17.3k832

Chrome DevTools MCP

chromedevtools

28.1k

AI-driven control of live Chrome via Chrome DevTools: browser automation, debugging, performance analysis and network monitoring.

OfficialPopular

70922

Chrome DevTools

chromedevtools

28.1k

Use Chrome DevTools for web site test speed, debugging, and performance analysis. The essential chrome developer tools for reliable web automation.

OfficialPopular

4.1k194

Related Skills

Browse all skills

ui-design-system

UI design system toolkit for Senior UI Designer including design token generation, component documentation, responsive design calculations, and developer handoff tools. Use for creating design systems, maintaining visual consistency, and facilitating design-dev collaboration.

ai-sdk

Answer questions about the AI SDK and help build AI-powered features. Use when developers: (1) Ask about AI SDK functions like generateText, streamText, ToolLoopAgent, embed, or tools, (2) Want to build AI agents, chatbots, RAG systems, or text generation features, (3) Have questions about AI providers (OpenAI, Anthropic, Google, etc.), streaming, tool calling, structured output, or embeddings, (4) Use React hooks like useChat or useCompletion. Triggers on: "AI SDK", "Vercel AI SDK", "generateText", "streamText", "add AI to my app", "build an agent", "tool calling", "structured output", "useChat".

archon

Interactive Archon integration for knowledge base and project management via REST API. On first use, asks for Archon host URL. Use when searching documentation, managing projects/tasks, or querying indexed knowledge. Provides RAG-powered semantic search, website crawling, document upload, hierarchical project/task management, and document versioning. Always try Archon first for external documentation and knowledge retrieval before using other sources.

doc-reader

Efficiently consume and navigate external documentation sites. Use when researching APIs, libraries, or tools; when the user mentions docs, documentation, or references a docs URL; or when you need to understand how something works before implementing it.

aws-advisor

Expert AWS Cloud Advisor for architecture design, security review, and implementation guidance. Leverages AWS MCP tools for accurate, documentation-backed answers. Use when user asks about AWS architecture, security, service selection, migrations, troubleshooting, or learning AWS. Triggers on AWS, Lambda, S3, EC2, ECS, EKS, DynamoDB, RDS, CloudFormation, CDK, Terraform, Serverless, SAM, IAM, VPC, API Gateway, or any AWS service.

openai-knowledge

Use when working with the OpenAI API (Responses API) or OpenAI platform features (tools, streaming, Realtime API, auth, models, rate limits, MCP) and you need authoritative, up-to-date documentation (schemas, examples, limits, edge cases). Prefer the OpenAI Developer Documentation MCP server tools when available; otherwise guide the user to enable `openaiDeveloperDocs`.

What it does

Best for

About RAG Documentation Search

How to install

License

MCP-server-ragdocs

Table of Contents

Usage

Features

Configuration

Usage with Claude Desktop

OpenAI Configuration

Ollama Configuration

Ollama run from this codebase

Environment Variables Reference

Local Deployment

Cloud Deployment

Playwright Integration

How it Works

Running Playwright in Docker

Tools

search_documentation

list_sources

extract_urls

remove_documentation

list_queue

run_queue

clear_queue

Project Structure

Using Ollama Embeddings without docker

License

Contributing

Alternatives

Knowledge Graph Memory

Context7

Chrome DevTools MCP

Chrome DevTools

Related Skills