Install
mkdir -p .claude/skills/rag-architect && curl -L -o skill.zip "https://mcp.directory/api/skills/download/2127" && unzip -o skill.zip -d .claude/skills/rag-architect && rm skill.zipInstalls to .claude/skills/rag-architect
About this skill
RAG Architect - POWERFUL
Overview
The RAG (Retrieval-Augmented Generation) Architect skill provides comprehensive tools and knowledge for designing, implementing, and optimizing production-grade RAG pipelines. This skill covers the entire RAG ecosystem from document chunking strategies to evaluation frameworks, enabling you to build scalable, efficient, and accurate retrieval systems.
Core Competencies
1. Document Processing & Chunking Strategies
Fixed-Size Chunking
- Character-based chunking: Simple splitting by character count (e.g., 512, 1024, 2048 chars)
- Token-based chunking: Splitting by token count to respect model limits
- Overlap strategies: 10-20% overlap to maintain context continuity
- Pros: Predictable chunk sizes, simple implementation, consistent processing time
- Cons: May break semantic units, context boundaries ignored
- Best for: Uniform documents, when consistent chunk sizes are critical
Sentence-Based Chunking
- Sentence boundary detection: Using NLTK, spaCy, or regex patterns
- Sentence grouping: Combining sentences until size threshold is reached
- Paragraph preservation: Avoiding mid-paragraph splits when possible
- Pros: Preserves natural language boundaries, better readability
- Cons: Variable chunk sizes, potential for very short/long chunks
- Best for: Narrative text, articles, books
Paragraph-Based Chunking
- Paragraph detection: Double newlines, HTML tags, markdown formatting
- Hierarchical splitting: Respecting document structure (sections, subsections)
- Size balancing: Merging small paragraphs, splitting large ones
- Pros: Preserves logical document structure, maintains topic coherence
- Cons: Highly variable sizes, may create very large chunks
- Best for: Structured documents, technical documentation
Semantic Chunking
- Topic modeling: Using TF-IDF, embeddings similarity for topic detection
- Heading-aware splitting: Respecting document hierarchy (H1, H2, H3)
- Content-based boundaries: Detecting topic shifts using semantic similarity
- Pros: Maintains semantic coherence, respects document structure
- Cons: Complex implementation, computationally expensive
- Best for: Long-form content, technical manuals, research papers
Recursive Chunking
- Hierarchical approach: Try larger chunks first, recursively split if needed
- Multi-level splitting: Different strategies at different levels
- Size optimization: Minimize number of chunks while respecting size limits
- Pros: Optimal chunk utilization, preserves context when possible
- Cons: Complex logic, potential performance overhead
- Best for: Mixed content types, when chunk count optimization is important
Document-Aware Chunking
- File type detection: PDF pages, Word sections, HTML elements
- Metadata preservation: Headers, footers, page numbers, sections
- Table and image handling: Special processing for non-text elements
- Pros: Preserves document structure and metadata
- Cons: Format-specific implementation required
- Best for: Multi-format document collections, when metadata is important
2. Embedding Model Selection
Dimension Considerations
- 128-256 dimensions: Fast retrieval, lower memory usage, suitable for simple domains
- 512-768 dimensions: Balanced performance, good for most applications
- 1024-1536 dimensions: High quality, better for complex domains, higher cost
- 2048+ dimensions: Maximum quality, specialized use cases, significant resources
Speed vs Quality Tradeoffs
- Fast models: sentence-transformers/all-MiniLM-L6-v2 (384 dim, ~14k tokens/sec)
- Balanced models: sentence-transformers/all-mpnet-base-v2 (768 dim, ~2.8k tokens/sec)
- Quality models: text-embedding-ada-002 (1536 dim, OpenAI API)
- Specialized models: Domain-specific fine-tuned models
Model Categories
- General purpose: all-MiniLM, all-mpnet, Universal Sentence Encoder
- Code embeddings: CodeBERT, GraphCodeBERT, CodeT5
- Scientific text: SciBERT, BioBERT, ClinicalBERT
- Multilingual: LaBSE, multilingual-e5, paraphrase-multilingual
3. Vector Database Selection
Pinecone
- Managed service: Fully hosted, auto-scaling
- Features: Metadata filtering, hybrid search, real-time updates
- Pricing: $70/month for 1M vectors (1536 dim), pay-per-use scaling
- Best for: Production applications, when managed service is preferred
- Cons: Vendor lock-in, costs can scale quickly
Weaviate
- Open source: Self-hosted or cloud options available
- Features: GraphQL API, multi-modal search, automatic vectorization
- Scaling: Horizontal scaling, HNSW indexing
- Best for: Complex data types, when GraphQL API is preferred
- Cons: Learning curve, requires infrastructure management
Qdrant
- Rust-based: High performance, low memory footprint
- Features: Payload filtering, clustering, distributed deployment
- API: REST and gRPC interfaces
- Best for: High-performance requirements, resource-constrained environments
- Cons: Smaller community, fewer integrations
Chroma
- Embedded database: SQLite-based, easy local development
- Features: Collections, metadata filtering, persistence
- Scaling: Limited, suitable for prototyping and small deployments
- Best for: Development, testing, small-scale applications
- Cons: Not suitable for production scale
pgvector (PostgreSQL)
- SQL integration: Leverage existing PostgreSQL infrastructure
- Features: ACID compliance, joins with relational data, mature ecosystem
- Performance: ivfflat and HNSW indexing, parallel query processing
- Best for: When you already use PostgreSQL, need ACID compliance
- Cons: Requires PostgreSQL expertise, less specialized than purpose-built DBs
4. Retrieval Strategies
Dense Retrieval
- Semantic similarity: Using embedding cosine similarity
- Advantages: Captures semantic meaning, handles paraphrasing well
- Limitations: May miss exact keyword matches, requires good embeddings
- Implementation: Vector similarity search with k-NN or ANN algorithms
Sparse Retrieval
- Keyword-based: TF-IDF, BM25, Elasticsearch
- Advantages: Exact keyword matching, interpretable results
- Limitations: Misses semantic similarity, vulnerable to vocabulary mismatch
- Implementation: Inverted indexes, term frequency analysis
Hybrid Retrieval
- Combination approach: Dense + sparse retrieval with score fusion
- Fusion strategies: Reciprocal Rank Fusion (RRF), weighted combination
- Benefits: Combines semantic understanding with exact matching
- Complexity: Requires tuning fusion weights, more complex infrastructure
Reranking
- Two-stage approach: Initial retrieval followed by reranking
- Reranking models: Cross-encoders, specialized reranking transformers
- Benefits: Higher precision, can use more sophisticated models for final ranking
- Tradeoff: Additional latency, computational cost
5. Query Transformation Techniques
HyDE (Hypothetical Document Embeddings)
- Approach: Generate hypothetical answer, embed answer instead of query
- Benefits: Improves retrieval by matching document style rather than query style
- Implementation: Use LLM to generate hypothetical document, embed that
- Use cases: When queries and documents have different styles
Multi-Query Generation
- Approach: Generate multiple query variations, retrieve for each, merge results
- Benefits: Increases recall, handles query ambiguity
- Implementation: LLM generates 3-5 query variations, deduplicate results
- Considerations: Higher cost and latency due to multiple retrievals
Step-Back Prompting
- Approach: Generate broader, more general version of specific query
- Benefits: Retrieves more general context that helps answer specific questions
- Implementation: Transform "What is the capital of France?" to "What are European capitals?"
- Use cases: When specific questions need general context
6. Context Window Optimization
Dynamic Context Assembly
- Relevance-based ordering: Most relevant chunks first
- Diversity optimization: Avoid redundant information
- Token budget management: Fit within model context limits
- Hierarchical inclusion: Include summaries before detailed chunks
Context Compression
- Summarization: Compress less relevant chunks while preserving key information
- Key information extraction: Extract only relevant facts/entities
- Template-based compression: Use structured formats to reduce token usage
- Selective inclusion: Include only chunks above relevance threshold
7. Evaluation Frameworks
Faithfulness Metrics
- Definition: How well generated answers are grounded in retrieved context
- Measurement: Fact verification against source documents
- Implementation: NLI models to check entailment between answer and context
- Threshold: >90% for production systems
Relevance Metrics
- Context relevance: How relevant retrieved chunks are to the query
- Answer relevance: How well the answer addresses the original question
- Measurement: Embedding similarity, human evaluation, LLM-as-judge
- Targets: Context relevance >0.8, Answer relevance >0.85
Context Precision & Recall
- Precision@K: Percentage of top-K results that are relevant
- Recall@K: Percentage of relevant documents found in top-K results
- Mean Reciprocal Rank (MRR): Average of reciprocal ranks of first relevant result
- NDCG@K: Normalized Discounted Cumulative Gain at K
End-to-End Metrics
- RAGAS: Comprehensive RAG evaluation framework
- Correctness: Factual accuracy of generated answers
- Completeness: Coverage of all relevant as
Content truncated.
More by alirezarezvani
View all skills by alirezarezvani →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversUnlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea
Boost your AI code assistant with Context7: inject real-time API documentation from OpenAPI specification sources into y
Extend your developer tools with GitHub MCP Server for advanced automation, supporting GitHub Student and student packag
Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool
Serena is a free AI code generator toolkit providing robust code editing and retrieval, turning LLMs into powerful artif
Unlock seamless Figma to code: streamline Figma to HTML with Framelink MCP Server for fast, accurate design-to-code work
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.