rag-architect

7
0
Source

RAG Architect - POWERFUL

Install

mkdir -p .claude/skills/rag-architect && curl -L -o skill.zip "https://mcp.directory/api/skills/download/2127" && unzip -o skill.zip -d .claude/skills/rag-architect && rm skill.zip

Installs to .claude/skills/rag-architect

About this skill

RAG Architect - POWERFUL

Overview

The RAG (Retrieval-Augmented Generation) Architect skill provides comprehensive tools and knowledge for designing, implementing, and optimizing production-grade RAG pipelines. This skill covers the entire RAG ecosystem from document chunking strategies to evaluation frameworks, enabling you to build scalable, efficient, and accurate retrieval systems.

Core Competencies

1. Document Processing & Chunking Strategies

Fixed-Size Chunking

  • Character-based chunking: Simple splitting by character count (e.g., 512, 1024, 2048 chars)
  • Token-based chunking: Splitting by token count to respect model limits
  • Overlap strategies: 10-20% overlap to maintain context continuity
  • Pros: Predictable chunk sizes, simple implementation, consistent processing time
  • Cons: May break semantic units, context boundaries ignored
  • Best for: Uniform documents, when consistent chunk sizes are critical

Sentence-Based Chunking

  • Sentence boundary detection: Using NLTK, spaCy, or regex patterns
  • Sentence grouping: Combining sentences until size threshold is reached
  • Paragraph preservation: Avoiding mid-paragraph splits when possible
  • Pros: Preserves natural language boundaries, better readability
  • Cons: Variable chunk sizes, potential for very short/long chunks
  • Best for: Narrative text, articles, books

Paragraph-Based Chunking

  • Paragraph detection: Double newlines, HTML tags, markdown formatting
  • Hierarchical splitting: Respecting document structure (sections, subsections)
  • Size balancing: Merging small paragraphs, splitting large ones
  • Pros: Preserves logical document structure, maintains topic coherence
  • Cons: Highly variable sizes, may create very large chunks
  • Best for: Structured documents, technical documentation

Semantic Chunking

  • Topic modeling: Using TF-IDF, embeddings similarity for topic detection
  • Heading-aware splitting: Respecting document hierarchy (H1, H2, H3)
  • Content-based boundaries: Detecting topic shifts using semantic similarity
  • Pros: Maintains semantic coherence, respects document structure
  • Cons: Complex implementation, computationally expensive
  • Best for: Long-form content, technical manuals, research papers

Recursive Chunking

  • Hierarchical approach: Try larger chunks first, recursively split if needed
  • Multi-level splitting: Different strategies at different levels
  • Size optimization: Minimize number of chunks while respecting size limits
  • Pros: Optimal chunk utilization, preserves context when possible
  • Cons: Complex logic, potential performance overhead
  • Best for: Mixed content types, when chunk count optimization is important

Document-Aware Chunking

  • File type detection: PDF pages, Word sections, HTML elements
  • Metadata preservation: Headers, footers, page numbers, sections
  • Table and image handling: Special processing for non-text elements
  • Pros: Preserves document structure and metadata
  • Cons: Format-specific implementation required
  • Best for: Multi-format document collections, when metadata is important

2. Embedding Model Selection

Dimension Considerations

  • 128-256 dimensions: Fast retrieval, lower memory usage, suitable for simple domains
  • 512-768 dimensions: Balanced performance, good for most applications
  • 1024-1536 dimensions: High quality, better for complex domains, higher cost
  • 2048+ dimensions: Maximum quality, specialized use cases, significant resources

Speed vs Quality Tradeoffs

  • Fast models: sentence-transformers/all-MiniLM-L6-v2 (384 dim, ~14k tokens/sec)
  • Balanced models: sentence-transformers/all-mpnet-base-v2 (768 dim, ~2.8k tokens/sec)
  • Quality models: text-embedding-ada-002 (1536 dim, OpenAI API)
  • Specialized models: Domain-specific fine-tuned models

Model Categories

  • General purpose: all-MiniLM, all-mpnet, Universal Sentence Encoder
  • Code embeddings: CodeBERT, GraphCodeBERT, CodeT5
  • Scientific text: SciBERT, BioBERT, ClinicalBERT
  • Multilingual: LaBSE, multilingual-e5, paraphrase-multilingual

3. Vector Database Selection

Pinecone

  • Managed service: Fully hosted, auto-scaling
  • Features: Metadata filtering, hybrid search, real-time updates
  • Pricing: $70/month for 1M vectors (1536 dim), pay-per-use scaling
  • Best for: Production applications, when managed service is preferred
  • Cons: Vendor lock-in, costs can scale quickly

Weaviate

  • Open source: Self-hosted or cloud options available
  • Features: GraphQL API, multi-modal search, automatic vectorization
  • Scaling: Horizontal scaling, HNSW indexing
  • Best for: Complex data types, when GraphQL API is preferred
  • Cons: Learning curve, requires infrastructure management

Qdrant

  • Rust-based: High performance, low memory footprint
  • Features: Payload filtering, clustering, distributed deployment
  • API: REST and gRPC interfaces
  • Best for: High-performance requirements, resource-constrained environments
  • Cons: Smaller community, fewer integrations

Chroma

  • Embedded database: SQLite-based, easy local development
  • Features: Collections, metadata filtering, persistence
  • Scaling: Limited, suitable for prototyping and small deployments
  • Best for: Development, testing, small-scale applications
  • Cons: Not suitable for production scale

pgvector (PostgreSQL)

  • SQL integration: Leverage existing PostgreSQL infrastructure
  • Features: ACID compliance, joins with relational data, mature ecosystem
  • Performance: ivfflat and HNSW indexing, parallel query processing
  • Best for: When you already use PostgreSQL, need ACID compliance
  • Cons: Requires PostgreSQL expertise, less specialized than purpose-built DBs

4. Retrieval Strategies

Dense Retrieval

  • Semantic similarity: Using embedding cosine similarity
  • Advantages: Captures semantic meaning, handles paraphrasing well
  • Limitations: May miss exact keyword matches, requires good embeddings
  • Implementation: Vector similarity search with k-NN or ANN algorithms

Sparse Retrieval

  • Keyword-based: TF-IDF, BM25, Elasticsearch
  • Advantages: Exact keyword matching, interpretable results
  • Limitations: Misses semantic similarity, vulnerable to vocabulary mismatch
  • Implementation: Inverted indexes, term frequency analysis

Hybrid Retrieval

  • Combination approach: Dense + sparse retrieval with score fusion
  • Fusion strategies: Reciprocal Rank Fusion (RRF), weighted combination
  • Benefits: Combines semantic understanding with exact matching
  • Complexity: Requires tuning fusion weights, more complex infrastructure

Reranking

  • Two-stage approach: Initial retrieval followed by reranking
  • Reranking models: Cross-encoders, specialized reranking transformers
  • Benefits: Higher precision, can use more sophisticated models for final ranking
  • Tradeoff: Additional latency, computational cost

5. Query Transformation Techniques

HyDE (Hypothetical Document Embeddings)

  • Approach: Generate hypothetical answer, embed answer instead of query
  • Benefits: Improves retrieval by matching document style rather than query style
  • Implementation: Use LLM to generate hypothetical document, embed that
  • Use cases: When queries and documents have different styles

Multi-Query Generation

  • Approach: Generate multiple query variations, retrieve for each, merge results
  • Benefits: Increases recall, handles query ambiguity
  • Implementation: LLM generates 3-5 query variations, deduplicate results
  • Considerations: Higher cost and latency due to multiple retrievals

Step-Back Prompting

  • Approach: Generate broader, more general version of specific query
  • Benefits: Retrieves more general context that helps answer specific questions
  • Implementation: Transform "What is the capital of France?" to "What are European capitals?"
  • Use cases: When specific questions need general context

6. Context Window Optimization

Dynamic Context Assembly

  • Relevance-based ordering: Most relevant chunks first
  • Diversity optimization: Avoid redundant information
  • Token budget management: Fit within model context limits
  • Hierarchical inclusion: Include summaries before detailed chunks

Context Compression

  • Summarization: Compress less relevant chunks while preserving key information
  • Key information extraction: Extract only relevant facts/entities
  • Template-based compression: Use structured formats to reduce token usage
  • Selective inclusion: Include only chunks above relevance threshold

7. Evaluation Frameworks

Faithfulness Metrics

  • Definition: How well generated answers are grounded in retrieved context
  • Measurement: Fact verification against source documents
  • Implementation: NLI models to check entailment between answer and context
  • Threshold: >90% for production systems

Relevance Metrics

  • Context relevance: How relevant retrieved chunks are to the query
  • Answer relevance: How well the answer addresses the original question
  • Measurement: Embedding similarity, human evaluation, LLM-as-judge
  • Targets: Context relevance >0.8, Answer relevance >0.85

Context Precision & Recall

  • Precision@K: Percentage of top-K results that are relevant
  • Recall@K: Percentage of relevant documents found in top-K results
  • Mean Reciprocal Rank (MRR): Average of reciprocal ranks of first relevant result
  • NDCG@K: Normalized Discounted Cumulative Gain at K

End-to-End Metrics

  • RAGAS: Comprehensive RAG evaluation framework
  • Correctness: Factual accuracy of generated answers
  • Completeness: Coverage of all relevant as

Content truncated.

senior-architect

alirezarezvani

Comprehensive software architecture skill for designing scalable, maintainable systems using ReactJS, NextJS, NodeJS, Express, React Native, Swift, Kotlin, Flutter, Postgres, GraphQL, Go, Python. Includes architecture diagram generation, system design patterns, tech stack decision frameworks, and dependency analysis. Use when designing system architecture, making technical decisions, creating architecture diagrams, evaluating trade-offs, or defining integration patterns.

170129

content-creator

alirezarezvani

Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content calendars, or when user mentions content creation, brand voice, SEO optimization, social media marketing, or content strategy.

11619

cold-email

alirezarezvani

When the user wants to write, improve, or build a sequence of B2B cold outreach emails to prospects who haven't asked to hear from them. Use when the user mentions 'cold email,' 'cold outreach,' 'prospecting emails,' 'SDR emails,' 'sales emails,' 'first touch email,' 'follow-up sequence,' or 'email prospecting.' Also use when they share an email draft that sounds too sales-y and needs to be humanized. Distinct from email-sequence (lifecycle/nurture to opted-in subscribers) — this is unsolicited outreach to new prospects. NOT for lifecycle emails, newsletters, or drip campaigns (use email-sequence).

3713

content-trend-researcher

alirezarezvani

Advanced content and topic research skill that analyzes trends across Google Analytics, Google Trends, Substack, Medium, Reddit, LinkedIn, X, blogs, podcasts, and YouTube to generate data-driven article outlines based on user intent analysis

10913

ceo-advisor

alirezarezvani

Executive leadership guidance for strategic decision-making, organizational development, and stakeholder management. Includes strategy analyzer, financial scenario modeling, board governance frameworks, and investor relations playbooks. Use when planning strategy, preparing board presentations, managing investors, developing organizational culture, making executive decisions, or when user mentions CEO, strategic planning, board meetings, investor updates, organizational leadership, or executive strategy.

8413

content-humanizer

alirezarezvani

Makes AI-generated content sound genuinely human — not just cleaned up, but alive. Use when content feels robotic, uses too many AI clichés, lacks personality, or reads like it was written by committee. Triggers: 'this sounds like AI', 'make it more human', 'add personality', 'it feels generic', 'sounds robotic', 'fix AI writing', 'inject our voice'. NOT for initial content creation (use content-production). NOT for SEO optimization (use content-production Mode 3).

359

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

642969

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

590705

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

318398

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

339397

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

451339

fastapi-templates

wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

304231

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.