semtools
This skill provides semantic search capabilities using embedding-based similarity matching for code and text. Enables meaning-based search beyond keyword matching, with optional document parsing (PDF, DOCX, PPTX) support.
Install
mkdir -p .claude/skills/semtools && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8209" && unzip -o skill.zip -d .claude/skills/semtools && rm skill.zipInstalls to .claude/skills/semtools
About this skill
Semtools: Semantic Search
Perform semantic (meaning-based) search across code and documents using embedding-based similarity matching.
Purpose
The semtools skill provides access to Semtools, a high-performance Rust-based CLI for semantic search and document processing. Unlike traditional text search (ripgrep) which matches exact strings, or structural search (ast-grep) which matches syntax patterns, semtools understands semantic meaning through embeddings.
Key capabilities:
- Semantic Search: Find code/text by meaning, not just keywords
- Workspace Management: Index large codebases for fast repeated searches
- Document Parsing: Convert PDFs, DOCX, PPTX to searchable text (requires API key)
Semtools excels at discovery - finding relevant code when you don't know the exact keywords, function names, or syntax patterns.
When to Use This Skill
Use the semtools skill when you need meaning-based search:
Semantic Code Discovery:
- Finding code that implements a concept ("error handling", "data validation")
- Discovering similar functionality across different modules
- Locating examples of a pattern when you don't know exact names
- Understanding what code does without reading everything
Documentation & Knowledge:
- Searching documentation by concept, not keywords
- Finding related discussions in comments or docs
- Discovering similar issues or solutions
- Analyzing technical documents (PDFs, reports)
Use Cases:
- "Find all authentication-related code" (without knowing function names)
- "Show me error handling patterns" (regardless of specific error types)
- "Find code similar to this implementation" (semantic similarity)
- "Search research papers for 'distributed consensus'" (document search)
Choose semtools over file-search (ripgrep/ast-grep) when:
- You know the concept but not the keywords
- Exact string matching misses relevant results
- You want semantically similar code, not exact matches
- Searching across languages or mixed content
Still use file-search when:
- You know exact keywords, function names, or patterns
- You need structural code matching (ast-grep)
- Speed is critical (ripgrep is faster for exact matches)
- You're searching for specific symbols or references
Available Commands
Semtools provides three CLI commands you can use via execute_command:
search- Semantic search across code and text filesworkspace- Manage workspaces for caching embeddingsparse- Convert documents (PDF, DOCX, PPTX) to searchable text
All commands work out-of-the-box in your execution environment. Document parsing requires the LLAMA_CLOUD_API_KEY environment variable to be set.
Core Operations
1. Semantic Search (search)
Find files and code sections by semantic meaning:
# Basic semantic search
search "authentication logic" src/
# Search with more context (5 lines before/after)
search "error handling" --n-lines 5 src/
# Get more results (default: 3)
search "database queries" --top-k 10 src/
# Control similarity threshold (0.0-1.0, lower = more lenient)
search "API endpoints" --max-distance 0.4 src/
Parameters:
--n-lines N: Show N lines of context around matches (default: 3)--top-k K: Return top K most similar matches (default: 3)--max-distance D: Maximum embedding distance (0.0-1.0, default: 0.3)-i: Case-insensitive matching
Output format:
Match 1 (similarity: 0.12)
File: src/auth/handlers.py
Lines: 42-47
----
def authenticate_user(username: str, password: str) -> Optional[User]:
"""Authenticate user credentials against database."""
user = get_user_by_username(username)
if user and verify_password(password, user.password_hash):
return user
return None
----
Match 2 (similarity: 0.18)
File: src/middleware/auth.py
...
2. Workspace Management (workspace)
For large codebases, create workspaces to cache embeddings and enable fast repeated searches:
# Create/activate workspace
workspace use my-project
# Set workspace via environment variable
export SEMTOOLS_WORKSPACE=my-project
# Index files in workspace (workspace auto-detected from env var)
search "query" src/
# Check workspace status
workspace status
# Clean up old workspaces
workspace prune
Benefits:
- Fast repeated searches: Embeddings cached, no re-computation
- Large codebases: IVF_PQ indexing for scalability
- Session persistence: Maintain context across multiple searches
When to use workspaces:
- Searching the same codebase multiple times
- Very large projects (1000+ files)
- Interactive exploration sessions
- CI/CD pipelines with repeated searches
3. Document Parsing (parse) ⚠️ Requires API Key
Convert documents to searchable markdown (requires LlamaParse API key):
# Parse PDFs to markdown
parse research_papers/*.pdf
# Parse Word documents
parse reports/*.docx
# Parse presentations
parse slides/*.pptx
# Parse and pipe to search
parse docs/*.pdf | xargs search "neural networks"
Supported formats:
- PDF (.pdf)
- Word (.docx)
- PowerPoint (.pptx)
Configuration:
# Via environment variable
export LLAMA_CLOUD_API_KEY="llx-..."
# Via config file
cat > ~/.parse_config.json << EOF
{
"api_key": "llx-...",
"max_concurrent_requests": 10,
"timeout_seconds": 3600
}
EOF
Important: Document parsing is optional. Semantic search works without it.
Workflow Patterns
Pattern 1: Concept Discovery
When you know what you're looking for conceptually but not by name:
# Step 1: Broad semantic search
search "rate limiting implementation" src/
# Step 2: Review results, refine query
search "throttle requests per user" src/ --top-k 10
# Step 3: Use ripgrep for exact follow-up
rg "RateLimiter" --type py src/
Pattern 2: Similar Code Finder
When you want to find code similar to a reference implementation:
# Step 1: Extract key concepts from reference code
# [Read example_auth.py and identify key concepts]
# Step 2: Search for similar implementations
search "user authentication with JWT tokens" src/
# Step 3: Compare implementations
# [Review semantic matches to find similar approaches]
Pattern 3: Documentation Search
When researching concepts in documentation or comments:
# Search code comments semantically
search "thread safety guarantees" src/ --n-lines 10
# Search markdown documentation
search "deployment best practices" docs/
# Combined search
search "performance optimization" --top-k 20
Pattern 4: Cross-Language Search
When searching for concepts across different languages:
# Semantic search works across languages
search "connection pooling" src/
# May find:
# - Java: "ConnectionPool manager"
# - Python: "database connection reuse"
# - Go: "pool of persistent connections"
# All semantically related despite different terminology
Pattern 5: Document Analysis (with API key)
When analyzing PDFs or documents:
# Step 1: Parse documents to markdown
parse research/*.pdf > papers.md
# Step 2: Search converted content
search "transformer architecture" papers.md
# Step 3: Combine with code search
search "attention mechanism implementation" src/
Integration with file-search
Semtools and file-search (ripgrep/ast-grep) are complementary tools. Use them together for comprehensive search:
Search Strategy Matrix
| You Know | Use First | Then Use | Why |
|---|---|---|---|
| Exact keywords | ripgrep | search | Fast exact match, then find similar |
| Concept only | search | ripgrep | Find relevant code, then search specifics |
| Function name | ripgrep | search | Find definition, then find similar usage |
| Code pattern | ast-grep | search | Find structure, then find similar logic |
| Approximate idea | search | ripgrep + ast-grep | Discover, then drill down |
Layered Search Approach
# Layer 1: Semantic discovery (what's related?)
search "user session management" --top-k 10
# Layer 2: Exact text search (what's the implementation?)
rg "SessionManager|session_store" --type py
# Layer 3: Structural search (how is it used?)
sg --pattern 'session.$METHOD($$$)' --lang python
# Layer 4: Reference tracking (where is it called?)
# [Use serena skill for symbol-level tracking]
Best Practices
1. Start Broad, Then Narrow
Use semantic search for discovery, then narrow with exact search:
# GOOD: Broad semantic discovery first
search "authentication" src/ --top-k 10
# [Review results to learn terminology]
rg "authenticate|verify_credentials" --type py src/
# AVOID: Starting too narrow and missing variations
rg "authenticate" --type py # Misses "verify_credentials", "check_auth", etc.
2. Adjust Similarity Threshold
Tune --max-distance based on results:
# Too many irrelevant results? Decrease distance (more strict)
search "query" --max-distance 0.2
# Missing relevant results? Increase distance (more lenient)
search "query" --max-distance 0.5
# Default (0.3) works well for most cases
search "query"
3. Use Workspaces for Repeated Searches
For interactive exploration, always use workspaces:
# GOOD: Create workspace once, search many times
export SEMTOOLS_WORKSPACE=my-analysis
search "concept1" src/
search "concept2" src/
search "concept3" src/
# INEFFICIENT: Re-compute embeddings every time
search "concept1" src/
search "concept2" src/
4. Combine with Context Tools
Get more context around semantic matches:
# Find semantically similar code
search "retry logic" src/ --n-lines 2
# Get more context with ripgrep
rg -C 10 "retry" src/specific_file.py
# Or read the full file
cat src/specific_file.py
5. Phrase Queries Conceptually
Write queries as concepts, not exact keywords:
# GOOD: Conceptual queries
search "handling network timeouts"
search "user input validation"
search
---
*Content truncated.*
More by massgen
View all skills by massgen →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversClaude Context offers semantic code search and indexing with vector embeddings and AST-based code splitting. Natural lan
Acemcp: semantic code search and code indexing tool for code repository search — natural language queries find relevant
Apple Developer Documentation (RAG) delivers fast, relevant technical docs with advanced semantic and keyword search for
Code Graph RAG enables advanced code analysis with graph traversal, semantic search, and multi-language support for smar
Cometix Indexer — local code indexer for fast semantic code search. Index workspaces and run incremental searches with a
Kiro Memory is project tracking software for developers, offering task tracking, automatic detection, and context-aware
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.