semtools

0
0
Source

This skill provides semantic search capabilities using embedding-based similarity matching for code and text. Enables meaning-based search beyond keyword matching, with optional document parsing (PDF, DOCX, PPTX) support.

Install

mkdir -p .claude/skills/semtools && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8209" && unzip -o skill.zip -d .claude/skills/semtools && rm skill.zip

Installs to .claude/skills/semtools

About this skill

Semtools: Semantic Search

Perform semantic (meaning-based) search across code and documents using embedding-based similarity matching.

Purpose

The semtools skill provides access to Semtools, a high-performance Rust-based CLI for semantic search and document processing. Unlike traditional text search (ripgrep) which matches exact strings, or structural search (ast-grep) which matches syntax patterns, semtools understands semantic meaning through embeddings.

Key capabilities:

  1. Semantic Search: Find code/text by meaning, not just keywords
  2. Workspace Management: Index large codebases for fast repeated searches
  3. Document Parsing: Convert PDFs, DOCX, PPTX to searchable text (requires API key)

Semtools excels at discovery - finding relevant code when you don't know the exact keywords, function names, or syntax patterns.

When to Use This Skill

Use the semtools skill when you need meaning-based search:

Semantic Code Discovery:

  • Finding code that implements a concept ("error handling", "data validation")
  • Discovering similar functionality across different modules
  • Locating examples of a pattern when you don't know exact names
  • Understanding what code does without reading everything

Documentation & Knowledge:

  • Searching documentation by concept, not keywords
  • Finding related discussions in comments or docs
  • Discovering similar issues or solutions
  • Analyzing technical documents (PDFs, reports)

Use Cases:

  • "Find all authentication-related code" (without knowing function names)
  • "Show me error handling patterns" (regardless of specific error types)
  • "Find code similar to this implementation" (semantic similarity)
  • "Search research papers for 'distributed consensus'" (document search)

Choose semtools over file-search (ripgrep/ast-grep) when:

  • You know the concept but not the keywords
  • Exact string matching misses relevant results
  • You want semantically similar code, not exact matches
  • Searching across languages or mixed content

Still use file-search when:

  • You know exact keywords, function names, or patterns
  • You need structural code matching (ast-grep)
  • Speed is critical (ripgrep is faster for exact matches)
  • You're searching for specific symbols or references

Available Commands

Semtools provides three CLI commands you can use via execute_command:

  • search - Semantic search across code and text files
  • workspace - Manage workspaces for caching embeddings
  • parse - Convert documents (PDF, DOCX, PPTX) to searchable text

All commands work out-of-the-box in your execution environment. Document parsing requires the LLAMA_CLOUD_API_KEY environment variable to be set.

Core Operations

1. Semantic Search (search)

Find files and code sections by semantic meaning:

# Basic semantic search
search "authentication logic" src/

# Search with more context (5 lines before/after)
search "error handling" --n-lines 5 src/

# Get more results (default: 3)
search "database queries" --top-k 10 src/

# Control similarity threshold (0.0-1.0, lower = more lenient)
search "API endpoints" --max-distance 0.4 src/

Parameters:

  • --n-lines N: Show N lines of context around matches (default: 3)
  • --top-k K: Return top K most similar matches (default: 3)
  • --max-distance D: Maximum embedding distance (0.0-1.0, default: 0.3)
  • -i: Case-insensitive matching

Output format:

Match 1 (similarity: 0.12)
File: src/auth/handlers.py
Lines: 42-47
----
def authenticate_user(username: str, password: str) -> Optional[User]:
    """Authenticate user credentials against database."""
    user = get_user_by_username(username)
    if user and verify_password(password, user.password_hash):
        return user
    return None
----

Match 2 (similarity: 0.18)
File: src/middleware/auth.py
...

2. Workspace Management (workspace)

For large codebases, create workspaces to cache embeddings and enable fast repeated searches:

# Create/activate workspace
workspace use my-project

# Set workspace via environment variable
export SEMTOOLS_WORKSPACE=my-project

# Index files in workspace (workspace auto-detected from env var)
search "query" src/

# Check workspace status
workspace status

# Clean up old workspaces
workspace prune

Benefits:

  • Fast repeated searches: Embeddings cached, no re-computation
  • Large codebases: IVF_PQ indexing for scalability
  • Session persistence: Maintain context across multiple searches

When to use workspaces:

  • Searching the same codebase multiple times
  • Very large projects (1000+ files)
  • Interactive exploration sessions
  • CI/CD pipelines with repeated searches

3. Document Parsing (parse) ⚠️ Requires API Key

Convert documents to searchable markdown (requires LlamaParse API key):

# Parse PDFs to markdown
parse research_papers/*.pdf

# Parse Word documents
parse reports/*.docx

# Parse presentations
parse slides/*.pptx

# Parse and pipe to search
parse docs/*.pdf | xargs search "neural networks"

Supported formats:

  • PDF (.pdf)
  • Word (.docx)
  • PowerPoint (.pptx)

Configuration:

# Via environment variable
export LLAMA_CLOUD_API_KEY="llx-..."

# Via config file
cat > ~/.parse_config.json << EOF
{
  "api_key": "llx-...",
  "max_concurrent_requests": 10,
  "timeout_seconds": 3600
}
EOF

Important: Document parsing is optional. Semantic search works without it.

Workflow Patterns

Pattern 1: Concept Discovery

When you know what you're looking for conceptually but not by name:

# Step 1: Broad semantic search
search "rate limiting implementation" src/

# Step 2: Review results, refine query
search "throttle requests per user" src/ --top-k 10

# Step 3: Use ripgrep for exact follow-up
rg "RateLimiter" --type py src/

Pattern 2: Similar Code Finder

When you want to find code similar to a reference implementation:

# Step 1: Extract key concepts from reference code
# [Read example_auth.py and identify key concepts]

# Step 2: Search for similar implementations
search "user authentication with JWT tokens" src/

# Step 3: Compare implementations
# [Review semantic matches to find similar approaches]

Pattern 3: Documentation Search

When researching concepts in documentation or comments:

# Search code comments semantically
search "thread safety guarantees" src/ --n-lines 10

# Search markdown documentation
search "deployment best practices" docs/

# Combined search
search "performance optimization" --top-k 20

Pattern 4: Cross-Language Search

When searching for concepts across different languages:

# Semantic search works across languages
search "connection pooling" src/

# May find:
# - Java: "ConnectionPool manager"
# - Python: "database connection reuse"
# - Go: "pool of persistent connections"
# All semantically related despite different terminology

Pattern 5: Document Analysis (with API key)

When analyzing PDFs or documents:

# Step 1: Parse documents to markdown
parse research/*.pdf > papers.md

# Step 2: Search converted content
search "transformer architecture" papers.md

# Step 3: Combine with code search
search "attention mechanism implementation" src/

Integration with file-search

Semtools and file-search (ripgrep/ast-grep) are complementary tools. Use them together for comprehensive search:

Search Strategy Matrix

You KnowUse FirstThen UseWhy
Exact keywordsripgrepsearchFast exact match, then find similar
Concept onlysearchripgrepFind relevant code, then search specifics
Function nameripgrepsearchFind definition, then find similar usage
Code patternast-grepsearchFind structure, then find similar logic
Approximate ideasearchripgrep + ast-grepDiscover, then drill down

Layered Search Approach

# Layer 1: Semantic discovery (what's related?)
search "user session management" --top-k 10

# Layer 2: Exact text search (what's the implementation?)
rg "SessionManager|session_store" --type py

# Layer 3: Structural search (how is it used?)
sg --pattern 'session.$METHOD($$$)' --lang python

# Layer 4: Reference tracking (where is it called?)
# [Use serena skill for symbol-level tracking]

Best Practices

1. Start Broad, Then Narrow

Use semantic search for discovery, then narrow with exact search:

# GOOD: Broad semantic discovery first
search "authentication" src/ --top-k 10
# [Review results to learn terminology]
rg "authenticate|verify_credentials" --type py src/

# AVOID: Starting too narrow and missing variations
rg "authenticate" --type py  # Misses "verify_credentials", "check_auth", etc.

2. Adjust Similarity Threshold

Tune --max-distance based on results:

# Too many irrelevant results? Decrease distance (more strict)
search "query" --max-distance 0.2

# Missing relevant results? Increase distance (more lenient)
search "query" --max-distance 0.5

# Default (0.3) works well for most cases
search "query"

3. Use Workspaces for Repeated Searches

For interactive exploration, always use workspaces:

# GOOD: Create workspace once, search many times
export SEMTOOLS_WORKSPACE=my-analysis
search "concept1" src/
search "concept2" src/
search "concept3" src/

# INEFFICIENT: Re-compute embeddings every time
search "concept1" src/
search "concept2" src/

4. Combine with Context Tools

Get more context around semantic matches:

# Find semantically similar code
search "retry logic" src/ --n-lines 2

# Get more context with ripgrep
rg -C 10 "retry" src/specific_file.py

# Or read the full file
cat src/specific_file.py

5. Phrase Queries Conceptually

Write queries as concepts, not exact keywords:

# GOOD: Conceptual queries
search "handling network timeouts"
search "user input validation"
search

---

*Content truncated.*

serena

massgen

This skill provides symbol-level code understanding and navigation using Language Server Protocol (LSP). Enables IDE-like capabilities for finding symbols, tracking references, and making precise code edits at the symbol level.

796

evolving-skill-creator

massgen

Guide for creating evolving skills - detailed workflow plans that capture what you'll do, what tools you'll create, and learnings from execution. Use this when starting a new task that could benefit from a reusable workflow.

00

massgen-develops-massgen

massgen

Guide for using MassGen to develop and improve itself. This skill should be used when agents need to run MassGen experiments programmatically (using automation mode) OR analyze terminal UI/UX quality (using visual evaluation tools). These are mutually exclusive workflows for different improvement goals.

20

file-search

massgen

This skill should be used when agents need to search codebases for text patterns or structural code patterns. Provides fast search using ripgrep for text and ast-grep for syntax-aware code search.

10

massgen-log-analyzer

massgen

Run MassGen experiments and analyze logs using automation mode, logfire tracing, and SQL queries. Use this skill for performance analysis, debugging agent behavior, evaluating coordination patterns, and improving the logging structure, or whenever an ANALYSIS_REPORT.md is needed in a log directory.

00

massgen-release-documenter

massgen

Guide for following MassGen's release documentation workflow. This skill should be used when preparing release documentation, updating changelogs, writing case studies, or maintaining project documentation across releases.

00

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

9521,094

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

846846

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

571699

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

548492

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

673466

fastapi-templates

wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

514280

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.