semtools

Name: semtools
Author: massgen

1views

0installs

This skill provides semantic search capabilities using embedding-based similarity matching for code and text. Enables meaning-based search beyond keyword matching, with optional document parsing (PDF, DOCX, PPTX) support.

Install

mkdir -p .claude/skills/semtools && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8209" && unzip -o skill.zip -d .claude/skills/semtools && rm skill.zip

Installs to .claude/skills/semtools

About this skill

Semtools: Semantic Search

Perform semantic (meaning-based) search across code and documents using embedding-based similarity matching.

Purpose

The semtools skill provides access to Semtools, a high-performance Rust-based CLI for semantic search and document processing. Unlike traditional text search (ripgrep) which matches exact strings, or structural search (ast-grep) which matches syntax patterns, semtools understands semantic meaning through embeddings.

Key capabilities:

Semantic Search: Find code/text by meaning, not just keywords
Workspace Management: Index large codebases for fast repeated searches
Document Parsing: Convert PDFs, DOCX, PPTX to searchable text (requires API key)

Semtools excels at discovery - finding relevant code when you don't know the exact keywords, function names, or syntax patterns.

When to Use This Skill

Use the semtools skill when you need meaning-based search:

Semantic Code Discovery:

Finding code that implements a concept ("error handling", "data validation")
Discovering similar functionality across different modules
Locating examples of a pattern when you don't know exact names
Understanding what code does without reading everything

Documentation & Knowledge:

Searching documentation by concept, not keywords
Finding related discussions in comments or docs
Discovering similar issues or solutions
Analyzing technical documents (PDFs, reports)

Use Cases:

"Find all authentication-related code" (without knowing function names)
"Show me error handling patterns" (regardless of specific error types)
"Find code similar to this implementation" (semantic similarity)
"Search research papers for 'distributed consensus'" (document search)

Choose semtools over file-search (ripgrep/ast-grep) when:

You know the concept but not the keywords
Exact string matching misses relevant results
You want semantically similar code, not exact matches
Searching across languages or mixed content

Still use file-search when:

You know exact keywords, function names, or patterns
You need structural code matching (ast-grep)
Speed is critical (ripgrep is faster for exact matches)
You're searching for specific symbols or references

Available Commands

Semtools provides three CLI commands you can use via execute_command:

search - Semantic search across code and text files
workspace - Manage workspaces for caching embeddings
parse - Convert documents (PDF, DOCX, PPTX) to searchable text

All commands work out-of-the-box in your execution environment. Document parsing requires the LLAMA_CLOUD_API_KEY environment variable to be set.

Core Operations

1. Semantic Search (`search`)

Find files and code sections by semantic meaning:

# Basic semantic search
search "authentication logic" src/

# Search with more context (5 lines before/after)
search "error handling" --n-lines 5 src/

# Get more results (default: 3)
search "database queries" --top-k 10 src/

# Control similarity threshold (0.0-1.0, lower = more lenient)
search "API endpoints" --max-distance 0.4 src/

Parameters:

--n-lines N: Show N lines of context around matches (default: 3)
--top-k K: Return top K most similar matches (default: 3)
--max-distance D: Maximum embedding distance (0.0-1.0, default: 0.3)
-i: Case-insensitive matching

Output format:

Match 1 (similarity: 0.12)
File: src/auth/handlers.py
Lines: 42-47
----
def authenticate_user(username: str, password: str) -> Optional[User]:
    """Authenticate user credentials against database."""
    user = get_user_by_username(username)
    if user and verify_password(password, user.password_hash):
        return user
    return None
----

Match 2 (similarity: 0.18)
File: src/middleware/auth.py
...

2. Workspace Management (`workspace`)

For large codebases, create workspaces to cache embeddings and enable fast repeated searches:

# Create/activate workspace
workspace use my-project

# Set workspace via environment variable
export SEMTOOLS_WORKSPACE=my-project

# Index files in workspace (workspace auto-detected from env var)
search "query" src/

# Check workspace status
workspace status

# Clean up old workspaces
workspace prune

Benefits:

Fast repeated searches: Embeddings cached, no re-computation
Large codebases: IVF_PQ indexing for scalability
Session persistence: Maintain context across multiple searches

When to use workspaces:

Searching the same codebase multiple times
Very large projects (1000+ files)
Interactive exploration sessions
CI/CD pipelines with repeated searches

3. Document Parsing (`parse`) ⚠️ Requires API Key

Convert documents to searchable markdown (requires LlamaParse API key):

# Parse PDFs to markdown
parse research_papers/*.pdf

# Parse Word documents
parse reports/*.docx

# Parse presentations
parse slides/*.pptx

# Parse and pipe to search
parse docs/*.pdf | xargs search "neural networks"

Supported formats:

PDF (.pdf)
Word (.docx)
PowerPoint (.pptx)

Configuration:

# Via environment variable
export LLAMA_CLOUD_API_KEY="llx-..."

# Via config file
cat > ~/.parse_config.json << EOF
{
  "api_key": "llx-...",
  "max_concurrent_requests": 10,
  "timeout_seconds": 3600
}
EOF

Important: Document parsing is optional. Semantic search works without it.

Workflow Patterns

Pattern 1: Concept Discovery

When you know what you're looking for conceptually but not by name:

# Step 1: Broad semantic search
search "rate limiting implementation" src/

# Step 2: Review results, refine query
search "throttle requests per user" src/ --top-k 10

# Step 3: Use ripgrep for exact follow-up
rg "RateLimiter" --type py src/

Pattern 2: Similar Code Finder

When you want to find code similar to a reference implementation:

# Step 1: Extract key concepts from reference code
# [Read example_auth.py and identify key concepts]

# Step 2: Search for similar implementations
search "user authentication with JWT tokens" src/

# Step 3: Compare implementations
# [Review semantic matches to find similar approaches]

Pattern 3: Documentation Search

When researching concepts in documentation or comments:

# Search code comments semantically
search "thread safety guarantees" src/ --n-lines 10

# Search markdown documentation
search "deployment best practices" docs/

# Combined search
search "performance optimization" --top-k 20

Pattern 4: Cross-Language Search

When searching for concepts across different languages:

# Semantic search works across languages
search "connection pooling" src/

# May find:
# - Java: "ConnectionPool manager"
# - Python: "database connection reuse"
# - Go: "pool of persistent connections"
# All semantically related despite different terminology

Pattern 5: Document Analysis (with API key)

When analyzing PDFs or documents:

# Step 1: Parse documents to markdown
parse research/*.pdf > papers.md

# Step 2: Search converted content
search "transformer architecture" papers.md

# Step 3: Combine with code search
search "attention mechanism implementation" src/

Integration with file-search

Semtools and file-search (ripgrep/ast-grep) are complementary tools. Use them together for comprehensive search:

Search Strategy Matrix

You Know	Use First	Then Use	Why
Exact keywords	ripgrep	search	Fast exact match, then find similar
Concept only	search	ripgrep	Find relevant code, then search specifics
Function name	ripgrep	search	Find definition, then find similar usage
Code pattern	ast-grep	search	Find structure, then find similar logic
Approximate idea	search	ripgrep + ast-grep	Discover, then drill down

Layered Search Approach

# Layer 1: Semantic discovery (what's related?)
search "user session management" --top-k 10

# Layer 2: Exact text search (what's the implementation?)
rg "SessionManager|session_store" --type py

# Layer 3: Structural search (how is it used?)
sg --pattern 'session.$METHOD($$$)' --lang python

# Layer 4: Reference tracking (where is it called?)
# [Use serena skill for symbol-level tracking]

Best Practices

1. Start Broad, Then Narrow

Use semantic search for discovery, then narrow with exact search:

# GOOD: Broad semantic discovery first
search "authentication" src/ --top-k 10
# [Review results to learn terminology]
rg "authenticate|verify_credentials" --type py src/

# AVOID: Starting too narrow and missing variations
rg "authenticate" --type py  # Misses "verify_credentials", "check_auth", etc.

2. Adjust Similarity Threshold

Tune --max-distance based on results:

# Too many irrelevant results? Decrease distance (more strict)
search "query" --max-distance 0.2

# Missing relevant results? Increase distance (more lenient)
search "query" --max-distance 0.5

# Default (0.3) works well for most cases
search "query"

3. Use Workspaces for Repeated Searches

For interactive exploration, always use workspaces:

# GOOD: Create workspace once, search many times
export SEMTOOLS_WORKSPACE=my-analysis
search "concept1" src/
search "concept2" src/
search "concept3" src/

# INEFFICIENT: Re-compute embeddings every time
search "concept1" src/
search "concept2" src/

4. Combine with Context Tools

Get more context around semantic matches:

# Find semantically similar code
search "retry logic" src/ --n-lines 2

# Get more context with ripgrep
rg -C 10 "retry" src/specific_file.py

# Or read the full file
cat src/specific_file.py

5. Phrase Queries Conceptually

Write queries as concepts, not exact keywords:

# GOOD: Conceptual queries
search "handling network timeouts"
search "user input validation"
search

---

*Content truncated.*

More by massgen

View all skills by massgen →

serena

massgen

This skill provides symbol-level code understanding and navigation using Language Server Protocol (LSP). Enables IDE-like capabilities for finding symbols, tracking references, and making precise code edits at the symbol level.

10215

file-search

massgen

This skill should be used when agents need to search codebases for text patterns or structural code patterns. Provides fast search using ripgrep for text and ast-grep for syntax-aware code search.

textual-ui-developer

massgen

Develop and improve the MassGen Textual TUI by running it in a browser via textual-serve and using Claude's browser tool for visual feedback.

massgen-log-analyzer

massgen

Run MassGen experiments and analyze logs using automation mode, logfire tracing, and SQL queries. Use this skill for performance analysis, debugging agent behavior, evaluating coordination patterns, and improving the logging structure, or whenever an ANALYSIS_REPORT.md is needed in a log directory.

evolving-skill-creator

massgen

Guide for creating evolving skills - detailed workflow plans that capture what you'll do, what tools you'll create, and learnings from execution. Use this when starting a new task that could benefit from a reusable workflow.

massgen-develops-massgen

massgen

Guide for using MassGen to develop and improve itself. This skill should be used when agents need to run MassGen experiments programmatically (using automation mode) OR analyze terminal UI/UX quality (using visual evaluation tools). These are mutually exclusive workflows for different improvement goals.

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

3,2392,771

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

4,2841,842

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,2261,672

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,3681,519

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,6771,285

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

2,0881,004

Related MCP Servers

Browse all servers

Claude Context

Claude Context offers semantic code search and indexing with vector embeddings and AST-based code splitting. Natural language queries across codebases.

5,5770 tools

Acemcp

Acemcp: semantic code search and code indexing tool for code repository search — natural language queries find relevant code snippets with incremental indexing.

3320 tools

Apple Developer Documentation (RAG)

Apple Developer Documentation (RAG) delivers fast, relevant technical docs with advanced semantic and keyword search for APIs and developer resources.

1102 tools

Code Graph RAG

Code Graph RAG enables advanced code analysis with graph traversal, semantic search, and multi-language support for smarter codebase reviews.

920 tools

Cometix Indexer

Cometix Indexer — local code indexer for fast semantic code search. Index workspaces and run incremental searches with automatic synchronization.

640 tools

Kiro Memory

Kiro Memory is project tracking software for developers, offering task tracking, automatic detection, and context-aware tools for seamless coding.

300 tools

Install

mkdir -p .claude/skills/semtools && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8209" && unzip -o skill.zip -d .claude/skills/semtools && rm skill.zip

Installs to .claude/skills/semtools

Stats

Views

Installs

Author

massgen

7 skills published

Links

Source Code

semtools

Install

About this skill

Semtools: Semantic Search

Purpose

When to Use This Skill

Available Commands

Core Operations

1. Semantic Search (search)

2. Workspace Management (workspace)

3. Document Parsing (parse) ⚠️ Requires API Key

Workflow Patterns

Pattern 1: Concept Discovery

Pattern 2: Similar Code Finder

Pattern 3: Documentation Search

Pattern 4: Cross-Language Search

Pattern 5: Document Analysis (with API key)

Integration with file-search

Search Strategy Matrix

Layered Search Approach

Best Practices

1. Start Broad, Then Narrow

2. Adjust Similarity Threshold

3. Use Workspaces for Repeated Searches

4. Combine with Context Tools

5. Phrase Queries Conceptually

More by massgen

serena

file-search

textual-ui-developer

massgen-log-analyzer

evolving-skill-creator

massgen-develops-massgen

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers

1. Semantic Search (`search`)

2. Workspace Management (`workspace`)

3. Document Parsing (`parse`) ⚠️ Requires API Key