exa-performance-tuning
Optimize Exa API performance with caching, batching, and connection pooling. Use when experiencing slow API responses, implementing caching strategies, or optimizing request throughput for Exa integrations. Trigger with phrases like "exa performance", "optimize exa", "exa latency", "exa caching", "exa slow", "exa batch".
Install
mkdir -p .claude/skills/exa-performance-tuning && curl -L -o skill.zip "https://mcp.directory/api/skills/download/9311" && unzip -o skill.zip -d .claude/skills/exa-performance-tuning && rm skill.zipInstalls to .claude/skills/exa-performance-tuning
About this skill
Exa Performance Tuning
Overview
Optimize Exa search API response times for production workloads. Key levers: search type selection (instant < fast < auto < neural < deep), result count reduction, content scope control, result caching, and parallel query execution.
Latency by Search Type
| Type | Typical Latency | Use Case |
|---|---|---|
instant | < 150ms | Real-time autocomplete, typeahead |
fast | p50 < 425ms | Speed-critical user-facing search |
auto | 300-1500ms | General purpose (default) |
neural | 500-2000ms | Best semantic quality |
deep | 2-5s | Maximum coverage, light deep search |
deep-reasoning | 5-15s | Complex research questions |
Instructions
Step 1: Match Search Type to Latency Budget
import Exa from "exa-js";
const exa = new Exa(process.env.EXA_API_KEY);
function selectSearchType(latencyBudgetMs: number) {
if (latencyBudgetMs < 200) return "instant";
if (latencyBudgetMs < 500) return "fast";
if (latencyBudgetMs < 1500) return "auto";
if (latencyBudgetMs < 3000) return "neural";
return "deep";
}
async function optimizedSearch(query: string, latencyBudgetMs: number) {
const type = selectSearchType(latencyBudgetMs);
const numResults = latencyBudgetMs < 500 ? 3 : latencyBudgetMs < 2000 ? 5 : 10;
return exa.search(query, { type, numResults });
}
Step 2: Minimize Content Retrieval
// Each content option adds latency. Only request what you need.
// Fastest: metadata only (no content retrieval)
const metadataOnly = await exa.search("query", { numResults: 5 });
// Medium: highlights only (much smaller than full text)
const highlightsOnly = await exa.searchAndContents("query", {
numResults: 5,
highlights: { maxCharacters: 300 },
// No text or summary — saves content retrieval time
});
// Slower: full text (use maxCharacters to limit)
const withText = await exa.searchAndContents("query", {
numResults: 3, // fewer results = faster
text: { maxCharacters: 1000 }, // limit content size
});
Step 3: Cache Search Results
import { LRUCache } from "lru-cache";
const searchCache = new LRUCache<string, any>({
max: 5000,
ttl: 2 * 3600 * 1000, // 2-hour TTL
});
async function cachedSearch(query: string, opts: any) {
const key = `${query}:${opts.type || "auto"}:${opts.numResults || 10}`;
const cached = searchCache.get(key);
if (cached) return cached; // Cache hit: 0ms vs 500-2000ms
const results = await exa.search(query, opts);
searchCache.set(key, results);
return results;
}
Step 4: Parallelize Independent Searches
// Run independent queries concurrently instead of sequentially
async function parallelSearch(queries: string[]) {
const searches = queries.map(q =>
cachedSearch(q, { type: "auto", numResults: 3 })
);
return Promise.all(searches);
// 3 parallel searches: ~600ms total (limited by slowest)
// 3 sequential searches: ~1800ms total
}
Step 5: Two-Phase Search Pattern
// Phase 1: Fast search for URLs only
// Phase 2: Selective content retrieval for top results only
async function twoPhaseSearch(query: string) {
// Phase 1: metadata only (fast)
const results = await exa.search(query, { type: "auto", numResults: 10 });
// Phase 2: get content only for top 3 results
const topUrls = results.results.slice(0, 3).map(r => r.url);
const contents = await exa.getContents(topUrls, {
text: { maxCharacters: 2000 },
highlights: { maxCharacters: 500, query },
});
return contents;
// Saves content retrieval time for 7 results you won't use
}
Step 6: Query Normalization for Cache Hits
function normalizeQuery(query: string): string {
return query
.toLowerCase()
.trim()
.replace(/\s+/g, " ") // collapse whitespace
.replace(/[?.!,;:]+$/, ""); // strip trailing punctuation
}
async function normalizedSearch(query: string, opts: any) {
return cachedSearch(normalizeQuery(query), opts);
}
// Increases cache hit rate by 20-40% for user-generated queries
Performance Comparison
| Strategy | Latency Savings | Implementation |
|---|---|---|
instant type | 5-10x faster than neural | One-line change |
| Reduce numResults (10 -> 3) | ~200-500ms saved | One-line change |
| Highlights instead of text | ~100-300ms saved | Replace text with highlights |
| LRU cache | 100% for cache hits | ~20 lines |
| Parallel queries | 2-3x throughput | Promise.all wrapper |
| Two-phase search | ~30-50% for large result sets | ~15 lines |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Search taking 3s+ | Neural search on complex query | Switch to fast or auto type |
| Timeout on content | Large pages, slow sources | Set maxCharacters limit |
| Cache miss rate high | Unique queries each time | Normalize queries before caching |
| Rate limit (429) | Too many concurrent searches | Add request queue with concurrency limit |
Resources
Next Steps
For cost optimization, see exa-cost-tuning. For reliability, see exa-reliability-patterns.
More by jeremylongshore
View all skills by jeremylongshore →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
pdf-to-markdown
aliceisjustplaying
Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.
Related MCP Servers
Browse all serversOptimize Facebook ad campaigns with AI-driven insights, creative analysis, and campaign control in Meta Ads Manager for
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API
Chinese Trends Hub gives you real-time trending topics from major Chinese platforms like Weibo, Zhihu, Douyin, and more,
Use Google Lighthouse to check web page performance and optimize website speed. Try our landing page optimizer for bette
Process Excel files efficiently: read sheet names, extract data, and cache workbooks for large files using tools like pd
GitHub Repos Manager integrates with GitHub's REST API to streamline repo management, issues, pull requests, file ops, s
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.