langfuse-rate-limits
Implement Langfuse rate limiting, batching, and backoff patterns. Use when handling rate limit errors, optimizing trace ingestion, or managing high-volume LLM observability workloads. Trigger with phrases like "langfuse rate limit", "langfuse throttling", "langfuse 429", "langfuse batching", "langfuse high volume".
Install
mkdir -p .claude/skills/langfuse-rate-limits && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8358" && unzip -o skill.zip -d .claude/skills/langfuse-rate-limits && rm skill.zipInstalls to .claude/skills/langfuse-rate-limits
About this skill
Langfuse Rate Limits
Overview
Handle Langfuse API rate limits with optimized SDK batching, exponential backoff with jitter, concurrent request limiting, and configurable sampling for ultra-high-volume workloads.
Prerequisites
- Langfuse SDK installed and configured
- High-volume trace workload (1,000+ events/minute)
Instructions
Step 1: Optimize SDK Batching Configuration
The Langfuse SDK batches events internally before sending. Tuning batch settings is the first defense against rate limits.
// v3 Legacy: Direct configuration
import { Langfuse } from "langfuse";
const langfuse = new Langfuse({
flushAt: 50, // Events per batch (default: 15, max ~200)
flushInterval: 10000, // Milliseconds between flushes (default: 10000)
requestTimeout: 30000, // Timeout per batch request
});
// v4+: Configure via OTel span processor
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { NodeSDK } from "@opentelemetry/sdk-node";
const processor = new LangfuseSpanProcessor({
exportIntervalMillis: 10000, // Flush interval
maxExportBatchSize: 50, // Events per batch
});
const sdk = new NodeSDK({ spanProcessors: [processor] });
sdk.start();
Step 2: Implement Retry with Exponential Backoff
For custom API calls (scores, datasets, prompts) that hit rate limits:
async function withRetry<T>(
fn: () => Promise<T>,
options: { maxRetries?: number; baseDelayMs?: number; maxDelayMs?: number } = {}
): Promise<T> {
const { maxRetries = 5, baseDelayMs = 1000, maxDelayMs = 30000 } = options;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error: any) {
const status = error?.status || error?.response?.status;
// Only retry on rate limits (429) and server errors (5xx)
if (attempt === maxRetries || (status && status < 429)) {
throw error;
}
// Honor Retry-After header if present
const retryAfter = error?.response?.headers?.["retry-after"];
let delay: number;
if (retryAfter) {
delay = parseInt(retryAfter, 10) * 1000;
} else {
// Exponential backoff with jitter
delay = Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);
delay += Math.random() * 500; // Jitter
}
console.warn(`Rate limited. Retry ${attempt + 1}/${maxRetries} in ${Math.round(delay)}ms`);
await new Promise((r) => setTimeout(r, delay));
}
}
throw new Error("Unreachable");
}
// Usage with Langfuse client operations
const langfuse = new LangfuseClient();
await withRetry(() =>
langfuse.score.create({
traceId: "trace-123",
name: "quality",
value: 0.95,
dataType: "NUMERIC",
})
);
Step 3: Queue-Based Concurrency Limiting
Use p-queue to cap concurrent Langfuse API calls:
import PQueue from "p-queue";
import { LangfuseClient } from "@langfuse/client";
const langfuse = new LangfuseClient();
// Max 10 concurrent API calls, 50 per second
const queue = new PQueue({
concurrency: 10,
interval: 1000,
intervalCap: 50,
});
// Queue score submissions
async function queueScore(params: {
traceId: string;
name: string;
value: number;
}) {
return queue.add(() =>
langfuse.score.create({
...params,
dataType: "NUMERIC",
})
);
}
// Queue dataset item creation
async function queueDatasetItem(datasetName: string, item: any) {
return queue.add(() =>
langfuse.api.datasetItems.create({
datasetName,
input: item.input,
expectedOutput: item.expectedOutput,
})
);
}
// Monitor queue health
setInterval(() => {
console.log(`Queue: ${queue.pending} pending, ${queue.size} queued`);
}, 10000);
Step 4: Configurable Sampling for Ultra-High Volume
When tracing volume exceeds rate limits, sample traces instead of dropping them:
import { observe, updateActiveObservation, startActiveObservation } from "@langfuse/tracing";
class TraceSampler {
private rate: number;
private windowCounts: number[] = [];
private windowMs = 60000; // 1 minute window
private maxPerWindow: number;
constructor(sampleRate: number, maxPerMinute: number) {
this.rate = sampleRate;
this.maxPerWindow = maxPerMinute;
}
shouldSample(tags?: string[]): boolean {
// Always sample errors
if (tags?.includes("error") || tags?.includes("critical")) {
return true;
}
// Check window limit
const now = Date.now();
this.windowCounts = this.windowCounts.filter((t) => t > now - this.windowMs);
if (this.windowCounts.length >= this.maxPerWindow) {
return false;
}
// Probabilistic sampling
if (Math.random() > this.rate) {
return false;
}
this.windowCounts.push(now);
return true;
}
}
// 10% sampling, max 1000 traces/minute
const sampler = new TraceSampler(0.1, 1000);
async function sampledOperation(name: string, fn: () => Promise<any>) {
if (!sampler.shouldSample()) {
return fn(); // Run without tracing
}
return startActiveObservation(name, async () => {
updateActiveObservation({ metadata: { sampled: true } });
return fn();
});
}
Rate Limit Reference
| Tier | Traces/min | Batch Size | Strategy |
|---|---|---|---|
| Hobby | ~500 | 15 | Default settings |
| Pro | ~5,000 | 50 | Increase flushAt |
| Team | ~10,000 | 100 | + Queue-based limiting |
| Enterprise | Custom | Custom | + Sampling |
Error Handling
| Error | Response | Action |
|---|---|---|
429 Too Many Requests | Retry-After: N | Backoff for N seconds |
503 Service Unavailable | Server overloaded | Backoff 30s+ |
| Flush timeout | Large batch | Reduce flushAt, increase requestTimeout |
| Memory growth | Queue backup | Add maxSize to PQueue |
Resources
More by jeremylongshore
View all skills by jeremylongshore →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversUnlock seamless Figma to code: streamline Figma to HTML with Framelink MCP Server for fast, accurate design-to-code work
Access official Microsoft Docs instantly for up-to-date info. Integrates with ms word and ms word online for seamless wo
Integrate Feishu (Lark) for seamless document retrieval, messaging, and collaboration via TypeScript CLI or HTTP server
Reddit Buddy offers powerful Reddit API tools for browsing, searching, and data annotation with secure access, rate limi
Reddit Buddy offers clean access to Reddit API, advanced reddit tools, and seamless data annotation reddit with smart ca
Explore Magic UI, a React UI library offering structured component access, code suggestions, and installation guides for
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.