langfuse-cost-tuning
Monitor and optimize LLM costs using Langfuse analytics and dashboards. Use when tracking LLM spending, identifying cost anomalies, or implementing cost controls for AI applications. Trigger with phrases like "langfuse costs", "LLM spending", "track AI costs", "langfuse token usage", "optimize LLM budget".
Install
mkdir -p .claude/skills/langfuse-cost-tuning && curl -L -o skill.zip "https://mcp.directory/api/skills/download/6585" && unzip -o skill.zip -d .claude/skills/langfuse-cost-tuning && rm skill.zipInstalls to .claude/skills/langfuse-cost-tuning
About this skill
Langfuse Cost Tuning
Overview
Track, analyze, and optimize LLM costs using Langfuse's built-in token/cost tracking, the Metrics API for programmatic cost analysis, model routing for cost reduction, and automated budget alerts.
Prerequisites
- Langfuse tracing with token usage captured (via
observeOpenAIor manualusagefields) - For Metrics API:
@langfuse/clientinstalled - Understanding of LLM pricing models
How Langfuse Tracks Costs
Langfuse automatically calculates costs for supported models (OpenAI, Anthropic, Google) when token usage is captured. For custom models, you can configure pricing in the Langfuse UI under Settings > Model Definitions.
Cost tracking works on observations of type generation and embedding. The observeOpenAI wrapper captures usage automatically; for manual tracing, include usage in your observation updates.
Instructions
Step 1: Ensure Token Usage is Captured
// Automatic: observeOpenAI captures everything
import { observeOpenAI } from "@langfuse/openai";
const openai = observeOpenAI(new OpenAI());
// Tokens, model, latency, and cost are all auto-tracked
// Manual: include usage in generation observations
import { startActiveObservation, updateActiveObservation } from "@langfuse/tracing";
await startActiveObservation(
{ name: "llm-call", asType: "generation" },
async () => {
updateActiveObservation({ model: "gpt-4o" }); // Model required for cost calc
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: prompt }],
});
updateActiveObservation({
output: response.choices[0].message.content,
usage: {
promptTokens: response.usage?.prompt_tokens,
completionTokens: response.usage?.completion_tokens,
totalTokens: response.usage?.total_tokens,
},
// Optional: override inferred cost (in USD)
// costInUsd: 0.0015,
});
}
);
Step 2: Query Costs via Metrics API
import { LangfuseClient } from "@langfuse/client";
const langfuse = new LangfuseClient();
// Fetch aggregated cost metrics
async function getCostReport(days: number) {
const fromTimestamp = new Date(Date.now() - days * 86400000).toISOString();
// Use the API to list traces with cost data
const traces = await langfuse.api.traces.list({
fromTimestamp,
limit: 1000,
orderBy: "timestamp",
});
const costByModel = new Map<string, { cost: number; tokens: number; count: number }>();
for (const trace of traces.data) {
const observations = await langfuse.api.observations.list({
traceId: trace.id,
type: "GENERATION",
});
for (const obs of observations.data) {
const model = obs.model || "unknown";
const existing = costByModel.get(model) || { cost: 0, tokens: 0, count: 0 };
existing.cost += obs.calculatedTotalCost || 0;
existing.tokens += obs.totalTokens || 0;
existing.count += 1;
costByModel.set(model, existing);
}
}
console.log("\n=== LLM Cost Report ===");
console.log(`Period: Last ${days} days\n`);
let totalCost = 0;
for (const [model, data] of costByModel.entries()) {
console.log(`${model}:`);
console.log(` Calls: ${data.count}`);
console.log(` Tokens: ${data.tokens.toLocaleString()}`);
console.log(` Cost: $${data.cost.toFixed(4)}`);
totalCost += data.cost;
}
console.log(`\nTotal: $${totalCost.toFixed(4)}`);
}
getCostReport(7);
Step 3: Implement Smart Model Routing
Route requests to cheaper models when appropriate:
import { observe, updateActiveObservation } from "@langfuse/tracing";
interface ModelConfig {
model: string;
costPer1MInput: number;
costPer1MOutput: number;
maxComplexity: "simple" | "moderate" | "complex";
}
const MODELS: ModelConfig[] = [
{ model: "gpt-4o-mini", costPer1MInput: 0.15, costPer1MOutput: 0.60, maxComplexity: "simple" },
{ model: "gpt-4o", costPer1MInput: 2.50, costPer1MOutput: 10.00, maxComplexity: "moderate" },
{ model: "claude-sonnet-4-20250514", costPer1MInput: 3.00, costPer1MOutput: 15.00, maxComplexity: "complex" },
];
function selectModel(task: string, inputLength: number): ModelConfig {
const simpleTasks = ["classify", "extract", "summarize-short", "translate"];
const isSimple = simpleTasks.some((t) => task.includes(t));
const isShort = inputLength < 500;
if (isSimple && isShort) return MODELS[0]; // gpt-4o-mini
if (isSimple || inputLength < 2000) return MODELS[1]; // gpt-4o
return MODELS[2]; // claude-sonnet-4
}
const costOptimizedLLM = observe(
{ name: "cost-optimized-llm", asType: "generation" },
async (task: string, input: string) => {
const config = selectModel(task, input.length);
updateActiveObservation({
model: config.model,
metadata: {
task,
selectedReason: `${config.maxComplexity} tier`,
estimatedCostPer1M: config.costPer1MInput,
},
});
const response = await callModel(config.model, input);
updateActiveObservation({
output: response.content,
usage: response.usage,
});
return response;
}
);
Step 4: Budget Alerts
// scripts/cost-alert.ts -- run as cron job
import { LangfuseClient } from "@langfuse/client";
const langfuse = new LangfuseClient();
const ALERT_THRESHOLDS = {
dailyWarn: 50, // $50/day warning
dailyCritical: 200, // $200/day critical
perRequestWarn: 1, // $1/request warning
};
async function checkCostAlerts() {
const since = new Date(Date.now() - 86400000).toISOString(); // Last 24h
const traces = await langfuse.api.traces.list({
fromTimestamp: since,
limit: 500,
});
let dailyCost = 0;
let maxRequestCost = 0;
for (const trace of traces.data) {
const observations = await langfuse.api.observations.list({
traceId: trace.id,
type: "GENERATION",
});
const traceCost = observations.data.reduce(
(sum, obs) => sum + (obs.calculatedTotalCost || 0), 0
);
dailyCost += traceCost;
maxRequestCost = Math.max(maxRequestCost, traceCost);
}
console.log(`Daily cost: $${dailyCost.toFixed(2)}`);
console.log(`Max request cost: $${maxRequestCost.toFixed(4)}`);
if (dailyCost > ALERT_THRESHOLDS.dailyCritical) {
await sendAlert("CRITICAL", `Daily LLM cost: $${dailyCost.toFixed(2)}`);
} else if (dailyCost > ALERT_THRESHOLDS.dailyWarn) {
await sendAlert("WARNING", `Daily LLM cost: $${dailyCost.toFixed(2)}`);
}
}
checkCostAlerts();
Langfuse Dashboard Features
Langfuse provides built-in cost analytics in the UI:
- Cost Dashboard: Tracks token usage and costs over time by model, user, and session
- Latency Dashboard: Response times across models and user segments
- Custom Dashboards: Build custom views with multi-level aggregations
- Pricing Tiers: Supports complex pricing (cached tokens, audio tokens, per-model tiers)
Cost Optimization Strategies
| Strategy | Savings | Effort | How |
|---|---|---|---|
| Model downgrade | 50-95% | Low | Route simple tasks to gpt-4o-mini |
| Prompt optimization | 10-30% | Low | Remove filler words, use structured prompts |
| Response caching | 20-80% | Medium | Cache identical prompts with TTL |
| Batch processing | 50% | Medium | Use OpenAI Batch API for offline tasks |
| Token limits | 10-40% | Low | Set max_tokens on all calls |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Missing cost data | No usage in generation | Ensure usage is included with promptTokens/completionTokens |
| Wrong cost calculation | Model name mismatch | Use exact model ID (e.g., gpt-4o-2024-08-06) |
| Custom model no cost | No pricing configured | Add model pricing in Langfuse Settings > Model Definitions |
| Stale pricing | Model prices changed | Update model definitions periodically |
Resources
More by jeremylongshore
View all skills by jeremylongshore →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversEnhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium s
Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool
Effortlessly create 25+ chart types with MCP Server Chart. Visualize complex datasets using TypeScript and AntV for powe
Easily manage and gain insights into your Cloudflare Workers Builds with integrated tools. Optimize and monitor your Clo
Connect Supabase projects to AI with Supabase MCP Server. Standardize LLM communication for secure, efficient developmen
Supercharge AI platforms with Azure MCP Server for seamless Azure API Management and resource automation. Public Preview
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.