langsmith-observability
LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.
Install
mkdir -p .claude/skills/langsmith-observability && curl -L -o skill.zip "https://mcp.directory/api/skills/download/959" && unzip -o skill.zip -d .claude/skills/langsmith-observability && rm skill.zipInstalls to .claude/skills/langsmith-observability
About this skill
LangSmith - LLM Observability Platform
Development platform for debugging, evaluating, and monitoring language models and AI applications.
When to use LangSmith
Use LangSmith when:
- Debugging LLM application issues (prompts, chains, agents)
- Evaluating model outputs systematically against datasets
- Monitoring production LLM systems
- Building regression testing for AI features
- Analyzing latency, token usage, and costs
- Collaborating on prompt engineering
Key features:
- Tracing: Capture inputs, outputs, latency for all LLM calls
- Evaluation: Systematic testing with built-in and custom evaluators
- Datasets: Create test sets from production traces or manually
- Monitoring: Track metrics, errors, and costs in production
- Integrations: Works with OpenAI, Anthropic, LangChain, LlamaIndex
Use alternatives instead:
- Weights & Biases: Deep learning experiment tracking, model training
- MLflow: General ML lifecycle, model registry focus
- Arize/WhyLabs: ML monitoring, data drift detection
Quick start
Installation
pip install langsmith
# Set environment variables
export LANGSMITH_API_KEY="your-api-key"
export LANGSMITH_TRACING=true
Basic tracing with @traceable
from langsmith import traceable
from openai import OpenAI
client = OpenAI()
@traceable
def generate_response(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Automatically traced to LangSmith
result = generate_response("What is machine learning?")
OpenAI wrapper (automatic tracing)
from langsmith.wrappers import wrap_openai
from openai import OpenAI
# Wrap client for automatic tracing
client = wrap_openai(OpenAI())
# All calls automatically traced
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
Core concepts
Runs and traces
A run is a single execution unit (LLM call, chain, tool). Runs form hierarchical traces showing the full execution flow.
from langsmith import traceable
@traceable(run_type="chain")
def process_query(query: str) -> str:
# Parent run
context = retrieve_context(query) # Child run
response = generate_answer(query, context) # Child run
return response
@traceable(run_type="retriever")
def retrieve_context(query: str) -> list:
return vector_store.search(query)
@traceable(run_type="llm")
def generate_answer(query: str, context: list) -> str:
return llm.invoke(f"Context: {context}\n\nQuestion: {query}")
Projects
Projects organize related runs. Set via environment or code:
import os
os.environ["LANGSMITH_PROJECT"] = "my-project"
# Or per-function
@traceable(project_name="my-project")
def my_function():
pass
Client API
from langsmith import Client
client = Client()
# List runs
runs = list(client.list_runs(
project_name="my-project",
filter='eq(status, "success")',
limit=100
))
# Get run details
run = client.read_run(run_id="...")
# Create feedback
client.create_feedback(
run_id="...",
key="correctness",
score=0.9,
comment="Good answer"
)
Datasets and evaluation
Create dataset
from langsmith import Client
client = Client()
# Create dataset
dataset = client.create_dataset("qa-test-set", description="QA evaluation")
# Add examples
client.create_examples(
inputs=[
{"question": "What is Python?"},
{"question": "What is ML?"}
],
outputs=[
{"answer": "A programming language"},
{"answer": "Machine learning"}
],
dataset_id=dataset.id
)
Run evaluation
from langsmith import evaluate
def my_model(inputs: dict) -> dict:
# Your model logic
return {"answer": generate_answer(inputs["question"])}
def correctness_evaluator(run, example):
prediction = run.outputs["answer"]
reference = example.outputs["answer"]
score = 1.0 if reference.lower() in prediction.lower() else 0.0
return {"key": "correctness", "score": score}
results = evaluate(
my_model,
data="qa-test-set",
evaluators=[correctness_evaluator],
experiment_prefix="v1"
)
print(f"Average score: {results.aggregate_metrics['correctness']}")
Built-in evaluators
from langsmith.evaluation import LangChainStringEvaluator
# Use LangChain evaluators
results = evaluate(
my_model,
data="qa-test-set",
evaluators=[
LangChainStringEvaluator("qa"),
LangChainStringEvaluator("cot_qa")
]
)
Advanced tracing
Tracing context
from langsmith import tracing_context
with tracing_context(
project_name="experiment-1",
tags=["production", "v2"],
metadata={"version": "2.0"}
):
# All traceable calls inherit context
result = my_function()
Manual runs
from langsmith import trace
with trace(
name="custom_operation",
run_type="tool",
inputs={"query": "test"}
) as run:
result = do_something()
run.end(outputs={"result": result})
Process inputs/outputs
def sanitize_inputs(inputs: dict) -> dict:
if "password" in inputs:
inputs["password"] = "***"
return inputs
@traceable(process_inputs=sanitize_inputs)
def login(username: str, password: str):
return authenticate(username, password)
Sampling
import os
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1" # 10% sampling
LangChain integration
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
# Tracing enabled automatically with LANGSMITH_TRACING=true
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{input}")
])
chain = prompt | llm
# All chain runs traced automatically
response = chain.invoke({"input": "Hello!"})
Production monitoring
Hub prompts
from langsmith import Client
client = Client()
# Pull prompt from hub
prompt = client.pull_prompt("my-org/qa-prompt")
# Use in application
result = prompt.invoke({"question": "What is AI?"})
Async client
from langsmith import AsyncClient
async def main():
client = AsyncClient()
runs = []
async for run in client.list_runs(project_name="my-project"):
runs.append(run)
return runs
Feedback collection
from langsmith import Client
client = Client()
# Collect user feedback
def record_feedback(run_id: str, user_rating: int, comment: str = None):
client.create_feedback(
run_id=run_id,
key="user_rating",
score=user_rating / 5.0, # Normalize to 0-1
comment=comment
)
# In your application
record_feedback(run_id="...", user_rating=4, comment="Helpful response")
Testing integration
Pytest integration
from langsmith import test
@test
def test_qa_accuracy():
result = my_qa_function("What is Python?")
assert "programming" in result.lower()
Evaluation in CI/CD
from langsmith import evaluate
def run_evaluation():
results = evaluate(
my_model,
data="regression-test-set",
evaluators=[accuracy_evaluator]
)
# Fail CI if accuracy drops
assert results.aggregate_metrics["accuracy"] >= 0.9, \
f"Accuracy {results.aggregate_metrics['accuracy']} below threshold"
Best practices
- Structured naming - Use consistent project/run naming conventions
- Add metadata - Include version, environment, user info
- Sample in production - Use sampling rate to control volume
- Create datasets - Build test sets from interesting production cases
- Automate evaluation - Run evaluations in CI/CD pipelines
- Monitor costs - Track token usage and latency trends
Common issues
Traces not appearing:
import os
# Ensure tracing is enabled
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"
# Verify connection
from langsmith import Client
client = Client()
print(client.list_projects()) # Should work
High latency from tracing:
# Enable background batching (default)
from langsmith import Client
client = Client(auto_batch_tracing=True)
# Or use sampling
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"
Large payloads:
# Hide sensitive/large fields
@traceable(
process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"}
)
def my_function(data):
pass
References
- Advanced Usage - Custom evaluators, distributed tracing, hub prompts
- Troubleshooting - Common issues, debugging, performance
Resources
- Documentation: https://docs.smith.langchain.com
- Python SDK: https://github.com/langchain-ai/langsmith-sdk
- Web App: https://smith.langchain.com
- Version: 0.2.0+
- License: MIT
More by davila7
View all →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
rust-coding-skill
UtakataKyosui
Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.