openrouter-rate-limits
Handle OpenRouter rate limits with proper backoff strategies. Use when experiencing 429 errors or building high-throughput systems. Trigger with phrases like 'openrouter rate limit', 'openrouter 429', 'openrouter throttle', 'openrouter backoff'.
Install
mkdir -p .claude/skills/openrouter-rate-limits && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8782" && unzip -o skill.zip -d .claude/skills/openrouter-rate-limits && rm skill.zipInstalls to .claude/skills/openrouter-rate-limits
About this skill
OpenRouter Rate Limits
Overview
OpenRouter rate limits are per-key, not per-account. Free tier keys get lower limits; paid keys get higher limits that scale with credit balance. The OpenAI SDK has built-in retry with exponential backoff for 429 responses. Check your current limits via GET /api/v1/auth/key. Rate limit headers are returned on every response.
Check Your Rate Limits
# Query current rate limit configuration for your key
curl -s https://openrouter.ai/api/v1/auth/key \
-H "Authorization: Bearer $OPENROUTER_API_KEY" | jq '{
label: .data.label,
rate_limit: .data.rate_limit,
is_free_tier: .data.is_free_tier,
credits_used: .data.usage,
credit_limit: .data.limit
}'
# Example output:
# {
# "label": "my-app-prod",
# "rate_limit": {"requests": 200, "interval": "10s"},
# "is_free_tier": false,
# "credits_used": 12.34,
# "credit_limit": 100
# }
Rate Limit Tiers
| Tier | Requests | Interval | Who |
|---|---|---|---|
| Free (no credits) | 20 | 10s | New accounts |
| Free (with credits) | 200 | 10s | Accounts with any credits |
| Paid | Higher | Varies | Based on credit balance |
Free models have separate limits: 50 req/day (free users), 1000 req/day (with $10+ credits).
Read Rate Limit Headers
import os
from openai import OpenAI
import requests as http_requests
# The OpenAI SDK abstracts headers, so use requests for direct access
def check_rate_headers():
"""Make a request and inspect rate limit headers."""
resp = http_requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
"Content-Type": "application/json",
"HTTP-Referer": "https://my-app.com",
},
json={
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "hi"}],
"max_tokens": 1,
},
)
return {
"status": resp.status_code,
"x-ratelimit-limit": resp.headers.get("x-ratelimit-limit"),
"x-ratelimit-remaining": resp.headers.get("x-ratelimit-remaining"),
"x-ratelimit-reset": resp.headers.get("x-ratelimit-reset"),
"retry-after": resp.headers.get("retry-after"),
}
Retry Strategy with OpenAI SDK
from openai import OpenAI
# The SDK handles 429 retries automatically with exponential backoff
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
max_retries=5, # Default is 2; increase for high-throughput
timeout=60.0, # Per-request timeout
default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)
# The SDK will:
# 1. Catch 429 responses
# 2. Read Retry-After header
# 3. Wait with exponential backoff (+ jitter)
# 4. Retry up to max_retries times
response = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=200,
)
Custom Rate Limiter (Client-Side)
import time, threading
from collections import deque
class TokenBucket:
"""Client-side rate limiter to prevent hitting server limits."""
def __init__(self, rate: int = 200, interval: float = 10.0):
self.rate = rate # Max requests per interval
self.interval = interval
self._timestamps = deque()
self._lock = threading.Lock()
def acquire(self, timeout: float = 30.0) -> bool:
"""Block until a request slot is available."""
deadline = time.monotonic() + timeout
while time.monotonic() < deadline:
with self._lock:
now = time.monotonic()
# Remove timestamps outside the window
while self._timestamps and now - self._timestamps[0] > self.interval:
self._timestamps.popleft()
if len(self._timestamps) < self.rate:
self._timestamps.append(now)
return True
time.sleep(0.1) # Wait and retry
return False # Timed out
limiter = TokenBucket(rate=150, interval=10.0) # Stay under 200 limit
def rate_limited_completion(messages, **kwargs):
"""Completion with client-side rate limiting."""
if not limiter.acquire(timeout=30):
raise TimeoutError("Rate limiter timeout")
return client.chat.completions.create(messages=messages, **kwargs)
Batch Processing with Rate Awareness
import asyncio
from openai import AsyncOpenAI
async def batch_with_rate_limit(prompts: list[str], model="openai/gpt-4o-mini",
max_concurrent=10, delay_between=0.05):
"""Process a batch of prompts with rate-aware concurrency."""
semaphore = asyncio.Semaphore(max_concurrent)
aclient = AsyncOpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
max_retries=5,
default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)
async def process(prompt, idx):
await asyncio.sleep(idx * delay_between) # Stagger requests
async with semaphore:
response = await aclient.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=200,
)
return response.choices[0].message.content
return await asyncio.gather(*[process(p, i) for i, p in enumerate(prompts)])
Error Handling
| Error | Cause | Fix |
|---|---|---|
| 429 Too Many Requests | Exceeded requests per interval | SDK auto-retries; increase max_retries |
| Retry storm | Multiple clients retrying simultaneously | Add random jitter (0-1s) to retry delay |
| Silent throttling | Responses slow down before 429 | Monitor latency; proactively reduce rate |
| Free tier limit hit | 50 req/day on free models | Add credits ($10+) for 1000 req/day limit |
Enterprise Considerations
- Rate limits are per-key: use multiple keys to multiply effective throughput
- The OpenAI SDK handles 429 retries automatically -- configure
max_retries(default 2) - Implement client-side rate limiting to stay under limits proactively (cheaper than retries)
- Free models have daily limits separate from the per-key rate limit
- Monitor
x-ratelimit-remainingheaders to detect approaching limits before hitting 429 - For batch workloads, use staggered concurrent requests rather than burst patterns
References
More by jeremylongshore
View all skills by jeremylongshore →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
pdf-to-markdown
aliceisjustplaying
Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.
Related MCP Servers
Browse all serversCreate modern React UI components instantly with Magic AI Agent. Integrates with top IDEs for fast, stunning design and
Reddit Buddy offers powerful Reddit API tools for browsing, searching, and data annotation with secure access, rate limi
Reddit Buddy offers clean access to Reddit API, advanced reddit tools, and seamless data annotation reddit with smart ca
Transform any OpenAPI specification into callable tools. Easily test an API, handle authentication, and generate schemas
Sync Trello with Google Calendar easily. Fast, automated Trello workflows, card management & seamless Google Calendar in
Empower AI agents for efficient API automation in Postman for API testing. Streamline workflows and boost productivity w
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.