Updated May 2026Comparison19 min read

AI Gateway 2026: Vercel vs Portkey vs OpenRouter vs LiteLLM

Four gateways, one job: stand between your application and the LLM providers so swapping models is a config flip, not a code change. We picked the four that come up in every production conversation right now — Vercel’s platform-bundled offer, Portkey’s observability-heavy SaaS, OpenRouter’s pay-per-token unified API, and LiteLLM’s open-source proxy. Every fact in this post was pulled from the vendor’s docs and pricing pages directly.

Editorial illustration: four luminous teal AI-gateway glyphs in a horizontal row — a Vercel triangle, a Portkey small-key, an OpenRouter route-fork, a LiteLLM lightning-curve — connected by softly glowing routing arrows over a model-grid backdrop on midnight navy.
On this page · 14 sections
  1. TL;DR + decision tree
  2. What an AI gateway does
  3. Four axes that matter
  4. Side-by-side matrix
  5. Vercel AI Gateway
  6. Portkey
  7. OpenRouter
  8. LiteLLM
  9. Cost model deep dive
  10. Observability depth
  11. Pitfalls
  12. Community signal
  13. FAQ
  14. Sources

TL;DR + decision tree

  • Already on Vercel? Use Vercel AI Gateway. The integration is bundled with the platform, billing rolls into your Vercel invoice, and the AI SDK speaks to it natively. Zero new vendors.
  • Need production observability on day one? Portkey. It is the only one of the four where the dashboards — request traces, cost-per-route, guardrail events — are a first-party product, not an afterthought. Free up to 10k requests/month; $49/mo for 100k.
  • Want one key for 300+ models with no subscription? OpenRouter. Pay-per-token at the provider’s posted price plus a 5.5% platform fee on credit purchases. The simplest gateway to ship; the expensive one at very high volume.
  • Care about lock-in and have DevOps capacity? LiteLLM. Self-host the proxy, bring your own provider keys, full ownership of logs, no vendor in the loop. 46k+ GitHub stars and the broadest provider coverage in the OSS category.

These four are not equivalent products. Vercel and Portkey are platforms; OpenRouter is a marketplace; LiteLLM is a library you operate yourself. Picking by the surface feature (‘they all do routing!’) misses the point — the right question is whose business model fits your operational posture. We dig into that in the cost section below, but the quick test is: if billing through a single SaaS makes you nervous, run LiteLLM; if running infrastructure makes you nervous, do not.

What an AI gateway actually does

Strip away the marketing and an AI gateway does six things, roughly in order of how often they earn their keep.

  1. Multi-provider routing. Your code calls one endpoint with one auth header; the gateway forwards to OpenAI, Anthropic, Google, Bedrock, Azure, or a dozen others based on the model string in the request. Swapping from openai/gpt-5 to anthropic/claude-opus-4.6 is a string change, not a refactor. Every gateway in this comparison does this; the differences live further down.
  2. Observability. Every model call is logged with its request, response, latency, token counts, model, cost, and any metadata you attached. This is the feature your CFO will eventually ask about. The difference between ‘we have logs’ and ‘we have queryable cost-per-route across 30 days’ is enormous, and the four gateways here sit at different points on that spectrum.
  3. Caching. Two flavours. Exact-match caching hashes the request body and serves the prior response for duplicates — saves tokens on a small set of true repeats. Semantic caching embeds the prompt and serves a cached response when the embedding is within a similarity threshold of a prior request — saves tokens on the large set of paraphrases that real users actually send. Three of these gateways support semantic; LiteLLM gets it via Redis.
  4. Cost control. Budgets per virtual key, rate limits per route, dollar caps that hard-fail when hit, alerts when spend deviates from forecast. The reason this matters is not the average month — it is the bad month, the runaway loop, the leaked key. A gateway with enforced budgets is the difference between a $200 incident and a $20,000 one.
  5. Fallbacks and retries. When a provider 5xxs or rate-limits, the gateway retries against a fallback in the same family. This used to be a nice-to-have; in 2026 it is table stakes because every major provider has had at least one multi-hour incident this year, and your product cannot.
  6. Guardrails. PII redaction, jailbreak detection, output schema validation, prohibited-content filters. Portkey treats this as a first-class concern with deterministic and LLM-judged guardrails; LiteLLM has a pluggable hook system; Vercel and OpenRouter rely on upstream provider safety filtering plus whatever you wire in. If you ship to regulated industries, guardrails are not optional.

You can build all six in-house. The reason teams keep outsourcing it to a gateway is that the engineering cost is not the gateway code — it is the operational drag of every provider migration, every new model integration, every billing reconciliation, and every incident in the year the gateway code does not exist yet. The gateway is, in effect, a pre-built version of the integration team you were about to hire.

Four axes that matter

Spec sheets list dozens of features. In practice, the decision compresses to four axes, and every gateway sits at a specific point on each.

1. Host model. Is the gateway a SaaS, a self-hosted binary, a platform-bundled service, or all three? Vercel AI Gateway is platform-bundled — you cannot run it outside Vercel. OpenRouter is SaaS-only. Portkey is both: open-source gateway you can self-host, with an optional managed dashboard. LiteLLM is open-source-first, with a hosted option for teams that do not want to operate the proxy. This single axis determines your lock-in posture more than any other.

2. Observability depth. Three tiers in the market. Tier one is ‘we log requests’: a CSV export and a basic usage chart. Tier two is ‘we expose events’: callbacks to your existing observability stack (Langfuse, Helicone, Datadog) and you build the dashboards. Tier three is ‘we are the dashboards’: built-in cost-per-route, latency percentiles, error decomposition, trace search, retention controls. Vercel and OpenRouter are tier two. LiteLLM is tier two with first-class integrations. Portkey is tier three.

3. Caching strategy. Exact-match caching is free and universal; nobody differentiates on it. Semantic caching is where the savings show up because user prompts vary in surface form. Portkey ships semantic caching as a managed feature with a tunable similarity threshold and metadata-scoped invalidation. OpenRouter exposes response caching with similar primitives. LiteLLM supports semantic caching via Redis but you operate the index. Vercel offers response caching but is less prescriptive about the semantic flavour. If your traffic is repetitive, this axis can swing your spend by 30 percent or more.

4. Provider count and freshness. The headline number every vendor cites — ‘100+’, ‘250+’, ‘hundreds’, ‘300+’ — is mostly noise. What matters is how fast a new model appears in the gateway after the provider launches it. OpenRouter and LiteLLM both ship same-day for the big launches because that is core to their value prop. Vercel ships when the AI SDK ships, which is usually quick but is a two-step dependency. Portkey lands within the week. If you track every new model and want to A/B-test the day it drops, pick OpenRouter or LiteLLM.

Side-by-side matrix

Cells below are sourced from each vendor’s public docs and pricing pages as of May 2026. Treat exact dollar amounts as a snapshot; the rest is structural.

DimensionVercel AI GatewayPortkeyOpenRouterLiteLLM
LicenseClosed (SaaS)OSS gateway + paid SaaSClosed (SaaS)Open source (proxy) + commercial Enterprise
Host modelPlatform-bundled with VercelSaaS or self-hostSaaS onlySelf-host (Docker) or hosted Cloud
Models (vendor claim)Hundreds250+300+100+ providers, all endpoint types
Routing + fallbackYes (built-in)Yes (config-driven)Yes (provider preferences)Yes (router with retries)
ObservabilityVercel dashboardsFirst-party traces, cost, guardrailsDashboard + 15+ integrationsEvent callbacks to Langfuse / Helicone / etc
CachingResponse cachingExact + semanticExact + response cacheRedis-backed (incl. semantic)
GuardrailsUpstream providerDeterministic + LLM + partnerProvider-sidePlugin hooks
Pricing modelToken pass-through, no markup, BYOK supportedFree 10k logs/mo; $49/mo for 100k; Enterprise custom5.5% credit-purchase fee, no markup on tokensFree OSS; paid Enterprise / Cloud
Best forVercel-hosted apps using the AI SDKTeams that need dashboards nowMulti-model prototyping; unified billingSelf-hosted production; no lock-in

Three takeaways. First, Portkey is the only one of the four where you can run the gateway code yourself and pay separately for dashboards — that hybrid is rare and useful. Second, OpenRouter is the only one charging a platform fee on the dollars flowing through it (5.5 percent on credit purchases) — at low volume that fee buys convenience cheaply; at high volume it is the cost structure that pushes teams to LiteLLM. Third, Vercel AI Gateway’s pricing — token pass-through, no markup — is the most generous on paper, but you only get it if you are already paying Vercel for the platform.

Vercel AI Gateway

Vercel AI Gateway

Closed SaaS · platform-bundled

Hosted multi-provider LLM gateway bundled with the Vercel platform. Unified API across hundreds of models via the Vercel AI SDK, OpenAI Chat Completions / OpenAI Responses / Anthropic Messages compatible endpoints, automatic fallbacks, embeddings, spend monitoring, BYOK support, no markup on tokens.

vercel.com/ai-gateway · vercel.com/docs/ai-gateway

What it does best

Vercel AI Gateway’s headline strength is platform gravity. If your app already deploys to Vercel and uses the Vercel AI SDK (v5 or v6), turning on AI Gateway is one change: swap the provider import for the gateway-aware client and you get unified routing across hundreds of models, automatic retries to fallback providers, spend monitoring inside the same dashboard you use for function invocations, and a single line item on your Vercel invoice covering token pass-through with zero markup. BYOK is supported, which means you can route through Vercel for the observability and routing layer while paying providers directly out of your existing accounts.

Pick this if you...

  • Already host on Vercel and use the AI SDK in production
  • Want one invoice covering compute, edge, and LLM tokens
  • Need OpenAI Chat Completions, OpenAI Responses, and Anthropic Messages compatibility from a single endpoint
  • Prefer Bring-Your-Own-Key over a vendor reselling provider capacity

Recipe: swap models with one config flip

In a Next.js app using the AI SDK, the canonical usage pattern from the docs:

import { generateText } from 'ai';

const { text } = await generateText({
  model: 'anthropic/claude-opus-4.6',
  prompt: 'What is the capital of France?',
});

Change the model string to openai/gpt-5.4, xai/grok-4.1-fast-non-reasoning, or any other vendor identifier and the gateway routes accordingly. The same gateway URL also speaks Chat Completions via https://ai-gateway.vercel.sh/v1 for non-Vercel clients, so a Python script with the OpenAI SDK works unchanged. Provider preferences, fallback ordering, and routing rules live in the dashboard alongside spend caps.

Skip it if...

You do not host on Vercel. The integration story is the value; outside the Vercel platform the same gateway features are matched or exceeded by Portkey, OpenRouter, or a self-hosted LiteLLM, and you pay for the trip through Vercel’s edge for no clear gain.

Portkey

Portkey

OSS gateway + paid SaaS

Production AI gateway with 250+ model providers, first-party observability (request traces, cost-per-route, guardrail events), exact and semantic caching, virtual keys with budgets, and deterministic + LLM-judged guardrails. Open-source core; managed SaaS for the dashboards.

portkey.ai · portkey.ai/docs · portkey.ai/pricing

What it does best

Portkey is the only gateway in this comparison where production-grade observability is the headline product, not a bullet point. Every request is captured with prompt, completion, latency, tokens, cost, and any guardrail events. The dashboards group spend by virtual key, route, and metadata you set; retention runs 3 days on Developer, 30 days on Production, and configurable on Enterprise. The gateway itself is open-source and can run inside your VPC, which makes Portkey one of the few options here that satisfies a security review without forcing you to also build the dashboards yourself.

Pick this if you...

  • Need request-level traces and cost-per-route on day one without standing up Langfuse, Helicone, or a custom stack
  • Want semantic caching as a managed feature with a tunable similarity threshold
  • Ship to regulated industries and need deterministic + LLM-judged guardrails (PII, jailbreak, schema)
  • Have a security team that wants the gateway in your VPC but is fine paying for managed dashboards

Recipe: budget-capped virtual key

Portkey’s virtual-key abstraction lets you mint a key per environment, customer, or feature, attach a dollar budget, and revoke it without touching the underlying provider keys. The minimal pattern:

import OpenAI from "openai";

const portkey = new OpenAI({
  baseURL: "https://api.portkey.ai/v1",
  apiKey: process.env.OPENAI_API_KEY,
  defaultHeaders: {
    "x-portkey-api-key": process.env.PORTKEY_API_KEY,
    "x-portkey-virtual-key": process.env.PORTKEY_VKEY_PROD,
  },
});

const res = await portkey.chat.completions.create({
  model: "gpt-5",
  messages: [{ role: "user", content: "Summarise this support ticket." }],
});

In the dashboard, set a monthly budget on the virtual key, attach a fallback config that drops to claude-opus-4.6 on 5xx from OpenAI, and enable semantic caching at 0.95 similarity. Three config changes and the same code now has budget enforcement, automatic failover, and 30-day request traces.

Skip it if...

You already run a mature observability stack (Datadog, Langfuse, Helicone) and view a Portkey dashboard as duplication. LiteLLM’s callback model is a cleaner fit. Skip Portkey’s SaaS price tier too if your request volumes will sit comfortably under 10,000/month forever — the free Developer plan covers exactly that cohort.

OpenRouter

What it does best

OpenRouter is the simplest gateway in this comparison to ship with. One API key, 300+ models behind a unified OpenAI-compatible endpoint, pay-per-token at the provider’s posted price plus a 5.5% platform fee on credit purchases. There is no monthly subscription, no infrastructure to operate, no contract to negotiate. The model catalog is the broadest and freshest of the four — new releases tend to appear same-day because that is core to what OpenRouter is selling. Built-in provider preferences let you express ‘prefer Anthropic, fall back to OpenAI, never use this one specific cheaper host’ in a single header.

Pick this if you...

  • Are prototyping or running variable traffic and do not want to predict spend
  • Need unified billing across many providers without negotiating contracts with each
  • A/B-test models the day they launch and want the gateway to keep up
  • Are okay with a SaaS-only gateway and have no compliance requirement to self-host

Recipe: drop-in OpenAI replacement

OpenRouter speaks the OpenAI Chat Completions schema, so any client built against the OpenAI SDK can switch by changing the base URL:

import OpenAI from "openai";

const openrouter = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const res = await openrouter.chat.completions.create({
  model: "anthropic/claude-opus-4.6",
  messages: [{ role: "user", content: "Refactor this regex." }],
  // Optional: provider preferences
  // @ts-expect-error - non-standard field forwarded to OpenRouter
  provider: { order: ["anthropic", "openai"], allow_fallbacks: true },
});

Swap the model string to openai/gpt-5, google/gemini-2.5-pro, xai/grok-4.1, or deepseek/v3.5 with no other code changes. Provider preferences and fallbacks live in request headers; the dashboard handles usage, analytics, and credit top-ups.

Skip it if...

Your monthly LLM spend is in five or six figures and the 5.5% platform fee starts to add up to real money. At that volume, direct provider contracts plus a self-hosted LiteLLM or Portkey gateway is structurally cheaper. Also skip if you have a compliance posture that requires the gateway inside your VPC — OpenRouter is SaaS-only.

LiteLLM

LiteLLM

Open source · self-host or Cloud

Open-source proxy and SDK that translates 100+ provider APIs into the OpenAI format. Routing with retries and fallbacks, virtual keys with budgets, cost tracking, callbacks to Langfuse / Helicone / MLflow / Datadog, and a plugin system for guardrails. Self-host or use BerriAI’s hosted Cloud / Enterprise tier. ~46.5k GitHub stars as of May 2026.

github.com/BerriAI/litellm · docs.litellm.ai · litellm.ai

What it does best

LiteLLM is the open-source escape hatch. The proxy server runs in a Docker container, takes a YAML config of provider keys and routes, and exposes an OpenAI-compatible endpoint that your application calls. Every model call is logged to whatever callback you wire in — Langfuse, Helicone, MLflow, Datadog, or a custom HTTP webhook. Virtual keys with budget limits, role-based admin UI, and per-team cost tracking all ship in the OSS release. The pitch is sovereignty: your provider keys never leave your network, your logs land in your existing observability stack, and there is no SaaS invoice tied to request volume.

Pick this if you...

  • Have DevOps capacity to run a Docker container + Postgres in production
  • Already run Langfuse / Helicone / Datadog and want the gateway to feed those rather than be its own dashboard
  • Have a compliance posture that requires the gateway in your VPC with your own keys
  • Run high enough volume that a 5.5% SaaS fee or per-log pricing crosses your sovereignty threshold

Recipe: minimal proxy config

A canonical config.yaml for the LiteLLM proxy with two routes and a fallback:

model_list:
  - model_name: chat
    litellm_params:
      model: openai/gpt-5
      api_key: os.environ/OPENAI_API_KEY
  - model_name: chat
    litellm_params:
      model: anthropic/claude-opus-4.6
      api_key: os.environ/ANTHROPIC_API_KEY

router_settings:
  routing_strategy: simple-shuffle
  fallbacks:
    - chat:
        - openai/gpt-5
        - anthropic/claude-opus-4.6

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]
  cache: true
  cache_params:
    type: redis
    host: os.environ/REDIS_HOST

Run litellm --config config.yaml, point your application at http://localhost:4000/v1 with an OpenAI client, and every request now routes through your proxy with logging into Langfuse and Redis caching. Adding a third provider is a four-line YAML change.

Skip it if...

You do not want to operate infrastructure. The OSS gateway is free but the operational cost is non-zero — Postgres schema migrations, Redis tuning, ingress, secrets, on-call. Below a few thousand requests per month the SaaS economics of Portkey or OpenRouter strictly beat the engineering hours you spend on LiteLLM. Above that, the cost line flips.

Cost model deep dive

The four gateways monetise differently and the right pick depends on which axis you optimise. Walk through it from the lowest spend to the highest.

Hobby / prototype (under $100/mo in LLM spend).OpenRouter is almost always right. The 5.5% credit-purchase fee is roughly the cost of a coffee per month at this volume, and you get one key for 300+ models with zero infrastructure. Portkey’s Developer free tier (10k logged requests/mo, 3-day log retention) covers most prototypes too if you want dashboards from day one. Vercel AI Gateway is the move if you happen to deploy on Vercel; the ‘no markup on tokens’ pricing is materially better at this volume than the OpenRouter fee.

Startup ($100 - $2,000/mo). The decision matrix opens up. OpenRouter remains the simplest, but at this tier the 5.5% fee crosses $5-$110/mo, which is more than Portkey’s Production plan ($49/mo for 100k logs, plus $9 per additional 100k). If you already need dashboards, the Portkey switch pays for itself in tooling you do not have to build. LiteLLM at this scale tends to cost more in operational time than it saves in fees unless you already have a Kubernetes platform.

Production ($2,000 - $20,000/mo). Three viable answers. OpenRouter still works but the 5.5% fee is now $110-$1,100/mo of overhead — at this point a direct relationship with each provider plus a self-hosted gateway starts to make financial sense. Portkey Production is a strong middle ground: dashboards stay first-party, you can self-host the gateway inside your VPC for compliance, and the SaaS price is flat relative to your token spend. LiteLLM saves money if you already operate infrastructure; if you do not, the hidden cost of running it crosses the Portkey savings line around $8,000-$10,000/mo of LLM spend.

Scale (over $20,000/mo). Self-hosted LiteLLM with direct provider contracts is the cost-optimal answer in almost every case, because every percentage point of SaaS fee or markup is now a salaried engineer’s worth of dollars per year. The exception is teams whose regulatory posture requires a SOC2 Type 2 audit trail with 90+ days of request retention, in which case Portkey Enterprise (or a custom enterprise contract with a comparable SaaS) earns the premium. Vercel AI Gateway at this scale is a different conversation — token pass-through at no markup is generous, but you are committing your entire platform to Vercel and that lock-in has its own cost. There is no universal right answer at the top end; the math depends on the value of your DevOps hour versus a vendor’s percentage.

Observability depth

Every gateway logs requests; the difference is what you can ask of the logs after the fact. Concretely, the four shapes of observability question we end up asking in production:

  1. ‘Why did this specific user request fail?’ Needs request-level traces with full prompt, full completion, model, latency, error reason, and any guardrail events. Portkey ships this out of the box. OpenRouter has it via the activity log. Vercel surfaces it in the AI Gateway tab. LiteLLM emits it as a callback payload — you query it in whichever store you wired up.
  2. ‘Where did $4,000 go last week?’ Needs cost-per-route, cost-per-virtual-key, and cost-per-metadata-tag with daily granularity. Portkey is the strongest here — the dashboards default to this decomposition. OpenRouter gives you grouping by API key, model, and org member with CSV/PDF export. Vercel groups by project. LiteLLM gives you the raw spend events and trusts you to slice them.
  3. ‘Is p95 latency creeping up?’ Needs time-series latency metrics with percentile rollups by route and model. Portkey ships percentile dashboards. Vercel exposes the data in its observability surface. OpenRouter shows aggregate but is less prescriptive about percentile breakdowns. LiteLLM emits per-call timings; Datadog or your Prometheus scraper does the percentile math.
  4. ‘Is anyone leaking PII into prompts?’ Needs guardrail event logs with PII matches per request, searchable by virtual key or environment. Portkey treats guardrails as first-class and ships dashboards. LiteLLM has guardrail hooks; you build the dashboard. OpenRouter and Vercel defer to upstream provider safety filtering with no first-party guardrail surface — adequate for general-purpose apps, light for regulated ones.

The honest framing: if you want production-grade observability with no integration work, Portkey is the shortest path. If you already run a mature observability stack (Langfuse for traces, Datadog for metrics, an audit store for compliance), LiteLLM’s callback model is a cleaner fit because it feeds your existing tools rather than asking you to log in to a second dashboard. OpenRouter and Vercel sit in the middle — adequate for most teams, not the strongest pick for any specific observability requirement.

Pitfalls

Treating ‘more providers’ as the deciding feature

Every gateway covers OpenAI, Anthropic, Google, Bedrock, Azure, Mistral, xAI, DeepSeek, Cohere, and the major open-weights hosts. The marginal model nobody else has is rarely the one you ship with. Pick by operational fit (host model, observability depth, pricing structure), not by the headline provider count.

Caching too loose

Semantic caching is a knife. Set the similarity threshold too low and the gateway returns a cached answer to a prompt that meant something different — silent correctness regression. Start at 0.95 cosine, validate against a labelled hold-out, and never cache requests with sensitive PII even if the embeddings match. Portkey and OpenRouter both expose the threshold; pick a value and document it.

No budget caps

The single highest-ROI feature in any of these gateways is a hard dollar cap per virtual key. An agent in a tool-call loop or a leaked customer key can vaporise five figures in a weekend. Set the budget on every virtual key the day you mint it. Portkey, LiteLLM, and Vercel all expose this; OpenRouter caps via prepaid credits which is functionally similar.

Stacking two gateways

A surprising number of teams end up with LiteLLM pointing at OpenRouter, or Portkey pointing at OpenAI via Vercel. Double-gateway setups multiply latency, fragment observability, and produce confusing billing. Pick one. If you outgrow it, migrate, do not stack.

Skipping fallback configuration

All four support fallbacks; almost no team configures them on day one. Then a provider has a six-hour incident and your app goes down with it. Pick two providers in the same model family (Claude Opus fallback to GPT-5, for example), wire it in the gateway, and rehearse a failover by manually blackholing the primary in staging.

Logging full prompts in regulated contexts

Default gateway logging captures the prompt verbatim. For PHI / PCI / GDPR-sensitive workloads this is a compliance landmine. Portkey lets you disable prompt logging per virtual key; LiteLLM does the same via config; OpenRouter offers a no-logging endpoint and Vercel exposes a ‘disallow prompt training’ flag. Set this before your first production request, not after the audit.

Forgetting to measure added latency

The gateway hop costs 50-200ms in practice depending on geography. For chat UIs streaming first tokens, it is invisible. For low-latency tool calls inside an agent loop where you might fire 8 calls in series, that is 400-1600ms of dead time per turn. Measure it before shipping; consider self-hosted LiteLLM as a sidecar if the math is tight.

Community signal

The verbatim quote market for AI gateway opinions is cluttered with affiliate-flavoured roundups, so we will not fabricate any. The structural signal across HN threads, vendor blogs, and the GitHub release pace in 2026 is consistent enough to summarise without invented quotes.

OpenRouter is the unambiguous winner for ‘ship this week’ — the discussion in the LocalLLaMA / r/ChatGPT / r/OpenAI subreddits consistently lands on ‘just use OpenRouter for prototyping’ whenever a builder asks which gateway to start with. LiteLLM is the unambiguous winner for ‘own your stack’ — the 46.5k+ star count, the weekly release cadence, and the volume of production reference deployments all point the same way. Portkey’s position has hardened around ‘observability-led adoption’: teams arrive because they got tired of building dashboards on top of LiteLLM, not because they were unhappy with another gateway. Vercel AI Gateway is best understood as a platform feature, not a stand-alone product — the people excited about it are already paying Vercel for everything else.

One inconvenient consensus: nobody is excited about gateways the way they are excited about models or agents. The category is plumbing. The right one is the one your team forgets is there until something goes wrong, at which point its dashboards or its callbacks save the incident.

Frequently asked questions

What is an AI gateway and why would I put one in front of my LLM calls?

An AI gateway is a thin layer between your application and one or more LLM providers. It standardises the request shape (most gateways speak the OpenAI Chat Completions schema), handles authentication for every provider behind a single key, retries when a provider 5xxs or rate-limits, caches identical or semantically similar prompts, and emits structured logs you can query. The reason to add one is operational: without it, swapping from GPT-5 to Claude Opus 4.6 or Gemini 2.5 means a code change in every codepath that hits the model. With a gateway, it is a config flip. Observability is the second reason — your CFO will eventually ask which prompts spent $4,000 last week, and the answer is in the gateway's logs, not in 12 different provider dashboards.

Is Vercel AI Gateway worth it if I do not already host on Vercel?

Probably not as your main gateway. The pitch is platform-native — AI Gateway is bundled with Vercel, the AI SDK speaks to it natively, billing rolls into your Vercel invoice, and the observability tab lives next to your function metrics. If you host on Fly, Render, Cloudflare Workers, or your own boxes, you give up that integration story and end up paying for what Portkey, OpenRouter, or a self-hosted LiteLLM does just as well. The one carve-out is if your team is already on the Vercel AI SDK and you want zero-config provider switching — in that case AI Gateway is the path of least resistance regardless of where the rest of your stack lives.

Does OpenRouter charge a markup on tokens?

No markup on the tokens themselves — OpenRouter says posted model prices match the provider's prices exactly. They charge a 5.5 percent platform fee on credit purchases (lower for crypto), which is the equivalent of a markup if you think about total cost per request. There is no monthly subscription. The trade for that 5.5 percent is a single API key for 300+ models, unified billing, automatic provider fallbacks when one is down, and only paying for successful runs. For teams testing many models or running variable traffic, that fee is usually cheaper than the engineering time to integrate each provider directly. At very high volume the math flips and a self-hosted LiteLLM with direct provider keys wins.

Can I self-host Portkey or do I have to pay the SaaS?

Both. Portkey's gateway is open-source — the core routing, virtual-key, and guardrail engine can run in your own Kubernetes cluster or on a single Docker host. The managed SaaS layer on top (the observability dashboards with 90-day retention, the hosted prompt-template store, SOC2 audit logs, and SSO) is what the paid plans unlock. The Developer free tier covers 10,000 logged requests per month with 3-day log retention; Production starts at $49/month for 100k logs and 30-day retention. Most teams adopt Portkey by running the open-source gateway and pointing it at the managed dashboards, which is roughly the inverse of how OpenRouter works — there, the SaaS is the gateway, not just the UI.

What providers and models does each gateway support?

All four cover the obvious set: OpenAI, Anthropic, Google (Gemini and Vertex), AWS Bedrock, Azure OpenAI, Mistral, xAI's Grok, DeepSeek, Cohere, Together, Groq, Fireworks, and the major open-weights hosts. The numbers cited in each vendor's own marketing are: Vercel AI Gateway 'hundreds of models', Portkey 250+ models, OpenRouter 300+ models, LiteLLM 100+ providers across all endpoint types (chat, embeddings, images, audio, video). The practical difference is not the count — it is how quickly a new model lands. OpenRouter and LiteLLM tend to ship same-day for big launches because that is core to their value prop; Portkey is usually within the week; Vercel ships when the AI SDK does.

Which gateway has the best observability for production?

Portkey has the deepest first-party observability — request-level traces, cost-per-route, latency percentiles, error decomposition, and guardrail-event timelines, with retention controls per environment. OpenRouter has a clean dashboard for cost and usage but is less prescriptive about debugging individual requests. Vercel AI Gateway hooks into Vercel's existing observability surface so it sits next to your function logs, which is convenient if you live in that UI. LiteLLM exposes events via callbacks and ships with first-class integrations to Langfuse, Helicone, MLflow, and Datadog — the model is 'bring your own observability stack,' which is a strength if you already have one and a tax if you do not. For a team starting from zero, Portkey is the fastest path to production-grade dashboards.

How does semantic caching actually save money?

Standard caching is exact-match: identical request → cached response, zero token spend. Semantic caching computes an embedding for the incoming prompt and serves a cached response when the embedding is within a similarity threshold of a prior request. The economic case is that user-facing chatbots ask the same intent in a thousand surface forms ('how do I cancel my subscription' / 'cancel sub' / 'how to unsub' / 'I want to cancel') and exact-match misses all of them. Portkey and OpenRouter both offer semantic caching; LiteLLM supports it via Redis. The tuning knob is the similarity threshold — too loose and you serve stale or wrong answers; too tight and the hit rate collapses to exact-match. Start at 0.95 cosine similarity and adjust based on a labelled hold-out set.

Does using an AI gateway hurt latency compared to calling the provider directly?

Yes, usually by 50-200ms per request depending on geography and provider. Vercel AI Gateway is the lowest-overhead of the four if you are already on Vercel because it terminates inside the same edge network. OpenRouter adds a hop and a 100-150ms tax in our experience. Portkey's SaaS adds a similar hop; the self-hosted Portkey gateway has near-zero overhead if you co-locate it with your app. LiteLLM running as a sidecar in the same pod is the lowest possible overhead but you eat the operational cost. For streaming responses the perceived latency is dominated by time-to-first-token, not gateway hop time, so for chat UIs the gateway cost is almost invisible. For low-latency tool calls inside an agent loop, it matters.

Can I use these gateways with MCP servers and agents?

Yes — none of them are MCP-aware in a special sense, but all four sit cleanly underneath an agent loop because they speak the OpenAI Chat Completions wire format that every modern agent framework defaults to. The pattern is: agent calls the gateway, gateway routes to the right model, MCP tools execute on the agent side and stream back into the same conversation. The gateway logs every model call, which gives you a clean audit trail when an agent burns 50,000 tokens debugging itself in a loop. Pair the gateway with our /blog/mcp-context-bloat-fix-2026-tool-search-code-mode-progressive-disclosure guide and you have full visibility into both sides of the agent — tools and tokens.

Which gateway should I pick if I just want to ship something this week?

If you are on Vercel, AI Gateway. The integration is one line of config and you do not need a second vendor invoice. If you are not on Vercel and just want a single key for many models with no decisions to make, OpenRouter — sign up, drop in credits, pick a model string, done. If you want production dashboards from day one, Portkey's Developer free tier gets you running in 10 minutes with real observability. If you have DevOps capacity and care about lock-in, run LiteLLM in Docker and point it at your own provider keys. There is no wrong default; there is only the default that matches your week-one constraints.

Sources

Vercel AI Gateway

Portkey

OpenRouter

LiteLLM

Internal links

Keep reading