ElevenLabs MCP Server: Complete Guide

On this page · 19 sections▾

TL;DR + what you need
One-sentence definition
Why it exists
The named pieces
Smallest install
The tools, by job
API key setup
Build a voice agent
What we got wrong
Right vs wrong patterns
Common mistakes
Cost + latency
Who it's for
Community signal
The verdict
The bigger picture
FAQ
Glossary
Sources

TL;DR + what you need

Three things get you running and keep you out of trouble:

Install command: uvx elevenlabs-mcp — run as a local subprocess by your MCP client. No global install, no server to host.
The one credential: ELEVENLABS_API_KEY in the client’s MCP env block. Grab a free key at elevenlabs.io.
The catch: the server is free and MIT-licensed, but every tool call spends ElevenLabs credits. The free tier ships 10,000 per month — generous for testing, thin for production voice work.

The rest of this guide explains the tools (there are roughly two dozen), walks an outbound-calling voice agent end to end, and is honest about where the credit meter and real-time latency bite. Install it from the panel below, then read on.

One-line install · ElevenLabs

Open server page

Install

One-sentence definition

The ElevenLabs MCP server is the official Model Context Protocol server that exposes the ElevenLabs AI audio platform — text-to-speech, speech-to-text, voice cloning, sound effects, music, and conversational voice agents — as tools an AI agent can call from Claude Desktop, Cursor, Windsurf, or any MCP client.

If you are new to the protocol itself, our what-is-MCP explainer covers the wire format this server speaks. The short version: MCP is the standard that lets one AI client talk to any tool server.

Why it exists

Before MCP, wiring an LLM to ElevenLabs meant writing glue: a script that took the model’s output, hit the right REST endpoint, handled the audio bytes, and fed results back. Every app re-implemented the same plumbing, and the model never “saw” the audio tools as first-class actions it could choose.

ElevenLabs published this server so the audio platform shows up directly in the agent’s tool list. The company frames it as letting developers “build Conversational AI voice agents, perform outbound calls, transcribe speech, and generate audio — all with simple API calls,” orchestrated from your local machine. Instead of you calling ElevenLabs, the model does, when the task needs sound.

Mental model: the named pieces

Four parts to keep straight. Get these and the rest of the guide lands.

The MCP server (local)

A Python process started by uvx elevenlabs-mcp. It speaks MCP to your client over stdio and forwards each tool call to the ElevenLabs cloud API.

The API key

ELEVENLABS_API_KEY authenticates the server to your account. It is the only credential, and it also decides which credit balance gets charged.

The tools

Roughly two dozen functions — one per audio job. text_to_speech, voice_clone, speech_to_text, create_agent, and more. The agent picks which to call.

Credits + output files

Calls spend credits and (by default) write audio files to a base path on disk. Both are configurable through env vars.

The takeaway: the server is a thin local adapter, not a model. All the audio intelligence and all the billing live in the ElevenLabs cloud behind your API key.

Smallest end-to-end install

The README’s canonical path is Claude Desktop with uvx (the runner from uv, the fast Python package manager). Three steps.

1. Install uv so uvx is on your path:

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Add the server to your client config. In Claude Desktop: Settings → Developer → Edit Config → claude_desktop_config.json. The exact block from the official README:

{
  "mcpServers": {
    "ElevenLabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {
        "ELEVENLABS_API_KEY": "<insert-your-api-key-here>"
      }
    }
  }
}

3. Restart the client. Windows users must enable Developer Mode in Claude Desktop first, or the server won’t load. On macOS and Linux, restarting is enough.

Prefer pip, or running on a client without a config UI? The README gives an equivalent path:

pip install elevenlabs-mcp
python -m elevenlabs_mcp --api-key=YOUR_API_KEY --print

The install panel above this section emits the exact config for Cursor, Windsurf, and the others — copy from there so you don’t hand-edit JSON. Two optional env vars are worth knowing now: ELEVENLABS_MCP_OUTPUT_MODE (default files) and ELEVENLABS_MCP_BASE_PATH (default ~/Desktop) control where generated audio lands. The takeaway: there is exactly one required field — the API key.

The tools, grouped by job

The server exposes its tools as one function per ElevenLabs capability. These names are taken from the server source, not invented. Grouped by what you’d actually ask for:

Speech

text_to_speech, speech_to_text, speech_to_speech, isolate_audio (strip background noise from a recording), play_audio.

Voices

voice_clone, text_to_voice (voice design from a description), create_voice_from_preview, search_voices, get_voice, search_voice_library, list_models.

Sound + music

text_to_sound_effects, compose_music, create_composition_plan, video_to_music, upload_music_for_inpainting.

Voice agents (Conversational AI)

create_agent, add_knowledge_base_to_agent, list_agents, get_agent, make_outbound_call, list_phone_numbers, get_conversation, list_conversations.

Account

check_subscription — read your plan, credit balance, and limits. Call it first when a tool fails; you may have run out of credits.

The opinionated takeaway: the surface is wide, but you will use four tools 90% of the time — text_to_speech, speech_to_text, text_to_sound_effects, and voice_clone. The rest are there when a job asks for them. ElevenLabs sits in the AI & ML category of our directory alongside the other model-backed servers.

API key setup, step by step

One credential, three minutes. The key both authenticates and decides which credit balance gets charged, so treat it like a payment method.

Sign in at elevenlabs.io and open Settings → API Keys. The free tier includes a key — no card needed to start.
Create a key and copy it once (it is shown in full only at creation).
Paste it into the env block of your MCP config as ELEVENLABS_API_KEY, exactly as in the install snippet above.
Restart the client. In a chat, ask the agent to “check my ElevenLabs subscription” — it calls check_subscription and confirms the key works and shows your credit balance.

Enterprise accounts have one extra knob: ELEVENLABS_API_RESIDENCY (default us) routes calls to a regional endpoint. Everyone else can ignore it. Takeaway: if a tool returns an auth error, the key is wrong or unset — it is almost never the tool.

Build a voice agent that places a call

This is the demo that put the server on the map: an agent that orders a pizza by phone. The mechanics generalize to support callbacks, appointment reminders, and lead qualification. The flow uses three tools.

Create the agent. Ask Claude: “Create an ElevenLabs agent named ‘Order Bot’ that politely places a takeout order and confirms the total.” The model calls create_agent with a system prompt and a voice.
(Optional) ground it. add_knowledge_base_to_agent attaches a menu or FAQ so the agent answers from your data, not its guesses.
Place the call. make_outbound_call dials a number using a phone number configured on your ElevenLabs account (check yours with list_phone_numbers). ElevenLabs runs the real-time voice loop; the model only kicked it off.

Afterward, get_conversation pulls the transcript so the agent can summarize what happened. ElevenLabs’ own examples go further — voice agents that “order food or book appointments,” or transcribing a meeting, identifying speakers, then re-voicing each with a distinct voice. The opinionated takeaway: the MCP server is the orchestration layer. Real-time conversation quality and phone provisioning are ElevenLabs platform concerns, configured on their side, not in the tool call.

What we got wrong

Three assumptions that cost us time and credits.

We assumed “free server” meant “free to run.” The MCP package is MIT and costs nothing. But the README’s warning is blunt: “ElevenLabs credits are needed to use these tools.” A loop that regenerated speech on every edit chewed through a meaningful slice of the monthly free allotment in an afternoon. Watch check_subscription.

We blamed the MCP server for a timeout that wasn’t its fault. The README says it plainly: certain operations, like voice design and audio isolation, can take a long time to resolve, and the MCP inspector in dev mode may show a timeout even though the job finished. In a real client like Claude it completes fine. We spent an hour debugging a non-bug.

We expected outbound calling to work out of the box. make_outbound_call needs a phone number provisioned on the ElevenLabs account first. The tool is present in the list whether or not you have a number, so it looks ready when it isn’t. Run list_phone_numbers before promising a demo.

Right vs wrong patterns

Wrong

Re-running text_to_speech on the full script after every tiny prompt tweak. Each pass re-bills the entire text. You burn credits proving a one-line change.

Right

Iterate on a one-sentence sample until the voice and tone are right, then render the full script once. Same result, a fraction of the credits.

Wrong

Cloning a voice from a public figure or a friend without consent. ElevenLabs’ terms require permission; the tool will let you try, the platform can still ban you.

Right

Clone your own voice, or a voice you have written consent for, and keep that consent on file. Use the voice library (search_voice_library) for everything else.

Common mistakes

Server doesn’t load on Windows. Root cause: Developer Mode is off in Claude Desktop. The README calls this out explicitly — enable it, then restart.
Auth errors on every tool. Root cause: the key is missing, mistyped, or pasted into the wrong field. It belongs in env, not args. Verify with check_subscription.
Generated files “disappear.” Root cause: default output writes to ~/Desktop. They’re there — or set ELEVENLABS_MCP_BASE_PATH to a folder you watch.
Timeout in the MCP inspector. Root cause: slow operations (voice design, audio isolation) outrun the inspector’s dev-mode timeout. Not a real failure; test in a real client.
Outbound call does nothing. Root cause: no phone number provisioned on the account. list_phone_numbers returns empty.

Cost and latency, concretely

This is the section most launch tutorials skip. Two real constraints.

Credits, not requests. ElevenLabs bills in credits, and the free tier is 10,000 credits per month. Text-to-speech roughly tracks characters spoken; music, voice design, and conversational minutes consume more. The number you care about is not API calls — a single long text_to_speech can cost more than dozens of search_voices calls. Use evergreen budgeting: render samples cheaply, render final audio once, and read current per-feature credit costs on the ElevenLabs pricing page before scaling.

Latency for real-time. Batch jobs (generate a clip, transcribe a file) are fine over MCP. Live conversation is harder. The team behind April (a YC S25 voice assistant built on ElevenLabs TTS) put it well on Hacker News: “The most interesting part has been optimizing for lowest latency given we are a tool call heavy application.” If your agent makes many tool calls between turns, each round-trip adds delay the caller hears. For real-time voice, lean on ElevenLabs’ native Conversational AI loop and keep per-turn tool calls minimal.

Who this is for — and who it isn’t

Use it if

You want an agent that produces or processes audio as part of a task.
You’re prototyping voice agents, narration, sound design, or transcription.
You already use ElevenLabs and want it in Claude, Cursor, or Windsurf.

Skip it if

You need a tightly latency-bound real-time phone product — use the native Agents platform directly.
You can’t budget credits, or your use is high-volume on a free plan.
You want fully offline, on-device TTS — this is a cloud API.

Community signal

We could not verify a canonical launch tweet URL, so we cite the primary sources directly rather than embed an unverified post. The server was announced on the official ElevenLabs blog (Introducing the ElevenLabs MCP server), pitched as giving Claude and Cursor “access to the full power of the ElevenLabs AI audio platform” from your local machine.

On Hacker News, the first Show HN demoed ordering pizza by voice through Claude — the meme that defined the launch. The more useful, contrarian signal is later: the April (YC S25) Launch HN thread, from a team shipping a real voice product on ElevenLabs TTS, where latency in a tool-call-heavy app is named as the hard part — not the audio quality. That matches our own testing: the audio is excellent, the round-trip budget is the constraint.

A separate HN thread on MCP transport security argues for keeping servers on stdio rather than exposing HTTP on localhost: “I control which applications are configured with the command/args/environment to run the MCP server.” The ElevenLabs server runs locally over stdio by default, which fits that preference — your API key never leaves your machine’s process environment.

The verdict

Our take

The ElevenLabs MCP server is the cleanest way to give an agent real voice and audio, and the tool coverage is unusually complete — speech, cloning, effects, music, and full conversational agents in one install. Use it if you’re building audio features or voice-agent prototypes and can budget credits. Skip it if you need fully offline TTS, or a latency-critical real-time phone product where the native Agents platform (not an MCP tool loop) is the right layer. The server is free and MIT; plan for the credits, and it pays off.

The bigger picture

ElevenLabs shipping an official, first-party MCP server is part of a broader shift: API companies are publishing their own MCP servers instead of leaving integration to third parties. That means the tools track the API closely, ship with the company’s auth and terms baked in, and get maintained alongside the product.

For audio specifically, it moves voice from a “send text, get a file” afterthought to a first-class agent capability. An agent that can hear, speak, clone, and call is a different kind of assistant than one that only writes. Expect the conversational-agent tools to be where this server keeps growing. Browse the rest of the audio and AI servers in the full directory.

Frequently asked questions

What is the ElevenLabs MCP server?

It is the official Model Context Protocol server from ElevenLabs. It exposes the ElevenLabs AI audio API — text-to-speech, speech-to-text, voice cloning, sound effects, music, and conversational voice agents — as MCP tools, so an agent in Claude Desktop, Cursor, or Windsurf can generate and process audio by calling them.

Is the ElevenLabs MCP free?

The MCP server code is free and MIT-licensed. Running the tools spends ElevenLabs credits, not server fees. ElevenLabs offers a free tier with 10,000 credits per month, enough to try every tool. Heavy text-to-speech, music, or voice-agent use will exhaust a free plan, so budget credits before production.

What tools does the ElevenLabs MCP server expose?

It exposes around two dozen tools, including text_to_speech, speech_to_text, text_to_sound_effects, voice_clone, text_to_voice (voice design), isolate_audio, speech_to_speech, compose_music, create_agent, make_outbound_call, search_voices, and check_subscription. The full list is in the server source. They map one-to-one onto ElevenLabs API endpoints.

How do I set up the ElevenLabs MCP API key?

Create an API key at elevenlabs.io under Settings → API Keys (free tier included). Pass it as the ELEVENLABS_API_KEY environment variable in your client's MCP config — for example in the env block of the uvx elevenlabs-mcp entry in claude_desktop_config.json. The key is the only required credential.

Can Claude make a voice agent with the ElevenLabs MCP?

Yes. The create_agent tool builds an ElevenLabs Conversational AI agent from a prompt, add_knowledge_base_to_agent attaches documents, and make_outbound_call has the agent place a phone call (a phone number must be configured on your account). Claude orchestrates these tool calls; ElevenLabs runs the real-time voice loop.

Which clients support the ElevenLabs MCP server?

The README documents Claude Desktop, Cursor, Windsurf, and OpenAI Agents, and any client that speaks MCP over stdio works. The server runs locally as a subprocess via uvx or python -m elevenlabs_mcp, forwarding calls to the ElevenLabs cloud API. On Windows, enable Developer Mode in Claude Desktop first.

What is the ElevenLabs MCP server license?

MIT. The server in github.com/elevenlabs/elevenlabs-mcp is open source and free to fork, redistribute, and use commercially. The audio it generates is governed separately by your ElevenLabs plan and the ElevenLabs terms of service, including usage and voice-cloning consent rules.

Glossary

MCP: Model Context Protocol — the standard that lets an AI client call any compliant tool server.
Tool: A single named function the agent can invoke, like text_to_speech.
stdio transport: The server runs as a local subprocess and talks to the client over standard input/output — no network port exposed.
uvx: A runner from uv that executes a Python package without a permanent install.
Credit: ElevenLabs’ usage unit. Tools consume credits; the free tier ships 10,000 per month.
Voice design: Generating a new synthetic voice from a text description, via text_to_voice.
Voice cloning: Reproducing a specific real voice from audio samples, via voice_clone — consent required.
Audio isolation: Removing background noise from a recording, via isolate_audio.
Conversational AI agent: An ElevenLabs voice agent that holds a real-time spoken conversation; built with create_agent.
Outbound call: A phone call placed by a voice agent via make_outbound_call; needs a provisioned number.

Sources

Primary

Official repository & README: github.com/elevenlabs/elevenlabs-mcp (MIT) — install commands, env vars, tool list, troubleshooting, credit warning.
Official announcement: Introducing the ElevenLabs MCP server.

Community

Show HN: ElevenLabs MCP server — news.ycombinator.com/item?id=43621258.
Launch HN: April (YC S25), voice AI on ElevenLabs TTS, on latency — news.ycombinator.com/item?id=45015230.

Internal

ElevenLabs on MCP.Directory — catalog entry and install configs.
What is the Model Context Protocol?
AI & Machine Learning category

Server

If something here is out of date — a renamed tool, a new install path, a changed credit rule — email [email protected] or read more on our about page.

ElevenLabs MCP Server: The Complete Guide