Voice MCP

Voice MCP

mbailey

Enables voice conversations with Claude by converting speech to text and text back to speech. Works through local microphone or remote room connections with automatic fallback options.

Enables two-way voice conversations through multiple transport methods including local microphone recording and LiveKit room-based communication, with configurable STT/TTS services and automatic transport fallback for creating voice-enabled applications.

875355 views120Local (stdio)

What it does

  • Record voice through local microphone
  • Convert speech to text with multiple STT services
  • Convert text to speech with configurable TTS services
  • Connect through LiveKit rooms for remote voice chat
  • Handle automatic transport fallback
  • Maintain continuous voice conversations

Best for

Developers who need hands-free coding assistanceWorking while multitasking or away from keyboardAccessibility for users who prefer voice interaction
Multiple transport methods with fallbackWorks with existing Claude setupLocal and cloud STT/TTS options

About Voice MCP

Voice MCP is a community-built MCP server published by mbailey that provides AI assistants with tools and capabilities via the Model Context Protocol. Voice MCP powers two-way voice apps with Google Cloud Speech to Text, Speech Recognition, and Text to Speech API for acc It is categorized under productivity, communication.

How to install

You can install Voice MCP in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Voice MCP is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

VoiceMode

Natural voice conversations with Claude Code (and other MCP capable agents)

PyPI Downloads PyPI Downloads PyPI Downloads

VoiceMode enables natural voice conversations with Claude Code. Voice isn't about replacing typing - it's about being available when typing isn't.

Perfect for:

  • Walking to your next meeting
  • Cooking while debugging
  • Giving your eyes a break after hours of screen time
  • Holding a coffee (or a dog)
  • Any moment when your hands or eyes are busy

See It In Action

VoiceMode Demo

Quick Start

Requirements: Computer with microphone and speakers

Option 1: Claude Code Plugin (Recommended)

The fastest way for Claude Code users to get started:

# Add the VoiceMode marketplace
claude plugin marketplace add mbailey/voicemode

# Install VoiceMode plugin
claude plugin install voicemode@voicemode

## Install dependencies (CLI, Local Voice Services)

/voicemode:install

# Start talking!
/voicemode:converse

Option 2: Python installer package

Installs dependencies and the VoiceMode Python package.

# Install UV package manager (if needed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Run the installer (sets up dependencies and local voice services)
uvx voice-mode-install

# Add to Claude Code
claude mcp add --scope user voicemode -- uvx --refresh voice-mode

# Optional: Add OpenAI API key as fallback for local services
export OPENAI_API_KEY=your-openai-key

# Start a conversation
claude converse

For manual setup, see the Getting Started Guide.

Features

  • Natural conversations - speak naturally, hear responses immediately
  • Works offline - optional local voice services (Whisper STT, Kokoro TTS)
  • Low latency - fast enough to feel like a real conversation
  • Smart silence detection - stops recording when you stop speaking
  • Privacy options - run entirely locally or use cloud services

Compatibility

Platforms: Linux, macOS, Windows (WSL), NixOS Python: 3.10-3.14

Configuration

VoiceMode works out of the box. For customization:

# Set OpenAI API key (if using cloud services)
export OPENAI_API_KEY="your-key"

# Or configure via file
voicemode config edit

See the Configuration Guide for all options.

Permissions Setup (Optional)

To use VoiceMode without permission prompts, add to ~/.claude/settings.json:

{
  "permissions": {
    "allow": [
      "mcp__voicemode__converse",
      "mcp__voicemode__service"
    ]
  }
}

See the Permissions Guide for more options.

Local Voice Services

For privacy or offline use, install local speech services:

  • Whisper.cpp - Local speech-to-text
  • Kokoro - Local text-to-speech with multiple voices

These provide the same API as OpenAI, so VoiceMode switches seamlessly between them.

Installation Details

System Dependencies by Platform

Ubuntu/Debian

sudo apt update
sudo apt install -y ffmpeg gcc libasound2-dev libasound2-plugins libportaudio2 portaudio19-dev pulseaudio pulseaudio-utils python3-dev

WSL2 users: The pulseaudio packages above are required for microphone access.

Fedora/RHEL

sudo dnf install alsa-lib-devel ffmpeg gcc portaudio portaudio-devel python3-devel

macOS

brew install ffmpeg node portaudio

NixOS

# Use development shell
nix develop github:mbailey/voicemode

# Or install system-wide
nix profile install github:mbailey/voicemode
Alternative Installation Methods

From source

git clone https://github.com/mbailey/voicemode.git
cd voicemode
uv tool install -e .

NixOS system-wide

# In /etc/nixos/configuration.nix
environment.systemPackages = [
  (builtins.getFlake "github:mbailey/voicemode").packages.${pkgs.system}.default
];

Troubleshooting

ProblemSolution
No microphone accessCheck terminal/app permissions. WSL2 needs pulseaudio packages.
UV not foundRun curl -LsSf https://astral.sh/uv/install.sh | sh
OpenAI API errorVerify OPENAI_API_KEY is set correctly
No audio outputCheck system audio settings and available devices

Save Audio for Debugging

export VOICEMODE_SAVE_AUDIO=true
# Files saved to ~/.voicemode/audio/YYYY/MM/

Documentation

Full documentation: voice-mode.readthedocs.io

Links

License

MIT - A Failmode Project


mcp-name: com.failmode/voicemode

Alternatives

Related Skills

Browse all skills
brand-voice-consistency

Ensure all communication matches brand voice and tone guidelines. Use when creating marketing copy, customer communications, public-facing content, or when users mention brand voice, tone, or writing style.

3
twilio-communications

Build communication features with Twilio: SMS messaging, voice calls, WhatsApp Business API, and user verification (2FA). Covers the full spectrum from simple notifications to complex IVR systems and multi-channel authentication. Critical focus on compliance, rate limits, and error handling. Use when: twilio, send SMS, text message, voice call, phone verification.

2
azure-ai-voicelive-py

Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, and transcription.

2
azure-ai-voicelive-dotnet

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant .NET", "bidirectional audio", "speech-to-speech".

0
brand-voice

Apply and enforce brand voice, style guide, and messaging pillars across content. Use when reviewing content for brand consistency, documenting a brand voice, adapting tone for different audiences, or checking terminology and style guide compliance.

91
content-creator

Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content calendars, or when user mentions content creation, brand voice, SEO optimization, social media marketing, or content strategy.

33