Voice MCP

Name: Voice MCP
Rating: 4.9 (492 reviews)
Author: mbailey

Enables voice conversations with Claude by converting speech to text and text back to speech. Works through local microphone or remote room connections with automatic fallback options.

Enables two-way voice conversations through multiple transport methods including local microphone recording and LiveKit room-based communication, with configurable STT/TTS services and automatic transport fallback for creating voice-enabled applications.

875355 views120Local (stdio)

productivity communication

GitHub

What it does

Record voice through local microphone
Convert speech to text with multiple STT services
Convert text to speech with configurable TTS services
Connect through LiveKit rooms for remote voice chat
Handle automatic transport fallback
Maintain continuous voice conversations

Best for

Developers who need hands-free coding assistanceWorking while multitasking or away from keyboardAccessibility for users who prefer voice interaction

Multiple transport methods with fallbackWorks with existing Claude setupLocal and cloud STT/TTS options

About Voice MCP

Voice MCP is a community-built MCP server published by mbailey that provides AI assistants with tools and capabilities via the Model Context Protocol. Voice MCP powers two-way voice apps with Google Cloud Speech to Text, Speech Recognition, and Text to Speech API for acc It is categorized under productivity, communication.

How to install

You can install Voice MCP in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Voice MCP is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

VoiceMode

Natural voice conversations with Claude Code (and other MCP capable agents)

VoiceMode enables natural voice conversations with Claude Code. Voice isn't about replacing typing - it's about being available when typing isn't.

Perfect for:

Walking to your next meeting
Cooking while debugging
Giving your eyes a break after hours of screen time
Holding a coffee (or a dog)
Any moment when your hands or eyes are busy

See It In Action

Quick Start

Requirements: Computer with microphone and speakers

Option 1: Claude Code Plugin (Recommended)

The fastest way for Claude Code users to get started:

# Add the VoiceMode marketplace
claude plugin marketplace add mbailey/voicemode

# Install VoiceMode plugin
claude plugin install voicemode@voicemode

## Install dependencies (CLI, Local Voice Services)

/voicemode:install

# Start talking!
/voicemode:converse

Option 2: Python installer package

Installs dependencies and the VoiceMode Python package.

# Install UV package manager (if needed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Run the installer (sets up dependencies and local voice services)
uvx voice-mode-install

# Add to Claude Code
claude mcp add --scope user voicemode -- uvx --refresh voice-mode

# Optional: Add OpenAI API key as fallback for local services
export OPENAI_API_KEY=your-openai-key

# Start a conversation
claude converse

For manual setup, see the Getting Started Guide.

Features

Natural conversations - speak naturally, hear responses immediately
Works offline - optional local voice services (Whisper STT, Kokoro TTS)
Low latency - fast enough to feel like a real conversation
Smart silence detection - stops recording when you stop speaking
Privacy options - run entirely locally or use cloud services

Compatibility

Platforms: Linux, macOS, Windows (WSL), NixOS Python: 3.10-3.14

Configuration

VoiceMode works out of the box. For customization:

# Set OpenAI API key (if using cloud services)
export OPENAI_API_KEY="your-key"

# Or configure via file
voicemode config edit

See the Configuration Guide for all options.

Permissions Setup (Optional)

To use VoiceMode without permission prompts, add to ~/.claude/settings.json:

{
  "permissions": {
    "allow": [
      "mcp__voicemode__converse",
      "mcp__voicemode__service"
    ]
  }
}

See the Permissions Guide for more options.

Local Voice Services

For privacy or offline use, install local speech services:

Whisper.cpp - Local speech-to-text
Kokoro - Local text-to-speech with multiple voices

These provide the same API as OpenAI, so VoiceMode switches seamlessly between them.

Installation Details

System Dependencies by Platform

Ubuntu/Debian

sudo apt update
sudo apt install -y ffmpeg gcc libasound2-dev libasound2-plugins libportaudio2 portaudio19-dev pulseaudio pulseaudio-utils python3-dev

WSL2 users: The pulseaudio packages above are required for microphone access.

Fedora/RHEL

sudo dnf install alsa-lib-devel ffmpeg gcc portaudio portaudio-devel python3-devel

macOS

brew install ffmpeg node portaudio

NixOS

# Use development shell
nix develop github:mbailey/voicemode

# Or install system-wide
nix profile install github:mbailey/voicemode

Alternative Installation Methods

From source

git clone https://github.com/mbailey/voicemode.git
cd voicemode
uv tool install -e .

NixOS system-wide

# In /etc/nixos/configuration.nix
environment.systemPackages = [
  (builtins.getFlake "github:mbailey/voicemode").packages.${pkgs.system}.default
];

Troubleshooting

Problem	Solution
No microphone access	Check terminal/app permissions. WSL2 needs pulseaudio packages.
UV not found	Run `curl -LsSf https://astral.sh/uv/install.sh \| sh`
OpenAI API error	Verify `OPENAI_API_KEY` is set correctly
No audio output	Check system audio settings and available devices

Save Audio for Debugging

export VOICEMODE_SAVE_AUDIO=true
# Files saved to ~/.voicemode/audio/YYYY/MM/

Documentation

Getting Started - Full setup guide
Configuration - All environment variables
Whisper Setup - Local speech-to-text
Kokoro Setup - Local text-to-speech
Development Setup - Contributing guide

Full documentation: voice-mode.readthedocs.io

License

MIT - A Failmode Project

mcp-name: com.failmode/voicemode

Alternatives

GitHub

github

27.6k

Extend your developer tools with GitHub MCP Server for advanced automation, supporting GitHub Student and student packag

OfficialRemotePopular

4.5k232

Task Master

eyaltoledano

25.8k

Boost productivity with Task Master: an AI-powered tool for project management and agile development workflows, integrat

CommunityPopular

4.9k114

Mastra Docs

mastra-ai

21.8k

Mastra Docs: AI assistants with direct access to Mastra.ai’s full knowledge base for faster, smarter support and insight

OfficialPopular

3872

Beads

steveyegge

18.6k

Beads — a drop-in memory upgrade for your coding agent that boosts context, speed, and reliability with zero friction.

OfficialPopular

8084

Related Skills

Browse all skills

brand-voice-consistency

Ensure all communication matches brand voice and tone guidelines. Use when creating marketing copy, customer communications, public-facing content, or when users mention brand voice, tone, or writing style.

twilio-communications

Build communication features with Twilio: SMS messaging, voice calls, WhatsApp Business API, and user verification (2FA). Covers the full spectrum from simple notifications to complex IVR systems and multi-channel authentication. Critical focus on compliance, rate limits, and error handling. Use when: twilio, send SMS, text message, voice call, phone verification.

azure-ai-voicelive-py

Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, and transcription.

azure-ai-voicelive-dotnet

Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant .NET", "bidirectional audio", "speech-to-speech".

brand-voice

Apply and enforce brand voice, style guide, and messaging pillars across content. Use when reviewing content for brand consistency, documenting a brand voice, adapting tone for different audiences, or checking terminology and style guide compliance.

content-creator

Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content calendars, or when user mentions content creation, brand voice, SEO optimization, social media marketing, or content strategy.

What it does

Best for

About Voice MCP

How to install

License

VoiceMode

See It In Action

Quick Start

Option 1: Claude Code Plugin (Recommended)

Option 2: Python installer package

Features

Compatibility

Configuration

Permissions Setup (Optional)

Local Voice Services

Installation Details

Ubuntu/Debian

Fedora/RHEL

macOS

NixOS

From source

NixOS system-wide

Troubleshooting

Save Audio for Debugging

Documentation

Links

License

Alternatives

GitHub

Task Master

Mastra Docs

Beads

Related Skills