voicemode
This skill provides voice interaction capabilities for AI assistants. This skill should be used when users mention voice mode, want to have voice conversations, speak with Claude, check voice service status, or manage voice services like Whisper and Kokoro.
Install
mkdir -p .claude/skills/voicemode && curl -L -o skill.zip "https://mcp.directory/api/skills/download/190" && unzip -o skill.zip -d .claude/skills/voicemode && rm skill.zipInstalls to .claude/skills/voicemode
About this skill
VoiceMode
Overview
This skill enables natural voice conversations between Claude and users by providing access to VoiceMode's speech-to-text (STT) and text-to-speech (TTS) capabilities. It integrates with both local and cloud-based voice services for flexible, high-quality voice interactions.
When to Use This Skill
Load this skill when:
- User mentions "voice mode" or "voicemode"
- User says "converse", "speak to me", "talk to me", or similar phrases
- User wants to start or continue a voice conversation
- User needs to check voice service status (Whisper, Kokoro, LiveKit)
- User wants to manage voice services (start, stop, restart)
- User needs voice configuration or troubleshooting help
- User mentions voice-related issues or preferences
Core Capabilities
1. Voice Conversations
Start natural voice conversations using the converse tool:
# Basic conversation
voicemode:converse(message="Hello! How can I help you today?")
# With specific settings
voicemode:converse(
message="Let me help you with that",
voice="nova", # TTS voice selection
wait_for_response=True, # Listen for user response
listen_duration_max=60 # Maximum listening time
)
Key Parameters:
message: Text to speakwait_for_response: Whether to listen for response (default: true)voice: TTS voice name (auto-selected if not specified)tts_provider: Provider selection ("openai" or "kokoro")listen_duration_max: Max listening time in seconds (default: 120)disable_silence_detection: Disable auto-stop on silence
2. Service Management
Manage voice services using the service tool:
# Check service status
voicemode:service(service_name="whisper", action="status")
voicemode:service(service_name="kokoro", action="status")
# Start/stop services
voicemode:service(service_name="whisper", action="start")
voicemode:service(service_name="kokoro", action="stop")
# View service logs
voicemode:service(service_name="whisper", action="logs", lines=100)
Supported Services:
whisper: Local STT using Whisper.cppkokoro: Local TTS with multiple voiceslivekit: Room-based real-time communication
Actions:
status: Check if service is runningstart: Start the servicestop: Stop the servicerestart: Restart the servicelogs: View recent logsenable: Start at boot/logindisable: Remove from startup
3. Voice Configuration
VoiceMode supports multiple configuration methods:
Environment Variables:
VOICEMODE_TTS_VOICE: Default TTS voiceVOICEMODE_TTS_PROVIDER: Default TTS providerVOICEMODE_STT_PROVIDER: Default STT providerVOICEMODE_AUDIO_FORMAT: Audio format (wav, mp3, etc.)
Voice Preferences:
- Project-level:
.voicemodefile in project root - User-level:
~/.voicemodefile in home directory
Configuration Files:
- Main config:
~/.voicemode/config/config.yaml - Pronunciation:
~/.voicemode/config/pronunciation.yaml
Voice Service Architecture
Provider System
VoiceMode uses OpenAI-compatible endpoints for all services:
- Automatic discovery of available services
- Health checking and failover support
- Transparent switching between providers
Available Providers
Cloud Services (requires API key):
- OpenAI API: High-quality TTS/STT
Local Services (no API key needed):
- Whisper.cpp: Fast local STT
- Kokoro: Local TTS with multiple voices
- LiveKit: WebRTC-based communication
Audio Processing
- Requires FFmpeg for audio format conversion
- Supports PCM, MP3, WAV, FLAC, AAC, Opus formats
- WebRTC VAD for voice activity detection
- Automatic format negotiation based on provider
Common Workflows
Token Efficiency Note
When using voicemode converse via CLI commands, redirect STDERR to /dev/null to save tokens by suppressing verbose diagnostic output. This prevents FFmpeg warnings and debug messages from consuming context:
voicemode converse -m "Hello" 2>/dev/null
Note: Omit the 2>/dev/null redirection when debugging issues or troubleshooting audio problems, as STDERR contains useful diagnostic information.
Starting a Voice Conversation
When using MCP tools:
# Simple start
voicemode:converse("Hello! What would you like to discuss today?")
# With specific voice
voicemode:converse(
message="Let's begin our conversation",
voice="echo", # or "alloy", "nova", etc.
tts_provider="openai"
)
When using CLI directly:
# Simple conversation (redirect STDERR to save tokens)
voicemode converse 2>/dev/null
# Speak without waiting
voicemode converse -m "Hello there!" --no-wait 2>/dev/null
# Continuous conversation mode
voicemode converse --continuous 2>/dev/null
# With specific voice
voicemode converse --voice nova 2>/dev/null
# Note: Omit 2>/dev/null for debugging or diagnostics
Checking Voice Setup
When using MCP tools:
# Check all services
voicemode:service("whisper", "status")
voicemode:service("kokoro", "status")
When using CLI directly:
# Check service status
voicemode whisper service status
voicemode kokoro status
voicemode livekit status
# Check dependencies
voicemode deps
# Diagnostic commands
voicemode diag info
voicemode diag devices
voicemode diag registry
Managing Services
# Whisper service management
voicemode whisper service start
voicemode whisper service stop
voicemode whisper service restart
voicemode whisper service logs
# Kokoro service management
voicemode kokoro start
voicemode kokoro stop
voicemode kokoro restart
voicemode kokoro logs
# LiveKit service management
voicemode livekit start
voicemode livekit stop
voicemode livekit restart
Configuration Management
# Edit configuration file
voicemode config edit
# View configuration
voicemode config list
voicemode config get VOICEMODE_TTS_VOICE
# Set configuration
voicemode config set VOICEMODE_TTS_VOICE nova
Installation and Setup
Quick Install
# Install VoiceMode package
curl -sL https://voicemode.ai/install.sh | bash
# Or with UV
uv tool install voice-mode-install
voice-mode-install
# Update to latest version
voicemode update
Service Installation
Using CLI commands:
# Install Whisper for local STT
voicemode whisper service install
# Install Kokoro for local TTS
voicemode kokoro install
# Install LiveKit for room-based communication
voicemode livekit install
# Both services auto-start after installation
Documentation References
For detailed information, reference these docs:
docs/reference/: API and parameter documentationdocs/tutorials/: Step-by-step guidesdocs/services/: Service-specific documentationCLAUDE.md: Project-specific Claude guidanceREADME.md: Installation and general usage
Logging and Debugging
VoiceMode maintains comprehensive logs in ~/.voicemode/:
logs/conversations/: Daily conversation logslogs/events/: Detailed operational eventsaudio/: Saved audio recordingsconfig/: User configuration files
To enable debug logging, set VOICEMODE_DEBUG=true or use --debug flag.
Communication Guidelines
Voice Mode Parallel Operations (DEFAULT BEHAVIOR)
When using voice mode, ALWAYS use the parallel pattern by default:
- Speak without waiting (
wait_for_response=false) before performing other actions - Narrate actions while performing them - this creates natural conversation flow
- Execute tools in parallel - speak and act simultaneously for better responsiveness
Example patterns:
When using MCP tools:
# ALWAYS do this - speak while acting
voicemode:converse("Let me search for that information", wait_for_response=False)
Grep(pattern="search_term", path="/path") # Runs while speaking
When using CLI commands with Bash tool:
# Run voice announcement and action in parallel (redirect STDERR to save tokens)
voicemode converse -m "Let me check the service status" --no-wait 2>/dev/null &
voicemode whisper service status
# Note: Omit 2>/dev/null for debugging or diagnostics
Only wait for response when:
- Asking questions that need answers
- Getting confirmation for important actions
- At natural conversation endpoints
Asking Questions
When asking questions, especially in voice mode:
- Ask questions one at a time - avoid bundling multiple questions
- Wait for the answer before proceeding to the next question
- Keep questions clear and concise for voice conversations
- This ensures clarity and prevents overwhelming in voice interactions
Example:
# Good - one question at a time
voicemode:converse("What type of voice service would you prefer?", wait_for_response=True)
# Wait for answer...
voicemode:converse("Would you like me to install it now?", wait_for_response=True)
# Avoid - multiple questions at once
# voicemode:converse("What voice do you want, and should I install Whisper, and do you need Kokoro too?")
Tips for Effective Use
- Parallel Operations: Use speak-without-waiting pattern for most actions
- Provider Selection: Let VoiceMode auto-select providers based on availability
- Voice Preferences: Set user preferences in
~/.voicemodefile - Service Management: Start services before conversations for best performance
- Error Handling: Check service logs if voice interactions fail
- Audio Quality: Use local services (Whisper/Kokoro) for privacy and speed
Integration Notes
- VoiceMode runs as an MCP server via stdio transport
- Compatible with Claude Code and other MCP clients
- Supports concurrent instances with audio playback management
- Works with tmux and terminal multiplexers
Batching Voice Announcements with Audio Playback
When playing audio files (e.g., from cue files or samples), you can batch multiple voice announcements and playback commands in a single tool call. The tools execute sequentially within the batch, allowing for natural announce-then-play patterns:
# Batch multiple announce-play sequences in one call
voicemode:converse("Chapter 1 - Intro", wait_for_response=False)
Bash(command="mpv --start=00:00 --length=3 song.mp3")
voicemode:converse("Chapter 2 - Verse", wait_for_response=False)
Bash(command="mpv --start=00:10 --length=5 song.mp3")
voicemode:converse("Chapter 3 - Chorus", wait_for_response=False)
Bash(command="mpv --start=00:30 --length=5 song.mp3")
Key points:
- All tools in the batch execute sequentially (not in parallel)
- Each announcement plays before its corresponding audio
- No explicit delays needed - the TTS completes before Bash runs
- Efficient for playing multiple cue file chapters with narration
- Small gap between audio end and next tool call (API round-trip)
More by mbailey
View all →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
rust-coding-skill
UtakataKyosui
Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.