voicemode

Name: voicemode
Author: mbailey

104views

10installs

This skill provides voice interaction capabilities for AI assistants. This skill should be used when users mention voice mode, want to have voice conversations, speak with Claude, check voice service status, or manage voice services like Whisper and Kokoro.

Install

mkdir -p .claude/skills/voicemode && curl -L -o skill.zip "https://mcp.directory/api/skills/download/190" && unzip -o skill.zip -d .claude/skills/voicemode && rm skill.zip

Installs to .claude/skills/voicemode

About this skill

VoiceMode

Overview

This skill enables natural voice conversations between Claude and users by providing access to VoiceMode's speech-to-text (STT) and text-to-speech (TTS) capabilities. It integrates with both local and cloud-based voice services for flexible, high-quality voice interactions.

When to Use This Skill

Load this skill when:

User mentions "voice mode" or "voicemode"
User says "converse", "speak to me", "talk to me", or similar phrases
User wants to start or continue a voice conversation
User needs to check voice service status (Whisper, Kokoro, LiveKit)
User wants to manage voice services (start, stop, restart)
User needs voice configuration or troubleshooting help
User mentions voice-related issues or preferences

Core Capabilities

1. Voice Conversations

Start natural voice conversations using the converse tool:

# Basic conversation
voicemode:converse(message="Hello! How can I help you today?")

# With specific settings
voicemode:converse(
    message="Let me help you with that",
    voice="nova",  # TTS voice selection
    wait_for_response=True,  # Listen for user response
    listen_duration_max=60  # Maximum listening time
)

Key Parameters:

message: Text to speak
wait_for_response: Whether to listen for response (default: true)
voice: TTS voice name (auto-selected if not specified)
tts_provider: Provider selection ("openai" or "kokoro")
listen_duration_max: Max listening time in seconds (default: 120)
disable_silence_detection: Disable auto-stop on silence

2. Service Management

Manage voice services using the service tool:

# Check service status
voicemode:service(service_name="whisper", action="status")
voicemode:service(service_name="kokoro", action="status")

# Start/stop services
voicemode:service(service_name="whisper", action="start")
voicemode:service(service_name="kokoro", action="stop")

# View service logs
voicemode:service(service_name="whisper", action="logs", lines=100)

Supported Services:

whisper: Local STT using Whisper.cpp
kokoro: Local TTS with multiple voices
livekit: Room-based real-time communication

Actions:

status: Check if service is running
start: Start the service
stop: Stop the service
restart: Restart the service
logs: View recent logs
enable: Start at boot/login
disable: Remove from startup

3. Voice Configuration

VoiceMode supports multiple configuration methods:

Environment Variables:

VOICEMODE_TTS_VOICE: Default TTS voice
VOICEMODE_TTS_PROVIDER: Default TTS provider
VOICEMODE_STT_PROVIDER: Default STT provider
VOICEMODE_AUDIO_FORMAT: Audio format (wav, mp3, etc.)

Voice Preferences:

Project-level: .voicemode file in project root
User-level: ~/.voicemode file in home directory

Configuration Files:

Main config: ~/.voicemode/config/config.yaml
Pronunciation: ~/.voicemode/config/pronunciation.yaml

Voice Service Architecture

Provider System

VoiceMode uses OpenAI-compatible endpoints for all services:

Automatic discovery of available services
Health checking and failover support
Transparent switching between providers

Available Providers

Cloud Services (requires API key):

OpenAI API: High-quality TTS/STT

Local Services (no API key needed):

Whisper.cpp: Fast local STT
Kokoro: Local TTS with multiple voices
LiveKit: WebRTC-based communication

Audio Processing

Requires FFmpeg for audio format conversion
Supports PCM, MP3, WAV, FLAC, AAC, Opus formats
WebRTC VAD for voice activity detection
Automatic format negotiation based on provider

Common Workflows

Token Efficiency Note

When using voicemode converse via CLI commands, redirect STDERR to /dev/null to save tokens by suppressing verbose diagnostic output. This prevents FFmpeg warnings and debug messages from consuming context:

voicemode converse -m "Hello" 2>/dev/null

Note: Omit the 2>/dev/null redirection when debugging issues or troubleshooting audio problems, as STDERR contains useful diagnostic information.

Starting a Voice Conversation

When using MCP tools:

# Simple start
voicemode:converse("Hello! What would you like to discuss today?")

# With specific voice
voicemode:converse(
    message="Let's begin our conversation",
    voice="echo",  # or "alloy", "nova", etc.
    tts_provider="openai"
)

When using CLI directly:

# Simple conversation (redirect STDERR to save tokens)
voicemode converse 2>/dev/null

# Speak without waiting
voicemode converse -m "Hello there!" --no-wait 2>/dev/null

# Continuous conversation mode
voicemode converse --continuous 2>/dev/null

# With specific voice
voicemode converse --voice nova 2>/dev/null

# Note: Omit 2>/dev/null for debugging or diagnostics

Checking Voice Setup

When using MCP tools:

# Check all services
voicemode:service("whisper", "status")
voicemode:service("kokoro", "status")

When using CLI directly:

# Check service status
voicemode whisper service status
voicemode kokoro status
voicemode livekit status

# Check dependencies
voicemode deps

# Diagnostic commands
voicemode diag info
voicemode diag devices
voicemode diag registry

Managing Services

# Whisper service management
voicemode whisper service start
voicemode whisper service stop
voicemode whisper service restart
voicemode whisper service logs

# Kokoro service management
voicemode kokoro start
voicemode kokoro stop
voicemode kokoro restart
voicemode kokoro logs

# LiveKit service management
voicemode livekit start
voicemode livekit stop
voicemode livekit restart

Configuration Management

# Edit configuration file
voicemode config edit

# View configuration
voicemode config list
voicemode config get VOICEMODE_TTS_VOICE

# Set configuration
voicemode config set VOICEMODE_TTS_VOICE nova

Installation and Setup

Quick Install

# Install VoiceMode package
curl -sL https://voicemode.ai/install.sh | bash

# Or with UV
uv tool install voice-mode-install
voice-mode-install

# Update to latest version
voicemode update

Service Installation

Using CLI commands:

# Install Whisper for local STT
voicemode whisper service install

# Install Kokoro for local TTS
voicemode kokoro install

# Install LiveKit for room-based communication
voicemode livekit install

# Both services auto-start after installation

Documentation References

For detailed information, reference these docs:

docs/reference/: API and parameter documentation
docs/tutorials/: Step-by-step guides
docs/services/: Service-specific documentation
CLAUDE.md: Project-specific Claude guidance
README.md: Installation and general usage

Logging and Debugging

VoiceMode maintains comprehensive logs in ~/.voicemode/:

logs/conversations/: Daily conversation logs
logs/events/: Detailed operational events
audio/: Saved audio recordings
config/: User configuration files

To enable debug logging, set VOICEMODE_DEBUG=true or use --debug flag.

Communication Guidelines

Voice Mode Parallel Operations (DEFAULT BEHAVIOR)

When using voice mode, ALWAYS use the parallel pattern by default:

Speak without waiting (wait_for_response=false) before performing other actions
Narrate actions while performing them - this creates natural conversation flow
Execute tools in parallel - speak and act simultaneously for better responsiveness

Example patterns:

When using MCP tools:

# ALWAYS do this - speak while acting
voicemode:converse("Let me search for that information", wait_for_response=False)
Grep(pattern="search_term", path="/path")  # Runs while speaking

When using CLI commands with Bash tool:

# Run voice announcement and action in parallel (redirect STDERR to save tokens)
voicemode converse -m "Let me check the service status" --no-wait 2>/dev/null &
voicemode whisper service status

# Note: Omit 2>/dev/null for debugging or diagnostics

Only wait for response when:

Asking questions that need answers
Getting confirmation for important actions
At natural conversation endpoints

Asking Questions

When asking questions, especially in voice mode:

Ask questions one at a time - avoid bundling multiple questions
Wait for the answer before proceeding to the next question
Keep questions clear and concise for voice conversations
This ensures clarity and prevents overwhelming in voice interactions

Example:

# Good - one question at a time
voicemode:converse("What type of voice service would you prefer?", wait_for_response=True)
# Wait for answer...
voicemode:converse("Would you like me to install it now?", wait_for_response=True)

# Avoid - multiple questions at once
# voicemode:converse("What voice do you want, and should I install Whisper, and do you need Kokoro too?")

Tips for Effective Use

Parallel Operations: Use speak-without-waiting pattern for most actions
Provider Selection: Let VoiceMode auto-select providers based on availability
Voice Preferences: Set user preferences in ~/.voicemode file
Service Management: Start services before conversations for best performance
Error Handling: Check service logs if voice interactions fail
Audio Quality: Use local services (Whisper/Kokoro) for privacy and speed

Integration Notes

VoiceMode runs as an MCP server via stdio transport
Compatible with Claude Code and other MCP clients
Supports concurrent instances with audio playback management
Works with tmux and terminal multiplexers

Batching Voice Announcements with Audio Playback

When playing audio files (e.g., from cue files or samples), you can batch multiple voice announcements and playback commands in a single tool call. The tools execute sequentially within the batch, allo

Content truncated.

More by mbailey

View all skills by mbailey →

voicemode-dj

mbailey

Background music control for VoiceMode voice sessions using mpv

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,6142,343

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1121,621

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,4411,494

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,1961,420

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,3161,177

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,888941

Related MCP Servers

Browse all servers

Voice Interface

Voice Interface is a browser-based speech to text website offering fast, hands-free speech to text online and website speech to text for accessibility.

655 tools

Playwright Browser Automation

Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium software testing tools.

28,44922 tools

Notion

Enhance productivity with AI-driven Notion automation. Leverage the Notion API for secure, automated workspace management and advanced integrations.

4,0000 tools

MongoDB

Connect MongoDB databases to chat interfaces. Manage AWS with MongoDB, explore Atlas cost, and inspect collections securely with authentication.

9450 tools

Lokka (Microsoft Graph)

Lokka (Microsoft Graph) — Conversational bridge to manage Microsoft 365 tenants via natural language, no complex API calls required.

2260 tools

Xero

Xero enables seamless financial data integration and accounting operations via xero software and OAuth2 for automated workflows.

2040 tools

Install

mkdir -p .claude/skills/voicemode && curl -L -o skill.zip "https://mcp.directory/api/skills/download/190" && unzip -o skill.zip -d .claude/skills/voicemode && rm skill.zip

Installs to .claude/skills/voicemode

Stats

Views

104

Installs

Author

mbailey

2 skills published

Links

Source Code

voicemode

Install

About this skill

VoiceMode

Overview

When to Use This Skill

Core Capabilities

1. Voice Conversations

2. Service Management

3. Voice Configuration

Voice Service Architecture

Provider System

Available Providers

Audio Processing

Common Workflows

Token Efficiency Note

Starting a Voice Conversation

Checking Voice Setup

Managing Services

Configuration Management

Installation and Setup

Quick Install

Service Installation

Documentation References

Logging and Debugging

Communication Guidelines

Voice Mode Parallel Operations (DEFAULT BEHAVIOR)

Asking Questions

Tips for Effective Use

Integration Notes

Batching Voice Announcements with Audio Playback

More by mbailey

voicemode-dj

You might also like

ui-ux-pro-max

flutter-development

pdf-to-markdown

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers