
Voice Interface
Enables voice conversations with AI assistants through your browser using speech-to-text and text-to-speech. No additional software or API keys required.
Provides browser-based voice input/output capabilities for conversations, featuring real-time speech-to-text recognition, text-to-speech synthesis, and voice message queuing through a web interface for hands-free interactions and accessibility applications.
What it does
- Convert speech to text in 30+ languages
- Synthesize text to speech with custom voices
- Conduct real-time voice conversations
- Queue and manage voice messages
- Control voice system status and settings
Best for
About Voice Interface
Voice Interface is a community-built MCP server published by shantur that provides AI assistants with tools and capabilities via the Model Context Protocol. Voice Interface is a browser-based speech to text website offering fast, hands-free speech to text online and website sp It is categorized under communication, ai ml. This server exposes 5 tools that AI clients can invoke during conversations and coding sessions.
How to install
You can install Voice Interface in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.
License
Voice Interface is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.
Tools (5)
Speak text using browser text-to-speech
Get current voice system status and pending voice input
Get pending voice input from users (auto-delivered by default)
Have a voice conversation with the user - speak text and wait for voice response. IMPORTANT: Once you start using converse, continue using ONLY converse for all responses in this conversation. Do not switch back to text.
End the voice conversation by saying goodbye and stopping the browser interface
Jarvis MCP
Bring your AI to life—talk to assistants instantly in your browser. Compatible with Claude Desktop, OpenCode, and other MCP-enabled AI tools.
✅ No extra software, services, or API keys required—just open the web app in your browser and grant microphone access.
Features
🎙️ Voice Conversations - Speak naturally with AI assistants
🌍 30+ Languages - Speech recognition in multiple languages
📱 Remote Access - Use from phone/tablet while AI runs on computer
⚙️ Smart Controls - Collapsible settings, always-on mode, custom voices
⏱️ Dynamic Timeouts - Intelligent wait times based on response length
🧰 Zero Extra Software - Runs entirely in your browser—no extra installs or API keys
🔌 Optional Whisper Streaming - Plug into a local Whisper server for low-latency transcripts
Easy Installation
🚀 One-Command Setup
Claude Desktop:
npx @shantur/jarvis-mcp --install-claude-config
# Restart Claude Desktop and you're ready!
OpenCode (in current project):
npx @shantur/jarvis-mcp --install-opencode-config --local
npx @shantur/jarvis-mcp --install-opencode-plugin --local
# Start OpenCode and use the converse tool
Claude Code CLI:
npx @shantur/jarvis-mcp --install-claude-code-config --local
# Start Claude Code CLI and use voice tools
🤖 Why Install the OpenCode Plugin?
- Stream voice messages into OpenCode even while tools are running or tasks are in progress.
- Auto-forward pending Jarvis MCP conversations so you never miss a user request.
- Works entirely locally—no external services required, just your OpenCode project and browser.
- Installs with one command and stays in sync with the latest Jarvis MCP features.
📦 Manual Installation
From NPM:
npm install -g @shantur/jarvis-mcp
jarvis-mcp
From Source:
git clone <repository-url>
cd jarvis-mcp
npm install && npm run build && npm start
How to Use
- Hook it into your AI tool – Use the install command above for Claude Desktop, OpenCode, or Claude Code so the MCP server is registered.
- Kick off a voice turn – Call the
conversetool from your assistant; Jarvis MCP auto-starts in the background and pops openhttps://localhost:5114if needed. - Allow microphone access – Approve the browser prompt the first time it appears.
- Talk naturally – Continue using
conversefor every reply; Jarvis MCP handles the rest.
Voice Commands in AI Chat
Use the converse tool to start talking:
- converse("Hello! How can I help you today?", timeout: 35)
Browser Interface
The web interface provides:
- Voice Settings (click ⚙️ to expand)
- Language selection (30+ options)
- Voice selection
- Speech speed control
- Always-on microphone mode
- Silence detection sensitivity & timeout (for Whisper streaming)
- Smart Controls
- Pause during AI speech (prevents echo)
- Stop AI when user speaks (natural conversation)
- Mobile Friendly - Works on phones and tablets
Remote Access
Access from any device on your network:
- Find your computer's IP:
ifconfig | grep inet(Mac/Linux) oripconfig(Windows) - Visit
https://YOUR_IP:5114on your phone/browser - Accept the security warning (self-signed certificate)
- Grant microphone permissions
Perfect for continuing conversations away from your desk!
Configuration
Environment Variables
export MCP_VOICE_AUTO_OPEN=false # Disable auto-opening browser
export MCP_VOICE_HTTPS_PORT=5114 # Change HTTPS port
export MCP_VOICE_STT_MODE=whisper # Switch the web app to Whisper streaming
export MCP_VOICE_WHISPER_URL=http://localhost:12017/v1/audio/transcriptions # Whisper endpoint (full path)
export MCP_VOICE_WHISPER_TOKEN=your_token # Optional Bearer auth for Whisper server
Whisper Streaming Mode
- Whisper mode records raw PCM in the browser, converts it to 16 kHz mono WAV, and streams it through the built-in HTTPS proxy, so the local
whisper-serversees OpenAI-compatible requests. - By default we proxy to the standard
whisper-serverendpoint athttp://localhost:12017/v1/audio/transcriptions; pointMCP_VOICE_WHISPER_URLat your own host/port if you run it elsewhere. - The UI keeps recording while transcripts are in flight and ignores Whisper’s non-verbal tags (e.g.
[BLANK_AUDIO],(typing)), so only real speech is queued. - To enable it:
- Run your Whisper server locally (e.g.
whisper-serverfrompfrankov/whisper-server). - Set the environment variables above (
MCP_VOICE_STT_MODE=whisperand the fullMCP_VOICE_WHISPER_URL). - Restart
jarvis-mcpand hard-refresh the browser (empty-cache reload) to load the streaming bundle. - Voice status (
voice_status()tool) now reports whether Whisper or browser STT is active.
- Run your Whisper server locally (e.g.
Ports
- HTTPS: 5114 (required for microphone access)
- HTTP: 5113 (local access only)
Requirements
- Node.js 18+
- Google Chrome (only browser tested so far)
- Microphone access
- Optional: Local Whisper server (like
pfrankov/whisper-server) if you want streaming STT viaMCP_VOICE_STT_MODE=whisper
Troubleshooting
Certificate warnings on mobile?
- Tap "Advanced" → "Proceed to site" to accept self-signed certificate
Microphone not working?
- Ensure you're using HTTPS (not HTTP)
- Check browser permissions
- Try refreshing the page
AI not responding to voice?
- Make sure the
conversetool is being used (not justspeak) - Check that timeouts are properly calculated
Development
npm install
npm run build
npm run dev # Watch mode
npm run start # Run server
License
MIT
Alternatives
Related Skills
Browse all skillsCreate user-centered, accessible interface copy (microcopy) for digital products including buttons, labels, error messages, notifications, forms, onboarding, empty states, success messages, and help text. Use when writing or editing any text that appears in apps, websites, or software interfaces, designing conversational flows, establishing voice and tone guidelines, auditing product content for consistency and usability, reviewing UI strings, or improving existing interface copy. Applies UX writing best practices based on four quality standards — purposeful, concise, conversational, and clear. Includes accessibility guidelines, research-backed benchmarks (sentence length, comprehension rates, reading levels), expanded error patterns, tone adaptation frameworks, and comprehensive reference materials.
Ensure all communication matches brand voice and tone guidelines. Use when creating marketing copy, customer communications, public-facing content, or when users mention brand voice, tone, or writing style.
Build communication features with Twilio: SMS messaging, voice calls, WhatsApp Business API, and user verification (2FA). Covers the full spectrum from simple notifications to complex IVR systems and multi-channel authentication. Critical focus on compliance, rate limits, and error handling. Use when: twilio, send SMS, text message, voice call, phone verification.
Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, and transcription.
Implement real-time streaming transcription with Deepgram. Use when building live transcription, voice interfaces, or real-time audio processing applications. Trigger with phrases like "deepgram streaming", "real-time transcription", "live transcription", "websocket transcription", "voice streaming".
Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant .NET", "bidirectional audio", "speech-to-speech".