audio-gen
Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and ElevenLabs converts it to high-quality audio. Supports multiple formats (audiobook, podcast, educational), custom lengths, and voice effects. Use when asked to create audio content, make a podcast, generate an audiobook, or produce educational audio. Returns MP3 audio file via MEDIA token.
Install
mkdir -p .claude/skills/audio-gen && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8397" && unzip -o skill.zip -d .claude/skills/audio-gen && rm skill.zipInstalls to .claude/skills/audio-gen
About this skill
🎙️ Audio Content Generator
Generate high-quality audiobooks, podcasts, or educational audio content on demand using AI-written scripts and ElevenLabs text-to-speech.
Quick Start
Create an audiobook chapter:
User: "Create a 5-minute audiobook chapter about a dragon discovering friendship"
Generate a podcast:
User: "Make a 10-minute podcast about the history of coffee"
Produce educational content:
User: "Generate a 15-minute educational audio explaining how neural networks work"
Content Formats
Audiobook
Style: Narrative storytelling with emotional depth
- Clear beginning, middle, and end
- Descriptive language and vivid imagery
- Dramatic pacing with thoughtful pauses
- Emotional tone that matches the story
- Use voice effects like
[whispers],[excited],[serious]for impact
Example Structure:
[Opening hook - set the scene]
[long pause]
[Story development with character emotions]
[short pause] between sentences
[long pause] between paragraphs
[Climax with dramatic tension]
[long pause]
[Resolution and emotional closure]
Podcast
Style: Conversational and engaging
- Warm, welcoming intro (15-30 seconds)
- Main content with natural flow
- Transitions between topics
- Memorable outro with key takeaways
- Conversational tone throughout
Example Structure:
**Intro:** "Welcome to [topic]. I'm excited to share..."
[short pause]
**Main Content:** "Let's start with... [topic 1]"
[long pause] between segments
**Outro:** "Thanks for listening! Remember..."
Educational Content
Style: Clear explanations for learning
- Simple introductions to complex topics
- Step-by-step breakdowns
- Real-world examples and analogies
- Recap of key concepts at the end
- Enthusiastic delivery with
[excited]for important points
Example Structure:
**Introduction:** What is [topic] and why it matters?
**Main Content:**
- Concept 1: Explanation + Example
- Concept 2: Explanation + Example
- Concept 3: Explanation + Example
**Summary:** Key takeaways and next steps
Length Guidelines
Word Count to Duration Conversion:
- 5 minutes = ~375 words
- 10 minutes = ~750 words
- 15 minutes = ~1,125 words
- 20 minutes = ~1,500 words
- 30 minutes = ~2,250 words
Pacing: Average conversational speed is ~75 words per minute
Practical Limits:
- Minimum: 2 minutes (~150 words)
- Maximum: 30 minutes (~2,250 words)
- Sweet spot: 5-15 minutes for best engagement
Workflow Instructions
Step 1: Understand the Request
Parse the user's request for:
- Content type (audiobook, podcast, educational, or inferred from topic)
- Topic/theme (what should the content be about)
- Target length (how many minutes)
- Tone/style (dramatic, casual, educational, etc.)
- Special requests (specific voice, emphasis on certain points)
Step 2: Calculate Word Count
target_words = target_minutes × 75
Example: 10 minutes = 10 × 75 = 750 words
Step 3: Generate the Script
Write the complete script following these rules:
Content Guidelines:
- Start strong with an engaging hook
- Maintain natural, conversational flow
- Use active voice and simple sentence structure
- Include relevant examples and stories
- End with a satisfying conclusion
Formatting Rules:
- Add
[short pause]after sentences (use sparingly, not every sentence) - Add
[long pause]between paragraphs or major sections - Use voice effects strategically:
[whispers],[shouts],[excited],[serious],[sarcastic],[sings],[laughs] - Write numbers as words: "twenty-three" not "23"
- Spell out acronyms first time: "AI, or artificial intelligence"
- Avoid complex punctuation (em-dashes work, but semicolons don't read well)
- Remove markdown formatting before TTS conversion
Step 4: Present the Script
Show the script to the user and ask:
Here's the [format] script I've created (approximately [length] minutes):
[Display the script]
Would you like me to:
1. Generate the audio now
2. Make changes to the script
3. Adjust the length or tone
Step 5: Handle User Feedback
If user requests changes:
- Regenerate the script with adjustments
- Maintain the target word count
- Present the revised version
If user approves:
- Proceed to audio generation
Step 6: Generate Audio
Format the script for TTS:
- Remove any remaining markdown (headers, bold, italics)
- Ensure voice effects are in proper
[effect]format - Check that pauses are appropriately placed
- Verify numbers and acronyms are spelled out
Invoke the TTS script:
IMPORTANT: The ELEVENLABS_API_KEY environment variable is already configured in the system. Simply invoke the TTS script directly.
uv run /home/clawdbot/clawdbot/skills/sag/scripts/tts.py \
-o /tmp/audio-gen-[timestamp]-[topic-slug].mp3 \
-m eleven_multilingual_v2 \
"[formatted_script]"
For long scripts, use heredoc:
uv run /home/clawdbot/clawdbot/skills/sag/scripts/tts.py \
-o /tmp/audio-gen-[timestamp]-[topic-slug].mp3 \
-m eleven_multilingual_v2 \
"$(cat <<'EOF'
[formatted_script]
EOF
)"
Return the result:
MEDIA:/tmp/audio-gen-[timestamp]-[topic-slug].mp3
Your [format] is ready! [Brief description of content]. Duration: approximately [X] minutes.
Voice Effects (SSML Tags)
Available voice modulation effects (use sparingly for impact):
[whispers]- Soft, intimate delivery[shouts]- Loud, emphatic delivery[excited]- Enthusiastic, energetic tone[serious]- Grave, solemn tone[sarcastic]- Ironic, mocking tone[sings]- Musical, melodic delivery[laughs]- Amused, jovial tone[short pause]- Brief silence (~0.5s)[long pause]- Extended silence (~1-2s)
Best Practices:
- Use effects for emotional moments, not every sentence
- Pauses are your most powerful tool for pacing
- Voice effects work best in audiobooks and dramatic content
- Keep podcasts and educational content mostly natural
Error Handling
Script Too Long
If the generated script exceeds target by >20%:
The script I generated is [X] words ([Y] minutes), which is longer than your target of [Z] minutes. Would you like me to:
1. Condense it to fit the target length
2. Split it into multiple parts
3. Keep it as is
Script Too Short
If the generated script is under target by >20%:
The script is [X] words ([Y] minutes), shorter than your target. Would you like me to:
1. Expand it with more detail
2. Add additional examples or stories
3. Generate as is
TTS Generation Fails
If the TTS script fails:
I've created the script, but I'm unable to generate the audio right now. Here's your script:
[Display script]
Error: [specific error message]
You can:
1. Check that ELEVENLABS_API_KEY is configured
2. Use the script with your own text-to-speech tool
3. Try again in a moment
4. Ask me to troubleshoot the audio generation
Common TTS Issues:
- API key not set: Verify ELEVENLABS_API_KEY in config
- Rate limit: Wait a moment and try again
- Text too long: Break into smaller chunks (max ~5000 characters)
Invalid Request
For unrealistic requests (e.g., "100-hour audiobook"):
That length would require [X] words and take significant time to generate. I recommend:
- Breaking it into multiple episodes/chapters
- Targeting 5-30 minutes per audio file
- Creating a series instead of one long file
Tips for Best Results
For Engaging Audiobooks
- Focus on character emotions and sensory details
- Use pauses to build dramatic tension
- Vary sentence length for rhythm
- Include internal monologue and reflection
For Compelling Podcasts
- Start with a question or surprising fact
- Use conversational phrases: "You know what's interesting..."
- Include relatable examples from everyday life
- End with actionable takeaways
For Effective Educational Content
- Use the "explain like I'm five" approach
- Build from simple to complex concepts
- Repeat key terms and definitions
- Provide multiple examples for clarity
Technical Notes
TTS Implementation:
- Uses Python script:
~/.clawdbot/clawdbot/skills/sag/scripts/tts.py - No binary installation required (pure Python + requests)
- Directly calls ElevenLabs API
- Compatible with Linux and macOS
File Storage:
- Audio files are saved to
/tmp/audio-gen/ - Filename format:
audio-gen-[timestamp]-[topic-slug].mp3 - Files are automatically cleaned up after 24 hours
API Requirements:
- Anthropic API for script generation (already configured)
- ElevenLabs API for text-to-speech (configured via ELEVENLABS_API_KEY)
- Both services must be configured and have available credits
Supported Models:
eleven_multilingual_v2- Best quality (default)eleven_turbo_v2- Faster generationeleven_turbo_v2_5- Fastest generationeleven_multilingual_v1- Legacy model
Cost Estimate:
- 10-minute audio (~750 words): approximately $1.43
- Claude API: ~$0.075
- ElevenLabs: ~$1.35
- Longer content scales proportionally
Generation Time:
- Script generation: 5-30 seconds (depending on length)
- Audio generation: 5-15 seconds (ElevenLabs processing)
- Total: Usually under 1 minute for 10-minute audio
Limitations
-
Maximum Length: 30 minutes (~2,250 words) per audio file
- For longer content, create multiple parts/episodes
-
Single Voice: Currently supports one narrator voice
- Cannot do multi-voice dialogue or character voices
-
No Background Music: Pure voice narration only
- No background music, sound effects, or audio mixing
-
Real-time Generation: Each request generates fresh content
- No pre-made templates or cached audio
-
Language: Primarily English
- ElevenLabs supports other languages, but content generation optimized for English
Example Conversations
Example 1: Quick Audiobook
User: Create a 5-minute audiobook chapter abou
---
*Content truncated.*
More by openclaw
View all skills by openclaw →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversUnlock powerful text to speech and AI voice generator tools with ElevenLabs. Create, clone, and customize speech easily.
Convert text to speech with Fish Audio. Use our AI voice generator for real-time, high-quality speech to voice, free for
Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactiv
Create modern React UI components instantly with Magic AI Agent. Integrates with top IDEs for fast, stunning design and
Effortlessly create 25+ chart types with MCP Server Chart. Visualize complex datasets using TypeScript and AntV for powe
AI-driven CAD modeling with FreeCAD: control design workflows, generate logos, and edit objects using remote Python scri
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.