speech
Use when implementing speech-to-text, live transcription, or audio transcription. Covers SpeechAnalyzer (iOS 26+), SpeechTranscriber, volatile/finalized results, AssetInventory model management, audio format handling.
Install
mkdir -p .claude/skills/speech && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4992" && unzip -o skill.zip -d .claude/skills/speech && rm skill.zipInstalls to .claude/skills/speech
About this skill
Speech-to-Text with SpeechAnalyzer
Overview
SpeechAnalyzer is Apple's new speech-to-text API introduced in iOS 26. It powers Notes, Voice Memos, Journal, and Call Summarization. The on-device model is faster, more accurate, and better for long-form/distant audio than SFSpeechRecognizer.
Key principle: SpeechAnalyzer is modular—add transcription modules to an analysis session. Results stream asynchronously using Swift's AsyncSequence.
Decision Tree - SpeechAnalyzer vs SFSpeechRecognizer
Need speech-to-text?
├─ iOS 26+ only?
│ └─ Yes → SpeechAnalyzer (preferred)
├─ Need iOS 10-25 support?
│ └─ Yes → SFSpeechRecognizer (or DictationTranscriber)
├─ Long-form audio (meetings, lectures)?
│ └─ Yes → SpeechAnalyzer
├─ Distant audio (across room)?
│ └─ Yes → SpeechAnalyzer
└─ Short dictation commands?
└─ Either works
SpeechAnalyzer advantages:
- Better for long-form and conversational audio
- Works well with distant speakers (meetings)
- On-device, private
- Model managed by system (no app size increase)
- Powers Notes, Voice Memos, Journal
DictationTranscriber (iOS 26+): Same languages as SFSpeechRecognizer, but doesn't require user to enable Siri/dictation in Settings.
Red Flags
Use this skill when you see:
- "Live transcription"
- "Transcribe audio"
- "Speech-to-text"
- "SpeechAnalyzer" or "SpeechTranscriber"
- "Volatile results"
- Building Notes-like or Voice Memos-like features
Pattern 1 - File Transcription (Simplest)
Transcribe an audio file to text in one function.
import Speech
func transcribe(file: URL, locale: Locale) async throws -> AttributedString {
// Set up transcriber
let transcriber = SpeechTranscriber(
locale: locale,
preset: .offlineTranscription
)
// Collect results asynchronously
async let transcriptionFuture = try transcriber.results
.reduce(AttributedString()) { str, result in
str + result.text
}
// Set up analyzer with transcriber module
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Analyze the file
if let lastSample = try await analyzer.analyzeSequence(from: file) {
try await analyzer.finalizeAndFinish(through: lastSample)
} else {
await analyzer.cancelAndFinishNow()
}
return try await transcriptionFuture
}
Key points:
analyzeSequence(from:)reads file and feeds audio to analyzerfinalizeAndFinish(through:)ensures all results are finalized- Results are
AttributedStringwith timing metadata
Pattern 2 - Live Transcription Setup
For real-time transcription from microphone.
Step 1 - Configure SpeechTranscriber
import Speech
class TranscriptionManager: ObservableObject {
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
private var analyzerFormat: AudioFormatDescription?
private var inputBuilder: AsyncStream<AnalyzerInput>.Continuation?
@Published var finalizedTranscript = AttributedString()
@Published var volatileTranscript = AttributedString()
func setUp() async throws {
// Create transcriber with options
transcriber = SpeechTranscriber(
locale: Locale.current,
transcriptionOptions: [],
reportingOptions: [.volatileResults], // Enable real-time updates
attributeOptions: [.audioTimeRange] // Include timing
)
guard let transcriber else { throw TranscriptionError.setupFailed }
// Create analyzer with transcriber module
analyzer = SpeechAnalyzer(modules: [transcriber])
// Get required audio format
analyzerFormat = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
// Ensure model is available
try await ensureModel(for: transcriber)
// Create input stream
let (stream, builder) = AsyncStream<AnalyzerInput>.makeStream()
inputBuilder = builder
// Start analyzer
try await analyzer?.start(inputSequence: stream)
}
}
Step 2 - Ensure Model Availability
func ensureModel(for transcriber: SpeechTranscriber) async throws {
let locale = Locale.current
// Check if language is supported
let supported = await SpeechTranscriber.supportedLocales
guard supported.contains(where: {
$0.identifier(.bcp47) == locale.identifier(.bcp47)
}) else {
throw TranscriptionError.localeNotSupported
}
// Check if model is installed
let installed = await SpeechTranscriber.installedLocales
if installed.contains(where: {
$0.identifier(.bcp47) == locale.identifier(.bcp47)
}) {
return // Already installed
}
// Download model
if let downloader = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
// Track progress if needed
let progress = downloader.progress
try await downloader.downloadAndInstall()
}
}
Note: Models are stored in system storage, not app storage. Limited number of languages can be allocated at once.
Step 3 - Handle Results
func startResultHandling() {
Task {
guard let transcriber else { return }
do {
for try await result in transcriber.results {
let text = result.text
if result.isFinal {
// Finalized result - won't change
finalizedTranscript += text
volatileTranscript = AttributedString()
// Access timing info
for run in text.runs {
if let timeRange = run.audioTimeRange {
print("Time: \(timeRange)")
}
}
} else {
// Volatile result - will be replaced
volatileTranscript = text
}
}
} catch {
print("Transcription failed: \(error)")
}
}
}
Pattern 3 - Audio Recording and Streaming
Connect AVAudioEngine to SpeechAnalyzer.
import AVFoundation
class AudioRecorder {
private let audioEngine = AVAudioEngine()
private var outputContinuation: AsyncStream<AVAudioPCMBuffer>.Continuation?
private let transcriptionManager: TranscriptionManager
func startRecording() async throws {
// Request permission
guard await AVAudioApplication.requestRecordPermission() else {
throw RecordingError.permissionDenied
}
// Configure audio session (iOS)
#if os(iOS)
let session = AVAudioSession.sharedInstance()
try session.setCategory(.playAndRecord, mode: .spokenAudio)
try session.setActive(true, options: .notifyOthersOnDeactivation)
#endif
// Set up transcriber
try await transcriptionManager.setUp()
transcriptionManager.startResultHandling()
// Stream audio to transcriber
for await buffer in try audioStream() {
try await transcriptionManager.streamAudio(buffer)
}
}
private func audioStream() throws -> AsyncStream<AVAudioPCMBuffer> {
let inputNode = audioEngine.inputNode
let format = inputNode.outputFormat(forBus: 0)
inputNode.installTap(
onBus: 0,
bufferSize: 4096,
format: format
) { [weak self] buffer, time in
self?.outputContinuation?.yield(buffer)
}
audioEngine.prepare()
try audioEngine.start()
return AsyncStream { continuation in
outputContinuation = continuation
}
}
}
Stream Audio with Format Conversion
extension TranscriptionManager {
private var converter: AVAudioConverter?
func streamAudio(_ buffer: AVAudioPCMBuffer) async throws {
guard let inputBuilder, let analyzerFormat else {
throw TranscriptionError.notSetUp
}
// Convert to analyzer's required format
let converted = try convertBuffer(buffer, to: analyzerFormat)
// Send to analyzer
let input = AnalyzerInput(buffer: converted)
inputBuilder.yield(input)
}
private func convertBuffer(
_ buffer: AVAudioPCMBuffer,
to format: AudioFormatDescription
) throws -> AVAudioPCMBuffer {
// Lazy initialize converter
if converter == nil {
let sourceFormat = buffer.format
let destFormat = AVAudioFormat(cmAudioFormatDescription: format)!
converter = AVAudioConverter(from: sourceFormat, to: destFormat)
}
guard let converter else {
throw TranscriptionError.conversionFailed
}
let outputBuffer = AVAudioPCMBuffer(
pcmFormat: converter.outputFormat,
frameCapacity: buffer.frameLength
)!
try converter.convert(to: outputBuffer, from: buffer)
return outputBuffer
}
}
Pattern 4 - Stopping Transcription
Properly finalize to get remaining volatile results as finalized.
func stopRecording() async {
// Stop audio
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
outputContinuation?.finish()
// Finalize transcription (converts remaining volatile to final)
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
// Cancel any pending tasks
recognizerTask?.cancel()
}
Critical: Always call finalizeAndFinishThroughEndOfInput() to ensure volatile results are finalized.
Pattern 5 - Model Asset Management
Check Supported Languages
// Languages the API supports
let supported = await SpeechTranscriber.supportedLocales
// Languages currently installed on device
let installed = await SpeechTranscriber.installedLocales
Deallocate Languages
Content truncated.
More by CharlesWiltgen
View all skills by CharlesWiltgen →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversVoice MCP powers two-way voice apps with Google Cloud Speech to Text, Speech Recognition, and Text to Speech API for acc
Rtfmbro is an MCP server for config management tools—get real-time, version-specific docs from GitHub for Python, Node.j
Transcribe for YouTube and other platforms. Extract accurate transcript of a YouTube video for accessibility, analysis,
Access Intercom data securely via a remote MCP server with authenticated connections for AI tools and live updates.
AI-driven control of live Chrome via Chrome DevTools: browser automation, debugging, performance analysis and network mo
Unlock seamless Figma to code: streamline Figma to HTML with Framelink MCP Server for fast, accurate design-to-code work
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.