transcript-fixer

Name: transcript-fixer
Author: daymade

1views

1installs

Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.

Install

mkdir -p .claude/skills/transcript-fixer && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4280" && unzip -o skill.zip -d .claude/skills/transcript-fixer && rm skill.zip

Installs to .claude/skills/transcript-fixer

About this skill

Transcript Fixer

Correct speech-to-text transcription errors through dictionary-based rules, AI-powered corrections, and automatic pattern detection. Build a personalized knowledge base that learns from each correction.

When to Use This Skill

Correcting ASR/STT errors in meeting notes, lectures, or interviews
Building domain-specific correction dictionaries
Fixing Chinese/English homophone errors or technical terminology
Collaborating on shared correction knowledge bases

Prerequisites

Python execution must use uv - never use system Python directly.

If uv is not installed:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Quick Start

Default: Native AI Correction (no API key needed)

When invoked from Claude Code, the skill uses a two-phase approach:

Dictionary phase (script): Apply 700+ learned correction rules instantly
AI phase (Claude native): Claude reads the text directly and fixes ASR errors, adds paragraph breaks, removes filler words

# First time: Initialize database
uv run scripts/fix_transcription.py --init

# Phase 1: Dictionary corrections (instant, free)
uv run scripts/fix_transcription.py --input meeting.md --stage 1

After Stage 1, Claude should:

Read the Stage 1 output in ~3000-char chunks
Identify ASR errors (homophones, technical terms, broken sentences)
Present corrections in a table for user review (high/medium confidence)
Apply confirmed corrections and save stable patterns to dictionary
Optionally: add paragraph breaks and remove excessive filler words

Alternative: API-Based Batch Processing (for automation or large volumes):

# Set API key for automated AI corrections
export GLM_API_KEY="<api-key>"  # From https://open.bigmodel.cn/

# Run full pipeline (dict + API AI + diff report)
uv run scripts/fix_transcript_enhanced.py input.md --output ./corrected

Timestamp repair:

uv run scripts/fix_transcript_timestamps.py meeting.txt --in-place

Split transcript into sections and rebase each section to 00:00:00:

uv run scripts/split_transcript_sections.py meeting.txt \
  --first-section-name "课前聊天" \
  --section "正式上课::好，无缝切换嘛。对。那个曹总连上了吗？那个网页。" \
  --section "课后复盘::我们复盘一下。" \
  --rebase-to-zero

Output files:

*_stage1.md - Dictionary corrections applied
*_corrected.txt - Final version (native mode) or *_stage2.md (API mode)
*_对比.html - Visual diff (open in browser for best experience)

Generate word-level diff (recommended for reviewing corrections):

uv run scripts/generate_word_diff.py original.md corrected.md output.html

This creates an HTML file showing word-by-word differences with clear highlighting:

🔴 japanese 3 pro → 🟢 Gemini 3 Pro (complete word replacements)
Easy to spot exactly what changed without character-level noise

Example Session

Input transcript (meeting.md):

今天我们讨论了巨升智能的最新进展。
股价系统需要优化，目前性能不够好。

After Stage 1 (meeting_stage1.md):

今天我们讨论了具身智能的最新进展。  ← "巨升"→"具身" corrected
股价系统需要优化,目前性能不够好。  ← Unchanged (not in dictionary)

After Stage 2 (meeting_stage2.md):

今天我们讨论了具身智能的最新进展。
框架系统需要优化，目前性能不够好。  ← "股价"→"框架" corrected by AI

Learned pattern detected:

✓ Detected: "股价" → "框架" (confidence: 85%, count: 1)
  Run --review-learned after 2 more occurrences to approve

Core Workflow

Two-phase pipeline stores corrections in ~/.transcript-fixer/corrections.db:

Initialize (first time): uv run scripts/fix_transcription.py --init
Add domain corrections: --add "错误词" "正确词" --domain <domain>
Phase 1 — Dictionary: --input file.md --stage 1 (instant, free)
Phase 2 — AI Correction: Claude reads output and fixes ASR errors natively (default), or use --stage 3 with GLM_API_KEY for API mode
Save stable patterns: --add "错误词" "正确词" after each fix session
Review learned patterns: --review-learned and --approve high-confidence suggestions

Domains: general, embodied_ai, finance, medical, or custom names including Chinese (e.g., 火星加速器, 具身智能) Learning: Patterns appearing ≥3 times at ≥80% confidence move from AI to dictionary

See references/workflow_guide.md for detailed workflows, references/script_parameters.md for complete CLI reference, and references/team_collaboration.md for collaboration patterns.

Critical Workflow: Dictionary Iteration

Save stable, reusable ASR patterns after each fix. This is the skill's core value.

After fixing errors manually, immediately save stable corrections to dictionary:

uv run scripts/fix_transcription.py --add "错误词" "正确词" --domain general

Do not save one-off deletions, ambiguous context-only rewrites, or section-specific cleanup to the dictionary.

See references/iteration_workflow.md for complete iteration guide with checklist.

FALSE POSITIVE RISKS -- READ BEFORE ADDING CORRECTIONS

Dictionary-based corrections are powerful but dangerous. Adding the wrong rule silently corrupts every future transcript. The --add command runs safety checks automatically, but you must understand the risks.

What is safe to add

ASR-specific gibberish: "巨升智能" -> "具身智能" (no real word sounds like "巨升智能")
Long compound errors: "语音是别" -> "语音识别" (4+ chars, unlikely to collide)
English transliteration errors: "japanese 3 pro" -> "Gemini 3 Pro"

What is NEVER safe to add

Common Chinese words: "仿佛", "正面", "犹豫", "传说", "增加", "教育" -- these appear correctly in normal text. Replacing them corrupts transcripts from better ASR models.
Words <=2 characters: Almost any 2-char Chinese string is a valid word or part of one. "线数" inside "产线数据" becomes "产线束据".
Both sides are real words: "仿佛->反复", "犹豫->抑郁" -- both forms are valid Chinese. The "error" is only an error for one specific ASR model.

When in doubt, use a context rule instead

Context rules use regex patterns that match only in specific surroundings, avoiding false positives:

# Instead of: --add "线数" "线束"
# Use a context rule in the database:
sqlite3 ~/.transcript-fixer/corrections.db "INSERT INTO context_rules (pattern, replacement, description, priority) VALUES ('(?<!产)线数(?!据)', '线束', 'ASR: 线数->线束 (not inside 产线数据)', 10);"

Auditing the dictionary

Run --audit periodically to scan all rules for false positive risks:

uv run scripts/fix_transcription.py --audit
uv run scripts/fix_transcription.py --audit --domain manufacturing

Forcing a risky addition

If you understand the risks and still want to add a flagged rule:

uv run scripts/fix_transcription.py --add "仿佛" "反复" --domain general --force

Native AI Correction (Default Mode)

Claude IS the AI. When running inside Claude Code, use Claude's own language understanding for Stage 2 corrections instead of calling an external API. This is the default behavior — no API key needed.

Workflow

Run Stage 1 (dictionary): uv run scripts/fix_transcription.py --input file.md --stage 1
Read the text in ~3000-character chunks (use cut -c<start>-<end> for single-line files)
Identify ASR errors — look for:
- Homophone errors (同音字): "上海文" → "上下文", "扩种" → "扩充"
- Broken sentence boundaries: "很大程。路上" → "很大程度上"
- Technical terms: "Web coding" → "Vibe Coding"
- Missing/extra characters: "沉沉默" → "沉默"
Present corrections in a table with confidence levels before applying:
- High confidence: clear ASR errors with unambiguous corrections
- Medium confidence: context-dependent, need user confirmation
Apply corrections to a copy of the file (never modify the original)
Save stable patterns to dictionary: --add "错误词" "正确词" --domain general
Generate word diff: uv run scripts/generate_word_diff.py original.md corrected.md diff.html

Enhanced AI Capabilities (Native Mode Only)

Native mode can do things the API mode cannot:

Intelligent paragraph breaks: Add \n\n at logical topic transitions in continuous text
Filler word reduction: Remove excessive repetition (这个这个这个 → 这个, 都都都都 → 都)
Interactive review: Present corrections for user confirmation before applying
Context-aware judgment: Use full document context to resolve ambiguous errors

When to Use API Mode Instead

Use GLM_API_KEY + Stage 3 for:

Batch processing multiple files in automation
When Claude Code is not available (standalone script usage)
Consistent reproducible processing without interactive review

Legacy Fallback Marker

When the script outputs [CLAUDE_FALLBACK] (GLM API error), switch to native mode automatically.

Database Operations

MUST read references/database_schema.md before any database operations.

Quick reference:

# View all corrections
sqlite3 ~/.transcript-fixer/corrections.db "SELECT * FROM active_corrections;"

# Check schema version
sqlite3 ~/.transcript-fixer/corrections.db "SELECT value FROM system_config WHERE key='schema_version';"

Stages

Stage	Description	Speed	Cost
1	Dictionary only	Instant	Free
1 + Native	Dictionary + Claude AI (default)	~1min	Free
3	Dictionary + API AI + diff report	~10s	API calls

Bundled Resources

Scripts:

ensure_deps.py - Initialize shared virtual environment (run once, optional)
fix_transcript_enhanced.py - Enhanced wrapper (recommended for interactive use)
fix_transcription.py - Core CLI (for automation)
fix_transcript_timestamps.py - Normalize/repair speaker timestamps and optionally rebase to zero
generate_word_diff.py - Generate word-level diff HTML for reviewing corrections
split_transcript_sections.py - Split a transcript by marker phrases and optionally rebase each section
`exa

Content truncated.

More by daymade

View all skills by daymade →

ppt-creator

daymade

Create professional slide decks from topics or documents. Generates structured content with data-driven charts, speaker notes, and complete PPTX files. Applies persuasive storytelling principles (Pyramid Principle, assertion-evidence). Supports multiple formats (Marp, PowerPoint). Use for presentations, pitches, slide decks, or keynotes.

12988

macos-cleaner

daymade

Analyze and reclaim macOS disk space through intelligent cleanup recommendations. This skill should be used when users report disk space issues, need to clean up their Mac, or want to understand what's consuming storage. Focus on safe, interactive analysis with user confirmation before any deletions.

3418

qa-expert

daymade

This skill should be used when establishing comprehensive QA testing processes for any software project. Use when creating test strategies, writing test cases following Google Testing Standards, executing test plans, tracking bugs with P0-P4 classification, calculating quality metrics, or generating progress reports. Includes autonomous execution capability via master prompts and complete documentation templates for third-party QA team handoffs. Implements OWASP security testing and achieves 90% coverage targets.

2816

markdown-tools

daymade

Converts documents to markdown with multi-tool orchestration for best quality. Supports Quick Mode (fast, single tool) and Heavy Mode (best quality, multi-tool merge). Use when converting PDF/DOCX/PPTX files to markdown, extracting images from documents, validating conversion quality, or needing LLM-optimized document output.

4410

repomix-unmixer

daymade

Extracts files from repomix-packed repositories, restoring original directory structures from XML/Markdown/JSON formats. Activates when users need to unmix repomix files, extract packed repositories, restore file structures from repomix output, or reverse the repomix packing process.

246

teams-channel-post-writer

daymade

Creates educational Teams channel posts for internal knowledge sharing about Claude Code features, tools, and best practices. Applies when writing posts, announcements, or documentation to teach colleagues effective Claude Code usage, announce new features, share productivity tips, or document lessons learned. Provides templates, writing guidelines, and structured approaches emphasizing concrete examples, underlying principles, and connections to best practices like context engineering. Activates for content involving Teams posts, channel announcements, feature documentation, or tip sharing.

925

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,8502,515

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,7671,646

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1461,638

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,2581,461

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,4491,216

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,947965

Related MCP Servers

Browse all servers

NextJS

Supercharge your NextJS projects with AI-powered tools for diagnostics, upgrades, and docs. Accelerate development and boost productivity today.

6657 tools

Claude Historian

Claude Historian: AI-powered search for Claude Code conversations—find files, errors, context, and sessions via JSONL parsing and smart query expansion.

2168 tools

Logfire

Logfire is a data observability platform for querying, analyzing, and monitoring OpenTelemetry traces, errors, and metrics with secure, automated access.

1530 tools

Ask Human

Ask Human adds human-in-the-loop responses to AI, preventing errors on sensitive tasks like passwords and API endpoints.

1500 tools

YouTube Transcript

Easily fetch and analyze YouTube transcripts by video URL or ID. Use our YouTube transcript tool for fast content analysis and transcription.

600 tools

Korean Spell Checker (Naver)

Korean Spell Checker (Naver) fixes grammar errors and typos in Korean text using Naver's advanced spelling correction service.

211 tools

Install

mkdir -p .claude/skills/transcript-fixer && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4280" && unzip -o skill.zip -d .claude/skills/transcript-fixer && rm skill.zip

Installs to .claude/skills/transcript-fixer

Stats

Views

Installs

Author

daymade

7 skills published

Links

Source Code

transcript-fixer

Install

About this skill

Transcript Fixer

When to Use This Skill

Prerequisites

Quick Start

Example Session

Core Workflow

Critical Workflow: Dictionary Iteration

FALSE POSITIVE RISKS -- READ BEFORE ADDING CORRECTIONS

What is safe to add

What is NEVER safe to add

When in doubt, use a context rule instead

Auditing the dictionary

Forcing a risky addition

Native AI Correction (Default Mode)

Workflow

Enhanced AI Capabilities (Native Mode Only)

When to Use API Mode Instead

Legacy Fallback Marker

Database Operations

Stages

Bundled Resources

More by daymade

ppt-creator

macos-cleaner

qa-expert

markdown-tools

repomix-unmixer

teams-channel-post-writer

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers