lecture-transcript-slide-matcher

68views

5installs

Combines YouTube lecture transcripts with PDF slides to create an interactive HTML page. Matches each slide to corresponding transcript segments, organized by key concepts. Use when users want to create synchronized lecture notes from transcript text files and slide PDFs.

Install

mkdir -p .claude/skills/lecture-transcript-slide-matcher && curl -L -o skill.zip "https://mcp.directory/api/skills/download/485" && unzip -o skill.zip -d .claude/skills/lecture-transcript-slide-matcher && rm skill.zip

Installs to .claude/skills/lecture-transcript-slide-matcher

About this skill

Lecture Transcript and Slide Matcher

Combines YouTube lecture transcripts (txt files) with corresponding PDF slides to create an interactive HTML page with synchronized content organized by key concepts.

Overview

This skill processes lecture materials and generates an HTML page with:

Left-hand table of contents (TOC) with key concepts
Main content area with slides and transcript segments for each concept
Automatic transcript cleaning (removes fillers, formats paragraphs)
Visual separation between sections

Workflow

The matching process involves these steps:

Convert transcript - Standardize timestamp format in transcript
Analyze content - Extract information from transcript and PDF
Create mapping - Match concepts to slides and transcript segments
Generate HTML - Produce the final interactive page

Step 1: Covert transcript

Run the conversion script to standardize the transcript timestamp format:

python scripts/convert_transcript.py <transcript_input.txt> <transcript_output.pdf>

This script:

Reads timestamps from separate lines
Converts them to [MM:SS] or [H:MM:SS] format
Attaches timestamps inline with text
Outputs a new transcript text file

Step 2: Analyze Content

Run the analysis script to understand the lecture materials:

python scripts/analyze_content.py <transcript.txt> <slides.pdf> [output_analysis.json]

This script:

Parses all transcript segments with timestamps
Extracts text previews from each PDF slide
Creates a mapping template
Outputs content_analysis.json with all information

What to do:

Run the analysis script
Review the output JSON file
Examine transcript segments and slide previews
Identify the key concepts in the lecture

Step 3: Create Mapping

Create a mapping.json file that connects concepts to slides and transcript segments.

Option A: Let Claude create the mapping

After running the analysis script, ask Claude to create the mapping by providing:

The content_analysis.json file
The original transcript file (for full text)
Instructions on how to identify key concepts

Claude will analyze the content and create a comprehensive mapping.

Option B: Manual creation

Use the template in content_analysis.json as a starting point. See references/mapping_schema.md for complete documentation.

Mapping Structure

[
  {
    "title": "Key concept or insight",
    "slide_indices": [0, 1, 2],
    "transcript_segments": [
      {
        "start_time": "MM:SS or HH:MM:SS",
        "end_time": "MM:SS or HH:MM:SS",
        "text": "Full transcript text from this time range"
      }
    ]
  }
]

Key points:

Use 0-based indexing for slides (first slide = 0)
Timestamps must match format in transcript: [HH:MM:SS] or [MM:SS]
Include full transcript text, not summaries
Each TOC item represents one coherent concept
Multiple slides and transcript segments can map to one concept

See references/mapping_schema.md for detailed schema documentation and examples.

Step 4: Generate HTML

Run the generation script to create the final HTML page:

python scripts/match_lecture_content.py <transcript.txt> <slides.pdf> <mapping.json> [output.html]

The script:

Parses the transcript and extracts all segments
Converts PDF pages to images (embedded as base64)
Reads the mapping JSON
Generates an interactive HTML page with:
- Left panel with TOC (clickable navigation)
- Main area with sections for each concept
- Slides displayed as images
- Cleaned and formatted transcript segments
- Visual separation between sections

Output: lecture_output.html (or specified filename)

Transcript Format Requirements

The transcript must use timestamp markers:

[00:15] Welcome to today's lecture on machine learning.
[00:45] We'll start by discussing supervised learning...
[02:30] Now let's look at an example with house prices...

Supported timestamp formats:

[HH:MM:SS] - Hours, minutes, seconds
[MM:SS] - Minutes, seconds
[H:MM:SS] - Single-digit hours

Automatic Transcript Cleaning

The script automatically:

Removes filler words (um, uh, like, you know, etc.)
Removes conversational artifacts ([inaudible], [laughter], etc.)
Condenses multiple spaces
Breaks text into readable paragraphs (50 words per paragraph)
Displays only start and end timestamps for continuous segments

HTML Output Features

Table of Contents (Left Panel)

Clickable items for navigation
Highlights current section on scroll
Fixed width, scrollable
Responsive (collapses on mobile)

Content Area

One section per TOC item
Section title as header
Slides displayed as images
Transcript segments below slides
Time range badges for each segment
Visual separators between sections
Smooth scrolling

Styling

Clean, professional appearance
Blue accent colors
Readable typography
Shadow effects for slides
Highlighted transcript containers

Best Practices

Identifying Key Concepts

Good concept granularity:

"Linear Regression: Mathematical Formulation"
"Gradient Descent Algorithm"
"Neural Networks: Forward Propagation"

Too broad:

"Machine Learning Overview" (entire lecture)

Too narrow:

"Definition of Theta" (single term)

Creating Effective Mappings

One concept per TOC item: Each entry should represent one coherent idea
Logical ordering: Follow lecture sequence
Complete coverage: Include all major concepts
Accurate alignment: Ensure slides and transcript truly correspond
Full transcript text: Don't summarize; include everything from the time range

Handling Edge Cases

Concept spans non-contiguous slides:

{
  "title": "Example: Housing Price Prediction",
  "slide_indices": [5, 8, 12],
  "transcript_segments": [...]
}

Multiple transcript segments per concept:

{
  "title": "Backpropagation",
  "slide_indices": [15],
  "transcript_segments": [
    {"start_time": "20:00", "end_time": "22:30", "text": "..."},
    {"start_time": "23:00", "end_time": "25:45", "text": "..."}
  ]
}

No slides for a concept (discussion only):

{
  "title": "Q&A: Common Misconceptions",
  "slide_indices": [],
  "transcript_segments": [...]
}

Dependencies

The scripts require PyMuPDF for PDF processing:

pip install pymupdf --break-system-packages

Claude handles installation automatically when needed.

Example Usage

Complete workflow example:

# Step 1: Analyze
python scripts/analyze_content.py lecture.txt slides.pdf analysis.json

# Step 2: Create mapping (manually or with Claude's help)
# Edit analysis.json or create new mapping.json

# Step 3: Generate HTML
python scripts/match_lecture_content.py lecture.txt slides.pdf mapping.json output.html

Reference Files

references/mapping_schema.md - Complete JSON schema documentation with examples
references/example_mapping.json - Sample mapping for a machine learning lecture

Troubleshooting

"PyMuPDF not installed" Run: pip install pymupdf --break-system-packages

Timestamps don't match Ensure timestamps in mapping.json exactly match those in the transcript file.

Slides not displaying Verify slide_indices are 0-based (first slide = 0, not 1).

Text looks messy The cleaning is automatic. If issues persist, check for unusual formatting in the transcript.

Missing concepts Review the analysis output to ensure all relevant transcript segments and slides are covered.

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,5691,369

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,1151,187

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,4171,108

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,192747

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,152683

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,310614

Related MCP Servers

Browse all servers

YouTube Transcripts

Extract and analyze YouTube transcripts in multiple languages. Use our YouTube transcriptor to easily transcribe for You

4881 tools

Fetch (Web Content & YouTube Transcripts)

Fetch is a web scraping tool that extracts web content and YouTube transcripts, converting HTML to Markdown with accurat

1572 tools

YouTube Transcript

Easily fetch and analyze YouTube transcripts by video URL or ID. Use our YouTube transcript tool for fast content analys

600 tools

YouTube Data

Retrieve and transcribe YouTube transcripts, channel stats, and video engagement seamlessly using YouTube Data API integ

580 tools

YouTube Transcript

Extract and format YouTube transcripts with language selection, paragraph formatting, and enriched metadata for analysis

301 tools

YouTube Subtitles

Retrieve and transcribe YouTube transcripts with natural language queries. Easily get YouTube transcript or convert YouT

41 tools

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.

Install

mkdir -p .claude/skills/lecture-transcript-slide-matcher && curl -L -o skill.zip "https://mcp.directory/api/skills/download/485" && unzip -o skill.zip -d .claude/skills/lecture-transcript-slide-matcher && rm skill.zip

Installs to .claude/skills/lecture-transcript-slide-matcher

Stats

Views

Installs

Author

az9713

Links

Source Code

lecture-transcript-slide-matcher

Install

About this skill

Lecture Transcript and Slide Matcher

Overview

Workflow

Step 1: Covert transcript

Step 2: Analyze Content

Step 3: Create Mapping

Mapping Structure

Step 4: Generate HTML

Transcript Format Requirements

Automatic Transcript Cleaning

HTML Output Features

Table of Contents (Left Panel)

Content Area

Styling

Best Practices

Identifying Key Concepts

Creating Effective Mappings

Handling Edge Cases

Dependencies

Example Usage

Reference Files

Troubleshooting

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers

Stay ahead of the MCP ecosystem