voice-note-to-midi

2views

2installs

Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing

Install

mkdir -p .claude/skills/voice-note-to-midi && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7202" && unzip -o skill.zip -d .claude/skills/voice-note-to-midi && rm skill.zip

Installs to .claude/skills/voice-note-to-midi

About this skill

🎵 Voice Note to MIDI

Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.

What It Does

This skill provides a complete audio-to-MIDI conversion pipeline that:

Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds
ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction
Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles
Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction
Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output

Pipeline Architecture

Audio Input (WAV/M4A/MP3)
    ↓
┌─────────────────────────────────────┐
│ Step 1: Stem Separation (HPSS)     │
│ - Isolate harmonic content          │
│ - Remove drums/percussion           │
│ - Noise gating                      │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Step 2: Pitch Detection             │
│ - Basic Pitch ML model (Spotify)    │
│ - Polyphonic note detection         │
│ - Onset/offset estimation           │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Step 3: Analysis                    │
│ - Pitch class distribution          │
│ - Key detection                     │
│ - Dominant note identification      │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Step 4: Quantization & Cleanup      │
│ - Timing grid snap                  │
│ - Key-aware pitch correction        │
│ - Octave pruning (harmonic removal) │
│ - Overlap-based pruning             │
│ - Note merging (legato)             │
│ - Velocity normalization            │
└─────────────────────────────────────┘
    ↓
MIDI Output (Standard MIDI File)

Setup

Prerequisites

Python 3.11+ (Python 3.14+ recommended)
FFmpeg (for audio format support)
pip

Installation

Quick Install (Recommended):

cd /path/to/voice-note-to-midi
./setup.sh

This automated script will:

Check Python 3.11+ is installed
Create the ~/melody-pipeline directory
Set up the virtual environment
Install all dependencies (basic-pitch, librosa, music21, etc.)
Download and configure the hum2midi script
Add melody-pipeline to your PATH

Manual Install:

If you prefer manual setup:

mkdir -p ~/melody-pipeline
cd ~/melody-pipeline
python3 -m venv venv-bp
source venv-bp/bin/activate
pip install basic-pitch librosa soundfile mido music21
chmod +x ~/melody-pipeline/hum2midi

Add to your PATH (optional):

echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc
source ~/.bashrc

Verify Installation

cd ~/melody-pipeline
./hum2midi --help

Usage

Basic Usage

Convert a voice memo to MIDI:

./hum2midi my_humming.wav

This creates my_humming.mid with 16th-note quantization.

Specify Output File

./hum2midi input.wav output.mid

Command-Line Options

Option	Description	Default
`--grid <value>`	Quantization grid: `1/4`, `1/8`, `1/16`, `1/32`	`1/16`
`--min-note <ms>`	Minimum note duration in milliseconds	`50`
`--no-quantize`	Skip quantization (output raw Basic Pitch MIDI)	disabled
`--key-aware`	Enable key-aware pitch correction	disabled
`--no-analysis`	Skip pitch analysis and key detection	disabled

Usage Examples

Quantize to eighth notes

./hum2midi melody.wav --grid 1/8

Key-aware quantization (recommended for tonal music)

./hum2midi song.wav --key-aware

Require longer minimum notes

./hum2midi humming.wav --min-note 100

Skip analysis for faster processing

./hum2midi quick.wav --no-analysis

Combine options

./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80

Processing MIDI Input

You can also process existing MIDI files through the quantization pipeline:

./hum2midi input.mid output.mid --grid 1/16 --key-aware

This skips the audio processing steps and goes directly to analysis and quantization.

Sample Output

═══════════════════════════════════════════════════════════════
  hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition)
  [Key-Aware Mode Enabled]
═══════════════════════════════════════════════════════════════

Input:  my_humming.wav
Output: my_humming.mid

→ Step 1: Stem Separation (HPSS)
  Isolating melodic content...
  Loaded: 5.23s @ 44100Hz
  ✓ Melody stem extracted → 5.23s

→ Step 2: Audio-to-MIDI Conversion (Basic Pitch)
  Running Spotify's Basic Pitch ML model on melody stem...
  ✓ Raw MIDI generated (Basic Pitch)

→ Step 3: Pitch Analysis & Key Detection
  Notes detected: 42 total, 7 unique
  Note range: C3 - G4
  Pitch classes: C3, E3, G3, A3, C4, D4, G4
  Dominant note: G3 (23.8% of notes)
  Detected key: G major

→ Step 4: Quantization & Cleanup
  Octave pruning: removed 3 harmonic notes above 67 (median+12)
  Overlap pruning: removed 2 harmonic notes at overlapping positions
  Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks)
  Grid:   240 ticks (1/16)
  Notes:  38 notes
  Key:    G major
  Key-aware: 2 notes corrected to scale
  Tempo:  120 BPM
  ✓ Quantized MIDI saved

═══════════════════════════════════════════════════════════════
  ✓ Done! Output: my_humming.mid
═══════════════════════════════════════════════════════════════

📊 ANALYSIS SUMMARY
─────────────────────────────────────────────────────────────
  Detected Notes: C3, E3, G3, A3, C4, D4, G4
  Detected Key:   G major
  Quantization:   Key-aware mode (notes snapped to scale)

MIDI Info: 38 notes, 7 unique pitches, 120 BPM
Pitches: C3, E3, G3, A3, C4, D4, G4

Notes & Limitations

Audio Quality Matters

Clear, loud melody produces the best results
Background noise can cause false note detection
Reverb and effects may confuse pitch detection
Close-mic'd vocals work significantly better than room recordings

Musical Considerations

Monophonic sources work best (single melody line)
Polyphonic audio (chords, multiple instruments) will produce messy results
Vibrato and pitch bends may be quantized to stepped pitches
Rapid note passages may be missed or merged

Technical Limitations

Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
Note velocities are normalized but may need manual adjustment
Very short notes (<50ms) may be filtered out by default
Extreme pitch ranges may cause octave detection issues

Post-Processing Recommendations

After generating MIDI, you may want to:

Import into your DAW and adjust tempo to match your original recording
Quantize further if stricter timing is needed
Adjust note velocities for dynamics
Apply swing/groove templates if the rigid grid sounds too mechanical
Edit individual notes that were misdetected (common with fast runs)

Supported Audio Formats

Input formats supported via FFmpeg:

WAV, AIFF, FLAC (uncompressed, best quality)
MP3, M4A, AAC (compressed, acceptable)
OGG, OPUS (open source formats)
Most other formats FFmpeg supports

Troubleshooting

No notes detected

Check that input file isn't silent or corrupted
Try increasing --min-note threshold
Verify audio has clear melodic content (not just noise)

Too many notes / messy output

Enable octave pruning and overlap pruning (on by default)
Use --key-aware to constrain to musical scale
Check for background noise in source audio

Wrong key detected

Key detection works best with at least 8-10 measures of music
Chromatic passages may confuse the detector
Manually review and adjust in your DAW if needed

Notes in wrong octave

Basic Pitch sometimes detects harmonics instead of fundamentals
The pipeline includes pruning, but some may slip through
Use your DAW's transpose function for simple octave shifts

References

Basic Pitch - Spotify's polyphonic pitch detection model
librosa HPSS - Harmonic-Percussive Source Separation
Krumhansl-Kessler Key Profiles - Key detection algorithm

License

This skill integrates Basic Pitch by Spotify, which is licensed under Apache 2.0. The pipeline script and documentation are provided under MIT license.

More by openclaw

View all skills by openclaw →

a-stock-analysis

openclaw

A股实时行情与分时量能分析。获取沪深股票实时价格、涨跌、成交量，分析分时量能分布（早盘/尾盘放量）、主力动向（抢筹/出货信号）、涨停封单。支持持仓管理和盈亏分析。Use when: (1) 查询A股实时行情, (2) 分析主力资金动向, (3) 查看分时成交量分布, (4) 管理股票持仓, (5) 分析持仓盈亏。

458176

Creates formal academic research papers following IEEE/ACM formatting standards with proper structure, citations, and scholarly writing style. Use when the user asks to write a research paper, academic paper, or conference paper on any topic.

53122

weread

openclaw

WeChat Reading (微信读书) CLI tool for fetching notes and highlights. Use when: (1) user asks about weread/微信读书 notes or highlights, (2) fetching today's or recent reading notes, (3) exporting book highlights, (4) managing reading bookshelf, (5) any task involving reading notes from WeChat Reading.

6376

gog

openclaw

Google Workspace CLI for Gmail, Calendar, Drive, Contacts, Sheets, and Docs.

17973

keyword-research

openclaw

Discovers high-value keywords with search intent analysis, difficulty assessment, and content opportunity mapping. Essential for starting any SEO or GEO content strategy.

31666

seedream-image-gen

openclaw

Generate images via Seedream API (doubao-seedream models). Synchronous generation.

4063

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,5591,560

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,8261,484

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,7071,236

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,614905

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,897835

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,436791

Related MCP Servers

Browse all servers

ElevenLabs

Unlock powerful text to speech and AI voice generator tools with ElevenLabs. Create, clone, and customize speech easily.

1,2480 tools

Markdown Mindmap

Transform your notes with Markdown Mindmap—convert Markdown into interactive mind maps for organized, visual knowledge r

2201 tools

Rime Text-to-Speech

Convert text to speech instantly using Rime's API. Enjoy fast, streaming AI voice generation with minimal latency. Try o

280 tools

Fish Audio

Convert text to speech with Fish Audio. Use our AI voice generator for real-time, high-quality speech to voice, free for

100 tools

PDF.co

Automate document workflows with PDF.co: convert PDF into text, use OCR text recognition, merge, split, and process PDFs

90 tools

Pandoc Markdown to PowerPoint

Pandoc Markdown to PowerPoint converts Markdown to PowerPoint with diagrams, templates, and custom file paths for fast d

51 tools

Install

mkdir -p .claude/skills/voice-note-to-midi && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7202" && unzip -o skill.zip -d .claude/skills/voice-note-to-midi && rm skill.zip

Installs to .claude/skills/voice-note-to-midi

Stats

Views

Installs

Author

openclaw

7 skills published

Links

Source Code

voice-note-to-midi

Install

About this skill

🎵 Voice Note to MIDI

What It Does

Pipeline Architecture

Setup

Prerequisites

Installation

Verify Installation

Usage

Basic Usage

Specify Output File

Command-Line Options

Usage Examples

Quantize to eighth notes

Key-aware quantization (recommended for tonal music)

Require longer minimum notes

Skip analysis for faster processing

Combine options

Processing MIDI Input

Sample Output

Notes & Limitations

Audio Quality Matters

Musical Considerations

Technical Limitations

Post-Processing Recommendations

Supported Audio Formats

Troubleshooting

No notes detected

Too many notes / messy output

Wrong key detected

Notes in wrong octave

References

License

More by openclaw

a-stock-analysis

research-paper-writer

weread

gog

keyword-research

seedream-image-gen

You might also like

ui-ux-pro-max

flutter-development

drawio-diagrams-enhanced

godot

pdf-to-markdown

nano-banana-pro

Related MCP Servers