ocr-image-to-markdown

Name: ocr-image-to-markdown
Author: hugohe3

by hugohe3

41views

10installs

Source

鉴于本地 OCR 工具的缺失，本技能利用 Agent 的多模态能力来查看图像（PNG, JPG 等）并将内容（文本、表格、逻辑图）转录为格式化的 Markdown。

Install

mkdir -p .claude/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://mcp.directory/api/skills/download/1744" && unzip -o skill.zip -d .claude/skills/ocr-image-to-markdown && rm skill.zip

Installs to .claude/skills/ocr-image-to-markdown

About this skill

OCR 图像识别转 Markdown

本技能允许你“阅读”图片并将内容转换为可编辑的 Markdown 文本。这在提取数据表格、幻灯片内容或文档截图时特别有用，尤其是当无法使用外部 OCR 库时。

使用指南

确认目标图片:
- 定位你需要处理的图片文件。
- 如有需要，使用 list_dir 浏览目录。
查看图片:
- 使用 view_file 工具来“看”图片内容。系统允许你直接处理图像数据。
- 关键: 你必须对图片路径使用 view_file，这样你的视觉模型才能消化它。
转录内容:
- 基于你所看到的，将文本转录为 Markdown。
- 表格: 将视觉看到的表格转换为标准 Markdown 表格 (| 表头 | ... |)。
- 标题: 使用 #, ## 等来标记图片中的标题，保持层级结构。
- 文本: 将段落转录为普通文本。
- 数字: 仔细核对所有数字，特别是财务报表中的数据。
保存输出:
- 使用 write_to_file 将转录的内容写入 .md 文件（例如 ocr_results.md）。
- 如果处理多张图片，考虑将其追加到同一个文件中，或按逻辑组织。

最佳实践技巧

表格: 仔细对齐行和列。标准 Markdown 表格不支持单元格合并（rowspan/colspan）。你需要根据逻辑流将合并的单元格展开，或者留空。
复杂布局: 如果图片布局复杂（例如左右分栏），请按照逻辑阅读顺序（从上到下，从左到右）将其序列化。
图表/图形: 如果图片包含图表，请描述趋势，或者将可见的数据点提取为列表或表格。
无需代码执行: 不要试图编写或使用 Python 库（如 pytesseract, easyocr, PIL）来进行文本提取。请直接利用你自身的视觉能力。

示例场景

请求: "把这 3 张财务报告的截图转为 markdown。"

执行:

list_dir 查看文件: img1.png, img2.png, img3.png。
view_file 读取 img1.png。
(内部处理): 识别表头 "Q1 Revenue" 和表格行数据。
view_file 读取 img2.png 和 img3.png。
write_to_file 创建 financial_report.md 并写入汇总的内容。

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,5972,334

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1091,619

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,4211,480

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,1921,420

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,3081,173

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,877940

Related MCP Servers

Browse all servers

Markitdown

Easily convert markdown to PDF using Markitdown MCP server. Supports HTTP, STDIO, and SSE for fast converting markdown to PDF workflows.

90,3881 tools

Firecrawl

Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for research or automation.

89,5930 tools

Repomix

Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tools.

22,2988 tools

Basic Memory

Basic Memory is a knowledge management system that builds a persistent semantic graph in markdown, locally and securely.

2,60617 tools

Markdownify MCP

Convert almost anything to Markdown. Transforms PDFs, images, web pages, DOCX, XLSX, and other formats into clean Markdown that AI assistants can read and…

2,4380 tools

Obsidian

Obsidian: fast search and analysis of Markdown notes across Obsidian vaults — find, filter, and analyze notes with an intuitive search plugin.

1,3290 tools

Install

mkdir -p .claude/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://mcp.directory/api/skills/download/1744" && unzip -o skill.zip -d .claude/skills/ocr-image-to-markdown && rm skill.zip

Installs to .claude/skills/ocr-image-to-markdown

Stats

Views

Installs

Author

hugohe3

Links

Source Code

ocr-image-to-markdown

Install

About this skill

OCR 图像识别转 Markdown

使用指南

最佳实践技巧

示例场景

You might also like

ui-ux-pro-max

flutter-development

pdf-to-markdown

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers