ocr-image-to-markdown

18
0
Source

鉴于本地 OCR 工具的缺失,本技能利用 Agent 的多模态能力来查看图像(PNG, JPG 等)并将内容(文本、表格、逻辑图)转录为格式化的 Markdown。

Install

mkdir -p .claude/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://mcp.directory/api/skills/download/1744" && unzip -o skill.zip -d .claude/skills/ocr-image-to-markdown && rm skill.zip

Installs to .claude/skills/ocr-image-to-markdown

About this skill

OCR 图像识别转 Markdown

本技能允许你“阅读”图片并将内容转换为可编辑的 Markdown 文本。这在提取数据表格、幻灯片内容或文档截图时特别有用,尤其是当无法使用外部 OCR 库时。

使用指南

  1. 确认目标图片:

    • 定位你需要处理的图片文件。
    • 如有需要,使用 list_dir 浏览目录。
  2. 查看图片:

    • 使用 view_file 工具来“看”图片内容。系统允许你直接处理图像数据。
    • 关键: 你必须对图片路径使用 view_file,这样你的视觉模型才能消化它。
  3. 转录内容:

    • 基于你所看到的,将文本转录为 Markdown。
    • 表格: 将视觉看到的表格转换为标准 Markdown 表格 (| 表头 | ... |)。
    • 标题: 使用 #, ## 等来标记图片中的标题,保持层级结构。
    • 文本: 将段落转录为普通文本。
    • 数字: 仔细核对所有数字,特别是财务报表中的数据。
  4. 保存输出:

    • 使用 write_to_file 将转录的内容写入 .md 文件(例如 ocr_results.md)。
    • 如果处理多张图片,考虑将其追加到同一个文件中,或按逻辑组织。

最佳实践技巧

  • 表格: 仔细对齐行和列。标准 Markdown 表格不支持单元格合并(rowspan/colspan)。你需要根据逻辑流将合并的单元格展开,或者留空。
  • 复杂布局: 如果图片布局复杂(例如左右分栏),请按照逻辑阅读顺序(从上到下,从左到右)将其序列化。
  • 图表/图形: 如果图片包含图表,请描述趋势,或者将可见的数据点提取为列表或表格。
  • 无需代码执行: 不要 试图编写或使用 Python 库(如 pytesseract, easyocr, PIL)来进行文本提取。请直接利用你自身的视觉能力。

示例场景

请求: "把这 3 张财务报告的截图转为 markdown。"

执行:

  1. list_dir 查看文件: img1.png, img2.png, img3.png
  2. view_file 读取 img1.png
  3. (内部处理): 识别表头 "Q1 Revenue" 和表格行数据。
  4. view_file 读取 img2.pngimg3.png
  5. write_to_file 创建 financial_report.md 并写入汇总的内容。

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

277787

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

204415

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

197279

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

210231

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

168197

rust-coding-skill

UtakataKyosui

Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.

165173

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.