layout-analyzer
Analyze document structure and layout using surya - detect text blocks, tables, and reading order
Install
mkdir -p .claude/skills/layout-analyzer && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3937" && unzip -o skill.zip -d .claude/skills/layout-analyzer && rm skill.zipInstalls to .claude/skills/layout-analyzer
About this skill
Layout Analyzer Skill
Overview
This skill enables document layout analysis using surya - an advanced document understanding system. Detect text blocks, tables, figures, headings, and determine reading order in complex documents.
How to Use
- Provide the document image or PDF
- Specify what layout elements to detect
- I'll analyze the structure and return detected regions
Example prompts:
- "Analyze the layout of this document page"
- "Detect all tables and text blocks in this image"
- "Determine the reading order for this PDF page"
- "Find headings and paragraphs in this document"
Domain Knowledge
surya Fundamentals
from surya.detection import DetectionPredictor
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image
# Load image
image = Image.open("document.png")
# Detect layout elements
layout_predictor = LayoutPredictor()
layout_result = layout_predictor([image])
Layout Element Types
| Element | Description |
|---|---|
| Text | Regular paragraph text |
| Title | Document/section titles |
| Section-header | Section headings |
| List-item | Bulleted/numbered items |
| Table | Tabular data |
| Figure | Images/diagrams |
| Caption | Figure/table captions |
| Footnote | Footnotes |
| Formula | Mathematical equations |
| Page-header | Headers |
| Page-footer | Footers |
Text Detection
from surya.detection import DetectionPredictor
from PIL import Image
# Initialize detector
detector = DetectionPredictor()
# Load image
image = Image.open("document.png")
# Detect text regions
results = detector([image])
# Access results
for page_result in results:
for bbox in page_result.bboxes:
print(f"Text region: {bbox.bbox}")
print(f"Confidence: {bbox.confidence}")
Layout Analysis
from surya.layout import LayoutPredictor
from PIL import Image
# Initialize layout predictor
layout_predictor = LayoutPredictor()
# Analyze layout
image = Image.open("document.png")
layout_results = layout_predictor([image])
# Process results
for page_result in layout_results:
for element in page_result.bboxes:
print(f"Type: {element.label}")
print(f"Bbox: {element.bbox}")
print(f"Confidence: {element.confidence}")
Reading Order Detection
from surya.reading_order import ReadingOrderPredictor
from surya.layout import LayoutPredictor
from PIL import Image
# Get layout first
layout_predictor = LayoutPredictor()
image = Image.open("document.png")
layout_results = layout_predictor([image])
# Determine reading order
reading_order_predictor = ReadingOrderPredictor()
order_results = reading_order_predictor([image], layout_results)
# Access ordered elements
for page_result in order_results:
for i, element in enumerate(page_result.ordered_bboxes):
print(f"{i+1}. {element.label}: {element.bbox}")
OCR with Layout
from surya.ocr import OCRPredictor
from surya.layout import LayoutPredictor
from PIL import Image
# Initialize predictors
ocr_predictor = OCRPredictor()
layout_predictor = LayoutPredictor()
# Load image
image = Image.open("document.png")
# Get layout
layout_results = layout_predictor([image])
# Run OCR
ocr_results = ocr_predictor([image])
# Combine results
for layout, ocr in zip(layout_results, ocr_results):
for layout_elem in layout.bboxes:
print(f"Element: {layout_elem.label}")
# Find OCR text within this layout element
for text_line in ocr.text_lines:
if boxes_overlap(layout_elem.bbox, text_line.bbox):
print(f" Text: {text_line.text}")
Processing PDFs
from surya.layout import LayoutPredictor
from pdf2image import convert_from_path
def analyze_pdf_layout(pdf_path):
"""Analyze layout of all pages in PDF."""
# Convert PDF to images
images = convert_from_path(pdf_path)
# Initialize predictor
layout_predictor = LayoutPredictor()
# Analyze all pages
results = layout_predictor(images)
document_structure = []
for page_num, page_result in enumerate(results):
page_elements = []
for element in page_result.bboxes:
page_elements.append({
'type': element.label,
'bbox': element.bbox,
'confidence': element.confidence
})
document_structure.append({
'page': page_num + 1,
'elements': page_elements
})
return document_structure
structure = analyze_pdf_layout("document.pdf")
Visualization
from surya.layout import LayoutPredictor
from PIL import Image, ImageDraw, ImageFont
def visualize_layout(image_path, output_path):
"""Visualize detected layout elements."""
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
results = layout_predictor([image])
# Create drawing context
draw = ImageDraw.Draw(image)
# Color mapping for element types
colors = {
'Text': 'blue',
'Title': 'red',
'Table': 'green',
'Figure': 'purple',
'Section-header': 'orange',
'List-item': 'cyan',
}
for element in results[0].bboxes:
bbox = element.bbox
color = colors.get(element.label, 'gray')
# Draw rectangle
draw.rectangle(bbox, outline=color, width=2)
# Add label
draw.text((bbox[0], bbox[1] - 15),
f"{element.label} ({element.confidence:.2f})",
fill=color)
image.save(output_path)
return output_path
Best Practices
- Use High-Quality Images: 150+ DPI for best results
- Preprocess if Needed: Deskew rotated documents
- Validate Results: Check confidence scores
- Handle Multi-page: Process pages individually
- Combine with OCR: Get text within detected regions
Common Patterns
Document Structure Extraction
def extract_document_structure(image_path):
"""Extract hierarchical document structure."""
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
image = Image.open(image_path)
# Get layout
layout_predictor = LayoutPredictor()
layout_results = layout_predictor([image])
# Get reading order
order_predictor = ReadingOrderPredictor()
order_results = order_predictor([image], layout_results)
structure = {
'title': None,
'sections': [],
'tables': [],
'figures': []
}
current_section = None
for element in order_results[0].ordered_bboxes:
if element.label == 'Title':
structure['title'] = element
elif element.label == 'Section-header':
current_section = {'header': element, 'content': []}
structure['sections'].append(current_section)
elif element.label == 'Table':
structure['tables'].append(element)
elif element.label == 'Figure':
structure['figures'].append(element)
elif current_section and element.label in ['Text', 'List-item']:
current_section['content'].append(element)
return structure
Table Region Extraction
def extract_table_regions(image_path):
"""Extract table regions from document."""
from surya.layout import LayoutPredictor
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
results = layout_predictor([image])
tables = []
for element in results[0].bboxes:
if element.label == 'Table':
bbox = element.bbox
# Crop table region
table_image = image.crop(bbox)
tables.append({
'bbox': bbox,
'image': table_image,
'confidence': element.confidence
})
return tables
Examples
Example 1: Academic Paper Analysis
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from pdf2image import convert_from_path
def analyze_academic_paper(pdf_path):
"""Analyze structure of academic paper."""
images = convert_from_path(pdf_path)
layout_predictor = LayoutPredictor()
order_predictor = ReadingOrderPredictor()
paper_structure = {
'pages': [],
'element_counts': {
'Title': 0,
'Section-header': 0,
'Text': 0,
'Table': 0,
'Figure': 0,
'Formula': 0,
'Footnote': 0
}
}
layout_results = layout_predictor(images)
order_results = order_predictor(images, layout_results)
for page_num, (layout, order) in enumerate(zip(layout_results, order_results)):
page_structure = {
'page': page_num + 1,
'elements': []
}
for element in order.ordered_bboxes:
page_structure['elements'].append({
'type': element.label,
'bbox': element.bbox,
'order': element.position
})
# Count element types
if element.label in paper_structure['element_counts']:
paper_structure['element_counts'][element.label] += 1
paper_structure['pages'].append(page_structure)
return paper_structure
paper = analyze_academic_paper('research_paper.pdf')
print(f"Total tables: {paper['element_counts']['Table']}")
print(f"Total figures: {paper['element_counts']['Figure']}")
Example 2: Form Field Detection
from surya.layout import LayoutPredictor
from PIL import Image
def detect_form_fields(image_path):
"""Detect form fields and labels."""
image = Image.
---
*Content truncated.*
More by openclaw
View all skills by openclaw →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversOptimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool
Guide your software projects with structured prompts from requirements to code using the waterfall development model and
Streamline project docs with Specs Workflow: automate software project plan templates, tracking, and OpenAPI-driven prog
Manipulate and analyze JSON data with JSON Manipulation using JSONPath syntax. Query arrays and explore JSONPath example
Dumpling AI is a powerful web scraper offering advanced web scraping tools to extract, process, and analyze data from di
Deep Research (Tavily) aggregates web content for research reports and technical docs. Easily structure findings using e
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.