Document Forge

Name: Document Forge
Rating: 4.9 (29 reviews)
Author: cablate

Processes and converts documents across multiple formats including PDF, DOCX, HTML, CSV, and EPUB. Provides extraction, merging, splitting, and format conversion capabilities.

Integrates document processing libraries to enable extraction, conversion, and manipulation across multiple file formats including PDF, DOCX, HTML, CSV, and EPUB.

17439 views8Local (stdio)

productivity

GitHub

What it does

Read content from PDF, DOCX, HTML, CSV, and TXT files
Convert DOCX to PDF or HTML format
Merge multiple PDFs into one file
Split PDF files into multiple documents
Convert HTML to plain text or Markdown
Clean and format HTML code

Best for

Content creators managing multiple document formatsDevelopers building document processing workflowsResearchers extracting text from various file types

10+ document processing toolsMulti-encoding support (UTF-8, Big5, GBK)Preserves formatting during conversions

About Document Forge

Document Forge is a community-built MCP server published by cablate that provides AI assistants with tools and capabilities via the Model Context Protocol. Document Forge enables fast extraction, conversion, and editing of files like PDF, DOCX, HTML, CSV, and EPUB for easy mp3 converter integration. It is categorized under productivity. This server exposes 16 tools that AI clients can invoke during conversations and coding sessions.

How to install

You can install Document Forge in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Document Forge is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Tools (16)

document_reader

Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv

pdf_merger

Merge multiple PDF files into one

pdf_splitter

Split a PDF file into multiple files

docx_to_pdf

Convert DOCX files to PDF format

docx_to_html

Convert DOCX to HTML while preserving formatting

Simple Document Processing MCP Server

A powerful Model Context Protocol (MCP) server providing comprehensive document processing capabilities.

Features

Document Reader

Read DOCX, PDF, TXT, HTML, CSV

Document Conversion

DOCX to HTML/PDF conversion
HTML to TXT/Markdown conversion
PDF manipulation (merge, split)

Text Processing

Multi-encoding transfer support (UTF-8, Big5, GBK)
Text formatting and cleaning
Text comparison and diff generation
Text splitting by lines or delimiter

HTML Processing

HTML cleaning and formatting
Resource extraction (images, links, videos)
Structure-preserving conversion

Installation

Installing via Smithery

To install Document Processing Server for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @cablate/mcp-doc-forge --client claude

Manual Installation

npm install -g @cablate/mcp-doc-forge

Usage

Cli

mcp-doc-forge

With Dive Desktop

Click "+ Add MCP Server" in Dive Desktop
Copy and paste this configuration:

{
  "mcpServers": {
    "searxng": {
      "command": "npx",
      "args": [
        "-y",
        "@cablate/mcp-doc-forge"
      ],
      "enabled": true
    }
  }
}

Click "Save" to install the MCP server

License

MIT

Contributing

Welcome community participation and contributions! Here are ways to contribute:

⭐️ Star the project if you find it helpful
🐛 Submit Issues: Report problems or provide suggestions
🔧 Create Pull Requests: Submit code improvements

Contact

If you have any questions or suggestions, feel free to reach out:

📧 Email: [email protected]
📧 GitHub: CabLate
🤝 Collaboration: Welcome to discuss project cooperation
📚 Technical Guidance: Sincere welcome for suggestions and guidance

Alternatives

GitHub

github

27.6k

Extend your developer tools with GitHub MCP Server for advanced automation, supporting GitHub Student and student packages integration.

OfficialRemotePopular

4.8k268

Task Master

eyaltoledano

25.8k

Boost productivity with Task Master: an AI-powered tool for project management and agile development workflows, integrated with popular editors.

CommunityPopular

5.1k115

Mastra Docs

mastra-ai

21.8k

Mastra Docs: AI assistants with direct access to Mastra.ai’s full knowledge base for faster, smarter support and insights.

OfficialPopular

4665

Beads

steveyegge

18.6k

Beads — a drop-in memory upgrade for your coding agent that boosts context, speed, and reliability with zero friction.

OfficialPopular

9164

Related Skills

Browse all skills

skill-forge

Automated skill creation workshop with intelligent source detection, smart path management, and end-to-end workflow automation. This skill should be used when users want to create a new skill or convert external resources (GitHub repositories, online documentation, or local directories) into a skill. Automatically fetches, organizes, and packages skills with proactive cleanup management.

teams-channel-post-writer

Creates educational Teams channel posts for internal knowledge sharing about Claude Code features, tools, and best practices. Applies when writing posts, announcements, or documentation to teach colleagues effective Claude Code usage, announce new features, share productivity tips, or document lessons learned. Provides templates, writing guidelines, and structured approaches emphasizing concrete examples, underlying principles, and connections to best practices like context engineering. Activates for content involving Teams posts, channel announcements, feature documentation, or tip sharing.

pdf-to-markdown

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,402

literature-review

Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).

633

latex-writing

Guide LaTeX document authoring following best practices and proper semantic markup. Use proactively when: (1) writing or editing .tex files, (2) writing or editing .nw literate programming files, (3) literate-programming skill is active and working with .nw files, (4) user mentions LaTeX, BibTeX, or document formatting, (5) reviewing LaTeX code quality. Ensures proper use of semantic environments (description vs itemize), csquotes (\enquote{} not ``...''), and cleveref (\cref{} not \S\ref{}).

312

markitdown

Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.

199