Document Forge

Document Forge

cablate

Processes and converts documents across multiple formats including PDF, DOCX, HTML, CSV, and EPUB. Provides extraction, merging, splitting, and format conversion capabilities.

Integrates document processing libraries to enable extraction, conversion, and manipulation across multiple file formats including PDF, DOCX, HTML, CSV, and EPUB.

17425 views8Local (stdio)

What it does

  • Read content from PDF, DOCX, HTML, CSV, and TXT files
  • Convert DOCX to PDF or HTML format
  • Merge multiple PDFs into one file
  • Split PDF files into multiple documents
  • Convert HTML to plain text or Markdown
  • Clean and format HTML code

Best for

Content creators managing multiple document formatsDevelopers building document processing workflowsResearchers extracting text from various file types
10+ document processing toolsMulti-encoding support (UTF-8, Big5, GBK)Preserves formatting during conversions

About Document Forge

Document Forge is a community-built MCP server published by cablate that provides AI assistants with tools and capabilities via the Model Context Protocol. Document Forge enables fast extraction, conversion, and editing of files like PDF, DOCX, HTML, CSV, and EPUB for easy mp It is categorized under productivity. This server exposes 16 tools that AI clients can invoke during conversations and coding sessions.

How to install

You can install Document Forge in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Document Forge is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Tools (16)

document_reader

Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv

pdf_merger

Merge multiple PDF files into one

pdf_splitter

Split a PDF file into multiple files

docx_to_pdf

Convert DOCX files to PDF format

docx_to_html

Convert DOCX to HTML while preserving formatting

MseeP.ai Security Assessment Badge

Simple Document Processing MCP Server

smithery badge

A powerful Model Context Protocol (MCP) server providing comprehensive document processing capabilities.

Simple Document Processing Server MCP server

Features

Document Reader

  • Read DOCX, PDF, TXT, HTML, CSV

Document Conversion

  • DOCX to HTML/PDF conversion
  • HTML to TXT/Markdown conversion
  • PDF manipulation (merge, split)

Text Processing

  • Multi-encoding transfer support (UTF-8, Big5, GBK)
  • Text formatting and cleaning
  • Text comparison and diff generation
  • Text splitting by lines or delimiter

HTML Processing

  • HTML cleaning and formatting
  • Resource extraction (images, links, videos)
  • Structure-preserving conversion

Installation

Installing via Smithery

To install Document Processing Server for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @cablate/mcp-doc-forge --client claude

Manual Installation

npm install -g @cablate/mcp-doc-forge

Usage

Cli

mcp-doc-forge

With Dive Desktop

  1. Click "+ Add MCP Server" in Dive Desktop
  2. Copy and paste this configuration:
{
  "mcpServers": {
    "searxng": {
      "command": "npx",
      "args": [
        "-y",
        "@cablate/mcp-doc-forge"
      ],
      "enabled": true
    }
  }
}
  1. Click "Save" to install the MCP server

License

MIT

Contributing

Welcome community participation and contributions! Here are ways to contribute:

  • ⭐️ Star the project if you find it helpful
  • 🐛 Submit Issues: Report problems or provide suggestions
  • 🔧 Create Pull Requests: Submit code improvements

Contact

If you have any questions or suggestions, feel free to reach out:

  • 📧 Email: [email protected]
  • 📧 GitHub: CabLate
  • 🤝 Collaboration: Welcome to discuss project cooperation
  • 📚 Technical Guidance: Sincere welcome for suggestions and guidance

Alternatives

Related Skills

Browse all skills
skill-forge

Automated skill creation workshop with intelligent source detection, smart path management, and end-to-end workflow automation. This skill should be used when users want to create a new skill or convert external resources (GitHub repositories, online documentation, or local directories) into a skill. Automatically fetches, organizes, and packages skills with proactive cleanup management.

10
teams-channel-post-writer

Creates educational Teams channel posts for internal knowledge sharing about Claude Code features, tools, and best practices. Applies when writing posts, announcements, or documentation to teach colleagues effective Claude Code usage, announce new features, share productivity tips, or document lessons learned. Provides templates, writing guidelines, and structured approaches emphasizing concrete examples, underlying principles, and connections to best practices like context engineering. Activates for content involving Teams posts, channel announcements, feature documentation, or tip sharing.

4
pdf-to-markdown

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

582
literature-review

Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).

377
notebooklm

Query Google NotebookLM for source-grounded, citation-backed answers from uploaded documents. Reduces hallucinations through Gemini's document-only responses. Browser automation with library management and persistent authentication.

144
google-official-seo-guide

Official Google SEO guide covering search optimization, best practices, Search Console, crawling, indexing, and improving website search visibility based on official Google documentation

119