
Document Forge
Processes and converts documents across multiple formats including PDF, DOCX, HTML, CSV, and EPUB. Provides extraction, merging, splitting, and format conversion capabilities.
Integrates document processing libraries to enable extraction, conversion, and manipulation across multiple file formats including PDF, DOCX, HTML, CSV, and EPUB.
What it does
- Read content from PDF, DOCX, HTML, CSV, and TXT files
- Convert DOCX to PDF or HTML format
- Merge multiple PDFs into one file
- Split PDF files into multiple documents
- Convert HTML to plain text or Markdown
- Clean and format HTML code
Best for
About Document Forge
Document Forge is a community-built MCP server published by cablate that provides AI assistants with tools and capabilities via the Model Context Protocol. Document Forge enables fast extraction, conversion, and editing of files like PDF, DOCX, HTML, CSV, and EPUB for easy mp It is categorized under productivity. This server exposes 16 tools that AI clients can invoke during conversations and coding sessions.
How to install
You can install Document Forge in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.
License
Document Forge is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.
Tools (16)
Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv
Merge multiple PDF files into one
Split a PDF file into multiple files
Convert DOCX files to PDF format
Convert DOCX to HTML while preserving formatting
Simple Document Processing MCP Server
A powerful Model Context Protocol (MCP) server providing comprehensive document processing capabilities.
Features
Document Reader
- Read DOCX, PDF, TXT, HTML, CSV
Document Conversion
- DOCX to HTML/PDF conversion
- HTML to TXT/Markdown conversion
- PDF manipulation (merge, split)
Text Processing
- Multi-encoding transfer support (UTF-8, Big5, GBK)
- Text formatting and cleaning
- Text comparison and diff generation
- Text splitting by lines or delimiter
HTML Processing
- HTML cleaning and formatting
- Resource extraction (images, links, videos)
- Structure-preserving conversion
Installation
Installing via Smithery
To install Document Processing Server for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @cablate/mcp-doc-forge --client claude
Manual Installation
npm install -g @cablate/mcp-doc-forge
Usage
Cli
mcp-doc-forge
With Dive Desktop
- Click "+ Add MCP Server" in Dive Desktop
- Copy and paste this configuration:
{
"mcpServers": {
"searxng": {
"command": "npx",
"args": [
"-y",
"@cablate/mcp-doc-forge"
],
"enabled": true
}
}
}
- Click "Save" to install the MCP server
License
MIT
Contributing
Welcome community participation and contributions! Here are ways to contribute:
- ⭐️ Star the project if you find it helpful
- 🐛 Submit Issues: Report problems or provide suggestions
- 🔧 Create Pull Requests: Submit code improvements
Contact
If you have any questions or suggestions, feel free to reach out:
- 📧 Email: [email protected]
- 📧 GitHub: CabLate
- 🤝 Collaboration: Welcome to discuss project cooperation
- 📚 Technical Guidance: Sincere welcome for suggestions and guidance
Alternatives
Related Skills
Browse all skillsAutomated skill creation workshop with intelligent source detection, smart path management, and end-to-end workflow automation. This skill should be used when users want to create a new skill or convert external resources (GitHub repositories, online documentation, or local directories) into a skill. Automatically fetches, organizes, and packages skills with proactive cleanup management.
Creates educational Teams channel posts for internal knowledge sharing about Claude Code features, tools, and best practices. Applies when writing posts, announcements, or documentation to teach colleagues effective Claude Code usage, announce new features, share productivity tips, or document lessons learned. Provides templates, writing guidelines, and structured approaches emphasizing concrete examples, underlying principles, and connections to best practices like context engineering. Activates for content involving Teams posts, channel announcements, feature documentation, or tip sharing.
Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.
Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).
Query Google NotebookLM for source-grounded, citation-backed answers from uploaded documents. Reduces hallucinations through Gemini's document-only responses. Browser automation with library management and persistent authentication.
Official Google SEO guide covering search optimization, best practices, Search Console, crawling, indexing, and improving website search visibility based on official Google documentation
