
DeepSeek
Connects to DeepSeek's language models for AI-powered chat, text completion, and code generation. Works as a remote service or local installation.
Integrates DeepSeek's language models, enabling AI-powered chat completions with customizable parameters for tasks like writing assistance and code generation.
What it does
- Generate chat completions using DeepSeek models
- Create text and code with customizable parameters
- Check account balance and usage
- List available DeepSeek models
- Process AI requests via remote endpoint
- Run locally via Docker or npm package
Best for
About DeepSeek
DeepSeek is a community-built MCP server published by dmontgomery40 that provides AI assistants with tools and capabilities via the Model Context Protocol. DeepSeek offers an AI-powered chatbot and writing assistant for chat completions, writing help, and code generation with It is categorized under ai ml.
How to install
You can install DeepSeek in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.
License
DeepSeek is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.
DeepSeek MCP Server
As of February 24, 2026, this is the only DeepSeek MCP server repo linked in DeepSeek's official integration list and listed in the official MCP Registry.
Official DeepSeek MCP server for chat/completions/models/balance. Why V4 is a big deal (plain-language explainer).
- Hosted remote endpoint:
https://deepseek-mcp.ragweld.com/mcp - Auth:
Authorization: Bearer <token> - Local package and Docker are also supported.
Quick Install (Copy/Paste)
1) Set your hosted token once
export DEEPSEEK_MCP_AUTH_TOKEN="REPLACE_WITH_TOKEN"
2) Codex CLI (remote MCP)
codex mcp add deepseek --url https://deepseek-mcp.ragweld.com/mcp --bearer-token-env-var DEEPSEEK_MCP_AUTH_TOKEN
3) Claude Code (remote MCP)
claude mcp add --transport http deepseek https://deepseek-mcp.ragweld.com/mcp --header "Authorization: Bearer $DEEPSEEK_MCP_AUTH_TOKEN"
4) Cursor (remote MCP)
node -e 'const fs=require("fs"),p=process.env.HOME+"/.cursor/mcp.json";let j={mcpServers:{}};try{j=JSON.parse(fs.readFileSync(p,"utf8"))}catch{};j.mcpServers={...(j.mcpServers||{}),deepseek:{url:"https://deepseek-mcp.ragweld.com/mcp",headers:{Authorization:"Bearer ${env:DEEPSEEK_MCP_AUTH_TOKEN}"}}};fs.mkdirSync(process.env.HOME+"/.cursor",{recursive:true});fs.writeFileSync(p,JSON.stringify(j,null,2));'
5) Local install (stdio, if you prefer self-hosted)
DEEPSEEK_API_KEY="REPLACE_WITH_DEEPSEEK_KEY" npx -y deepseek-mcp-server
6) Local install with Docker (stdio, self-hosted)
docker pull docker.io/dmontgomery40/deepseek-mcp-server:0.4.0 && \
docker run --rm -i -e DEEPSEEK_API_KEY="REPLACE_WITH_DEEPSEEK_KEY" docker.io/dmontgomery40/deepseek-mcp-server:0.4.0
Non-Technical Users
If you mostly use chat apps and don’t want terminal setup:
- Use Cursor’s MCP settings UI and add:
- URL:
https://deepseek-mcp.ragweld.com/mcp - Header:
Authorization: Bearer <token>
- URL:
- If your app does not support custom remote MCP servers with bearer headers yet, use Codex/Claude Code/Cursor as your MCP-enabled client and keep your usual model provider.
OpenRouter users (API + chat UI)
OpenRouter now documents MCP usage, but its MCP flow is SDK/client-centric (not “paste URL in chat and done” for most users). Easiest path is: keep OpenRouter for models, and connect this MCP server through an MCP-capable client (Codex/Claude Code/Cursor).
Remote vs Local (Which Should I Use?)
Remote server
Use remote if you want the fastest setup and centralized updates.
- Pros: no local server process, easy multi-device use, one shared endpoint.
- Cons: depends on network + hosted token.
Local server
Use local if you want full runtime control.
- Pros: fully self-managed, easy private-network workflows.
- Cons: you manage updates/secrets/process lifecycle.
Code Execution with MCP (What This Actually Means)
In basic tool-calling mode, the model usually needs:
- many tool definitions loaded into context before it starts;
- one model round-trip per tool call;
- intermediate results repeatedly fed back into context.
That works for small toolsets, but it scales poorly. You burn tokens on tool metadata, add latency from repeated inference hops, and raise failure risk when tools are similarly named or require multi-step orchestration.
Code execution changes the control flow. Instead of repeatedly asking the model to call one tool at a time, the model can write a small program that calls tools directly in an execution runtime. That runtime handles loops, branching, filtering, joins, retries, and result shaping. The model then gets a compact summary instead of every raw intermediate payload.
Why this matters in practice:
- lower context pressure: you avoid dumping full tool catalogs and every raw result into prompt history;
- better orchestration: code handles deterministic logic that is awkward in pure natural-language loops;
- lower latency at scale: fewer model turns for multi-step workflows;
- usually better reliability: less chance of drifting tool choice across long chains.
Limits to keep in mind:
- code execution does not remove the need for good tool schemas and permissions;
- this is still an agent system, so guardrails/quotas/auditing matter;
- for tiny single-tool tasks, plain tool calling can still be simpler.
For this DeepSeek MCP server, the practical takeaway is: keep tool interfaces explicit and stable, then let MCP clients choose direct tool-calling or code-execution orchestration based on workload size and complexity.
Learn More (Curated)
-
Anthropic Engineering: Code execution with MCP: Building more efficient agents
Why it matters: the clearest explanation of why direct tool-calling becomes expensive at scale, and how code execution reduces token overhead and orchestration friction. -
Anthropic Engineering: Introducing advanced tool use on the Claude Developer Platform
Why it matters: practical architecture for large tool ecosystems: Tool Search Tool, Programmatic Tool Calling, and Tool Use Examples. -
Cloudflare (Matt Carey, Feb 2026): Code Mode: give agents an entire API in 1,000 tokens
Why it matters: concrete implementation patterns for model-controlled tool discovery and token-efficient execution loops. -
Anthropic Help (updated 2026): Getting started with custom connectors using remote MCP
Why it matters: clean product-level explanation of what remote MCP is and when to use it. -
Cursor docs: Model Context Protocol (MCP)
Why it matters: currentmcp.jsonsetup model for Cursor. -
OpenRouter docs: Using MCP Servers with OpenRouter
Why it matters: current integration path for OpenRouter-centric workflows.
Registry Identity
- MCP Registry name:
io.github.DMontgomery40/deepseek
License
MIT
Alternatives
Related Skills
Browse all skillsAdvanced context management with auto-compaction and dynamic context optimization for DeepSeek's 64k context window. Features intelligent compaction (merging, summarizing, extracting), query-aware relevance scoring, and hierarchical memory system with context archive. Logs optimization events to chat.
This skill should be used when configuring or using the OpenCode CLI for headless LLM automation. Use when the user asks to "configure opencode", "use opencode cli", "set up opencode", "opencode run command", "opencode model selection", "opencode providers", "opencode vertex ai", "opencode mcp servers", "opencode ollama", "opencode local models", "opencode deepseek", "opencode kimi", "opencode mistral", "fallback cli tool", or "headless llm cli". Covers command syntax, provider configuration, Vertex AI setup, MCP servers, local models, cloud providers, and subprocess integration patterns.
Use this skill when developing Node.js backend services or CloudBase cloud functions (Express/Koa/NestJS, serverless, backend APIs) that need AI capabilities. Features text generation (generateText), streaming (streamText), AND image generation (generateImage) via @cloudbase/node-sdk ≥3.16.0. Built-in models include Hunyuan (hunyuan-2.0-instruct-20251111 recommended), DeepSeek (deepseek-v3.2 recommended), and hunyuan-image for images. This is the ONLY SDK that supports image generation. NOT for browser/Web apps (use ai-model-web) or WeChat Mini Program (use ai-model-wechat).
Smart LLM router — save 78% on inference costs. Routes every request to the cheapest capable model across 30+ models from OpenAI, Anthropic, Google, DeepSeek, and xAI.
Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse architectures like Mixtral 8x7B or DeepSeek-V3, or scaling model capacity without proportional compute increase. Covers MoE architectures, routing mechanisms, load balancing, expert parallelism, and inference optimization.
Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.