DINO-X

Name: DINO-X
Rating: 4.5 (116 reviews)
Author: idea-research

Provides AI-powered object detection and visual analysis in images using natural language prompts. Works with local files or web URLs to find, locate, and describe specific objects or regions.

Empower LLMs with fine-grained visual understanding — detect, localize, and describe anything in images with natural language prompts.

112296 views11Local (stdio)

ai ml

GitHub

What it does

Detect objects in images using natural language queries
Generate region-level descriptions of image areas
Count and locate specific objects with coordinates
Analyze full images for detailed understanding
Create annotated visualizations with bounding boxes
Process images from local files or web URLs

Best for

Building visual AI applications and chatbotsAutomating visual inspection workflowsCreating multimodal reasoning systems

Fine-grained object detection and localizationStructured JSON outputs with coordinatesMultiple transport modes (local/cloud)

About DINO-X

DINO-X is a community-built MCP server published by idea-research that provides AI assistants with tools and capabilities via the Model Context Protocol. DINO-X is a powerful multimodal AI model that lets you detect, localize, and describe anything in images using natural l It is categorized under ai ml.

How to install

You can install DINO-X in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

DINO-X is released under the Apache-2.0 license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

DINO-X MCP Server

English | 中文

DINO-X Official MCP Server — powered by the DINO-X and Grounding DINO models — brings fine-grained object detection and image understanding to your multimodal applications.

Why DINO-X MCP?

With DINO-X MCP, you can:

Fine-Grained Understanding: Full image detection, object detection, and region-level descriptions.
Structured Outputs: Get object categories, counts, locations, and attributes for VQA and multi-step reasoning tasks.
Composable: Works seamlessly with other MCP servers to build end-to-end visual agents or automation pipelines.

Transport Modes

DINO-X MCP supports two transport modes:

Feature	STDIO (default)	Streamable HTTP
Runtime	Local	Local or Cloud
Transport	Standard I/O	HTTP (streaming responses)
Input source	`file://` and `https://`	`https://` only
Visualization	Supported (saves annotated images locally)	Not supported (for now)

Quick Start

1. Prepare an MCP client

Any MCP-compatible client works, e.g.:

2. Get your API key

Apply on the DINO-X platform: Request API Key (new users get free quota).

3. Configure MCP

Option A: Official Hosted Streamable HTTP (Recommended)

Add to your MCP client config and replace with your API key:

{
  "mcpServers": {
    "dinox-mcp": {
      "url": "https://mcp.deepdataspace.com/mcp?key=your-api-key"
    }
  }
}

Option B: Use the NPM package locally (STDIO)

Install Node.js first

Download the installer from nodejs.org
Or use command:

# macOS / Linux
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# or
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash

# load nvm into current shell (choose the one you use)
source ~/.bashrc || true
source ~/.zshrc  || true

# install and use LTS Node.js
nvm install --lts
nvm use --lts

# Windows (one of the following)
winget install OpenJS.NodeJS.LTS
# or with Chocolatey (in admin PowerShell)
iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex
choco install nodejs-lts -y

Configure your MCP client:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": ["-y", "@deepdataspace/dinox-mcp"],
      "env": {
        "DINOX_API_KEY": "your-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

Note: Replace your-api-key-here with your real key.

Option C: Run from source locally

Make sure Node.js is installed (see Option B), then:

# clone
git clone https://github.com/IDEA-Research/DINO-X-MCP.git
cd DINO-X-MCP

# install deps
npm install

# build
npm run build

Configure your MCP client:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "node",
      "args": ["/path/to/DINO-X-MCP/build/index.js"],
      "env": {
        "DINOX_API_KEY": "your-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

CLI Flags & Environment Variables

Common flags
- --http: start in Streamable HTTP mode (otherwise STDIO by default)
- --stdio: force STDIO mode
- --dinox-api-key=...: set API key
- --enable-client-key: allow API key via URL ?key= (Streamable HTTP only)
- --port=8080: HTTP port (default 3020)
Environment variables
- DINOX_API_KEY (required/conditionally required): DINO-X platform API key
- IMAGE_STORAGE_DIRECTORY (optional, STDIO): directory to save annotated images
- AUTH_TOKEN (optional, HTTP): if set, client must send Authorization: Bearer <token>
Examples:

# STDIO (local)
node build/index.js --dinox-api-key=your-api-key

# Streamable HTTP (server provides a shared API key)
node build/index.js --http --dinox-api-key=your-api-key

# Streamable HTTP (custom port)
node build/index.js --http --dinox-api-key=your-api-key --port=8080

# Streamable HTTP (require client-provided API key via URL)
node build/index.js --http --enable-client-key

Client config when using ?key=:

{
  "mcpServers": {
    "dinox-mcp": {
      "url": "http://localhost:3020/mcp?key=your-api-key"
    }
  }
}

Using AUTH_TOKEN with a gateway that injects Authorization: Bearer <token>:

AUTH_TOKEN=my-token node build/index.js --http --enable-client-key

Client example with supergateway:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "supergateway",
        "--streamableHttp",
        "http://localhost:3020/mcp?key=your-api-key",
        "--oauth2Bearer",
        "my-token"
      ]
    }
  }
}

Tools

Capability	Tool ID	Transport	Input	Output
Full-scene object detection	`detect-all-objects`	STDIO / HTTP	Image URL	Category + bbox + (optional) captions
Text-prompted object detection	`detect-objects-by-text`	STDIO / HTTP	Image URL + English nouns (dot-separated for multiple, e.g., `person.car`)	Target object bbox + (optional) captions
Human pose estimation	`detect-human-pose-keypoints`	STDIO / HTTP	Image URL	17 keypoints + bbox + (optional) captions
Visualization	`visualize-detection-result`	STDIO only	Image URL + detection results array	Local path to annotated image

🎬 Use Cases

🎯 Scenario	📝 Input	✨ Output
Detection & Localization	💬 Prompt: `Detect and visualize the` `fire areas in the forest` 🖼️ Input Image:
Object Counting	💬 Prompt: `Please analyze this` `warehouse image, detect` `all the cardboard boxes,` `count the total number` 🖼️ Input Image:
Feature Detection	💬 Prompt: `Find all red cars` `in the image` 🖼️ Input Image:
Attribute Reasoning	💬 Prompt: `Find the tallest person` `in the image, describe` `their clothing` 🖼️ Input Image:
Full Scene Detection	💬 Prompt: `Find the fruit with` `the highest vitamin C` `content in the image` 🖼️ Input Image:	Answer: Kiwi fruit (93mg/100g)
Pose Analysis	💬 Prompt: `Please analyze what` `yoga pose this is` 🖼️ Input Image: