AI Vision MCP Server

AI Vision MCP Server

honeyvig

Analyzes images and videos using Google's AI models to answer questions, detect objects, and understand visual content through natural language prompts.

Enables AI-powered image and video analysis using Google Gemini and Vertex AI models. Supports analyzing single or multiple images, detecting objects with bounding boxes, and video content analysis through natural language prompts.

190 views1Local (stdio)

What it does

  • Analyze single or multiple images with AI
  • Detect objects with precise bounding boxes
  • Process video content for analysis
  • Answer natural language questions about visual media
  • Extract text and details from images

Best for

Content creators analyzing media assetsDevelopers building vision-powered applicationsResearchers processing visual data at scale
Powered by Google Gemini and Vertex AISupports both images and videos

About AI Vision MCP Server

AI Vision MCP Server is a community-built MCP server published by honeyvig that provides AI assistants with tools and capabilities via the Model Context Protocol. AI Vision MCP Server enables AI image analysis and video content analysis with Google Gemini & Vertex AI—object detectio It is categorized under cloud infrastructure, ai ml.

How to install

You can install AI Vision MCP Server in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

AI Vision MCP Server is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

AI Vision MCP Server

A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.

Features

  • Dual Provider Support: Choose between Google Gemini API and Vertex AI
  • Multimodal Analysis: Support for both image and video content analysis
  • Flexible File Handling: Upload via multiple methods (URLs, local files, base64)
  • Storage Integration: Built-in Google Cloud Storage support
  • Comprehensive Validation: Zod-based data validation throughout
  • Error Handling: Robust error handling with retry logic and circuit breakers
  • TypeScript: Full TypeScript support with strict type checking

Quick Start

Pre-requisites

You could choose either to use google provider or vertex_ai provider. For simplicity, google provider is recommended.

Below are the environment variables you need to set based on your selected provider. (Note: It’s recommended to set the timeout configuration to more than 5 minutes for your MCP client).

(i) Using Google AI Studio Provider

export IMAGE_PROVIDER="google" # or vertex_ai
export VIDEO_PROVIDER="google" # or vertex_ai
export GEMINI_API_KEY="your-gemini-api-key"

Get your Google AI Studio's api key here

(ii) Using Vertex AI Provider

export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CREDENTIALS="/path/to/service-account.json"
export GCS_BUCKET_NAME="your-gcs-bucket"

Refer to the guideline here on how to set this up.

Installation

Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.

Claude Desktop

Add to your Claude Desktop configuration:

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CREDENTIALS": "/path/to/service-account.json",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}
Claude Code

(i) Using Google AI Studio Provider

claude mcp add ai-vision-mcp \
  -e IMAGE_PROVIDER=google \
  -e VIDEO_PROVIDER=google \
  -e GEMINI_API_KEY=your-gemini-api-key \
  -- npx ai-vision-mcp

(ii) Using Vertex AI Provider

claude mcp add ai-vision-mcp \
  -e IMAGE_PROVIDER=vertex_ai \
  -e VIDEO_PROVIDER=vertex_ai \
  -e VERTEX_CREDENTIALS=/path/to/service-account.json \
  -e GCS_BUCKET_NAME=ai-vision-mcp-{VERTEX_PROJECT_ID} \
  -- npx ai-vision-mcp

Note: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating ~\.claude\settings.json as follows:

{
  "env": {
    "MCP_TIMEOUT": "60000",
    "MCP_TOOL_TIMEOUT": "300000"
  }
}
Cursor

Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server

Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See Cursor MCP docs for more info.

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CREDENTIALS": "/path/to/service-account.json",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}
Cline

Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:

  1. Open Cline and click on the MCP Servers icon in the top navigation bar.
  2. Select the Installed tab, then click Advanced MCP Settings.
  3. In the cline_mcp_settings.json file, add the following configuration:

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "timeout": 300, 
    "type": "stdio",
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "timeout": 300,
      "type": "stdio",
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CREDENTIALS": "/path/to/service-account.json",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}
Other MCP clients

The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:

npx ai-vision-mcp

MCP Tools

The server provides four main MCP tools:

1) analyze_image

Analyzes an image using AI and returns a detailed description.

Parameters:

  • imageSource (string): URL, base64 data, or file path to the image
  • prompt (string): Question or instruction for the AI
  • options (object, optional): Analysis options including temperature and max tokens

Examples:

  1. Analyze image from URL:
{
  "imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",
  "prompt": "What is this image about? Describe what you see in detail."
}
  1. Analyze local image file:
{
  "imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
  "prompt": "What is this image about? Describe what you see in detail."
}

2) compare_images

Compares multiple images using AI and returns a detailed comparison analysis.

Parameters:

  • imageSources (array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 images
  • prompt (string): Question or instruction for comparing the images
  • options (object, optional): Analysis options including temperature and max tokens

Examples:

  1. Compare images from URLs:
{
  "imageSources": [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg"
  ],
  "prompt": "Compare these two images and tell me the differences"
}
  1. Compare mixed sources:
{
  "imageSources": [
    "https://example.com/image1.jpg",
    "C:\\\\Users\\\\username\\\\Downloads\\\\image2.jpg",
    "data:image/jpeg;base64,/9j/4AAQSkZJRgAB..."
  ],
  "prompt": "Which image has the best lighting quality?"
}

3) detect_objects_in_image

Detects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory.

Parameters:

  • imageSource (string): URL, base64 data, or file path to the image
  • prompt (string): Custom detection prompt describing what to detect or recognize in the image
  • outputFilePath (string, optional): Explicit output path for the annotated image

Configuration: This function uses optimized default parameters for object detection and does not accept runtime options parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables:

# Recommended environment variable settings for object detection (these are now the defaults)
TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0     # Deterministic responses
TOP_P_FOR_DETECT_OBJECTS_IN_IMAGE=0.95          # Nucleus sampling
TOP_K_FOR_DETECT_OBJECTS_IN_IMAGE=30            # Vocabulary selection
MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192     # High token limit for JSON

File Handling Logic:

  1. Explicit outputFilePath provided → Saves to the exact path specified
  2. If not explicit outputFilePath → Automatically saves to temporary directory

Response Types:

  • Returns file object when explicit outputFilePath is provided
  • Returns tempFile object when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folder
  • Always includes detections array with detected objects and coordinates
  • Includes summary with percentage-based coordinates for browser automation

Examples:

  1. Basic object detection:
{
  "imageSource": "https://example.com/image.jpg",
  "prompt": "Detect all objects in this image"
}
  1. Save annotated image to specific path:
{
  "imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
  "outputFilePath": "C:\\Users\\username\\Documents\\annotated_image.png"
}
  1. Custom detection prompt:
{
  "imageSource": "data:image/jpeg;base64,/9j/4AAQSkZJRgAB...",
  "prompt": "Detect and label all electronic devices in this image"
}

4) analyze_video

Analyzes a video using AI and returns a detailed description.

Parameters:

  • videoSource (string): YouTube URL,

README truncated. View full README on GitHub.

Alternatives

Related Skills

Browse all skills
terraform-module-library

Build reusable Terraform modules for AWS, Azure, and GCP infrastructure following infrastructure-as-code best practices. Use when creating infrastructure modules, standardizing cloud provisioning, or implementing reusable IaC components.

0
genkit-infra-expert

Execute use when deploying Genkit applications to production with Terraform. Trigger with phrases like "deploy genkit terraform", "provision genkit infrastructure", "firebase functions terraform", "cloud run deployment", or "genkit production infrastructure". Provisions Firebase Functions, Cloud Run services, GKE clusters, monitoring dashboards, and CI/CD for AI workflows.

0
azure-deployment-preflight

Performs comprehensive preflight validation of Bicep deployments to Azure, including template syntax validation, what-if analysis, and permission checks. Use this skill before any deployment to Azure to preview changes, identify potential issues, and ensure the deployment will succeed. Activate when users mention deploying to Azure, validating Bicep files, checking deployment permissions, previewing infrastructure changes, running what-if, or preparing for azd provision.

2
aws-skills

AWS development with infrastructure automation and cloud architecture patterns

1
cloudformation

AWS CloudFormation infrastructure as code for stack management. Use when writing templates, deploying stacks, managing drift, troubleshooting deployments, or organizing infrastructure with nested stacks.

0
mlops-engineer

Build comprehensive ML pipelines, experiment tracking, and model registries with MLflow, Kubeflow, and modern MLOps tools. Implements automated training, deployment, and monitoring across cloud platforms. Use PROACTIVELY for ML infrastructure, experiment management, or pipeline automation.

0