AI Vision

Name: AI Vision
Rating: 4.8 (40 reviews)
Author: tan-yong-sheng

Analyzes images and videos using Google's Gemini or Vertex AI models, with intelligent file handling for different content types and sizes.

Integrates with Google's Gemini and Vertex AI models to analyze images, compare multiple images, and process video content with intelligent file handling that automatically optimizes upload strategies for different file sizes.

42527 views11Local (stdio)

ai ml

GitHub

What it does

Analyze images with AI-powered vision models
Process video content for insights and analysis
Compare multiple images side-by-side
Upload files via URLs, local paths, or base64
Store and manage media files in Google Cloud Storage
Switch between Gemini API and Vertex AI providers

Best for

Content creators analyzing visual mediaDevelopers building vision-enabled applicationsResearchers processing image/video datasetsTeams needing automated visual content analysis

Dual provider support (Gemini + Vertex AI)Handles both images and videosIntelligent file upload optimization

About AI Vision

AI Vision is a community-built MCP server published by tan-yong-sheng that provides AI assistants with tools and capabilities via the Model Context Protocol. AI Vision uses Google Cloud Vertex AI to analyze images and videos, leveraging intelligent file handling for optimized u It is categorized under ai ml.

How to install

You can install AI Vision in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

AI Vision is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

AI Vision MCP Server

A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.

Features

Dual Provider Support: Choose between Google Gemini API and Vertex AI
Multimodal Analysis: Support for both image and video content analysis
Flexible File Handling: Upload via multiple methods (URLs, local files, base64)
Storage Integration: Built-in Google Cloud Storage support
Comprehensive Validation: Zod-based data validation throughout
Error Handling: Robust error handling with retry logic and circuit breakers
TypeScript: Full TypeScript support with strict type checking

Quick Start

Pre-requisites

You could choose either to use google provider or vertex_ai provider. For simplicity, google provider is recommended.

Below are the environment variables you need to set based on your selected provider. (Note: It’s recommended to set the timeout configuration to more than 5 minutes for your MCP client).

(i) Using Google AI Studio Provider

export IMAGE_PROVIDER="google" # or vertex_ai
export VIDEO_PROVIDER="google" # or vertex_ai
export GEMINI_API_KEY="your-gemini-api-key"

Get your Google AI Studio's api key here

(ii) Using Vertex AI Provider

export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CLIENT_EMAIL="[email protected]"
export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
export VERTEX_PROJECT_ID="your-gcp-project-id"
export GCS_BUCKET_NAME="your-gcs-bucket"

Refer to the guideline here on how to set this up.

Installation

Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.

Claude Desktop

Add to your Claude Desktop configuration:

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CLIENT_EMAIL": "[email protected]",
        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
        "VERTEX_PROJECT_ID": "your-gcp-project-id",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}

Claude Code

(i) Using Google AI Studio Provider

claude mcp add ai-vision-mcp \
  -e IMAGE_PROVIDER=google \
  -e VIDEO_PROVIDER=google \
  -e GEMINI_API_KEY=your-gemini-api-key \
  -- npx ai-vision-mcp

(ii) Using Vertex AI Provider

claude mcp add ai-vision-mcp \
  -e IMAGE_PROVIDER=vertex_ai \
  -e VIDEO_PROVIDER=vertex_ai \
  -e VERTEX_CLIENT_EMAIL=your-service-account@project.iam.gserviceaccount.com \
  -e VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" \
  -e VERTEX_PROJECT_ID=your-gcp-project-id \
  -e GCS_BUCKET_NAME=ai-vision-mcp-{VERTEX_PROJECT_ID} \
  -- npx ai-vision-mcp

Note: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating ~\.claude\settings.json as follows:

{
  "env": {
    "MCP_TIMEOUT": "60000",
    "MCP_TOOL_TIMEOUT": "300000"
  }
}

Cursor

Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server

Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See Cursor MCP docs for more info.

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CLIENT_EMAIL": "[email protected]",
        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
        "VERTEX_PROJECT_ID": "your-gcp-project-id",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}

Cline

Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:

Open Cline and click on the MCP Servers icon in the top navigation bar.
Select the Installed tab, then click Advanced MCP Settings.
In the cline_mcp_settings.json file, add the following configuration:

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "timeout": 300, 
    "type": "stdio",
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "timeout": 300,
      "type": "stdio",
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CLIENT_EMAIL": "[email protected]",
        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
        "VERTEX_PROJECT_ID": "your-gcp-project-id",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}

Other MCP clients

The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:

npx ai-vision-mcp

MCP Tools

The server provides four main MCP tools:

1) `analyze_image`

Analyzes an image using AI and returns a detailed description.

Parameters:

imageSource (string): URL, base64 data, or file path to the image
prompt (string): Question or instruction for the AI
options (object, optional): Analysis options including temperature and max tokens

Examples:

Analyze image from URL:

{
  "imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",
  "prompt": "What is this image about? Describe what you see in detail."
}

Analyze local image file:

{
  "imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
  "prompt": "What is this image about? Describe what you see in detail."
}

2) `compare_images`

Compares multiple images using AI and returns a detailed comparison analysis.

Parameters:

imageSources (array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 images
prompt (string): Question or instruction for comparing the images
options (object, optional): Analysis options including temperature and max tokens

Examples:

Compare images from URLs:

{
  "imageSources": [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg"
  ],
  "prompt": "Compare these two images and tell me the differences"
}

Compare mixed sources:

{
  "imageSources": [
    "https://example.com/image1.jpg",
    "C:\\\\Users\\\\username\\\\Downloads\\\\image2.jpg",
    "data:image/jpeg;base64,/9j/4AAQSkZJRgAB..."
  ],
  "prompt": "Which image has the best lighting quality?"
}

3) `detect_objects_in_image`

Detects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory.

Parameters:

imageSource (string): URL, base64 data, or file path to the image
prompt (string): Custom detection prompt describing what to detect or recognize in the image
outputFilePath (string, optional): Explicit output path for the annotated image

Configuration: This function uses optimized default parameters for object detection and does not accept runtime options parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables:

# Recommended environment variable settings for object detection (these are now the defaults)
TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0     # Deterministic responses
TOP_P_FOR_DETECT_OBJECTS_IN_IMAGE=0.95          # Nucleus sampling
TOP_K_FOR_DETECT_OBJECTS_IN_IMAGE=30            # Vocabulary selection
MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192     # High token limit for JSON

File Handling Logic:

Explicit outputFilePath provided → Saves to the exact path specified
If not explicit outputFilePath → Automatically saves to temporary directory

Response Types:

Returns file object when explicit outputFilePath is provided
Returns tempFile object when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folder
A

README truncated. View full README on GitHub.

Alternatives

Knowledge Graph Memory

anthropic

80.5k

Build persistent semantic networks for enterprise & engineering data management. Enable data persistence and memory acro

OfficialPopular

2.1k147

Context7

upstash

48.2k

Boost your AI code assistant with Context7: inject real-time API documentation from OpenAPI specification sources into y

OfficialRemotePopular

15.5k763

Blender

ahujasid

17.6k

Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactiv

CommunityPopular

2.9k48

Google GenAI Toolbox

google

13.3k

Google GenAI Toolbox: open-source GenAI database agent and AI database connector for Google Cloud database—query Cloud S

OfficialPopular

215

Related Skills

Browse all skills

xcodebuildmcp

Official skill for XcodeBuildMCP. Use when doing iOS/macOS/watchOS/tvOS/visionOS work (build, test, run, debug, log, UI automation).

senior-computer-vision

World-class computer vision skill for image/video processing, object detection, segmentation, and visual AI systems. Expertise in PyTorch, OpenCV, YOLO, SAM, diffusion models, and vision transformers. Includes 3D vision, video analysis, real-time processing, and production deployment. Use when building vision AI systems, implementing object detection, training custom vision models, or optimizing inference pipelines.

openclaw-setup

Set up a complete OpenClaw personal AI assistant from scratch using Claude Code. Walks through AWS provisioning, OpenClaw installation, Telegram bot creation, API configuration, Google Workspace integration, security hardening, and all power features. Give this to Claude Code and it handles the rest.

computer-use-agents

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.

product-strategist

Strategic product leadership toolkit for Head of Product including OKR cascade generation, market analysis, vision setting, and team scaling. Use for strategic planning, goal alignment, competitive analysis, and organizational design.

cpo-advisor

Product leadership for scaling companies. Product vision, portfolio strategy, product-market fit, and product org design. Use when setting product vision, managing a product portfolio, measuring PMF, designing product teams, prioritizing at the portfolio level, reporting to the board on product, or when user mentions CPO, product strategy, product-market fit, product organization, portfolio prioritization, or roadmap strategy.

What it does

Best for

About AI Vision

How to install

License

AI Vision MCP Server

Features

Quick Start

Pre-requisites

Installation

MCP Tools

1) analyze_image

2) compare_images

3) detect_objects_in_image

Alternatives

Knowledge Graph Memory

Context7

Blender

Google GenAI Toolbox

Related Skills

1) `analyze_image`

2) `compare_images`

3) `detect_objects_in_image`