Google AI Studio

Name: Google AI Studio
Rating: 4.6 (40 reviews)
Author: eternnoir

Connects to Google AI Studio/Gemini API to generate and analyze content from text, images, videos, audio, PDFs, and other file formats. Requires a Google AI Studio API key.

Integrates with Google AI Studio/Gemini API to process multimodal content including images, videos, audio, PDFs, and text files for content generation, analysis, and document conversion tasks.

26817 views5Local (stdio)

ai ml

GitHub

What it does

Generate text content using Gemini models
Analyze images, videos, and audio files
Process PDF and Office documents
Set custom system prompts for AI behavior
Handle multiple files in a single request
Configure model parameters like temperature and output tokens

Best for

Content creators analyzing multimedia assetsDocument processing and analysis workflowsDevelopers building AI-powered applicationsResearch teams working with mixed media content

Multimodal support for 10+ file typesConfigurable file size and count limitsConversation history support

About Google AI Studio

Google AI Studio is a community-built MCP server published by eternnoir that provides AI assistants with tools and capabilities via the Model Context Protocol. Leverage Google AI Studio & Gemini API to process images, videos, audio, PDFs, & text for document conversion, analysis & content generation. It is categorized under ai ml.

How to install

You can install Google AI Studio in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Google AI Studio is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

AI Studio MCP Server

A Model Context Protocol (MCP) server that integrates with Google AI Studio / Gemini API, providing content generation capabilities with support for files, conversation history, and system prompts.

Installation and Usage

Prerequisites

Node.js 20.0.0 or higher
Google AI Studio API key

Using npx (Recommended)

GEMINI_API_KEY=your_api_key npx -y aistudio-mcp-server

Local Installation

npm install -g aistudio-mcp-server
GEMINI_API_KEY=your_api_key aistudio-mcp-server

Configuration

Set your Google AI Studio API key as an environment variable:

export GEMINI_API_KEY=your_api_key_here

Optional Configuration

GEMINI_MODEL: Gemini model to use (default: gemini-2.5-flash)
GEMINI_TIMEOUT: Request timeout in milliseconds (default: 300000 = 5 minutes)
GEMINI_MAX_OUTPUT_TOKENS: Maximum output tokens (default: 8192)
GEMINI_MAX_FILES: Maximum number of files per request (default: 10)
GEMINI_MAX_TOTAL_FILE_SIZE: Maximum total file size in MB (default: 50)
GEMINI_TEMPERATURE: Temperature for generation (0-2, default: 0.2)

Example:

export GEMINI_API_KEY=your_api_key_here
export GEMINI_MODEL=gemini-2.5-flash
export GEMINI_TIMEOUT=600000  # 10 minutes
export GEMINI_MAX_OUTPUT_TOKENS=16384  # More output tokens
export GEMINI_MAX_FILES=5  # Limit to 5 files per request
export GEMINI_MAX_TOTAL_FILE_SIZE=100  # 100MB limit
export GEMINI_TEMPERATURE=0.7  # More creative responses

Available Tools

generate_content

Generates content using Gemini with comprehensive support for files, conversation history, and system prompts. Supports various file types including images, PDFs, Office documents, and text files.

Parameters:

user_prompt (string, required): User prompt for generation
system_prompt (string, optional): System prompt to guide AI behavior
files (array, optional): Array of files to include in generation
- Each file object must have either path or content
- path (string): Path to file
- content (string): Base64 encoded file content
- type (string, optional): MIME type (auto-detected from file extension)
model (string, optional): Gemini model to use (default: gemini-2.5-flash)
temperature (number, optional): Temperature for generation (0-2, default: 0.2). Lower values produce more focused responses, higher values more creative ones

Supported file types (Gemini 2.5 models):

Images: JPG, JPEG, PNG, GIF, WebP, SVG, BMP, TIFF
Video: MP4, AVI, MOV, WEBM, FLV, MPG, WMV (up to 10 files per request)
Audio: MP3, WAV, AIFF, AAC, OGG, FLAC (up to 15MB per file)
Documents: PDF (treated as images, one page = one image)
Text: TXT, MD, JSON, XML, CSV, HTML

File limitations:

Maximum file size: 15MB per audio/video/document file
Maximum total request size: 20MB (2GB when using Cloud Storage)
Video files: Up to 10 per request
PDF files follow image pricing (one page = one image)

Basic example:

{
  "user_prompt": "Analyze this image and describe what you see",
  "files": [
    {
      "path": "/path/to/image.jpg"
    }
  ]
}

PDF to Markdown conversion:

{
  "user_prompt": "Convert this PDF to well-formatted Markdown, preserving structure and formatting. Return only the Markdown content.",
  "files": [
    {
      "path": "/path/to/document.pdf"
    }
  ]
}

With system prompt:

{
  "system_prompt": "You are a helpful document analyst specialized in technical documentation",
  "user_prompt": "Please provide a detailed explanation of the authentication methods shown in this document",
  "files": [
    {"path": "/api-docs.pdf"}
  ]
}

Multiple files example:

{
  "user_prompt": "Compare these documents and images",
  "files": [
    {"path": "/document.pdf"},
    {"path": "/chart.png"},
    {"content": "base64encodedcontent", "type": "image/jpeg"}
  ]
}

Common Use Cases

PDF to Markdown Conversion

To convert PDF files to Markdown format, use the generate_content tool with an appropriate prompt:

{
  "user_prompt": "Convert this PDF to well-formatted Markdown, preserving structure, headings, lists, and formatting. Include table of contents if the document has sections.",
  "files": [
    {
      "path": "/path/to/document.pdf"
    }
  ]
}

Image Analysis

Analyze images, charts, diagrams, or photos with detailed descriptions:

{
  "system_prompt": "You are an expert image analyst. Provide detailed, accurate descriptions of visual content.",
  "user_prompt": "Analyze this image and describe what you see. Include details about objects, people, text, colors, and composition.",
  "files": [
    {
      "path": "/path/to/image.jpg"
    }
  ]
}

For screenshots or technical diagrams:

{
  "user_prompt": "Describe this system architecture diagram. Explain the components and their relationships.",
  "files": [
    {
      "path": "/architecture-diagram.png"
    }
  ]
}

Audio Transcription

Generate transcripts from audio files:

{
  "system_prompt": "You are a professional transcription service. Provide accurate, well-formatted transcripts.",
  "user_prompt": "Please transcribe this audio file. Include speaker identification if multiple speakers are present, and format it with proper punctuation and paragraphs.",
  "files": [
    {
      "path": "/meeting-recording.mp3"
    }
  ]
}

For interview or meeting transcripts:

{
  "user_prompt": "Transcribe this interview and provide a summary of key points discussed.",
  "files": [
    {
      "path": "/interview.wav"
    }
  ]
}

MCP Client Configuration

Add this server to your MCP client configuration:

{
  "mcpServers": {
    "aistudio": {
      "command": "npx",
      "args": ["-y", "aistudio-mcp-server"],
      "env": {
        "GEMINI_API_KEY": "your_api_key_here",
        "GEMINI_MODEL": "gemini-2.5-flash",
        "GEMINI_TIMEOUT": "600000",
        "GEMINI_MAX_OUTPUT_TOKENS": "16384",
        "GEMINI_MAX_FILES": "10",
        "GEMINI_MAX_TOTAL_FILE_SIZE": "50",
        "GEMINI_TEMPERATURE": "0.2"
      }
    }
  }
}

Development

Setup

Make sure you have Node.js 20.0.0 or higher installed.

npm install
npm run build

Running locally

GEMINI_API_KEY=your_api_key npm run dev

License

MIT

Alternatives

Knowledge Graph Memory

anthropic

80.5k

Build persistent semantic networks for enterprise & engineering data management. Enable data persistence and memory across chats efficiently.

OfficialPopular

2.7k171

Context7

upstash

48.2k

Boost your AI code assistant with Context7: inject real-time API documentation from OpenAPI specification sources into your coding workflow.

OfficialRemotePopular

17.3k832

Blender

ahujasid

17.6k

Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactive 3D scene creation.

CommunityPopular

3.1k52

Google GenAI Toolbox

google

13.3k

Google GenAI Toolbox: open-source GenAI database agent and AI database connector for Google Cloud database—query Cloud SQL connector, Spanner & AlloyDB with…

OfficialPopular

330

Related Skills

Browse all skills

nano-banana-pro

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

920

notebooklm

Query Google NotebookLM for source-grounded, citation-backed answers from uploaded documents. Reduces hallucinations through Gemini's document-only responses. Browser automation with library management and persistent authentication.

157

google-official-seo-guide

Official Google SEO guide covering search optimization, best practices, Search Console, crawling, indexing, and improving website search visibility based on official Google documentation

149

mobile-android-design

Master Material Design 3 and Jetpack Compose patterns for building native Android apps. Use when designing Android interfaces, implementing Compose UI, or following Google's Material Design guidelines.

120

gog

Google Workspace CLI for Gmail, Calendar, Drive, Contacts, Sheets, and Docs.

google-analytics

Analyze Google Analytics data, review website performance metrics, identify traffic patterns, and suggest data-driven improvements. Use when the user asks about analytics, website metrics, traffic analysis, conversion rates, user behavior, or performance optimization.