vertex-ai-media-master

Name: vertex-ai-media-master
Author: jeremylongshore

102views

9installs

Automate video processing, audio generation, image creation, and marketing campaigns using Google Vertex AI multimodal capabilities.

Install

mkdir -p .claude/skills/vertex-ai-media-master && curl -L -o skill.zip "https://mcp.directory/api/skills/download/99" && unzip -o skill.zip -d .claude/skills/vertex-ai-media-master && rm skill.zip

Installs to .claude/skills/vertex-ai-media-master

About this skill

Vertex AI Media Master - Comprehensive Multimodal AI Operations

This Agent Skill provides comprehensive mastery of Google Vertex AI multimodal capabilities for video, audio, image, and text processing with focus on marketing applications.

Core Capabilities

🎥 Video Processing (Gemini 2.0/2.5)

Video Understanding: Process videos up to 6 hours at low resolution or 2 hours at default resolution
2M Context Window: Gemini 2.5 Pro handles massive video content
Audio Track Processing: Automatic audio transcription from video
Multi-video Analysis: Process multiple videos in single request
Video Summarization: Extract key moments, scenes, and insights
Marketing Use Cases:
- Analyze competitor video ads
- Extract highlights from long-form content
- Generate video summaries for social media
- Transcribe and caption video content
- Identify brand mentions and product placements

🎵 Audio Generation & Processing

Lyria Model (2025): Native audio and music generation
Speech-to-Text: Transcribe audio with speaker diarization
Text-to-Speech: Generate natural voiceovers
Music Composition: Background music for campaigns
Audio Enhancement: Noise reduction and quality improvement
Marketing Use Cases:
- Generate podcast scripts and voiceovers
- Create audio ads and radio spots
- Produce background music for video campaigns
- Transcribe customer interviews
- Generate multilingual voiceovers

🖼️ Image Generation (Imagen 4 & Gemini 2.5 Flash Image)

Imagen 4: Highest quality text-to-image generation
Gemini 2.5 Flash Image: Interleaved image generation with text
Style Transfer: Apply brand styles to generated images
Product Visualization: Generate product mockups
Campaign Assets: Create ad creatives and social media graphics
Marketing Use Cases:
- Generate personalized ad images (Adios solution)
- Create social media graphics at scale
- Produce product lifestyle images
- Generate A/B test variations
- Create branded campaign visuals

📢 Marketing Campaign Automation

ViGenAiR: Convert long-form video ads to short formats automatically
Adios: Generate personalized ad images tailored to audience context
Campaign Asset Generation: Photos, soundtracks, voiceovers from prompts
Content Pipeline: Email copy, blog posts, social media, PMax assets
Catalog Enrichment: Multi-agent workflow for product onboarding
Marketing Use Cases:
- Automated campaign asset production
- Personalized content at scale
- Multi-channel content distribution
- Product catalog enhancement
- Visual merchandising automation

🔧 Technical Implementation

API Integration:

from google.cloud import aiplatform
from vertexai.preview.generative_models import GenerativeModel

# Initialize Vertex AI
aiplatform.init(project="your-project", location="us-central1")

# Gemini 2.5 Pro for video
model = GenerativeModel("gemini-2.5-pro")

# Process video with audio
response = model.generate_content([
    "Analyze this video and extract key marketing insights",
    video_file,  # Up to 6 hours
])

# Imagen 4 for image generation
from vertexai.preview.vision_models import ImageGenerationModel
imagen = ImageGenerationModel.from_pretrained("imagen-4")
images = imagen.generate_images(
    prompt="Professional product photo, studio lighting, white background",
    number_of_images=4
)

Gemini 2.5 Flash Image (Interleaved Generation):

# Generate images within text responses
model = GenerativeModel("gemini-2.5-flash-image")
response = model.generate_content([
    "Create a 5-step recipe with images for each step"
])
# Returns text + images interleaved

Audio Generation (Lyria):

from vertexai.preview.audio_models import AudioGenerationModel
lyria = AudioGenerationModel.from_pretrained("lyria")
audio = lyria.generate_audio(
    prompt="Upbeat background music for product launch video, 30 seconds",
    duration=30
)

📊 Marketing Workflow Automation

1. Multi-Channel Campaign Creation:

# Single prompt generates all assets
campaign = model.generate_content([
    """Create a product launch campaign for [product]:
    - Hero image (1920x1080)
    - 3 social media graphics (1080x1080)
    - 30-second video script
    - Background music description
    - Email marketing copy
    - Instagram caption"""
])

2. Video Repurposing Pipeline:

# Long-form to short-form conversion (ViGenAiR approach)
long_video = "gs://bucket/original-ad-60s.mp4"
response = model.generate_content([
    f"Extract 3 engaging 15-second clips from this video for TikTok/Reels",
    long_video
])
# Auto-generates format-specific versions

3. Personalized Ad Generation:

# Context-aware image generation (Adios approach)
for audience in audiences:
    ad_image = imagen.generate_images(
        prompt=f"Product ad for {product}, targeting {audience.demographics}, {audience.style_preference}",
        aspect_ratio="16:9"
    )

🎯 Best Practices for Jeremy

1. Project Setup:

# Set environment variables
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"

# Install SDK
pip install google-cloud-aiplatform[vision,audio] google-generativeai

2. Rate Limits & Quotas:

Gemini 2.5 Pro: 2M tokens/min (video processing)
Imagen 4: 100 images/min
Monitor usage in Cloud Console

3. Cost Optimization:

Use Gemini 2.5 Flash for faster, cheaper operations
Batch image generation requests
Cache video embeddings for repeated analysis
Use low-resolution video setting when appropriate

4. Security & Compliance:

Keep API keys in Secret Manager, never in code
Use service accounts with minimal permissions
Enable VPC Service Controls for data residency
Log all API calls for audit trails

🚀 Advanced Marketing Use Cases

1. Campaign Performance Analysis:

# Analyze competitor campaigns
competitor_videos = ["gs://bucket/competitor1.mp4", "gs://bucket/competitor2.mp4"]
analysis = model.generate_content([
    "Compare these competitor videos: themes, messaging, CTAs, production quality",
    *competitor_videos
])

2. Content Localization:

# Generate multilingual campaigns
for lang in ["en", "es", "fr", "de", "ja"]:
    localized_content = model.generate_content([
        f"Translate and culturally adapt this campaign for {lang} market:",
        campaign_brief,
        hero_image
    ])

3. A/B Test Generation:

# Generate variations automatically
variations = []
for style in ["minimalist", "bold", "luxury", "playful"]:
    variation = imagen.generate_images(
        prompt=f"Product ad, {style} style, {brand_guidelines}",
        number_of_images=1
    )
    variations.append(variation)

📚 Reference Documentation

Official Documentation:

Vertex AI Multimodal: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/overview
Gemini 2.5 Pro: https://cloud.google.com/vertex-ai/generative-ai/docs/models
Imagen 4: https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview
Video Understanding: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding

Marketing Solutions:

GenAI for Marketing: https://github.com/GoogleCloudPlatform/genai-for-marketing
ViGenAiR (video repurposing)
Adios (personalized ad images)

Pricing:

Gemini 2.5 Pro: $3.50/1M input tokens, $10.50/1M output tokens
Imagen 4: $0.04/image
Video processing: Included in Gemini token pricing

When This Skill Activates

This skill automatically activates when you mention:

Video processing, analysis, or understanding
Audio generation, music composition, or voiceovers
Image generation, ad creatives, or visual content
Marketing campaigns, content automation, or asset production
Gemini multimodal capabilities
Vertex AI media operations
Social media content, email marketing, or PMax campaigns

Integration with Other Tools

Google Cloud Services:

Cloud Storage for media asset management
BigQuery for campaign analytics
Cloud Functions for automation triggers
Vertex AI Pipelines for content workflows

Third-Party Integrations:

Social media APIs (LinkedIn, Twitter, Instagram)
Marketing automation platforms (HubSpot, Marketo)
CMS integrations (WordPress, Contentful)
DAM systems (Bynder, Cloudinary)

Success Metrics

Track These KPIs:

Asset generation speed (baseline: 5 images/min)
Content approval rate (target: >80%)
Campaign personalization scale (target: 1000+ variants)
Cost per asset (target: <$0.10/image)
Time saved vs manual production (target: 90% reduction)

This skill makes Jeremy a Vertex AI multimodal expert with instant access to video processing, audio generation, image creation, and marketing automation capabilities.

Prerequisites

Access to project files in {baseDir}/
Required tools and dependencies installed
Understanding of skill functionality
Permissions for file operations

Instructions

Identify skill activation trigger and context
Gather required inputs and parameters
Execute skill workflow systematically
Validate outputs meet requirements
Handle errors and edge cases appropriately
Provide clear results and next steps

Output

Primary deliverables based on skill purpose
Status indicators and success metrics
Generated files or configurations
Reports and summaries as applicable
Recommendations for follow-up actions

Error Handling

If execution fails:

Verify prerequisites are met
Check input parameters and formats
Validate file paths and permissions
Review error messages for root cause
Consult documentation for troubleshooting

Resources

Official documentation for related tools
Best practices guides

Content truncated.

More by jeremylongshore

View all skills by jeremylongshore →

automating-mobile-app-testing

jeremylongshore

This skill enables automated testing of mobile applications on iOS and Android platforms using frameworks like Appium, Detox, XCUITest, and Espresso. It generates end-to-end tests, sets up page object models, and handles platform-specific elements. Use this skill when the user requests mobile app testing, test automation for iOS or Android, or needs assistance with setting up device farms and simulators. The skill is triggered by terms like "mobile testing", "appium", "detox", "xcuitest", "espresso", "android test", "ios test".

27349

svg-icon-generator

jeremylongshore

Svg Icon Generator - Auto-activating skill for Visual Content. Triggers on: svg icon generator, svg icon generator Part of the Visual Content skill category.

14847

d2-diagram-creator

jeremylongshore

D2 Diagram Creator - Auto-activating skill for Visual Content. Triggers on: d2 diagram creator, d2 diagram creator Part of the Visual Content skill category.

16747

designing-database-schemas

jeremylongshore

Design and visualize efficient database schemas, normalize data, map relationships, and generate ERD diagrams and SQL statements.

12924

performing-penetration-testing

jeremylongshore

This skill enables automated penetration testing of web applications. It uses the penetration-tester plugin to identify vulnerabilities, including OWASP Top 10 threats, and suggests exploitation techniques. Use this skill when the user requests a "penetration test", "pentest", "vulnerability assessment", or asks to "exploit" a web application. It provides comprehensive reporting on identified security flaws.

6723

optimizing-sql-queries

jeremylongshore

This skill analyzes and optimizes SQL queries for improved performance. It identifies potential bottlenecks, suggests optimal indexes, and proposes query rewrites. Use this when the user mentions "optimize SQL query", "improve SQL performance", "SQL query optimization", "slow SQL query", or asks for help with "SQL indexing". The skill helps enhance database efficiency by analyzing query structure, recommending indexes, and reviewing execution plans.

6319

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,6152,345

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1121,621

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,4411,494

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,1961,420

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,3181,177

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,888941

Related MCP Servers

Browse all servers

Video Editor

AI-powered video editor that integrates Video Jungle for natural-language YouTube video search, automated clip generation, and fast content editing.

2530 tools

Content Core

Extract text and audio from URLs, docs, videos, and images with AI voice generator and text to speech for unified content analysis.

1361 tools

AllVoiceLab

AllVoiceLab offers advanced voice cloning and free audio processing software for text-to-speech, speech transformation, and multilingual dubbing.

560 tools

Google AI Studio

Leverage Google AI Studio & Gemini API to process images, videos, audio, PDFs, & text for document conversion, analysis & content generation.

260 tools

FFmpeg Helper

FFmpeg Helper — fast, simple video tools: convert formats, extract audio/frames, trim, watermark, and get media info with single tool calls.

268 tools

Video Edit (MoviePy)

MoviePy-based video editor for fast trimming, merging, resizing, effects & YouTube downloads—an alternative to Clip Champ, Cap Cut, and Adobe Premiere Pro.

1234 tools

Install

mkdir -p .claude/skills/vertex-ai-media-master && curl -L -o skill.zip "https://mcp.directory/api/skills/download/99" && unzip -o skill.zip -d .claude/skills/vertex-ai-media-master && rm skill.zip

Installs to .claude/skills/vertex-ai-media-master

Stats

Views

102

Installs

Author

jeremylongshore

7 skills published

Links

Source Code

vertex-ai-media-master

Install

About this skill

Vertex AI Media Master - Comprehensive Multimodal AI Operations

Core Capabilities

🎥 Video Processing (Gemini 2.0/2.5)

🎵 Audio Generation & Processing

🖼️ Image Generation (Imagen 4 & Gemini 2.5 Flash Image)

📢 Marketing Campaign Automation

🔧 Technical Implementation

📊 Marketing Workflow Automation

🎯 Best Practices for Jeremy

🚀 Advanced Marketing Use Cases

📚 Reference Documentation

When This Skill Activates

Integration with Other Tools

Success Metrics

Prerequisites

Instructions

Output

Error Handling

Resources

More by jeremylongshore

automating-mobile-app-testing

svg-icon-generator

d2-diagram-creator

designing-database-schemas

performing-penetration-testing

optimizing-sql-queries

You might also like

ui-ux-pro-max

flutter-development

pdf-to-markdown

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers