vertex-ai-media-master
Automate video processing, audio generation, image creation, and marketing campaigns using Google Vertex AI multimodal capabilities.
Install
mkdir -p .claude/skills/vertex-ai-media-master && curl -L -o skill.zip "https://mcp.directory/api/skills/download/99" && unzip -o skill.zip -d .claude/skills/vertex-ai-media-master && rm skill.zipInstalls to .claude/skills/vertex-ai-media-master
About this skill
Vertex AI Media Master - Comprehensive Multimodal AI Operations
This Agent Skill provides comprehensive mastery of Google Vertex AI multimodal capabilities for video, audio, image, and text processing with focus on marketing applications.
Core Capabilities
🎥 Video Processing (Gemini 2.0/2.5)
- Video Understanding: Process videos up to 6 hours at low resolution or 2 hours at default resolution
- 2M Context Window: Gemini 2.5 Pro handles massive video content
- Audio Track Processing: Automatic audio transcription from video
- Multi-video Analysis: Process multiple videos in single request
- Video Summarization: Extract key moments, scenes, and insights
- Marketing Use Cases:
- Analyze competitor video ads
- Extract highlights from long-form content
- Generate video summaries for social media
- Transcribe and caption video content
- Identify brand mentions and product placements
🎵 Audio Generation & Processing
- Lyria Model (2025): Native audio and music generation
- Speech-to-Text: Transcribe audio with speaker diarization
- Text-to-Speech: Generate natural voiceovers
- Music Composition: Background music for campaigns
- Audio Enhancement: Noise reduction and quality improvement
- Marketing Use Cases:
- Generate podcast scripts and voiceovers
- Create audio ads and radio spots
- Produce background music for video campaigns
- Transcribe customer interviews
- Generate multilingual voiceovers
🖼️ Image Generation (Imagen 4 & Gemini 2.5 Flash Image)
- Imagen 4: Highest quality text-to-image generation
- Gemini 2.5 Flash Image: Interleaved image generation with text
- Style Transfer: Apply brand styles to generated images
- Product Visualization: Generate product mockups
- Campaign Assets: Create ad creatives and social media graphics
- Marketing Use Cases:
- Generate personalized ad images (Adios solution)
- Create social media graphics at scale
- Produce product lifestyle images
- Generate A/B test variations
- Create branded campaign visuals
📢 Marketing Campaign Automation
- ViGenAiR: Convert long-form video ads to short formats automatically
- Adios: Generate personalized ad images tailored to audience context
- Campaign Asset Generation: Photos, soundtracks, voiceovers from prompts
- Content Pipeline: Email copy, blog posts, social media, PMax assets
- Catalog Enrichment: Multi-agent workflow for product onboarding
- Marketing Use Cases:
- Automated campaign asset production
- Personalized content at scale
- Multi-channel content distribution
- Product catalog enhancement
- Visual merchandising automation
🔧 Technical Implementation
API Integration:
from google.cloud import aiplatform
from vertexai.preview.generative_models import GenerativeModel
# Initialize Vertex AI
aiplatform.init(project="your-project", location="us-central1")
# Gemini 2.5 Pro for video
model = GenerativeModel("gemini-2.5-pro")
# Process video with audio
response = model.generate_content([
"Analyze this video and extract key marketing insights",
video_file, # Up to 6 hours
])
# Imagen 4 for image generation
from vertexai.preview.vision_models import ImageGenerationModel
imagen = ImageGenerationModel.from_pretrained("imagen-4")
images = imagen.generate_images(
prompt="Professional product photo, studio lighting, white background",
number_of_images=4
)
Gemini 2.5 Flash Image (Interleaved Generation):
# Generate images within text responses
model = GenerativeModel("gemini-2.5-flash-image")
response = model.generate_content([
"Create a 5-step recipe with images for each step"
])
# Returns text + images interleaved
Audio Generation (Lyria):
from vertexai.preview.audio_models import AudioGenerationModel
lyria = AudioGenerationModel.from_pretrained("lyria")
audio = lyria.generate_audio(
prompt="Upbeat background music for product launch video, 30 seconds",
duration=30
)
📊 Marketing Workflow Automation
1. Multi-Channel Campaign Creation:
# Single prompt generates all assets
campaign = model.generate_content([
"""Create a product launch campaign for [product]:
- Hero image (1920x1080)
- 3 social media graphics (1080x1080)
- 30-second video script
- Background music description
- Email marketing copy
- Instagram caption"""
])
2. Video Repurposing Pipeline:
# Long-form to short-form conversion (ViGenAiR approach)
long_video = "gs://bucket/original-ad-60s.mp4"
response = model.generate_content([
f"Extract 3 engaging 15-second clips from this video for TikTok/Reels",
long_video
])
# Auto-generates format-specific versions
3. Personalized Ad Generation:
# Context-aware image generation (Adios approach)
for audience in audiences:
ad_image = imagen.generate_images(
prompt=f"Product ad for {product}, targeting {audience.demographics}, {audience.style_preference}",
aspect_ratio="16:9"
)
🎯 Best Practices for Jeremy
1. Project Setup:
# Set environment variables
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
# Install SDK
pip install google-cloud-aiplatform[vision,audio] google-generativeai
2. Rate Limits & Quotas:
- Gemini 2.5 Pro: 2M tokens/min (video processing)
- Imagen 4: 100 images/min
- Monitor usage in Cloud Console
3. Cost Optimization:
- Use Gemini 2.5 Flash for faster, cheaper operations
- Batch image generation requests
- Cache video embeddings for repeated analysis
- Use low-resolution video setting when appropriate
4. Security & Compliance:
- Keep API keys in Secret Manager, never in code
- Use service accounts with minimal permissions
- Enable VPC Service Controls for data residency
- Log all API calls for audit trails
🚀 Advanced Marketing Use Cases
1. Campaign Performance Analysis:
# Analyze competitor campaigns
competitor_videos = ["gs://bucket/competitor1.mp4", "gs://bucket/competitor2.mp4"]
analysis = model.generate_content([
"Compare these competitor videos: themes, messaging, CTAs, production quality",
*competitor_videos
])
2. Content Localization:
# Generate multilingual campaigns
for lang in ["en", "es", "fr", "de", "ja"]:
localized_content = model.generate_content([
f"Translate and culturally adapt this campaign for {lang} market:",
campaign_brief,
hero_image
])
3. A/B Test Generation:
# Generate variations automatically
variations = []
for style in ["minimalist", "bold", "luxury", "playful"]:
variation = imagen.generate_images(
prompt=f"Product ad, {style} style, {brand_guidelines}",
number_of_images=1
)
variations.append(variation)
📚 Reference Documentation
Official Documentation:
- Vertex AI Multimodal: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/overview
- Gemini 2.5 Pro: https://cloud.google.com/vertex-ai/generative-ai/docs/models
- Imagen 4: https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview
- Video Understanding: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
Marketing Solutions:
- GenAI for Marketing: https://github.com/GoogleCloudPlatform/genai-for-marketing
- ViGenAiR (video repurposing)
- Adios (personalized ad images)
Pricing:
- Gemini 2.5 Pro: $3.50/1M input tokens, $10.50/1M output tokens
- Imagen 4: $0.04/image
- Video processing: Included in Gemini token pricing
When This Skill Activates
This skill automatically activates when you mention:
- Video processing, analysis, or understanding
- Audio generation, music composition, or voiceovers
- Image generation, ad creatives, or visual content
- Marketing campaigns, content automation, or asset production
- Gemini multimodal capabilities
- Vertex AI media operations
- Social media content, email marketing, or PMax campaigns
Integration with Other Tools
Google Cloud Services:
- Cloud Storage for media asset management
- BigQuery for campaign analytics
- Cloud Functions for automation triggers
- Vertex AI Pipelines for content workflows
Third-Party Integrations:
- Social media APIs (LinkedIn, Twitter, Instagram)
- Marketing automation platforms (HubSpot, Marketo)
- CMS integrations (WordPress, Contentful)
- DAM systems (Bynder, Cloudinary)
Success Metrics
Track These KPIs:
- Asset generation speed (baseline: 5 images/min)
- Content approval rate (target: >80%)
- Campaign personalization scale (target: 1000+ variants)
- Cost per asset (target: <$0.10/image)
- Time saved vs manual production (target: 90% reduction)
This skill makes Jeremy a Vertex AI multimodal expert with instant access to video processing, audio generation, image creation, and marketing automation capabilities.
Prerequisites
- Access to project files in {baseDir}/
- Required tools and dependencies installed
- Understanding of skill functionality
- Permissions for file operations
Instructions
- Identify skill activation trigger and context
- Gather required inputs and parameters
- Execute skill workflow systematically
- Validate outputs meet requirements
- Handle errors and edge cases appropriately
- Provide clear results and next steps
Output
- Primary deliverables based on skill purpose
- Status indicators and success metrics
- Generated files or configurations
- Reports and summaries as applicable
- Recommendations for follow-up actions
Error Handling
If execution fails:
- Verify prerequisites are met
- Check input parameters and formats
- Validate file paths and permissions
- Review error messages for root cause
- Consult documentation for troubleshooting
Resources
- Official documentation for related tools
- Best practices guides
- Example use cases and templates
- Community forums and support channels
More by jeremylongshore
View all →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
rust-coding-skill
UtakataKyosui
Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.