axiom-vision-ref

Name: axiom-vision-ref
Author: CharlesWiltgen

2views

2installs

Vision framework API, VNDetectHumanHandPoseRequest, VNDetectHumanBodyPoseRequest, person segmentation, face detection, VNImageRequestHandler, recognized points, joint landmarks, VNRecognizeTextRequest, VNDetectBarcodesRequest, DataScannerViewController, VNDocumentCameraViewController, RecognizeDocumentsRequest

Install

mkdir -p .claude/skills/axiom-vision-ref && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4362" && unzip -o skill.zip -d .claude/skills/axiom-vision-ref && rm skill.zip

Installs to .claude/skills/axiom-vision-ref

About this skill

Vision Framework API Reference

Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.

When to Use This Reference

Implementing subject lifting using VisionKit or Vision
Detecting hand/body poses for gesture recognition or fitness apps
Segmenting people from backgrounds or separating multiple individuals
Face detection and landmarks for AR effects or authentication
Combining Vision APIs to solve complex computer vision problems
Looking up specific API signatures and parameter meanings
Recognizing text in images (OCR) with VNRecognizeTextRequest
Detecting barcodes and QR codes with VNDetectBarcodesRequest
Building live scanners with DataScannerViewController
Scanning documents with VNDocumentCameraViewController
Extracting structured document data with RecognizeDocumentsRequest (iOS 26+)

Related skills: See axiom-vision for decision trees and patterns, axiom-vision-diag for troubleshooting

Vision Framework Overview

Vision provides computer vision algorithms for still images and video:

Core workflow:

Create request (e.g., VNDetectHumanHandPoseRequest())
Create handler with image (VNImageRequestHandler(cgImage: image))
Perform request (try handler.perform([request]))
Access observations from request.results

Coordinate system: Lower-left origin, normalized (0.0-1.0) coordinates

Performance: Run on background queue - resource intensive, blocks UI if on main thread

Request Handlers

Vision provides two request handlers for different scenarios.

VNImageRequestHandler

Analyzes a single image. Initialize with the image, perform requests against it, discard.

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request1, request2])  // Multiple requests, one image

Initialize with: CGImage, CIImage, CVPixelBuffer, Data, or URL

Rule: One handler per image. Reusing a handler with a different image is unsupported.

VNSequenceRequestHandler

Analyzes a sequence of frames (video, camera feed). Initialize empty, pass each frame to perform(). Maintains inter-frame state for temporal smoothing.

let sequenceHandler = VNSequenceRequestHandler()

// In your camera/video frame callback:
func processFrame(_ pixelBuffer: CVPixelBuffer) throws {
    try sequenceHandler.perform([request], on: pixelBuffer)
}

Rule: Create once, reuse across frames. The handler tracks state between calls.

When to Use Which

Use Case	Handler
Single photo or screenshot	`VNImageRequestHandler`
Video stream or camera frames	`VNSequenceRequestHandler`
Temporal smoothing (pose, segmentation)	`VNSequenceRequestHandler`
One-off analysis of a CVPixelBuffer	`VNImageRequestHandler`

Requests That Benefit from Sequence Handling

These requests use inter-frame state when run through VNSequenceRequestHandler:

VNDetectHumanBodyPoseRequest — Smoother joint tracking
VNDetectHumanHandPoseRequest — Smoother landmark tracking
VNGeneratePersonSegmentationRequest — Temporally consistent masks
VNGeneratePersonInstanceMaskRequest — Stable person identity across frames
VNDetectDocumentSegmentationRequest — Stable document edges
Any VNStatefulRequest subclass — Designed for sequences

Common Mistake

Creating a new VNImageRequestHandler per video frame discards temporal context. Pose landmarks jitter, segmentation masks flicker, and you lose the smoothing that sequence handling provides.

// Wrong — loses temporal context every frame
func processFrame(_ buffer: CVPixelBuffer) throws {
    let handler = VNImageRequestHandler(cvPixelBuffer: buffer)
    try handler.perform([poseRequest])
}

// Right — maintains inter-frame state
let sequenceHandler = VNSequenceRequestHandler()
func processFrame(_ buffer: CVPixelBuffer) throws {
    try sequenceHandler.perform([poseRequest], on: buffer)
}

Subject Segmentation APIs

VNGenerateForegroundInstanceMaskRequest

Availability: iOS 17+, macOS 14+, tvOS 17+, visionOS 1+

Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)

Basic Usage

let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)

try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

InstanceMaskObservation

allInstances: IndexSet containing all foreground instance indices (excludes background 0)

instanceMask: CVPixelBuffer with UInt8 labels (0 = background, 1+ = instance indices)

instanceAtPoint(_:): Returns instance index at normalized point

let point = CGPoint(x: 0.5, y: 0.5)  // Center of image
let instance = observation.instanceAtPoint(point)

if instance == 0 {
    print("Background tapped")
} else {
    print("Instance \(instance) tapped")
}

Generating Masks

createScaledMask(for:croppedToInstancesContent:)

Parameters:

for: IndexSet of instances to include
croppedToInstancesContent:
- false = Output matches input resolution (for compositing)
- true = Tight crop around selected instances

Returns: Single-channel floating-point CVPixelBuffer (soft segmentation mask)

// All instances, full resolution
let mask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// Single instance, cropped
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
    for: instances,
    croppedToInstancesContent: true
)

Instance Mask Hit Testing

Access raw pixel buffer to map tap coordinates to instance labels:

let instanceMask = observation.instanceMask

CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }

let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)

// Convert normalized tap to pixel coordinates
let pixelPoint = VNImagePointForNormalizedPoint(
    CGPoint(x: normalizedX, y: normalizedY),
    width: imageWidth,
    height: imageHeight
)

// Calculate byte offset
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)

// Read instance label
let label = UnsafeRawPointer(baseAddress!).load(
    fromByteOffset: offset,
    as: UInt8.self
)

let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))

VisionKit Subject Lifting

ImageAnalysisInteraction (iOS)

Availability: iOS 16+, iPadOS 16+

Adds system-like subject lifting UI to views:

let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject  // Or .automatic
imageView.addInteraction(interaction)

Interaction types:

.automatic: Subject lifting + Live Text + data detectors
.imageSubject: Subject lifting only (no interactive text)

ImageAnalysisOverlayView (macOS)

Availability: macOS 13+

let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)

Programmatic Access

ImageAnalyzer

let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])

let analysis = try await analyzer.analyze(image, configuration: configuration)

ImageAnalysis

subjects: [Subject] - All subjects in image

highlightedSubjects: Set<Subject> - Currently highlighted (user long-pressed)

subject(at:): Async lookup of subject at normalized point (returns nil if none)

// Get all subjects
let subjects = analysis.subjects

// Look up subject at tap
if let subject = try await analysis.subject(at: tapPoint) {
    // Process subject
}

// Change highlight state
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])

Subject Struct

image: UIImage/NSImage - Extracted subject with transparency

bounds: CGRect - Subject boundaries in image coordinates

// Single subject image
let subjectImage = subject.image

// Composite multiple subjects
let compositeImage = try await analysis.image(for: [subject1, subject2])

Out-of-process: VisionKit analysis happens out-of-process (performance benefit, image size limited)

Person Segmentation APIs

VNGeneratePersonSegmentationRequest

Availability: iOS 15+, macOS 12+

Returns single mask containing all people in image:

let request = VNGeneratePersonSegmentationRequest()
// Configure quality level if needed
try handler.perform([request])

guard let observation = request.results?.first as? VNPixelBufferObservation else {
    return
}

let personMask = observation.pixelBuffer  // CVPixelBuffer

VNGeneratePersonInstanceMaskRequest

Availability: iOS 17+, macOS 14+

Returns separate masks for up to 4 people:

let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

// Same InstanceMaskObservation API as foreground instance masks
let allPeople = observation.allInstances  // Up to 4 people (1-4)

// Get mask for person 1
let person1Mask = try observation.createScaledMask(
    for: IndexSet(integer: 1),
    croppedToInstancesContent: false
)

Limitations:

Segments up to 4 people
With >4 people: may miss people or combine them (typically background people)
Use VNDetectFaceRectanglesRequest to count faces if you need to handle crowded scenes

Hand Pose Detection

VNDet

Content truncated.

More by CharlesWiltgen

View all skills by CharlesWiltgen →

axiom-swiftui-nav-diag

CharlesWiltgen

Use when debugging navigation not responding, unexpected pops, deep links showing wrong screen, state lost on tab switch or background, crashes in navigationDestination, or any SwiftUI navigation failure - systematic diagnostics with production crisis defense

axiom-swiftui-26-ref

CharlesWiltgen

Use when implementing iOS 26 SwiftUI features - covers Liquid Glass design system, performance improvements, @Animatable macro, 3D spatial layout, scene bridging, WebView/WebPage, AttributedString rich text editing, drag and drop enhancements, and visionOS integration for iOS 26+

axiom-extensions-widgets-ref

CharlesWiltgen

Use when implementing widgets, Live Activities, Control Center controls, or app extensions - comprehensive API reference for WidgetKit, ActivityKit, App Groups, and extension lifecycle for iOS 14+

axiom-ios-build

CharlesWiltgen

Use when ANY iOS build fails, test crashes, Xcode misbehaves, or environment issue occurs before debugging code. Covers build failures, compilation errors, dependency conflicts, simulator problems, environment-first diagnostics.

333

axiom-ios-vision

CharlesWiltgen

Use when implementing ANY computer vision feature - image analysis, object detection, pose detection, person segmentation, subject lifting, hand/body pose tracking.

axiom-camera-capture-ref

CharlesWiltgen

Reference — AVCaptureSession, AVCapturePhotoSettings, AVCapturePhotoOutput, RotationCoordinator, photoQualityPrioritization, deferred processing, AVCaptureMovieFileOutput, session presets, capture device APIs

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,8892,530

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,8201,662

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1561,645

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,2691,469

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,4741,225

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,961969

Related MCP Servers

Browse all servers

Stackql

Query, provision and operate Cloud, SaaS, API and Model Context Protocol (MCP) resources through a unified SQL-based framework for humans and AI agents.

0 tools

Xpenser

Income/expense tracking application based on CleverBrush full-stack framework

0 tools

Playwright Browser Automation

Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium software testing tools.

28,44922 tools

Figma Context

Unlock seamless Figma to code: streamline Figma to HTML with Framelink MCP Server for fast, accurate design-to-code workflows.

13,4900 tools

Uno Platform

Uno Platform — Documentation and prompts for building cross-platform .NET apps with a single codebase. Get guides, samples, and best practices.

9,8441 tools

MCP Use

The fullstack MCP framework for developing MCP apps for ChatGPT, Claude, and building MCP servers for AI agents.

9,3960 tools

Install

mkdir -p .claude/skills/axiom-vision-ref && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4362" && unzip -o skill.zip -d .claude/skills/axiom-vision-ref && rm skill.zip

Installs to .claude/skills/axiom-vision-ref

Stats

Views

Installs

Author

CharlesWiltgen

7 skills published

Links

Source Code

axiom-vision-ref

Install

About this skill

Vision Framework API Reference

When to Use This Reference

Vision Framework Overview

Request Handlers

VNImageRequestHandler

VNSequenceRequestHandler

When to Use Which

Requests That Benefit from Sequence Handling

Common Mistake

Subject Segmentation APIs

VNGenerateForegroundInstanceMaskRequest

Basic Usage

InstanceMaskObservation

Generating Masks

Instance Mask Hit Testing

VisionKit Subject Lifting

ImageAnalysisInteraction (iOS)

ImageAnalysisOverlayView (macOS)

Programmatic Access

ImageAnalyzer

ImageAnalysis

Subject Struct

Person Segmentation APIs

VNGeneratePersonSegmentationRequest

VNGeneratePersonInstanceMaskRequest

Hand Pose Detection

VNDet

More by CharlesWiltgen

axiom-swiftui-nav-diag

axiom-swiftui-26-ref

axiom-extensions-widgets-ref

axiom-ios-build

axiom-ios-vision

axiom-camera-capture-ref

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers