axiom-vision-diag

1views

1installs

subject not detected, hand pose missing landmarks, low confidence observations, Vision performance, coordinate conversion, VisionKit errors, observation nil, text not recognized, barcode not detected, DataScannerViewController not working, document scan issues

Install

mkdir -p .claude/skills/axiom-vision-diag && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7044" && unzip -o skill.zip -d .claude/skills/axiom-vision-diag && rm skill.zip

Installs to .claude/skills/axiom-vision-diag

About this skill

Vision Framework Diagnostics

Systematic troubleshooting for Vision framework issues: subjects not detected, missing landmarks, low confidence, performance problems, coordinate mismatches, text recognition failures, barcode detection issues, and document scanning problems.

Overview

Core Principle: When Vision doesn't work, the problem is usually:

Environment (lighting, occlusion, edge of frame) - 40%
Confidence threshold (ignoring low confidence data) - 30%
Threading (blocking main thread causes frozen UI) - 15%
Coordinates (mixing lower-left and top-left origins) - 10%
API availability (using iOS 17+ APIs on older devices) - 5%

Always check environment and confidence BEFORE debugging code.

Red Flags

Symptoms that indicate Vision-specific issues:

Symptom	Likely Cause
Subject not detected at all	Edge of frame, poor lighting, very small subject
Hand landmarks intermittently nil	Hand near edge, parallel to camera, glove/occlusion
Body pose skipped frames	Person bent over, upside down, flowing clothing
UI freezes during processing	Running Vision on main thread
Overlays in wrong position	Coordinate conversion (lower-left vs top-left)
Crash on older devices	Using iOS 17+ APIs without `@available` check
Person segmentation misses people	>4 people in scene (instance mask limit)
Low FPS in camera feed	`maximumHandCount` too high, not dropping frames
Text not recognized at all	Blurry image, stylized font, wrong recognition level
Text misread (wrong characters)	Language correction disabled, missing custom words
Barcode not detected	Wrong symbology, code too small, glare/reflection
DataScanner shows blank screen	Camera access denied, device not supported
Document edges not detected	Low contrast, non-rectangular, glare
Real-time scanning too slow	Processing every frame, region too large

Mandatory First Steps

Before investigating code, run these diagnostics:

Step 1: Verify Detection with Diagnostic Code

let request = VNGenerateForegroundInstanceMaskRequest()  // Or hand/body pose
let handler = VNImageRequestHandler(cgImage: testImage)

do {
    try handler.perform([request])

    if let results = request.results {
        print("✅ Request succeeded")
        print("Result count: \(results.count)")

        if let observation = results.first as? VNInstanceMaskObservation {
            print("All instances: \(observation.allInstances)")
            print("Instance count: \(observation.allInstances.count)")
        }
    } else {
        print("⚠️ Request succeeded but no results")
    }
} catch {
    print("❌ Request failed: \(error)")
}

Expected output:

✅ Request succeeded, instance count > 0 → Detection working
⚠️ Request succeeded, instance count = 0 → Nothing detected (see Decision Tree)
❌ Request failed → API availability issue

Step 2: Check Confidence Scores

// For hand/body pose
if let observation = request.results?.first as? VNHumanHandPoseObservation {
    let allPoints = try observation.recognizedPoints(.all)

    for (key, point) in allPoints {
        print("\(key): confidence \(point.confidence)")

        if point.confidence < 0.3 {
            print("  ⚠️ LOW CONFIDENCE - unreliable")
        }
    }
}

Expected output:

Most landmarks > 0.5 confidence → Good detection
Many landmarks < 0.3 → Poor lighting, occlusion, or edge of frame

Step 3: Verify Threading

print("🧵 Thread: \(Thread.current)")

if Thread.isMainThread {
    print("❌ Running on MAIN THREAD - will block UI!")
} else {
    print("✅ Running on background thread")
}

Expected output:

✅ Background thread → Correct
❌ Main thread → Move to DispatchQueue.global()

Decision Tree

Vision not working as expected?
│
├─ No results returned?
│  ├─ Check Step 1 output
│  │  ├─ "Request failed" → See Pattern 1a (API availability)
│  │  ├─ "No results" → See Pattern 1b (nothing detected)
│  │  └─ Results but count = 0 → See Pattern 1c (edge of frame)
│
├─ Landmarks have nil/low confidence?
│  ├─ Hand pose → See Pattern 2 (hand detection issues)
│  ├─ Body pose → See Pattern 3 (body detection issues)
│  └─ Face detection → See Pattern 4 (face detection issues)
│
├─ UI freezing/slow?
│  ├─ Check Step 3 (threading)
│  │  ├─ Main thread → See Pattern 5a (move to background)
│  │  └─ Background thread → See Pattern 5b (performance tuning)
│
├─ Overlays in wrong position?
│  └─ See Pattern 6 (coordinate conversion)
│
├─ Person segmentation missing people?
│  └─ See Pattern 7 (crowded scenes)
│
├─ VisionKit not working?
│  └─ See Pattern 8 (VisionKit specific)
│
├─ Text recognition issues?
│  ├─ No text detected → See Pattern 9a (image quality)
│  ├─ Wrong characters → See Pattern 9b (language/correction)
│  └─ Too slow → See Pattern 9c (recognition level)
│
├─ Barcode detection issues?
│  ├─ Barcode not detected → See Pattern 10a (symbology/size)
│  └─ Wrong payload → See Pattern 10b (barcode quality)
│
├─ DataScannerViewController issues?
│  ├─ Blank screen → See Pattern 11a (availability check)
│  └─ Items not detected → See Pattern 11b (data types)
│
└─ Document scanning issues?
   ├─ Edges not detected → See Pattern 12a (contrast/shape)
   └─ Perspective wrong → See Pattern 12b (corner points)

Diagnostic Patterns

Pattern 1a: Request Failed (API Availability)

Symptom: try handler.perform([request]) throws error

Common errors:

"VNGenerateForegroundInstanceMaskRequest is only available on iOS 17.0 or newer"
"VNDetectHumanBodyPose3DRequest is only available on iOS 17.0 or newer"

Root cause: Using iOS 17+ APIs on older deployment target

Fix:

if #available(iOS 17.0, *) {
    let request = VNGenerateForegroundInstanceMaskRequest()
    // ...
} else {
    // Fallback for iOS 14-16
    let request = VNGeneratePersonSegmentationRequest()
    // ...
}

Prevention: Check API availability in axiom-vision-ref before implementing

Time to fix: 10 min

Pattern 1b: No Results (Nothing Detected)

Symptom: request.results == nil or results.isEmpty

Diagnostic:

// 1. Save debug image to Photos
UIImageWriteToSavedPhotosAlbum(debugImage, nil, nil, nil)

// 2. Inspect visually
// - Is subject too small? (< 10% of image)
// - Is subject blurry?
// - Poor contrast with background?

Common causes:

Subject too small (resize or crop closer)
Subject too blurry (increase lighting, stabilize camera)
Low contrast (subject same color as background)

Fix:

// Crop image to focus on region of interest
let croppedImage = cropImage(sourceImage, to: regionOfInterest)
let handler = VNImageRequestHandler(cgImage: croppedImage)

Time to fix: 30 min

Pattern 1c: Edge of Frame Issues

Symptom: Subject detected intermittently as object moves across frame

Root cause: Partial occlusion when subject touches image edges

Diagnostic:

// Check if subject is near edges
if let observation = results.first as? VNInstanceMaskObservation {
    let mask = try observation.createScaledMask(
        for: observation.allInstances,
        croppedToInstancesContent: true
    )

    let bounds = calculateMaskBounds(mask)

    if bounds.minX < 0.1 || bounds.maxX > 0.9 ||
       bounds.minY < 0.1 || bounds.maxY > 0.9 {
        print("⚠️ Subject too close to edge")
    }
}

Fix:

// Add padding to capture area
let paddedRect = captureRect.insetBy(dx: -20, dy: -20)

// OR guide user with on-screen overlay
overlayView.addSubview(guideBox)  // Visual boundary

Time to fix: 20 min

Pattern 2: Hand Pose Issues

Symptom: VNDetectHumanHandPoseRequest returns nil or low confidence landmarks

Diagnostic:

if let observation = request.results?.first as? VNHumanHandPoseObservation {
    let thumbTip = try? observation.recognizedPoint(.thumbTip)
    let wrist = try? observation.recognizedPoint(.wrist)

    print("Thumb confidence: \(thumbTip?.confidence ?? 0)")
    print("Wrist confidence: \(wrist?.confidence ?? 0)")

    // Check hand orientation
    if let thumb = thumbTip, let wristPoint = wrist {
        let angle = atan2(
            thumb.location.y - wristPoint.location.y,
            thumb.location.x - wristPoint.location.x
        )
        print("Hand angle: \(angle * 180 / .pi) degrees")

        if abs(angle) > 80 && abs(angle) < 100 {
            print("⚠️ Hand parallel to camera (hard to detect)")
        }
    }
}

Common causes:

Cause	Confidence Pattern	Fix
Hand near edge	Tips have low confidence	Adjust framing
Hand parallel to camera	All landmarks low	Prompt user to rotate hand
Gloves/occlusion	Fingers low, wrist high	Remove gloves or change lighting
Feet detected as hands	Unexpected hand detected	Add `chirality` check or ignore

Fix for parallel hand:

// Detect and warn user
if avgConfidence < 0.4 {
    showWarning("Rotate your hand toward the camera")
}

Time to fix: 45 min

Pattern 3: Body Pose Issues

Symptom: VNDetectHumanBodyPoseRequest skips frames or returns low confidence

Diagnostic:

if let observation = request.results?.first as? VNHumanBodyPoseObservation {
    let nose = try? observation.recognizedPoint(.nose)
    let root = try? observation.recognizedPoint(.root)

    if let nosePoint = nose, let rootPoint = root {
        let bodyAngle = atan2(
            nosePoint.location.y - rootPoint.location.y,
            nosePoint.location.x - rootPoint.location.x
        )

        let angleFromVertical = abs(bodyAngle - .pi / 2)

        if angleFromVertical > .pi / 4 {
            print("⚠️ Person bent over or upside down")
        }
    }
}

Common causes:

Cause	Solution
Person bent

Content truncated.

More by CharlesWiltgen

View all skills by CharlesWiltgen →

axiom-swiftui-nav-diag

CharlesWiltgen

Use when debugging navigation not responding, unexpected pops, deep links showing wrong screen, state lost on tab switch or background, crashes in navigationDestination, or any SwiftUI navigation failure - systematic diagnostics with production crisis defense

axiom-swiftui-26-ref

CharlesWiltgen

Use when implementing iOS 26 SwiftUI features - covers Liquid Glass design system, performance improvements, @Animatable macro, 3D spatial layout, scene bridging, WebView/WebPage, AttributedString rich text editing, drag and drop enhancements, and visionOS integration for iOS 26+

axiom-extensions-widgets-ref

CharlesWiltgen

Use when implementing widgets, Live Activities, Control Center controls, or app extensions - comprehensive API reference for WidgetKit, ActivityKit, App Groups, and extension lifecycle for iOS 14+

axiom-ios-build

CharlesWiltgen

Use when ANY iOS build fails, test crashes, Xcode misbehaves, or environment issue occurs before debugging code. Covers build failures, compilation errors, dependency conflicts, simulator problems, environment-first diagnostics.

253

axiom-camera-capture-ref

CharlesWiltgen

Reference — AVCaptureSession, AVCapturePhotoSettings, AVCapturePhotoOutput, RotationCoordinator, photoQualityPrioritization, deferred processing, AVCaptureMovieFileOutput, session presets, capture device APIs

coreml

CharlesWiltgen

Use when deploying custom ML models on-device, converting PyTorch models, compressing models, implementing LLM inference, or optimizing CoreML performance. Covers model conversion, compression, stateful models, KV-cache, multi-function models, MLTensor.

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6881,430

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2721,337

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5451,153

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,359809

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,268732

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,496685

Related MCP Servers

Browse all servers

Markitdown

Easily convert markdown to PDF using Markitdown MCP server. Supports HTTP, STDIO, and SSE for fast converting markdown t

90,3881 tools

Firecrawl

Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea

89,5930 tools

Chrome DevTools

Use Chrome DevTools for web site test speed, debugging, and performance analysis. The essential chrome developer tools f

28,13326 tools

Chrome MCP

Chrome extension-based MCP server that exposes browser functionality to AI assistants. Control tabs, capture screenshots

10,6750 tools

Genkit

Genkit — consume MCP resources or expose powerful Genkit tools as a server for streamlined development and integration.

5,5989 tools

Magic

Create modern React UI components instantly with Magic AI Agent. Integrates with top IDEs for fast, stunning design and

4,3850 tools

Install

mkdir -p .claude/skills/axiom-vision-diag && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7044" && unzip -o skill.zip -d .claude/skills/axiom-vision-diag && rm skill.zip

Installs to .claude/skills/axiom-vision-diag

Stats

Views

Installs

Author

CharlesWiltgen

7 skills published

Links

Source Code

axiom-vision-diag

Install

About this skill

Vision Framework Diagnostics

Overview

Red Flags

Mandatory First Steps

Step 1: Verify Detection with Diagnostic Code

Step 2: Check Confidence Scores

Step 3: Verify Threading

Decision Tree

Diagnostic Patterns

Pattern 1a: Request Failed (API Availability)

Pattern 1b: No Results (Nothing Detected)

Pattern 1c: Edge of Frame Issues

Pattern 2: Hand Pose Issues

Pattern 3: Body Pose Issues

More by CharlesWiltgen

axiom-swiftui-nav-diag

axiom-swiftui-26-ref

axiom-extensions-widgets-ref

axiom-ios-build

axiom-camera-capture-ref

coreml

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers