axiom-vision-diag

0
0
Source

subject not detected, hand pose missing landmarks, low confidence observations, Vision performance, coordinate conversion, VisionKit errors, observation nil, text not recognized, barcode not detected, DataScannerViewController not working, document scan issues

Install

mkdir -p .claude/skills/axiom-vision-diag && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7044" && unzip -o skill.zip -d .claude/skills/axiom-vision-diag && rm skill.zip

Installs to .claude/skills/axiom-vision-diag

About this skill

Vision Framework Diagnostics

Systematic troubleshooting for Vision framework issues: subjects not detected, missing landmarks, low confidence, performance problems, coordinate mismatches, text recognition failures, barcode detection issues, and document scanning problems.

Overview

Core Principle: When Vision doesn't work, the problem is usually:

  1. Environment (lighting, occlusion, edge of frame) - 40%
  2. Confidence threshold (ignoring low confidence data) - 30%
  3. Threading (blocking main thread causes frozen UI) - 15%
  4. Coordinates (mixing lower-left and top-left origins) - 10%
  5. API availability (using iOS 17+ APIs on older devices) - 5%

Always check environment and confidence BEFORE debugging code.

Red Flags

Symptoms that indicate Vision-specific issues:

SymptomLikely Cause
Subject not detected at allEdge of frame, poor lighting, very small subject
Hand landmarks intermittently nilHand near edge, parallel to camera, glove/occlusion
Body pose skipped framesPerson bent over, upside down, flowing clothing
UI freezes during processingRunning Vision on main thread
Overlays in wrong positionCoordinate conversion (lower-left vs top-left)
Crash on older devicesUsing iOS 17+ APIs without @available check
Person segmentation misses people>4 people in scene (instance mask limit)
Low FPS in camera feedmaximumHandCount too high, not dropping frames
Text not recognized at allBlurry image, stylized font, wrong recognition level
Text misread (wrong characters)Language correction disabled, missing custom words
Barcode not detectedWrong symbology, code too small, glare/reflection
DataScanner shows blank screenCamera access denied, device not supported
Document edges not detectedLow contrast, non-rectangular, glare
Real-time scanning too slowProcessing every frame, region too large

Mandatory First Steps

Before investigating code, run these diagnostics:

Step 1: Verify Detection with Diagnostic Code

let request = VNGenerateForegroundInstanceMaskRequest()  // Or hand/body pose
let handler = VNImageRequestHandler(cgImage: testImage)

do {
    try handler.perform([request])

    if let results = request.results {
        print("✅ Request succeeded")
        print("Result count: \(results.count)")

        if let observation = results.first as? VNInstanceMaskObservation {
            print("All instances: \(observation.allInstances)")
            print("Instance count: \(observation.allInstances.count)")
        }
    } else {
        print("⚠️ Request succeeded but no results")
    }
} catch {
    print("❌ Request failed: \(error)")
}

Expected output:

  • ✅ Request succeeded, instance count > 0 → Detection working
  • ⚠️ Request succeeded, instance count = 0 → Nothing detected (see Decision Tree)
  • ❌ Request failed → API availability issue

Step 2: Check Confidence Scores

// For hand/body pose
if let observation = request.results?.first as? VNHumanHandPoseObservation {
    let allPoints = try observation.recognizedPoints(.all)

    for (key, point) in allPoints {
        print("\(key): confidence \(point.confidence)")

        if point.confidence < 0.3 {
            print("  ⚠️ LOW CONFIDENCE - unreliable")
        }
    }
}

Expected output:

  • Most landmarks > 0.5 confidence → Good detection
  • Many landmarks < 0.3 → Poor lighting, occlusion, or edge of frame

Step 3: Verify Threading

print("🧵 Thread: \(Thread.current)")

if Thread.isMainThread {
    print("❌ Running on MAIN THREAD - will block UI!")
} else {
    print("✅ Running on background thread")
}

Expected output:

  • ✅ Background thread → Correct
  • ❌ Main thread → Move to DispatchQueue.global()

Decision Tree

Vision not working as expected?
│
├─ No results returned?
│  ├─ Check Step 1 output
│  │  ├─ "Request failed" → See Pattern 1a (API availability)
│  │  ├─ "No results" → See Pattern 1b (nothing detected)
│  │  └─ Results but count = 0 → See Pattern 1c (edge of frame)
│
├─ Landmarks have nil/low confidence?
│  ├─ Hand pose → See Pattern 2 (hand detection issues)
│  ├─ Body pose → See Pattern 3 (body detection issues)
│  └─ Face detection → See Pattern 4 (face detection issues)
│
├─ UI freezing/slow?
│  ├─ Check Step 3 (threading)
│  │  ├─ Main thread → See Pattern 5a (move to background)
│  │  └─ Background thread → See Pattern 5b (performance tuning)
│
├─ Overlays in wrong position?
│  └─ See Pattern 6 (coordinate conversion)
│
├─ Person segmentation missing people?
│  └─ See Pattern 7 (crowded scenes)
│
├─ VisionKit not working?
│  └─ See Pattern 8 (VisionKit specific)
│
├─ Text recognition issues?
│  ├─ No text detected → See Pattern 9a (image quality)
│  ├─ Wrong characters → See Pattern 9b (language/correction)
│  └─ Too slow → See Pattern 9c (recognition level)
│
├─ Barcode detection issues?
│  ├─ Barcode not detected → See Pattern 10a (symbology/size)
│  └─ Wrong payload → See Pattern 10b (barcode quality)
│
├─ DataScannerViewController issues?
│  ├─ Blank screen → See Pattern 11a (availability check)
│  └─ Items not detected → See Pattern 11b (data types)
│
└─ Document scanning issues?
   ├─ Edges not detected → See Pattern 12a (contrast/shape)
   └─ Perspective wrong → See Pattern 12b (corner points)

Diagnostic Patterns

Pattern 1a: Request Failed (API Availability)

Symptom: try handler.perform([request]) throws error

Common errors:

"VNGenerateForegroundInstanceMaskRequest is only available on iOS 17.0 or newer"
"VNDetectHumanBodyPose3DRequest is only available on iOS 17.0 or newer"

Root cause: Using iOS 17+ APIs on older deployment target

Fix:

if #available(iOS 17.0, *) {
    let request = VNGenerateForegroundInstanceMaskRequest()
    // ...
} else {
    // Fallback for iOS 14-16
    let request = VNGeneratePersonSegmentationRequest()
    // ...
}

Prevention: Check API availability in axiom-vision-ref before implementing

Time to fix: 10 min

Pattern 1b: No Results (Nothing Detected)

Symptom: request.results == nil or results.isEmpty

Diagnostic:

// 1. Save debug image to Photos
UIImageWriteToSavedPhotosAlbum(debugImage, nil, nil, nil)

// 2. Inspect visually
// - Is subject too small? (< 10% of image)
// - Is subject blurry?
// - Poor contrast with background?

Common causes:

  • Subject too small (resize or crop closer)
  • Subject too blurry (increase lighting, stabilize camera)
  • Low contrast (subject same color as background)

Fix:

// Crop image to focus on region of interest
let croppedImage = cropImage(sourceImage, to: regionOfInterest)
let handler = VNImageRequestHandler(cgImage: croppedImage)

Time to fix: 30 min

Pattern 1c: Edge of Frame Issues

Symptom: Subject detected intermittently as object moves across frame

Root cause: Partial occlusion when subject touches image edges

Diagnostic:

// Check if subject is near edges
if let observation = results.first as? VNInstanceMaskObservation {
    let mask = try observation.createScaledMask(
        for: observation.allInstances,
        croppedToInstancesContent: true
    )

    let bounds = calculateMaskBounds(mask)

    if bounds.minX < 0.1 || bounds.maxX > 0.9 ||
       bounds.minY < 0.1 || bounds.maxY > 0.9 {
        print("⚠️ Subject too close to edge")
    }
}

Fix:

// Add padding to capture area
let paddedRect = captureRect.insetBy(dx: -20, dy: -20)

// OR guide user with on-screen overlay
overlayView.addSubview(guideBox)  // Visual boundary

Time to fix: 20 min

Pattern 2: Hand Pose Issues

Symptom: VNDetectHumanHandPoseRequest returns nil or low confidence landmarks

Diagnostic:

if let observation = request.results?.first as? VNHumanHandPoseObservation {
    let thumbTip = try? observation.recognizedPoint(.thumbTip)
    let wrist = try? observation.recognizedPoint(.wrist)

    print("Thumb confidence: \(thumbTip?.confidence ?? 0)")
    print("Wrist confidence: \(wrist?.confidence ?? 0)")

    // Check hand orientation
    if let thumb = thumbTip, let wristPoint = wrist {
        let angle = atan2(
            thumb.location.y - wristPoint.location.y,
            thumb.location.x - wristPoint.location.x
        )
        print("Hand angle: \(angle * 180 / .pi) degrees")

        if abs(angle) > 80 && abs(angle) < 100 {
            print("⚠️ Hand parallel to camera (hard to detect)")
        }
    }
}

Common causes:

CauseConfidence PatternFix
Hand near edgeTips have low confidenceAdjust framing
Hand parallel to cameraAll landmarks lowPrompt user to rotate hand
Gloves/occlusionFingers low, wrist highRemove gloves or change lighting
Feet detected as handsUnexpected hand detectedAdd chirality check or ignore

Fix for parallel hand:

// Detect and warn user
if avgConfidence < 0.4 {
    showWarning("Rotate your hand toward the camera")
}

Time to fix: 45 min

Pattern 3: Body Pose Issues

Symptom: VNDetectHumanBodyPoseRequest skips frames or returns low confidence

Diagnostic:

if let observation = request.results?.first as? VNHumanBodyPoseObservation {
    let nose = try? observation.recognizedPoint(.nose)
    let root = try? observation.recognizedPoint(.root)

    if let nosePoint = nose, let rootPoint = root {
        let bodyAngle = atan2(
            nosePoint.location.y - rootPoint.location.y,
            nosePoint.location.x - rootPoint.location.x
        )

        let angleFromVertical = abs(bodyAngle - .pi / 2)

        if angleFromVertical > .pi / 4 {
            print("⚠️ Person bent over or upside down")
        }
    }
}

Common causes:

CauseSolution
Person bent

Content truncated.

axiom-ios-build

CharlesWiltgen

Use when ANY iOS build fails, test crashes, Xcode misbehaves, or environment issue occurs before debugging code. Covers build failures, compilation errors, dependency conflicts, simulator problems, environment-first diagnostics.

91

axiom-getting-started

CharlesWiltgen

Use when first installing Axiom, unsure which skill to use, want an overview of available skills, or need help finding the right skill for your situation — interactive onboarding that recommends skills based on your project and current focus

00

axiom-ui-testing

CharlesWiltgen

Use when writing UI tests, recording interactions, tests have race conditions, timing dependencies, inconsistent pass/fail behavior, or XCTest UI tests are flaky - covers Recording UI Automation (WWDC 2025), condition-based waiting, network conditioning, multi-factor testing, crash debugging, and accessibility-first testing patterns

00

axiom-core-spotlight-ref

CharlesWiltgen

Use when indexing app content for Spotlight search, using NSUserActivity for prediction/handoff, or choosing between CSSearchableItem and IndexedEntity - covers Core Spotlight framework and NSUserActivity integration for iOS 9+

00

axiom-now-playing-carplay

CharlesWiltgen

CarPlay Now Playing integration patterns. Use when implementing CarPlay audio controls, CPNowPlayingTemplate customization, or debugging CarPlay-specific issues.

00

axiom-ios-concurrency

CharlesWiltgen

Use when writing ANY code with async, actors, threads, or seeing ANY concurrency error. Covers Swift 6 concurrency, @MainActor, Sendable, data races, async/await patterns, performance optimization.

00

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

641968

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

590705

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

339397

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

318395

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

450339

fastapi-templates

wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

304231

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.