axiom-vision-diag
subject not detected, hand pose missing landmarks, low confidence observations, Vision performance, coordinate conversion, VisionKit errors, observation nil, text not recognized, barcode not detected, DataScannerViewController not working, document scan issues
Install
mkdir -p .claude/skills/axiom-vision-diag && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7044" && unzip -o skill.zip -d .claude/skills/axiom-vision-diag && rm skill.zipInstalls to .claude/skills/axiom-vision-diag
About this skill
Vision Framework Diagnostics
Systematic troubleshooting for Vision framework issues: subjects not detected, missing landmarks, low confidence, performance problems, coordinate mismatches, text recognition failures, barcode detection issues, and document scanning problems.
Overview
Core Principle: When Vision doesn't work, the problem is usually:
- Environment (lighting, occlusion, edge of frame) - 40%
- Confidence threshold (ignoring low confidence data) - 30%
- Threading (blocking main thread causes frozen UI) - 15%
- Coordinates (mixing lower-left and top-left origins) - 10%
- API availability (using iOS 17+ APIs on older devices) - 5%
Always check environment and confidence BEFORE debugging code.
Red Flags
Symptoms that indicate Vision-specific issues:
| Symptom | Likely Cause |
|---|---|
| Subject not detected at all | Edge of frame, poor lighting, very small subject |
| Hand landmarks intermittently nil | Hand near edge, parallel to camera, glove/occlusion |
| Body pose skipped frames | Person bent over, upside down, flowing clothing |
| UI freezes during processing | Running Vision on main thread |
| Overlays in wrong position | Coordinate conversion (lower-left vs top-left) |
| Crash on older devices | Using iOS 17+ APIs without @available check |
| Person segmentation misses people | >4 people in scene (instance mask limit) |
| Low FPS in camera feed | maximumHandCount too high, not dropping frames |
| Text not recognized at all | Blurry image, stylized font, wrong recognition level |
| Text misread (wrong characters) | Language correction disabled, missing custom words |
| Barcode not detected | Wrong symbology, code too small, glare/reflection |
| DataScanner shows blank screen | Camera access denied, device not supported |
| Document edges not detected | Low contrast, non-rectangular, glare |
| Real-time scanning too slow | Processing every frame, region too large |
Mandatory First Steps
Before investigating code, run these diagnostics:
Step 1: Verify Detection with Diagnostic Code
let request = VNGenerateForegroundInstanceMaskRequest() // Or hand/body pose
let handler = VNImageRequestHandler(cgImage: testImage)
do {
try handler.perform([request])
if let results = request.results {
print("✅ Request succeeded")
print("Result count: \(results.count)")
if let observation = results.first as? VNInstanceMaskObservation {
print("All instances: \(observation.allInstances)")
print("Instance count: \(observation.allInstances.count)")
}
} else {
print("⚠️ Request succeeded but no results")
}
} catch {
print("❌ Request failed: \(error)")
}
Expected output:
- ✅ Request succeeded, instance count > 0 → Detection working
- ⚠️ Request succeeded, instance count = 0 → Nothing detected (see Decision Tree)
- ❌ Request failed → API availability issue
Step 2: Check Confidence Scores
// For hand/body pose
if let observation = request.results?.first as? VNHumanHandPoseObservation {
let allPoints = try observation.recognizedPoints(.all)
for (key, point) in allPoints {
print("\(key): confidence \(point.confidence)")
if point.confidence < 0.3 {
print(" ⚠️ LOW CONFIDENCE - unreliable")
}
}
}
Expected output:
- Most landmarks > 0.5 confidence → Good detection
- Many landmarks < 0.3 → Poor lighting, occlusion, or edge of frame
Step 3: Verify Threading
print("🧵 Thread: \(Thread.current)")
if Thread.isMainThread {
print("❌ Running on MAIN THREAD - will block UI!")
} else {
print("✅ Running on background thread")
}
Expected output:
- ✅ Background thread → Correct
- ❌ Main thread → Move to
DispatchQueue.global()
Decision Tree
Vision not working as expected?
│
├─ No results returned?
│ ├─ Check Step 1 output
│ │ ├─ "Request failed" → See Pattern 1a (API availability)
│ │ ├─ "No results" → See Pattern 1b (nothing detected)
│ │ └─ Results but count = 0 → See Pattern 1c (edge of frame)
│
├─ Landmarks have nil/low confidence?
│ ├─ Hand pose → See Pattern 2 (hand detection issues)
│ ├─ Body pose → See Pattern 3 (body detection issues)
│ └─ Face detection → See Pattern 4 (face detection issues)
│
├─ UI freezing/slow?
│ ├─ Check Step 3 (threading)
│ │ ├─ Main thread → See Pattern 5a (move to background)
│ │ └─ Background thread → See Pattern 5b (performance tuning)
│
├─ Overlays in wrong position?
│ └─ See Pattern 6 (coordinate conversion)
│
├─ Person segmentation missing people?
│ └─ See Pattern 7 (crowded scenes)
│
├─ VisionKit not working?
│ └─ See Pattern 8 (VisionKit specific)
│
├─ Text recognition issues?
│ ├─ No text detected → See Pattern 9a (image quality)
│ ├─ Wrong characters → See Pattern 9b (language/correction)
│ └─ Too slow → See Pattern 9c (recognition level)
│
├─ Barcode detection issues?
│ ├─ Barcode not detected → See Pattern 10a (symbology/size)
│ └─ Wrong payload → See Pattern 10b (barcode quality)
│
├─ DataScannerViewController issues?
│ ├─ Blank screen → See Pattern 11a (availability check)
│ └─ Items not detected → See Pattern 11b (data types)
│
└─ Document scanning issues?
├─ Edges not detected → See Pattern 12a (contrast/shape)
└─ Perspective wrong → See Pattern 12b (corner points)
Diagnostic Patterns
Pattern 1a: Request Failed (API Availability)
Symptom: try handler.perform([request]) throws error
Common errors:
"VNGenerateForegroundInstanceMaskRequest is only available on iOS 17.0 or newer"
"VNDetectHumanBodyPose3DRequest is only available on iOS 17.0 or newer"
Root cause: Using iOS 17+ APIs on older deployment target
Fix:
if #available(iOS 17.0, *) {
let request = VNGenerateForegroundInstanceMaskRequest()
// ...
} else {
// Fallback for iOS 14-16
let request = VNGeneratePersonSegmentationRequest()
// ...
}
Prevention: Check API availability in axiom-vision-ref before implementing
Time to fix: 10 min
Pattern 1b: No Results (Nothing Detected)
Symptom: request.results == nil or results.isEmpty
Diagnostic:
// 1. Save debug image to Photos
UIImageWriteToSavedPhotosAlbum(debugImage, nil, nil, nil)
// 2. Inspect visually
// - Is subject too small? (< 10% of image)
// - Is subject blurry?
// - Poor contrast with background?
Common causes:
- Subject too small (resize or crop closer)
- Subject too blurry (increase lighting, stabilize camera)
- Low contrast (subject same color as background)
Fix:
// Crop image to focus on region of interest
let croppedImage = cropImage(sourceImage, to: regionOfInterest)
let handler = VNImageRequestHandler(cgImage: croppedImage)
Time to fix: 30 min
Pattern 1c: Edge of Frame Issues
Symptom: Subject detected intermittently as object moves across frame
Root cause: Partial occlusion when subject touches image edges
Diagnostic:
// Check if subject is near edges
if let observation = results.first as? VNInstanceMaskObservation {
let mask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: true
)
let bounds = calculateMaskBounds(mask)
if bounds.minX < 0.1 || bounds.maxX > 0.9 ||
bounds.minY < 0.1 || bounds.maxY > 0.9 {
print("⚠️ Subject too close to edge")
}
}
Fix:
// Add padding to capture area
let paddedRect = captureRect.insetBy(dx: -20, dy: -20)
// OR guide user with on-screen overlay
overlayView.addSubview(guideBox) // Visual boundary
Time to fix: 20 min
Pattern 2: Hand Pose Issues
Symptom: VNDetectHumanHandPoseRequest returns nil or low confidence landmarks
Diagnostic:
if let observation = request.results?.first as? VNHumanHandPoseObservation {
let thumbTip = try? observation.recognizedPoint(.thumbTip)
let wrist = try? observation.recognizedPoint(.wrist)
print("Thumb confidence: \(thumbTip?.confidence ?? 0)")
print("Wrist confidence: \(wrist?.confidence ?? 0)")
// Check hand orientation
if let thumb = thumbTip, let wristPoint = wrist {
let angle = atan2(
thumb.location.y - wristPoint.location.y,
thumb.location.x - wristPoint.location.x
)
print("Hand angle: \(angle * 180 / .pi) degrees")
if abs(angle) > 80 && abs(angle) < 100 {
print("⚠️ Hand parallel to camera (hard to detect)")
}
}
}
Common causes:
| Cause | Confidence Pattern | Fix |
|---|---|---|
| Hand near edge | Tips have low confidence | Adjust framing |
| Hand parallel to camera | All landmarks low | Prompt user to rotate hand |
| Gloves/occlusion | Fingers low, wrist high | Remove gloves or change lighting |
| Feet detected as hands | Unexpected hand detected | Add chirality check or ignore |
Fix for parallel hand:
// Detect and warn user
if avgConfidence < 0.4 {
showWarning("Rotate your hand toward the camera")
}
Time to fix: 45 min
Pattern 3: Body Pose Issues
Symptom: VNDetectHumanBodyPoseRequest skips frames or returns low confidence
Diagnostic:
if let observation = request.results?.first as? VNHumanBodyPoseObservation {
let nose = try? observation.recognizedPoint(.nose)
let root = try? observation.recognizedPoint(.root)
if let nosePoint = nose, let rootPoint = root {
let bodyAngle = atan2(
nosePoint.location.y - rootPoint.location.y,
nosePoint.location.x - rootPoint.location.x
)
let angleFromVertical = abs(bodyAngle - .pi / 2)
if angleFromVertical > .pi / 4 {
print("⚠️ Person bent over or upside down")
}
}
}
Common causes:
| Cause | Solution |
|---|---|
| Person bent |
Content truncated.
More by CharlesWiltgen
View all skills by CharlesWiltgen →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversEasily convert markdown to PDF using Markitdown MCP server. Supports HTTP, STDIO, and SSE for fast converting markdown t
Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea
Use Chrome DevTools for web site test speed, debugging, and performance analysis. The essential chrome developer tools f
Chrome extension-based MCP server that exposes browser functionality to AI assistants. Control tabs, capture screenshots
Genkit — consume MCP resources or expose powerful Genkit tools as a server for streamlined development and integration.
Create modern React UI components instantly with Magic AI Agent. Integrates with top IDEs for fast, stunning design and
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.