mflux-debugging

Name: mflux-debugging
Author: filipstrand

by filipstrand

11views

3installs

Source

Debug MLX ports by comparing against a PyTorch/diffusers reference via exported tensors/images (export-then-compare).

Install

mkdir -p .claude/skills/mflux-debugging && curl -L -o skill.zip "https://mcp.directory/api/skills/download/2707" && unzip -o skill.zip -d .claude/skills/mflux-debugging && rm skill.zip

Installs to .claude/skills/mflux-debugging

About this skill

mflux debugging (MLX parity vs PyTorch/diffusers)

Use this skill when you are porting a model to MLX and need to prove numerical parity (or isolate where it diverges) versus a PyTorch reference implementation (often from diffusers).

This skill defaults to export-then-compare:

Run the reference once and export deterministic artifacts (tensors + optional images).
Load those artifacts in MLX and compare with clear thresholds.

When to Use

You suspect a port mismatch (wrong shapes/layout, RoPE, scheduler math, dtype casting, etc).
You want a repeatable workflow to narrow down the first layer/block where outputs diverge.
You need evidence of correctness before refactoring (see mflux-model-porting).

Ground Rules (repo norms)

Use uv to run Python: uv run python -m ...
If you run pytest, preserve outputs: MFLUX_PRESERVE_TEST_OUTPUT=1 (see mflux-testing and the Makefile test targets).
Do not update or replace reference (“golden”) images unless explicitly asked.
Debug artifacts (tensor dumps) should live in a local folder and must not be committed unless explicitly asked.
If you need the broader porting workflow (milestones, ordering, when to refactor), follow mflux-model-porting.
RNG warning: PyTorch and MLX RNGs are different. Matching the same integer seed is not enough for parity—export the exact initial noise/latents from the reference and load them in MLX.
Practical setup: the PyTorch reference repo (often diffusers/) and mflux/ are frequently next to each other on disk (e.g. both on your Desktop). Use absolute paths when in doubt.

Default Workflow (export-then-compare)

Preferred workflow: two tiny scripts + inline dumps

For day-to-day debugging, prefer a minimal paired repro:

One simple script in the reference repo (often diffusers/), e.g. diffusers/flux2_klein_edit_debug.py
One simple script in mflux/, e.g. mflux/flux2_klein_edit_debug.py

Keep them “boring”: hardcoded variables, no cli, no framework, and just a few np.savez(...) / mx.save(...) lines at the right spot.

The key trick for RNG parity:

In the reference script, compute latents once, save them, then pass them back into the pipeline (latents=...) so the run definitely uses the dumped tensor.
In rare cases where a tensor needs to be saved from within a loop, make sure its name reflects the loop conditions (e.g the 4th noise predcition in a 10 step loop etc).
In the MLX script, load that same latents file and feed it into the MLX run (do not rely on matching integer seeds).

1) Pick a single deterministic repro

Fix seed(s), prompt(s), height/width, steps, guidance, and any input image paths.
Keep the first repro small if possible (fewer steps, smaller resolution) to iterate quickly.

2) Decide your checkpoints (what to dump)

Start coarse, then narrow:

VAE: packed latents before decode; optional intermediate activations for one block if needed.
Transformer: hidden states at entry/exit of the model, then per-block (or every N blocks), then inside attention/MLP.
Text encoder: token ids + attention mask, embeddings output, then per-layer hidden states if needed.
Scheduler: timesteps/sigmas/alphas and the predicted noise/velocity per step.

Tip: work “backwards from pixels” like mflux-model-porting suggests: validate VAE decode first with exported latents, then the diffusion/transformer loop, then text encoder.

3) Export artifacts from the PyTorch reference (no logic changes)

Create a run directory like:

./debug_artifacts/<run_id>/ref/

Export with one of these patterns:

NumPy: np.savez(path, **tensors_as_numpy)
PyTorch: torch.save(dict_of_tensors, path)

4) Run the MLX side with the same inputs and compare

Create a matching run directory:

./debug_artifacts/<run_id>/mlx/

Load and compare tensors. For each checkpoint, report:

Shape + dtype
max_abs_diff, mean_abs_diff
max_rel_diff (guarding division by zero)
Pass/fail with a clearly stated rtol/atol
It is more important to inspect actual tensor values (e.g., first 10 elements) than rely on summary stats.
Statistics can mislead; small-looking stats can hide systematic drift or sign flips.
Prefer runtime tensor dumps over code reading; code can use different conventions yet still represent the same math.

Suggested tolerance starting points (adjust per component):

fp32 comparisons: atol=1e-5, rtol=1e-5
fp16/bf16 comparisons: atol=1e-2, rtol=1e-2
If comparing images: compare both (a) tensor space before final clamp and (b) saved png visually, since tiny numeric diffs can look identical.

If a checkpoint fails:

Add an earlier checkpoint and repeat (binary search through the forward path).

Common Causes of Divergence (high-signal checklist)

Layout mistakes: NCHW vs NHWC, transposes around convs/attention, flatten/reshape ordering. Some operations like convolutions can have different conventions between libraries.
Broadcasting: scale/shift vectors applied on the wrong axis (common in RoPE and modulation).
Dtype casting: reference silently upcasts to fp32 for norm/softmax; MLX path stays in fp16.
RoPE details: position ids, reshape order, whether cos/sin are broadcast over heads vs sequence.
Scheduler math: timestep indexing, sigma/alpha definitions, and off-by-one step order.
Scheduler config: compare sigma schedules directly.
Seed/RNG: ensure you aren’t comparing stochastic paths (dropout, noise sampling) without controlling RNG.
Device dtype: MPS float16 can produce NaNs; prefer bfloat16 for reference dumps if you see NaNs.
Do not use CPU for comparisons; always keep reference runs on MPS to avoid misleading behavior.

Artifact Hygiene

Prefer debug_artifacts/<run_id>/... at repo root.
Do not commit debug_artifacts/ unless explicitly asked.
If you convert the parity check into a test, follow the repo’s testing conventions and preserve outputs (see mflux-testing).
Clean up old artifacts when they are no longer needed, only focus on the current problem and avoid confusion with older artifacts that are not relevant for the current task.

More by filipstrand

View all skills by filipstrand →

mflux-manual-testing

filipstrand

Manually validate mflux CLIs by exercising the changed paths and reviewing output images/artifacts.

102

mflux-cli

filipstrand

Navigate MFLUX CLI capabilities, locate commands by area, and summarize supported features.

172

mflux-dev-env

filipstrand

Set up and work in the mflux dev environment (arm64 expectation, uv, Makefile targets, lint/format/test).

mflux-model-porting

filipstrand

Port ML models into mflux/MLX with correctness-first validation, then refactor toward mflux style.

102

mflux-pr

filipstrand

Make a clean PR in mflux (inspect diff, quick verification, commit, push, open PR) using repo conventions.

mflux-testing

filipstrand

Run tests in mflux (fast/slow/full), preserve image outputs, and handle golden image diffs safely.

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,8712,521

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,7991,653

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1491,640

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,2671,466

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,4611,222

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,955969

Related MCP Servers

Browse all servers

ROS MCP Server

Control any ROS1 or ROS2 robot with natural language using ROS MCP Server—AI-powered, code-free, real-time monitoring and debugging.

1,0630 tools

Axe Accessibility

Test website accessibility and ensure WCAG compliance with Axe Accessibility, a web accessibility checker with detailed reports and remediation guidance.

786 tools

Sentry

Integrate with Sentry to retrieve and analyze error reports and stacktraces, streamlining issue tracking and speeding up developer debugging workflows.

210 tools

Android Debug Bridge MCP

Control Android devices via Android Debug Bridge (ADB): automate testing, manage apps, capture screens, analyze UI, and simulate input with natural-language…

189 tools

SE Ranking

AI-friendly MCP server for SE Ranking: run natural-language SEO analysis to find lost or high-op keywords, compare competitors, and get actionable reports.

80 tools

Markitdown

Easily convert markdown to PDF using Markitdown MCP server. Supports HTTP, STDIO, and SSE for fast converting markdown to PDF workflows.

90,3881 tools

Install

mkdir -p .claude/skills/mflux-debugging && curl -L -o skill.zip "https://mcp.directory/api/skills/download/2707" && unzip -o skill.zip -d .claude/skills/mflux-debugging && rm skill.zip

Installs to .claude/skills/mflux-debugging

Stats

Views

Installs

Author

filipstrand

7 skills published

Links

Source Code

mflux-debugging

Install

About this skill

mflux debugging (MLX parity vs PyTorch/diffusers)

When to Use

Ground Rules (repo norms)

Default Workflow (export-then-compare)

Preferred workflow: two tiny scripts + inline dumps

1) Pick a single deterministic repro

2) Decide your checkpoints (what to dump)

3) Export artifacts from the PyTorch reference (no logic changes)

4) Run the MLX side with the same inputs and compare

Common Causes of Divergence (high-signal checklist)

Artifact Hygiene

See Also

More by filipstrand

mflux-manual-testing

mflux-cli

mflux-dev-env

mflux-model-porting

mflux-pr

mflux-testing

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers