Back to all posts

Microsoft MAI Coding Models Explained

At Build 2026 on June 2, Microsoft announced a family of seven in-house MAI models — and two of them matter directly to anyone who writes code. MAI-Code-1-Flash is Microsoft’s first home-grown coding model, now selectable inside GitHub Copilot. MAI-Thinking-1 is its first reasoning model. Both were trained, in Microsoft’s words, “without distillation from third-party models.” This piece walks through what each model is, why Microsoft building its own models (rather than only shipping OpenAI’s) is the real story, how MAI-Code-1-Flash slots into the Copilot model picker, and what it changes for a developer choosing a coding model in 2026 — using only the numbers Microsoft actually published.

June 22, 2026 ~12 min read2,650 words
Editorial illustration: a teal Microsoft-blue circuit glyph shaped like a terminal cursor blinking inside a GitHub Copilot model-picker pane, flanked by two concentric rings labeled MAI-Code and MAI-Thinking, set against a deep slate backdrop with a faint hill-climbing gradient rising from lower-left to upper-right.

MAI models in 60 seconds

  • Announced June 2, 2026 at Build 2026 by Microsoft AI (MAI), the division led by Mustafa Suleyman — a family of seven new in-house models.
  • MAI-Code-1-Flash is a small-tier coding model, rolling into the GitHub Copilot model picker in VS Code (and the default auto-picker).
  • MAI-Thinking-1 is Microsoft’s first reasoning model — a sparse Mixture-of-Experts model with a 256k context window, in private preview on Microsoft Foundry.
  • Trained “without distillation from third-party models,” Microsoft says — its pitch is owning the full stack, from architecture to post-training, on traceable data.
  • The strategic signal is independence: Microsoft now has frontier-class coding and reasoning models it built itself, not only the OpenAI models it licenses.
  • No published per-token price for MAI-Code-1-Flash; it shipped inside existing Copilot plans.

Why this matters for agent builders

MAI-Code-1-Flash was trained directly on Copilot’s agentic harness — its file-editing tools, terminal integrations, and multi-step task loops. That makes it a data point in a broader 2026 trend: coding models fitted to a specific tool surface rather than sold as general-purpose. Microsoft also says the MAI family will list on OpenRouter, Fireworks, and Baseten, so the models are reachable outside Copilot too.

What Microsoft actually shipped at Build 2026

On June 2, 2026, Microsoft AI published a post titled “Building a hill-climbing machine: Launching seven new MAI models.” The seven, per microsoft.ai/news: MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5, MAI-Image-2.5-Flash, MAI-Transcribe-1.5, MAI-Voice-2, and MAI-Voice-2-Flash.

Two of those are squarely developer-facing — the coding model and the reasoning model — and they’re the focus here. The common thread Microsoft kept returning to is provenance. Mustafa Suleyman’s framing on the launch post: “We don’t distill from other labs and we don’t rely on opaque data. Our datasets are clean, traceable, and enterprise-grade.” The “hill-climbing machine” metaphor is his pitch for why Microsoft is doing this at all: “Our job at MAI is to help you do this — to push the frontier, and to build a hill-climbing machine to keep you at the frontier.”

Read past the marketing and the substantive claim is narrow but real: these are models Microsoft built end-to-end — architecture, training pipeline, post-training — rather than fine-tuning or wrapping a partner’s model. Whether the quality matches the framing is a separate question, and one the published benchmarks only partly answer.

MAI-Code-1-Flash: the Copilot-native coding model

MAI-Code-1-Flash is the headline for developers. Microsoft describes it on microsoft.ai/news/introducingmai-code-1-flash as a fast, efficient coding model trained “from the ground up on clean, traceable and enterprise-grade data, without distillation from third-party models.” The detail that makes it distinct: it was trained directly with the GitHub Copilot harnesses used in production — the same file editing tools, terminal integrations, and multi-step task loops a Copilot agent runs.

Microsoft’s efficiency claim is the one worth quoting precisely: it says the model solves “harder problems with up to 60% fewer tokens” on SWE-Bench Verified, attributing this to “adaptive solution length control, which helps the model adjust the depth of its response.” For a coding assistant billed by request and latency, fewer tokens per solved task is a meaningful efficiency lever — if the claim holds up on your own code.

On benchmarks, Microsoft frames MAI-Code-1-Flash against Claude Haiku 4.5 specifically — Anthropic’s small, fast tier, not Sonnet or Opus. Microsoft’s published numbers: SWE-Bench Pro at 51.2% vs. 35.2% (a “+16-point lead”), higher pass rates on SWE-Bench Verified, SWE-Bench Multilingual and Terminal Bench 2, and a +28.9-point margin on IF Bench (instruction following). On an adversarial-reasoning benchmark it cites 85.8% adjusted accuracy. Microsoft did not disclose the parameter count, context window, or a price in the launch post.

Read the comparison set carefully

Every benchmark delta above is Microsoft’s own, measured against Claude Haiku 4.5 — a small model in the same weight class, not a frontier flagship. There is no third-party reproduction yet. The honest read: MAI-Code-1-Flash looks strong for a small Copilot-tuned model; it is not positioned as a Claude Opus or GPT-class generalist, and Microsoft doesn’t claim it is.

MAI-Thinking-1: the reasoning model behind it

The companion release is MAI-Thinking-1, Microsoft AI’s first reasoning model. Per microsoft.ai/news/introducing-mai-thinking-1, it is a sparse Mixture-of-Experts model with roughly 35B active parameters out of about 1 trillion total, and a 256k-token context window (Microsoft frames that as enough for a 600-page document). Microsoft calls it “a medium-sized model that stands among the strongest models in its weight class.”

The reported figures: AIME 2025 at 97.0% and AIME 2026 at 94.5% on math; Microsoft says it goes “toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro” on coding; and it cites a blind human side-by-side study across 1,276 tasks where users preferred MAI-Thinking-1 over Claude Sonnet 4.6. Like the coding model, it was trained “without distillation from third party models.” It launched in private preview on Microsoft Foundry, with a public MAI Playground preview promised.

Why include the reasoning model in a piece about coding? Two reasons. First, MAI-Thinking-1 is the clearest evidence that Microsoft’s in-house effort reaches the frontier, not just the small-and-fast tier — its strongest claims are against Opus and Sonnet, not Haiku. Second, the two models share a training philosophy and lineage; MAI-Code-1-Flash is the Copilot-shaped, latency-optimized sibling of the same in-house program.

Why Microsoft building its own models matters

Microsoft has been the most prominent face of the OpenAI partnership for years — Copilot, Azure OpenAI, and consumer Copilot all lean on GPT models. MAI doesn’t end that; Microsoft still ships GPT across its products. What changed at Build 2026 is that Microsoft now has its own frontier-class coding and reasoning models running in the same surfaces.

That independence buys Microsoft a few concrete things. It controls the training data and can state its provenance (“clean, traceable, enterprise-grade”) — which matters for enterprise customers nervous about data licensing. It controls the cost structure rather than paying a partner per token. And, most relevant to developers, it can tune a modelagainst the exact harness it ships in — MAI-Code-1-Flash was trained on Copilot’s own agentic tooling, something you can only do when you own the model.

The skeptical read is equally fair. A first in-house coding model that benchmarks against a small competitor model, with no disclosed parameters or price and no third-party reproduction, is a strategic flag in the ground more than a category-resetting release. The right framing is directional: Microsoft is signaling it intends to compete on its own models — and the “first in a new wave of purpose-built coding models,” per GitHub’s own changelog, implies more are coming.

How MAI-Code-1-Flash fits into GitHub Copilot

The distribution story is GitHub Copilot. Per GitHub’s June 2 changelog, MAI-Code-1-Flash appears in the Copilot model picker in VS Code and can be chosen by the default auto-picker — no setup required. GitHub describes it as Microsoft’s “latest small-tier coding model” that delivers “best-in-class quality for its size” and is “well-suited for lightweight coding workflows,” explicitly “designed and tuned specifically for GitHub Copilot.”

It began rolling out to Copilot Free, Student, Pro, Pro+, and Max plans, starting with a limited set of users and expanding “gradually over the coming weeks.” A follow-up June 18 changelog widened the surfaces to Copilot CLI, the Copilot app, Copilot Chat on GitHub, Visual Studio, GitHub Mobile, JetBrains IDEs, Eclipse, and Xcode; it noted Enterprise and Business access was still forthcoming.

One thing to be clear about: this does not mean Copilot dropped Claude, GPT, or Gemini. MAI-Code-1-Flash is one more option in a multi-model picker. The interesting shift is that Microsoft’s own model is now the default-eligible small-tier choice in its own product — and GitHub did not publish a premium-request multiplier for it in the launch materials, so check the Copilot model documentation for the authoritative cost.

If you wire Copilot or other coding agents into MCP tooling, the relevant catalog entries below are where to start. The MAI family is also slated to list on OpenRouter, so a single OpenRouter integration can reach these models alongside others.

How it compares to the field (only verified claims)

There is no neutral third-party head-to-head on MAI-Code-1-Flash yet, so this section sticks to attributed, verified statements and points you to our existing comparisons rather than re-deriving benchmarks.

  • vs. Claude (Anthropic) — Microsoft’s coding-model claims are against Claude Haiku 4.5 (the small tier), not Sonnet or Opus; for reasoning, MAI-Thinking-1 claims parity with Opus 4.6 on SWE-Bench Pro and a blind preference over Sonnet 4.6. All Microsoft-published. Match the tier to the comparison before reading anything into it.
  • vs. GPT (OpenAI) — Microsoft published no head-to-head against GPT models, even though Copilot ships both. MAI is the in-house alternative track, not a stated GPT replacement. The difference for developers is ownership and Copilot-specific tuning, not a claimed capability gap.
  • vs. Gemini (Google) — no Microsoft comparison published. For the current Gemini small/fast model, see our Gemini 3.5 Flash explainer, which walks through Google’s own published benchmarks the same way.
  • vs. dedicated coding tools — if you’re weighing Copilot’s model picker against other autocomplete engines, our Cursor Tab vs Copilot vs Codeium vs Tabnine vs Cody comparison covers the tool layer that sits above whichever model you pick.

For the broader assistant landscape — chat-first agents across Mistral, ChatGPT, Claude, and Gemini — our Le Chat vs ChatGPT vs Claude vs Gemini comparison is the better reference than re-listing numbers here.

What it means for choosing a coding model in 2026

A few practical reads if you’re deciding what to point your editor at.

1. If you live in GitHub Copilot, try it where it already is. MAI-Code-1-Flash is in the model picker with no setup, and the auto-picker may already route some of your requests to it. The cheapest evaluation is to select it on a few real tasks and compare against whatever you use now. The 60%-fewer-tokens claim, if it holds for your codebase, is a latency and quota win.

2. Treat the benchmarks as a starting hypothesis, not a verdict. They’re vendor-published, against a small competitor model, with no third-party reproduction. The model is positioned as small-tier and lightweight — good for fast everyday assistance, not advertised as your model for the hardest agentic builds.

3. Match the model tier to the task. MAI-Code-1-Flash is the fast, cheap default; for harder reasoning, the same lineage’s MAI-Thinking-1 (or a frontier Claude/GPT/Gemini model) is the heavier tool. This is the same routing logic our model-routing guide lays out — cheap model for the inner loop, heavy model for the hard calls.

4. Watch for what comes next. GitHub called MAI-Code-1-Flash “the first in a new wave of purpose-built coding models from Microsoft.” If you’re making a long-term tooling bet, the signal is that Microsoft intends to keep shipping in-house coding models — so today’s small-tier flash model is a floor, not a ceiling.

FAQ

What is Microsoft MAI?

MAI is Microsoft AI, the model-building division led by Mustafa Suleyman. At Build 2026 (June 2, 2026) it announced a family of seven in-house models — including MAI-Thinking-1 (a reasoning model) and MAI-Code-1-Flash (a coding model). Microsoft's framing on microsoft.ai/news: "We don't distill from other labs and we don't rely on opaque data. Our datasets are clean, traceable, and enterprise-grade." These are the first MAI frontier models trained end-to-end by Microsoft rather than wrapping or fine-tuning a partner's model.

What is MAI-Code-1-Flash?

MAI-Code-1-Flash is Microsoft's first in-house coding model, announced June 2, 2026 at Build 2026 and rolling into GitHub Copilot. GitHub describes it as Microsoft's "latest small-tier coding model" that delivers "best-in-class quality for its size" and is "Designed and tuned specifically for GitHub Copilot." Microsoft says it was trained "from the ground up on clean, traceable and enterprise-grade data, without distillation from third-party models," directly on the Copilot agentic harnesses (file-editing tools, terminal integrations, multi-step task loops) developers use in production.

Does GitHub Copilot use Microsoft's own model now?

Partly. As of June 2026, MAI-Code-1-Flash is one selectable model inside GitHub Copilot's model picker in VS Code, and it can be chosen by the default auto-picker. It is not the only model — Copilot still offers Claude, GPT, and Gemini options. GitHub's changelog says it began rolling out to Copilot Free, Student, Pro, Pro+, and Max plans, starting with a limited set of users and expanding "gradually over the coming weeks." A June 18 update added Copilot CLI, the Copilot app, Copilot Chat on GitHub, Visual Studio, GitHub Mobile, JetBrains, Eclipse, and Xcode; Enterprise and Business access was listed as forthcoming.

What is MAI-Thinking-1?

MAI-Thinking-1 is Microsoft AI's first reasoning model, also announced June 2, 2026. Per microsoft.ai/news, it is a sparse Mixture-of-Experts model with roughly 35B active parameters out of about 1 trillion total, and a 256k-token context window. Microsoft reports AIME 2025 at 97.0% and AIME 2026 at 94.5%, says it goes "toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro," and cites a blind side-by-side study (1,276 tasks) where users preferred it over Claude Sonnet 4.6. It launched in private preview on Microsoft Foundry. These are Microsoft's own published figures.

Is MAI-Code-1-Flash better than Claude?

Microsoft's own benchmarks claim wins against Claude Haiku 4.5 specifically — note the comparison is to Anthropic's small/fast tier, not Sonnet or Opus. Microsoft reports SWE-Bench Pro at 51.2% vs. 35.2% for Haiku 4.5 (a "+16-point lead"), higher pass rates on SWE-Bench Verified, SWE-Bench Multilingual and Terminal Bench 2, and large instruction-following margins (IF Bench +28.9). These are vendor-published numbers with no third-party reproduction yet, and they compare a Copilot-native coding model to a general-purpose small model. Treat them as Microsoft's framing and run your own task before deciding.

How is MAI different from GPT — and why does Microsoft need its own models?

Microsoft remains a major OpenAI partner and still ships GPT models across Copilot. MAI is the in-house track Microsoft built in parallel — models it owns end-to-end (architecture, training pipeline, post-training) rather than licensing. The stated goal in the Build 2026 "hill-climbing machine" post is long-term self-sufficiency: control over data provenance, cost, and the ability to tune a model directly against Copilot's harness. So the practical difference for developers isn't GPT vs. MAI on capability so much as who owns the model and how tightly it's fitted to Copilot.

How much does MAI-Code-1-Flash cost in GitHub Copilot?

Microsoft and GitHub did not publish a per-token price or a Copilot premium-request multiplier for MAI-Code-1-Flash in the launch materials. It rolled out inside existing Copilot plans (Free, Student, Pro, Pro+, Max) rather than as a separately priced add-on. For an authoritative number, check GitHub's Copilot model documentation and pricing page — that is where any request multiplier would be published. Do not treat any third-party estimate as an official rate.

Where can I use the MAI models?

MAI-Code-1-Flash lives inside GitHub Copilot (VS Code first, then Copilot CLI, the Copilot app, Copilot Chat on GitHub, Visual Studio, JetBrains, Eclipse, Xcode and GitHub Mobile). MAI-Thinking-1 launched in private preview on Microsoft Foundry with a public MAI Playground preview promised. Microsoft also said the broader MAI family would be available through Microsoft Foundry plus third-party platforms OpenRouter, Fireworks, and Baseten — so you can reach the models outside the Microsoft stack as those listings go live.

Sources

Found an issue?

If something here is out of date — a new benchmark, a published price, a new Copilot surface — email [email protected] or read more on our about page. We keep these guides current.