Updated June 2026Developer guide15 min read

Magenta RealTime 2: Open Live Music Model Guide

Magenta RealTime 2 is Google Magenta's open-weights live music model for MacBook musicians, audio developers, and people building interactive AI instruments. This guide turns the launch posts, official MRT2 app page, technical blog, GitHub repo, Reddit notes, and early issues into a practical map.

Abstract live music model workflow with MIDI notes, waveform lanes, an audio engine core, and streaming output arcs.
On this page - 18 sectionsv
  1. Definition
  2. Why it matters
  3. Launch posts
  4. Mental model
  5. Quickstart
  6. Apps and plugins
  7. Controls
  8. Latency
  9. Hardware
  10. Architecture
  11. Developer paths
  12. What we got wrong
  13. Caveats
  14. Workflows
  15. Verdict
  16. FAQ
  17. Glossary
  18. Sources

One-sentence definition

Magenta RealTime 2 is an open-weights live music model that generates 48kHz stereo audio while responding to MIDI, text, and audio controls fast enough to be played like an instrument.

Artifact

Answer first

Use MRT2 when you want an AI sound source inside a live music workflow: MIDI keyboard, DAW track, standalone jam app, creative controller, or custom C++/Python experiment. Keep conventional song generators in the stack when the task is a finished track from one prompt.

Why MRT2 matters

Most AI music demos still feel like batch jobs. You type a prompt, wait, and judge the rendered clip. MRT2 aims at a different interface: press keys, move a control, blend a prompt, and hear the model react while the session is still alive.

Google Magenta's technical post says the first Magenta RealTime model worked in chunks, which made control feel delayed. MRT2 moves to frame-level streaming, runs through an MLX-backed C++ engine on Apple Silicon, and ships with apps and Audio Unit plugins instead of only notebooks.

Model

230M / 2.4B

Small and base model sizes documented by Google and GitHub.

Frame size

40ms

Google says MRT2 moved from 2-second chunks to frame-level operation.

Output

48kHz

The apps require 48kHz stereo audio settings for playback.

Launch posts

The user-facing message is consistent across Google Gemma and Google Magenta: MRT2 is not only an open audio model. It is a playable instrument surface for live control. The official Magenta thread is more detailed; the Gemma thread is the broader ecosystem announcement.

  1. Official launch post

    Google Magenta introduces MRT2 as a live music model with open weights, an open source inference engine, apps, plugins, MIDI control, and low-latency MacBook playback.

  2. Official launch post

    Google Gemma frames MRT2 as an open model musicians can play as an instrument using MIDI, text, and audio on a MacBook.

  3. Official launch post

    Google Gemma points readers to the MRT2 app and plugin download page.

  4. Official launch post

    Google Gemma sends musicians and developers to the Google Magenta project for more experiments.

Mental model: five moving parts

MRT2 is easiest to understand as a stack. The public demos sit at the top, but the useful developer surface is the line between model, inference engine, and controls.

MIDI notes / text prompt / audio prompt
        |
        v
MusicCoCa style embedding + note/drum conditioning
        |
        v
Depthformer generates SpectroStream audio tokens
        |
        v
MLX-backed C++ streaming engine on Apple Silicon
        |
        v
Standalone app / AU plugin / custom instrument

MusicCoCa maps text and audio prompts into style embeddings. SpectroStream is the codec that turns audio into tokens and back. Depthformer is the transformer that generates the token stream. The C++ engine is what makes the real-time path practical on MacBook GPUs.

Smallest local run

For musicians, the shortest path is the Mac app bundle from the MRT2 apps page. For developers, the shortest reproducible path is the Python package and CLI. The docs use `uv`, Python 3.12, and the `magenta-rt` package.

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.12
source .venv/bin/activate

# Apple Silicon live path
uv pip install "magenta-rt[mlx]"

# Shared resources, then one streaming model
mrt models init
mrt models download

# Generate a four-second check clip
mrt mlx generate --prompt "disco funk" --duration 4.0 --model=mrt2_small

The GitHub installation docs say MLX is the live Apple Silicon path. JAX is still available for offline, batch, or research work, including Linux setups with the appropriate JAX wheel.

Apps and plugins

The MRT2 download page matters because it shows Google is not shipping only a model checkpoint. The bundle includes standalone apps, Audio Unit plugin support, and examples that demonstrate different interfaces for the same model.

MIDI steering

Hold a note or chord and the model generates accompaniment that follows the harmony.

Text-to-synth

Describe a playable instrument, such as a string ensemble or disco funk patch.

Audio cloning

Drop in a short audio reference and turn the sound into a playable source.

Prompt mixing

Move between text and audio prompts to explore hybrid styles.

Sound design

Combine musical prompts with noisy textures and modulate chaos over time.

Gesture control

Use MIDI, LFOs, camera gestures, Max, PureData, or SuperCollider style interfaces.

The Audio Unit plugin angle is the most practical part for DAW users. Google's install page tells users to register the plugin, refresh AU plugins in the DAW, and drag the plugin onto a MIDI track. That means the first serious tests will happen in normal music sessions, not only in research demos.

The controls are the product

MRT2 uses multiple control signals. Text and audio prompts steer style. MIDI notes steer pitch and harmony. Drum control can be used to suppress drums when the model needs to sit beside other tracks.

The technical post says note control is trained from audio and MIDI pairs, with MIDI labels inferred by MT3. It also describes two note modes: one where the model chooses attacks from held notes, and one where the user supplies onset timing. That is the difference between a model that follows a chord and a model that follows performance timing.

Artifact

A useful test prompt

Try three passes with the same MIDI chord progression: `disco funk`, `string ensemble`, and a short audio reference. Then listen for three things: whether the harmony follows the keys, whether the style changes without collapsing, and whether the attack timing feels playable.

Latency: the real product test

Google's own post says MRT2 moved from a 2-second frame in the first version to a 40ms frame in MRT2, with average control latency under 200ms once buffers and system overhead are counted. That is the central claim.

The engineering reason is frame-level autoregression. Instead of waiting for a large chunk boundary before new control information matters, MRT2 injects conditioning at each generation step. It also uses decoder-only streaming with sliding window attention so the model can continue over time without unbounded memory.

Generation unitWhat changesWhy it matters
2-second chunksControls wait for the next chunk.Feels like a delayed effect.
40ms framesControls can affect the next frame.Starts to feel playable.

Hardware and model choice

Start with the small model. The official docs say `mrt2_small` runs real-time on any Apple Silicon Mac, including Air models. The base model is larger and higher quality, but it has a much tighter timing budget.

PathSizeHardwareUse
mrt2_small230MAny Apple Silicon MacBest first install
mrt2_base2.4BPro/Max class Apple SiliconHigher quality, higher risk
Offline JAXEitherLinux/NVIDIA or CPU research pathNot the live plugin path

The app page also says first launch may download model weights: roughly 450MB for small and 2.5GB for base. Treat those numbers as install-size planning, not runtime memory guarantees.

Architecture notes for developers

MRT2 is a codec language model. A codec language model generates discrete audio tokens, then decodes those tokens back to audio. Google's technical appendix says SpectroStream compresses 48kHz stereo audio into token frames at 25Hz, with 12 tokens per frame.

The key design choice is not only tokenization. MRT2 uses a decoder-only architecture, local sliding window attention, and attention sink embeddings to keep continuous generation stable after old context is evicted. The team also dropped explicit positional embeddings for this path, relying on causal masking and sliding attention for length generalization.

Artifact

Where to inspect the system

  • `magenta_rt/` is the Python inference library with JAX and MLX backends.
  • `core/` is the C++ inference engine for streaming apps.
  • `examples/mrt2/auv3` is the all-in-one Audio Unit plugin.
  • `examples/mrt2/standalone` is the standalone macOS app.

Developer paths

There are three serious developer paths. Use the app bundle if you are evaluating musical feel. Use the Python package if you are testing prompts, tokens, and offline generation. Use the C++ engine if you are building an instrument or plugin where audio callback behavior matters.

# C++ app development path from the docs
uv pip install "cmake<3.28"

cmake . -B build
cmake --build build --target hello_mrt2 -j10

./build/examples/hello_mrt2/hello_mrt2 \
  ~/Documents/Magenta/magenta-rt-v2/models/mrt2_small/mrt2_small.mlxfn \
  ~/Documents/Magenta/magenta-rt-v2/resources \
  100 \
  --prompt "ambient pads with sub bass"

For MCP.Directory readers, the interesting next step is not an MCP server yet. It is a local agent workflow around the model: generate prompt sets, test presets, catalog useful controller mappings, and create repeatable DAW session templates.

What we got wrong at first

The first read makes MRT2 sound like another open model release. That misses the point. The model only becomes different when the control loop is short enough for a musician to react to it.

We also initially treated the MacBook requirement as a simple limitation. It is a limitation, but it is also an architecture choice. Google started with Apple Silicon because MLX gives the C++ engine a predictable local GPU target and many musicians already use MacBooks. That does not remove the need for Windows and Linux paths, but it explains the launch shape.

Caveats and early friction

The launch reaction is positive, but the caveats are concrete. X replies ask for Windows, Linux, API access, and broader DAW clarity. One reply to Google Magenta asked directly about Windows; the project account answered that a broader release would be useful, but they started with Apple Silicon because the live path needs a moderately powerful GPU and MacBooks are common among musicians.

Real-time claims need local benchmarking

GitHub issue #39 reports a case where `mrt2_base` on an M3 Max exceeded the 40ms frame budget in the official benchmark. Treat the hardware table as guidance, then measure your exact model, MLX version, DAW buffer, and background GPU load.

Do not assume it is a cloud API

The launch is about open weights, local apps, plugins, and code. If your product needs a hosted API, the official sources here do not present one for MRT2.

Start with musical latency, not demo novelty

A clip can sound impressive and still fail as an instrument. Test note-following, onset feel, drift, buffer underruns, and whether performers can predict how the model responds.

Three workflows to try

The best first workflows are small and measurable. Do not start with an album. Start with one repeatable controller setup.

Workflow 1

MIDI duet sketch

Put the AU plugin on a MIDI track, hold simple chord changes, and record how the ensemble follows. Use `mrt2_small` first, then compare base only if latency stays stable.

Workflow 2

Text-to-synth preset bank

Generate ten playable patches from short prompts. Score them on attack clarity, style match, noise, and whether they sit behind a vocal or lead instrument.

Workflow 3

Prompt-mixing controller

Map one knob or XY controller between two style prompts. Listen for useful transitions, not only endpoint quality. This is where MRT2 feels least like a prompt box.

Verdict

MRT2 is the most interesting when you judge it as a live instrument runtime. The open weights matter, but the bigger story is the control surface: MIDI, text, audio, MLX, C++, apps, and plugins all aimed at the same question: can a model be played?

The cautious answer is yes, for the right Mac and the right model size. The practical recommendation is simple: install the bundle, run the small model first, set audio to 48kHz, measure latency in your real DAW session, and only then decide whether the base model belongs in a live workflow.

FAQ

What is Magenta RealTime 2?

Magenta RealTime 2 is an open-weights live music model from Google Magenta. It generates 48kHz stereo audio in real time and can be controlled with MIDI, text prompts, and audio prompts.

Is Magenta RealTime 2 open source?

The magenta-realtime code repository is Apache 2.0, and Google describes MRT2 as an open-weights model. Check the model card and license terms before commercial redistribution or training workflows.

What hardware does MRT2 need?

The small 230M model is documented for real-time streaming on Apple Silicon Macs, including Air models. The base 2.4B model is higher quality and needs stronger Pro or Max class Apple Silicon for real-time streaming.

Does MRT2 work on Windows or Linux?

The live apps and plugins are Mac-first because the streaming engine uses MLX on Apple Silicon. The Python library also exposes JAX for offline and research generation on other hardware, but the launch materials do not present Windows live apps.

Can I use MRT2 inside a DAW?

Yes. The app bundle includes Audio Unit plugin support for DAWs, plus standalone apps and examples. The official install page says to register the plugin, refresh AU plugins in the DAW, and place it on a MIDI track.

What is the main early caveat?

Latency is the real product test. The official docs describe sub-200ms control latency, but an early GitHub issue reports a case where the base model missed the 40ms frame budget on an M3 Max benchmark setup.

Glossary

Codec language model

A model that predicts compressed audio tokens, then decodes them into sound.

MLX

Apple Silicon machine-learning runtime used by MRT2's streaming engine.

Audio Unit

Apple's plugin format for DAWs such as Logic and other macOS music tools.

Open weights

Public model parameters that developers can download and run under the model terms.

Sources

This post uses first-party launch material, official Google Magenta docs, the GitHub repository, GitHub issues and PRs, Reddit discussion, and X reactions. Claims about hardware, install flow, features, and architecture are tied to the sources below.