Magenta RealTime 2: Open Live Music Model Guide

On this page - 18 sectionsv

Definition
Why it matters
Launch posts
Mental model
Quickstart
Apps and plugins
Controls
Latency
Hardware
Architecture
Developer paths
What we got wrong
Caveats
Workflows
Verdict
FAQ
Glossary
Sources

One-sentence definition

Magenta RealTime 2 is an open-weights live music model that generates 48kHz stereo audio while responding to MIDI, text, and audio controls fast enough to be played like an instrument.

Artifact

Answer first

Use MRT2 when you want an AI sound source inside a live music workflow: MIDI keyboard, DAW track, standalone jam app, creative controller, or custom C++/Python experiment. Keep conventional song generators in the stack when the task is a finished track from one prompt.

Why MRT2 matters

Most AI music demos still feel like batch jobs. You type a prompt, wait, and judge the rendered clip. MRT2 aims at a different interface: press keys, move a control, blend a prompt, and hear the model react while the session is still alive.

Google Magenta's technical post says the first Magenta RealTime model worked in chunks, which made control feel delayed. MRT2 moves to frame-level streaming, runs through an MLX-backed C++ engine on Apple Silicon, and ships with apps and Audio Unit plugins instead of only notebooks.

Model

230M / 2.4B

Small and base model sizes documented by Google and GitHub.

Frame size

40ms

Google says MRT2 moved from 2-second chunks to frame-level operation.

Output

48kHz

The apps require 48kHz stereo audio settings for playback.

Launch posts

The user-facing message is consistent across Google Gemma and Google Magenta: MRT2 is not only an open audio model. It is a playable instrument surface for live control. The official Magenta thread is more detailed; the Gemma thread is the broader ecosystem announcement.

Official launch post
Google Magenta introduces MRT2 as a live music model with open weights, an open source inference engine, apps, plugins, MIDI control, and low-latency MacBook playback.
Google Magenta introduces MRT2 as a live music model with open weights, an open source inference engine, apps, plugins, MIDI control, and low-latency MacBook playback.
— Google Magenta Project (@GoogleMagenta) June 4, 2026
Official launch post
Google Gemma frames MRT2 as an open model musicians can play as an instrument using MIDI, text, and audio on a MacBook.
Google Gemma frames MRT2 as an open model musicians can play as an instrument using MIDI, text, and audio on a MacBook.
— Google Gemma (@googlegemma) June 4, 2026
Official launch post
Google Gemma points readers to the MRT2 app and plugin download page.
Google Gemma points readers to the MRT2 app and plugin download page.
— Google Gemma (@googlegemma) June 4, 2026
Official launch post
Google Gemma sends musicians and developers to the Google Magenta project for more experiments.
Google Gemma sends musicians and developers to the Google Magenta project for more experiments.
— Google Gemma (@googlegemma) June 4, 2026

Mental model: five moving parts

MRT2 is easiest to understand as a stack. The public demos sit at the top, but the useful developer surface is the line between model, inference engine, and controls.

MIDI notes / text prompt / audio prompt
        |
        v
MusicCoCa style embedding + note/drum conditioning
        |
        v
Depthformer generates SpectroStream audio tokens
        |
        v
MLX-backed C++ streaming engine on Apple Silicon
        |
        v
Standalone app / AU plugin / custom instrument

MusicCoCa maps text and audio prompts into style embeddings. SpectroStream is the codec that turns audio into tokens and back. Depthformer is the transformer that generates the token stream. The C++ engine is what makes the real-time path practical on MacBook GPUs.

Smallest local run

For musicians, the shortest path is the Mac app bundle from the MRT2 apps page. For developers, the shortest reproducible path is the Python package and CLI. The docs use `uv`, Python 3.12, and the `magenta-rt` package.

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.12
source .venv/bin/activate

# Apple Silicon live path
uv pip install "magenta-rt[mlx]"

# Shared resources, then one streaming model
mrt models init
mrt models download

# Generate a four-second check clip
mrt mlx generate --prompt "disco funk" --duration 4.0 --model=mrt2_small

The GitHub installation docs say MLX is the live Apple Silicon path. JAX is still available for offline, batch, or research work, including Linux setups with the appropriate JAX wheel.

Apps and plugins

The MRT2 download page matters because it shows Google is not shipping only a model checkpoint. The bundle includes standalone apps, Audio Unit plugin support, and examples that demonstrate different interfaces for the same model.

MIDI steering

Hold a note or chord and the model generates accompaniment that follows the harmony.

Text-to-synth

Describe a playable instrument, such as a string ensemble or disco funk patch.

Audio cloning

Drop in a short audio reference and turn the sound into a playable source.

Prompt mixing

Move between text and audio prompts to explore hybrid styles.

Sound design

Combine musical prompts with noisy textures and modulate chaos over time.

Gesture control

Use MIDI, LFOs, camera gestures, Max, PureData, or SuperCollider style interfaces.

The Audio Unit plugin angle is the most practical part for DAW users. Google's install page tells users to register the plugin, refresh AU plugins in the DAW, and drag the plugin onto a MIDI track. That means the first serious tests will happen in normal music sessions, not only in research demos.

The controls are the product

MRT2 uses multiple control signals. Text and audio prompts steer style. MIDI notes steer pitch and harmony. Drum control can be used to suppress drums when the model needs to sit beside other tracks.

The technical post says note control is trained from audio and MIDI pairs, with MIDI labels inferred by MT3. It also describes two note modes: one where the model chooses attacks from held notes, and one where the user supplies onset timing. That is the difference between a model that follows a chord and a model that follows performance timing.

Artifact

A useful test prompt

Try three passes with the same MIDI chord progression: `disco funk`, `string ensemble`, and a short audio reference. Then listen for three things: whether the harmony follows the keys, whether the style changes without collapsing, and whether the attack timing feels playable.

Latency: the real product test

Google's own post says MRT2 moved from a 2-second frame in the first version to a 40ms frame in MRT2, with average control latency under 200ms once buffers and system overhead are counted. That is the central claim.

The engineering reason is frame-level autoregression. Instead of waiting for a large chunk boundary before new control information matters, MRT2 injects conditioning at each generation step. It also uses decoder-only streaming with sliding window attention so the model can continue over time without unbounded memory.

Generation unit	What changes	Why it matters
2-second chunks	Controls wait for the next chunk.	Feels like a delayed effect.
40ms frames	Controls can affect the next frame.	Starts to feel playable.

Hardware and model choice

Start with the small model. The official docs say `mrt2_small` runs real-time on any Apple Silicon Mac, including Air models. The base model is larger and higher quality, but it has a much tighter timing budget.

Path	Size	Hardware	Use
mrt2_small	230M	Any Apple Silicon Mac	Best first install
mrt2_base	2.4B	Pro/Max class Apple Silicon	Higher quality, higher risk
Offline JAX	Either	Linux/NVIDIA or CPU research path	Not the live plugin path

The app page also says first launch may download model weights: roughly 450MB for small and 2.5GB for base. Treat those numbers as install-size planning, not runtime memory guarantees.

Architecture notes for developers

MRT2 is a codec language model. A codec language model generates discrete audio tokens, then decodes those tokens back to audio. Google's technical appendix says SpectroStream compresses 48kHz stereo audio into token frames at 25Hz, with 12 tokens per frame.

The key design choice is not only tokenization. MRT2 uses a decoder-only architecture, local sliding window attention, and attention sink embeddings to keep continuous generation stable after old context is evicted. The team also dropped explicit positional embeddings for this path, relying on causal masking and sliding attention for length generalization.

Artifact

Where to inspect the system

`magenta_rt/` is the Python inference library with JAX and MLX backends.
`core/` is the C++ inference engine for streaming apps.
`examples/mrt2/auv3` is the all-in-one Audio Unit plugin.
`examples/mrt2/standalone` is the standalone macOS app.

Developer paths

There are three serious developer paths. Use the app bundle if you are evaluating musical feel. Use the Python package if you are testing prompts, tokens, and offline generation. Use the C++ engine if you are building an instrument or plugin where audio callback behavior matters.

# C++ app development path from the docs
uv pip install "cmake<3.28"

cmake . -B build
cmake --build build --target hello_mrt2 -j10

./build/examples/hello_mrt2/hello_mrt2 \
  ~/Documents/Magenta/magenta-rt-v2/models/mrt2_small/mrt2_small.mlxfn \
  ~/Documents/Magenta/magenta-rt-v2/resources \
  100 \
  --prompt "ambient pads with sub bass"

For MCP.Directory readers, the interesting next step is not an MCP server yet. It is a local agent workflow around the model: generate prompt sets, test presets, catalog useful controller mappings, and create repeatable DAW session templates.

What we got wrong at first

The first read makes MRT2 sound like another open model release. That misses the point. The model only becomes different when the control loop is short enough for a musician to react to it.

We also initially treated the MacBook requirement as a simple limitation. It is a limitation, but it is also an architecture choice. Google started with Apple Silicon because MLX gives the C++ engine a predictable local GPU target and many musicians already use MacBooks. That does not remove the need for Windows and Linux paths, but it explains the launch shape.

Caveats and early friction

The launch reaction is positive, but the caveats are concrete. X replies ask for Windows, Linux, API access, and broader DAW clarity. One reply to Google Magenta asked directly about Windows; the project account answered that a broader release would be useful, but they started with Apple Silicon because the live path needs a moderately powerful GPU and MacBooks are common among musicians.

Real-time claims need local benchmarking

GitHub issue #39 reports a case where `mrt2_base` on an M3 Max exceeded the 40ms frame budget in the official benchmark. Treat the hardware table as guidance, then measure your exact model, MLX version, DAW buffer, and background GPU load.

Do not assume it is a cloud API

The launch is about open weights, local apps, plugins, and code. If your product needs a hosted API, the official sources here do not present one for MRT2.

Start with musical latency, not demo novelty

A clip can sound impressive and still fail as an instrument. Test note-following, onset feel, drift, buffer underruns, and whether performers can predict how the model responds.

Three workflows to try

The best first workflows are small and measurable. Do not start with an album. Start with one repeatable controller setup.

Workflow 1

MIDI duet sketch

Put the AU plugin on a MIDI track, hold simple chord changes, and record how the ensemble follows. Use `mrt2_small` first, then compare base only if latency stays stable.

Workflow 2

Text-to-synth preset bank

Generate ten playable patches from short prompts. Score them on attack clarity, style match, noise, and whether they sit behind a vocal or lead instrument.

Workflow 3

Prompt-mixing controller

Map one knob or XY controller between two style prompts. Listen for useful transitions, not only endpoint quality. This is where MRT2 feels least like a prompt box.

Verdict

MRT2 is the most interesting when you judge it as a live instrument runtime. The open weights matter, but the bigger story is the control surface: MIDI, text, audio, MLX, C++, apps, and plugins all aimed at the same question: can a model be played?

The cautious answer is yes, for the right Mac and the right model size. The practical recommendation is simple: install the bundle, run the small model first, set audio to 48kHz, measure latency in your real DAW session, and only then decide whether the base model belongs in a live workflow.

FAQ

What is Magenta RealTime 2?

Magenta RealTime 2 is an open-weights live music model from Google Magenta. It generates 48kHz stereo audio in real time and can be controlled with MIDI, text prompts, and audio prompts.

Is Magenta RealTime 2 open source?

The magenta-realtime code repository is Apache 2.0, and Google describes MRT2 as an open-weights model. Check the model card and license terms before commercial redistribution or training workflows.

What hardware does MRT2 need?

The small 230M model is documented for real-time streaming on Apple Silicon Macs, including Air models. The base 2.4B model is higher quality and needs stronger Pro or Max class Apple Silicon for real-time streaming.

Does MRT2 work on Windows or Linux?

The live apps and plugins are Mac-first because the streaming engine uses MLX on Apple Silicon. The Python library also exposes JAX for offline and research generation on other hardware, but the launch materials do not present Windows live apps.

Can I use MRT2 inside a DAW?

Yes. The app bundle includes Audio Unit plugin support for DAWs, plus standalone apps and examples. The official install page says to register the plugin, refresh AU plugins in the DAW, and place it on a MIDI track.

What is the main early caveat?

Latency is the real product test. The official docs describe sub-200ms control latency, but an early GitHub issue reports a case where the base model missed the 40ms frame budget on an M3 Max benchmark setup.

Glossary

Codec language model

A model that predicts compressed audio tokens, then decodes them into sound.

MLX

Apple Silicon machine-learning runtime used by MRT2's streaming engine.

Audio Unit

Apple's plugin format for DAWs such as Logic and other macOS music tools.

Open weights

Public model parameters that developers can download and run under the model terms.

Sources

This post uses first-party launch material, official Google Magenta docs, the GitHub repository, GitHub issues and PRs, Reddit discussion, and X reactions. Claims about hardware, install flow, features, and architecture are tied to the sources below.

Keep reading

Open model

One-sentence definition

Answer first

Why MRT2 matters

Launch posts

Mental model: five moving parts

Smallest local run

Apps and plugins

MIDI steering

Text-to-synth

Audio cloning

Prompt mixing

Sound design

Gesture control

The controls are the product

A useful test prompt

Latency: the real product test

Hardware and model choice

Architecture notes for developers

Where to inspect the system

Developer paths

What we got wrong at first

Caveats and early friction

Real-time claims need local benchmarking

Do not assume it is a cloud API

Start with musical latency, not demo novelty

Three workflows to try

MIDI duet sketch

Text-to-synth preset bank

Prompt-mixing controller

Verdict

FAQ

What is Magenta RealTime 2?

Is Magenta RealTime 2 open source?

What hardware does MRT2 need?

Does MRT2 work on Windows or Linux?

Can I use MRT2 inside a DAW?

What is the main early caveat?

Glossary

Codec language model

MLX

Audio Unit

Open weights

Sources

Ideogram 4.0 guide

Runway MCP complete guide

Browse MCP.Directory skills