slime-rl-training
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
Install
mkdir -p .claude/skills/slime-rl-training && curl -L -o skill.zip "https://mcp.directory/api/skills/download/7236" && unzip -o skill.zip -d .claude/skills/slime-rl-training && rm skill.zipInstalls to .claude/skills/slime-rl-training
About this skill
slime: LLM Post-Training Framework for RL Scaling
slime is an LLM post-training framework from Tsinghua's THUDM team, powering GLM-4.5, GLM-4.6, and GLM-4.7. It connects Megatron-LM for training with SGLang for high-throughput rollout generation.
When to Use slime
Choose slime when you need:
- Megatron-LM native training with SGLang inference
- Custom data generation workflows with flexible data buffers
- Training GLM, Qwen3, DeepSeek V3, or Llama 3 models
- Research-grade framework with production backing (Z.ai)
Consider alternatives when:
- You need enterprise-grade stability features → use miles
- You want flexible backend swapping → use verl
- You need PyTorch-native abstractions → use torchforge
Key Features
- Training: Megatron-LM with full parallelism support (TP, PP, DP, SP)
- Rollout: SGLang-based high-throughput generation with router
- Data Buffer: Flexible prompt management and sample storage
- Models: GLM-4.x, Qwen3, DeepSeek V3/R1, Llama 3
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Data Buffer │
│ - Prompt initialization and management │
│ - Custom data generation and filtering │
│ - Rollout sample storage │
└─────────────┬───────────────────────────┬───────────────┘
│ │
┌─────────────▼───────────┐ ┌─────────────▼───────────────┐
│ Training (Megatron-LM) │ │ Rollout (SGLang + Router) │
│ - Actor model training │ │ - Response generation │
│ - Critic (optional) │ │ - Reward/verifier output │
│ - Weight sync to rollout│ │ - Multi-turn support │
└─────────────────────────┘ └─────────────────────────────┘
Installation
# Recommended: Docker
docker pull slimerl/slime:latest
docker run --rm --gpus all --ipc=host --shm-size=16g \
-it slimerl/slime:latest /bin/bash
# Inside container
cd /root/slime && pip install -e . --no-deps
From Source
git clone https://github.com/THUDM/slime.git
cd slime
pip install -r requirements.txt
pip install -e .
Quick Start: GRPO Training
# Source model configuration
source scripts/models/qwen3-4B.sh
# Launch training
python train.py \
--actor-num-nodes 1 \
--actor-num-gpus-per-node 4 \
--rollout-num-gpus 4 \
--advantage-estimator grpo \
--use-kl-loss --kl-loss-coef 0.001 \
--rollout-batch-size 32 \
--n-samples-per-prompt 8 \
--global-batch-size 256 \
--num-rollout 3000 \
--prompt-data /path/to/data.jsonl \
${MODEL_ARGS[@]} ${CKPT_ARGS[@]}
Workflow 1: Standard GRPO Training
Use this workflow for training reasoning models with group-relative advantages.
Prerequisites Checklist
- Docker environment or Megatron-LM + SGLang installed
- Model checkpoint (HuggingFace or Megatron format)
- Training data in JSONL format
Step 1: Prepare Data
# data.jsonl format
{"prompt": "What is 2 + 2?", "label": "4"}
{"prompt": "Solve: 3x = 12", "label": "x = 4"}
Or with chat format:
{
"prompt": [
{"role": "system", "content": "You are a math tutor."},
{"role": "user", "content": "What is 15 + 27?"}
],
"label": "42"
}
Step 2: Configure Model
Choose a pre-configured model script:
# List available models
ls scripts/models/
# glm4-9B.sh, qwen3-4B.sh, qwen3-30B-A3B.sh, deepseek-v3.sh, llama3-8B.sh, ...
# Source your model
source scripts/models/qwen3-4B.sh
Step 3: Launch Training
python train.py \
--actor-num-nodes 1 \
--actor-num-gpus-per-node 8 \
--rollout-num-gpus 8 \
--advantage-estimator grpo \
--use-kl-loss \
--kl-loss-coef 0.001 \
--prompt-data /path/to/train.jsonl \
--input-key prompt \
--label-key label \
--apply-chat-template \
--rollout-batch-size 32 \
--n-samples-per-prompt 8 \
--global-batch-size 256 \
--num-rollout 3000 \
--save-interval 100 \
--eval-interval 50 \
${MODEL_ARGS[@]}
Step 4: Monitor Training
- Check TensorBoard:
tensorboard --logdir outputs/ - Verify reward curves are increasing
- Monitor GPU utilization across nodes
Workflow 2: Asynchronous Training
Use async mode for higher throughput by overlapping rollout and training.
When to Use Async
- Large models with long generation times
- High GPU idle time in synchronous mode
- Sufficient memory for buffering
Launch Async Training
python train_async.py \
--actor-num-nodes 1 \
--actor-num-gpus-per-node 8 \
--rollout-num-gpus 8 \
--advantage-estimator grpo \
--async-buffer-size 4 \
--prompt-data /path/to/train.jsonl \
${MODEL_ARGS[@]}
Async-Specific Parameters
--async-buffer-size 4 # Number of rollouts to buffer
--update-weights-interval 2 # Sync weights every N rollouts
Workflow 3: Multi-Turn Agentic Training
Use this workflow for training agents with tool use or multi-step reasoning.
Prerequisites
- Custom generate function for multi-turn logic
- Tool/environment interface
Step 1: Define Custom Generate Function
# custom_generate.py
async def custom_generate(args, samples, evaluation=False):
"""Multi-turn generation with tool calling."""
for sample in samples:
conversation = sample.prompt
for turn in range(args.max_turns):
# Generate response
response = await generate_single(conversation)
# Check for tool call
tool_call = extract_tool_call(response)
if tool_call:
tool_result = execute_tool(tool_call)
conversation.append({"role": "assistant", "content": response})
conversation.append({"role": "tool", "content": tool_result})
else:
break
sample.response = response
sample.reward = compute_reward(sample)
return samples
Step 2: Launch with Custom Function
python train.py \
--custom-generate-function-path custom_generate.py \
--max-turns 5 \
--prompt-data /path/to/agent_data.jsonl \
${MODEL_ARGS[@]}
See examples/search-r1/ for a complete multi-turn search example.
Configuration Reference
Three Argument Categories
slime uses three types of arguments:
1. Megatron Arguments (passed directly):
--tensor-model-parallel-size 2
--pipeline-model-parallel-size 1
--num-layers 32
--hidden-size 4096
2. SGLang Arguments (prefixed with --sglang-):
--sglang-mem-fraction-static 0.8
--sglang-context-length 8192
--sglang-log-level INFO
3. slime Arguments:
# Resource allocation
--actor-num-nodes 1
--actor-num-gpus-per-node 8
--rollout-num-gpus 8
--colocate # Share GPUs between training/inference
# Data
--prompt-data /path/to/data.jsonl
--input-key prompt
--label-key label
# Training loop
--num-rollout 3000
--rollout-batch-size 32
--n-samples-per-prompt 8
--global-batch-size 256
# Algorithm
--advantage-estimator grpo # or: gspo, ppo, reinforce_plus_plus
--use-kl-loss
--kl-loss-coef 0.001
Key Constraints
rollout_batch_size × n_samples_per_prompt = global_batch_size × num_steps_per_rollout
Example: 32 × 8 = 256 × 1
Data Buffer System
slime's data buffer enables flexible data management:
Basic Data Source
class RolloutDataSource:
def get_samples(self, num_samples):
"""Fetch prompts from dataset."""
return self.dataset.sample(num_samples)
def add_samples(self, samples):
"""Called after generation (no-op by default)."""
pass
Buffered Data Source (Off-Policy)
class RolloutDataSourceWithBuffer(RolloutDataSource):
def __init__(self):
self.buffer = []
def add_samples(self, samples):
"""Store generated samples for reuse."""
self.buffer.extend(samples)
def buffer_filter(self, args, buffer, num_samples):
"""Custom selection logic (prioritized, stratified, etc.)."""
return select_best(buffer, num_samples)
Common Issues and Solutions
Issue: SGLang Engine Crash
Symptoms: Inference engine dies mid-training
Solutions:
# Enable fault tolerance
--use-fault-tolerance
# Increase memory allocation
--sglang-mem-fraction-static 0.85
# Reduce batch size
--rollout-batch-size 16
Issue: Weight Sync Timeout
Symptoms: Training hangs after rollout
Solutions:
# Increase sync interval
--update-weights-interval 5
# Use colocated mode (no network transfer)
--colocate
Issue: OOM During Training
Symptoms: CUDA OOM in backward pass
Solutions:
# Enable gradient checkpointing
--recompute-activations
# Reduce micro-batch size
--micro-batch-size 1
# Enable sequence parallelism
--sequence-parallel
Issue: Slow Data Loading
Symptoms: GPU idle during data fetch
Solutions:
# Increase data workers
--num-data-workers 4
# Use streaming dataset
--streaming-data
Supported Models
| Model Family | Configurations |
|---|---|
| GLM | GLM-4.5, GLM-4.6, GLM-4.7, GLM-Z1-9B |
| Qwen | Qwen3 (4B, 8B, 30B-A3B), Qwen3-MoE, Qwen2.5 |
| DeepSeek | V3, V3.1, R1 |
| Llama | Llama 3 (8B, 70B) |
| Others | Kimi K2, Moonlight-16B |
Each model has pre-configured scripts in scripts/models/.
Advanced Topics
Co-location Mode
Share GPUs between training and inference to reduce memory:
python train.py \
--colocate \
--actor-num-gpus-per-node 8 \
--sglang-mem-fraction-static 0.4 \
${MODEL_ARGS[@]}
Custom Reward Model
# custom_rm.py
class CustomRewardModel:
def __init__(self, model_path):
self.model = load_model(model_path)
def compute_reward(self,
---
*Content truncated.*
More by davila7
View all skills by davila7 →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversTest website accessibility and ensure WCAG compliance with Axe Accessibility, a web accessibility checker with detailed
Easily convert markdown to PDF using Markitdown MCP server. Supports HTTP, STDIO, and SSE for fast converting markdown t
Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium s
Use Chrome DevTools for web site test speed, debugging, and performance analysis. The essential chrome developer tools f
Connect Blender to Claude AI for seamless 3D modeling. Use AI 3D model generator tools for faster, intuitive, interactiv
Claude Context offers semantic code search and indexing with vector embeddings and AST-based code splitting. Natural lan
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.