pufferlib

1views

1installs

High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.

Install

mkdir -p .claude/skills/pufferlib && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5176" && unzip -o skill.zip -d .claude/skills/pufferlib && rm skill.zip

Installs to .claude/skills/pufferlib

About this skill

PufferLib - High-Performance Reinforcement Learning

Overview

PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.

When to Use This Skill

Use this skill when:

Training RL agents with PPO on any environment (single or multi-agent)
Creating custom environments using the PufferEnv API
Optimizing performance for parallel environment simulation (vectorization)
Integrating existing environments from Gymnasium, PettingZoo, Atari, Procgen, etc.
Developing policies with CNN, LSTM, or custom architectures
Scaling RL to millions of steps per second for faster experimentation
Multi-agent RL with native multi-agent environment support

Core Capabilities

1. High-Performance Training (PuffeRL)

PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.

Quick start training:

# CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4

# Distributed training
torchrun --nproc_per_node=4 train.py

Python training loop:

import pufferlib
from pufferlib import PuffeRL

# Create vectorized environment
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Create trainer
trainer = PuffeRL(
    env=env,
    policy=my_policy,
    device='cuda',
    learning_rate=3e-4,
    batch_size=32768
)

# Training loop
for iteration in range(num_iterations):
    trainer.evaluate()  # Collect rollouts
    trainer.train()     # Train on batch
    trainer.mean_and_log()  # Log results

For comprehensive training guidance, read references/training.md for:

Complete training workflow and CLI options
Hyperparameter tuning with Protein
Distributed multi-GPU/multi-node training
Logger integration (Weights & Biases, Neptune)
Checkpointing and resume training
Performance optimization tips
Curriculum learning patterns

2. Environment Development (PufferEnv)

Create custom high-performance environments with the PufferEnv API.

Basic environment structure:

import numpy as np
from pufferlib import PufferEnv

class MyEnvironment(PufferEnv):
    def __init__(self, buf=None):
        super().__init__(buf)

        # Define spaces
        self.observation_space = self.make_space((4,))
        self.action_space = self.make_discrete(4)

        self.reset()

    def reset(self):
        # Reset state and return initial observation
        return np.zeros(4, dtype=np.float32)

    def step(self, action):
        # Execute action, compute reward, check done
        obs = self._get_observation()
        reward = self._compute_reward()
        done = self._is_done()
        info = {}

        return obs, reward, done, info

Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:

Different observation space types (vector, image, dict)
Action space variations (discrete, continuous, multi-discrete)
Multi-agent environment structure
Testing utilities

For complete environment development, read references/environments.md for:

PufferEnv API details and in-place operation patterns
Observation and action space definitions
Multi-agent environment creation
Ocean suite (20+ pre-built environments)
Performance optimization (Python to C workflow)
Environment wrappers and best practices
Debugging and validation techniques

3. Vectorization and Performance

Achieve maximum throughput with optimized parallel simulation.

Vectorization setup:

import pufferlib

# Automatic vectorization
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)

# Performance benchmarks:
# - Pure Python envs: 100k-500k SPS
# - C-based envs: 100M+ SPS
# - With training: 400k-4M total SPS

Key optimizations:

Shared memory buffers for zero-copy observation passing
Busy-wait flags instead of pipes/queues
Surplus environments for async returns
Multiple environments per worker

For vectorization optimization, read references/vectorization.md for:

Architecture and performance characteristics
Worker and batch size configuration
Serial vs multiprocessing vs async modes
Shared memory and zero-copy patterns
Hierarchical vectorization for large scale
Multi-agent vectorization strategies
Performance profiling and troubleshooting

4. Policy Development

Build policies as standard PyTorch modules with optional utilities.

Basic policy structure:

import torch.nn as nn
from pufferlib.pytorch import layer_init

class Policy(nn.Module):
    def __init__(self, observation_space, action_space):
        super().__init__()

        # Encoder
        self.encoder = nn.Sequential(
            layer_init(nn.Linear(obs_dim, 256)),
            nn.ReLU(),
            layer_init(nn.Linear(256, 256)),
            nn.ReLU()
        )

        # Actor and critic heads
        self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
        self.critic = layer_init(nn.Linear(256, 1), std=1.0)

    def forward(self, observations):
        features = self.encoder(observations)
        return self.actor(features), self.critic(features)

For complete policy development, read references/policies.md for:

CNN policies for image observations
Recurrent policies with optimized LSTM (3x faster inference)
Multi-input policies for complex observations
Continuous action policies
Multi-agent policies (shared vs independent parameters)
Advanced architectures (attention, residual)
Observation normalization and gradient clipping
Policy debugging and testing

5. Environment Integration

Seamlessly integrate environments from popular RL frameworks.

Gymnasium integration:

import gymnasium as gym
import pufferlib

# Wrap Gymnasium environment
gym_env = gym.make('CartPole-v1')
env = pufferlib.emulate(gym_env, num_envs=256)

# Or use make directly
env = pufferlib.make('gym-CartPole-v1', num_envs=256)

PettingZoo multi-agent:

# Multi-agent environment
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)

Supported frameworks:

Gymnasium / OpenAI Gym
PettingZoo (parallel and AEC)
Atari (ALE)
Procgen
NetHack / MiniHack
Minigrid
Neural MMO
Crafter
GPUDrive
MicroRTS
Griddly
And more...

For integration details, read references/integration.md for:

Complete integration examples for each framework
Custom wrappers (observation, reward, frame stacking, action repeat)
Space flattening and unflattening
Environment registration
Compatibility patterns
Performance considerations
Integration debugging

Quick Start Workflow

For Training Existing Environments

Choose environment from Ocean suite or compatible framework
Use scripts/train_template.py as starting point
Configure hyperparameters for your task
Run training with CLI or Python script
Monitor with Weights & Biases or Neptune
Refer to references/training.md for optimization

For Creating Custom Environments

Start with scripts/env_template.py
Define observation and action spaces
Implement reset() and step() methods
Test environment locally
Vectorize with pufferlib.emulate() or make()
Refer to references/environments.md for advanced patterns
Optimize with references/vectorization.md if needed

For Policy Development

Choose architecture based on observations:
- Vector observations → MLP policy
- Image observations → CNN policy
- Sequential tasks → LSTM policy
- Complex observations → Multi-input policy
Use layer_init for proper weight initialization
Follow patterns in references/policies.md
Test with environment before full training

For Performance Optimization

Profile current throughput (steps per second)
Check vectorization configuration (num_envs, num_workers)
Optimize environment code (in-place ops, numpy vectorization)
Consider C implementation for critical paths
Use references/vectorization.md for systematic optimization

Resources

scripts/

train_template.py - Complete training script template with:

Environment creation and configuration
Policy initialization
Logger integration (WandB, Neptune)
Training loop with checkpointing
Command-line argument parsing
Multi-GPU distributed training setup

env_template.py - Environment implementation templates:

Single-agent PufferEnv example (grid world)
Multi-agent PufferEnv example (cooperative navigation)
Multiple observation/action space patterns
Testing utilities

references/

training.md - Comprehensive training guide:

Training workflow and CLI options
Hyperparameter configuration
Distributed training (multi-GPU, multi-node)
Monitoring and logging
Checkpointing
Protein hyperparameter tuning
Performance optimization
Common training patterns
Troubleshooting

environments.md - Environment development guide:

PufferEnv API and characteristics
Observation and action spaces
Multi-agent environments
Ocean suite environments
Custom environment development workflow
Python to C optimization path
Third-party environment integration
Wrappers and best practices
Debugging

vectorization.md - Vectorization optimization:

Architecture and key optimizations
Vectorization modes (serial, multiprocessing, async)
Worker and batch configuration
Shared memory and zero-copy patterns
Advanced vectorization (hierarchical, custom)
Multi-agent vectorization
Performance monitoring and p

Content truncated.

More by K-Dense-AI

View all skills by K-Dense-AI →

literature-review

K-Dense-AI

Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).

966413

markitdown

K-Dense-AI

Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.

223106

scientific-writing

K-Dense-AI

Write scientific manuscripts. IMRAD structure, citations (APA/AMA/Vancouver), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA), abstracts, for research papers and journal submissions.

26978

pubmed-database

K-Dense-AI

"Direct REST API access to PubMed. Advanced Boolean/MeSH queries, E-utilities API, batch processing, citation management. For Python workflows, prefer biopython (Bio.Entrez). Use this for direct HTTP/REST work or custom API implementations."

14944

reportlab

K-Dense-AI

"PDF generation toolkit. Create invoices, reports, certificates, forms, charts, tables, barcodes, QR codes, Canvas/Platypus APIs, for professional document automation."

15028

matplotlib

K-Dense-AI

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

12924

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6831,428

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2601,319

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5271,144

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,349807

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,262727

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,472680

Related MCP Servers

Browse all servers

Dual-Cycle Reasoner

Dual-Cycle Reasoner enables agents to detect repetitive behavior, diagnose failure causes, and recover with advanced met

90 tools

Playwright Browser Automation

Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium s

28,44922 tools

Repomix

Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool

22,2988 tools

Figma Context

Unlock seamless Figma to code: streamline Figma to HTML with Framelink MCP Server for fast, accurate design-to-code work

13,4900 tools

Uno Platform

Uno Platform — Documentation and prompts for building cross-platform .NET apps with a single codebase. Get guides, sampl

9,8441 tools

MCP Use

The fullstack MCP framework for developing MCP apps for ChatGPT, Claude, and building MCP servers for AI agents. Connect

9,3960 tools

Install

mkdir -p .claude/skills/pufferlib && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5176" && unzip -o skill.zip -d .claude/skills/pufferlib && rm skill.zip

Installs to .claude/skills/pufferlib

Stats

Views

Installs

Author

K-Dense-AI

7 skills published

Links

Source Code

pufferlib

Install

About this skill

PufferLib - High-Performance Reinforcement Learning

Overview

When to Use This Skill

Core Capabilities

1. High-Performance Training (PuffeRL)

2. Environment Development (PufferEnv)

3. Vectorization and Performance

4. Policy Development

5. Environment Integration

Quick Start Workflow

For Training Existing Environments

For Creating Custom Environments

For Policy Development

For Performance Optimization

Resources

scripts/

references/

More by K-Dense-AI

literature-review

markitdown

scientific-writing

pubmed-database

reportlab

matplotlib

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers