mamba-architecture
State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.
Install
mkdir -p .claude/skills/mamba-architecture && curl -L -o skill.zip "https://mcp.directory/api/skills/download/2145" && unzip -o skill.zip -d .claude/skills/mamba-architecture && rm skill.zipInstalls to .claude/skills/mamba-architecture
About this skill
Mamba - Selective State Space Models
Quick start
Mamba is a state-space model architecture achieving O(n) linear complexity for sequence modeling.
Installation:
# Install causal-conv1d (optional, for efficiency)
pip install causal-conv1d>=1.4.0
# Install Mamba
pip install mamba-ssm
# Or both together
pip install mamba-ssm[causal-conv1d]
Prerequisites: Linux, NVIDIA GPU, PyTorch 1.12+, CUDA 11.6+
Basic usage (Mamba block):
import torch
from mamba_ssm import Mamba
batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba(
d_model=dim, # Model dimension
d_state=16, # SSM state dimension
d_conv=4, # Conv1d kernel size
expand=2 # Expansion factor
).to("cuda")
y = model(x) # O(n) complexity!
assert y.shape == x.shape
Common workflows
Workflow 1: Language model with Mamba-2
Complete LM with generation:
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from mamba_ssm.models.config_mamba import MambaConfig
import torch
# Configure Mamba-2 LM
config = MambaConfig(
d_model=1024, # Hidden dimension
n_layer=24, # Number of layers
vocab_size=50277, # Vocabulary size
ssm_cfg=dict(
layer="Mamba2", # Use Mamba-2
d_state=128, # Larger state for Mamba-2
headdim=64, # Head dimension
ngroups=1 # Number of groups
)
)
model = MambaLMHeadModel(config, device="cuda", dtype=torch.float16)
# Generate text
input_ids = torch.randint(0, 1000, (1, 20), device="cuda", dtype=torch.long)
output = model.generate(
input_ids=input_ids,
max_length=100,
temperature=0.7,
top_p=0.9
)
Workflow 2: Use pretrained Mamba models
Load from HuggingFace:
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
# Load pretrained model
model_name = "state-spaces/mamba-2.8b"
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") # Use compatible tokenizer
model = MambaLMHeadModel.from_pretrained(model_name, device="cuda", dtype=torch.float16)
# Generate
prompt = "The future of AI is"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
output_ids = model.generate(
input_ids=input_ids,
max_length=200,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2
)
generated_text = tokenizer.decode(output_ids[0])
print(generated_text)
Available models:
state-spaces/mamba-130mstate-spaces/mamba-370mstate-spaces/mamba-790mstate-spaces/mamba-1.4bstate-spaces/mamba-2.8b
Workflow 3: Mamba-1 vs Mamba-2
Mamba-1 (smaller state):
from mamba_ssm import Mamba
model = Mamba(
d_model=256,
d_state=16, # Smaller state dimension
d_conv=4,
expand=2
).to("cuda")
Mamba-2 (multi-head, larger state):
from mamba_ssm import Mamba2
model = Mamba2(
d_model=256,
d_state=128, # Larger state dimension
d_conv=4,
expand=2,
headdim=64, # Head dimension for multi-head
ngroups=1 # Parallel groups
).to("cuda")
Key differences:
- State size: Mamba-1 (d_state=16) vs Mamba-2 (d_state=128)
- Architecture: Mamba-2 has multi-head structure
- Normalization: Mamba-2 uses RMSNorm
- Distributed: Mamba-2 supports tensor parallelism
Workflow 4: Benchmark vs Transformers
Generation speed comparison:
# Benchmark Mamba
python benchmarks/benchmark_generation_mamba_simple.py \
--model-name "state-spaces/mamba-2.8b" \
--prompt "The future of machine learning is" \
--topp 0.9 --temperature 0.7 --repetition-penalty 1.2
# Benchmark Transformer
python benchmarks/benchmark_generation_mamba_simple.py \
--model-name "EleutherAI/pythia-2.8b" \
--prompt "The future of machine learning is" \
--topp 0.9 --temperature 0.7 --repetition-penalty 1.2
Expected results:
- Mamba: 5× faster inference
- Memory: No KV cache needed
- Scaling: Linear with sequence length
When to use vs alternatives
Use Mamba when:
- Need long sequences (100K+ tokens)
- Want faster inference than Transformers
- Memory-constrained (no KV cache)
- Building streaming applications
- Linear scaling important
Advantages:
- O(n) complexity: Linear vs quadratic
- 5× faster inference: No attention overhead
- No KV cache: Lower memory usage
- Million-token sequences: Hardware-efficient
- Streaming: Constant memory per token
Use alternatives instead:
- Transformers: Need best-in-class performance, have compute
- RWKV: Want RNN+Transformer hybrid
- RetNet: Need retention-based architecture
- Hyena: Want convolution-based approach
Common issues
Issue: CUDA out of memory
Reduce batch size or use gradient checkpointing:
model = MambaLMHeadModel(config, device="cuda", dtype=torch.float16)
model.gradient_checkpointing_enable() # Enable checkpointing
Issue: Slow installation
Install binary wheels (not source):
pip install mamba-ssm --no-build-isolation
Issue: Missing causal-conv1d
Install separately:
pip install causal-conv1d>=1.4.0
Issue: Model not loading from HuggingFace
Use MambaLMHeadModel.from_pretrained (not AutoModel):
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
model = MambaLMHeadModel.from_pretrained("state-spaces/mamba-2.8b")
Advanced topics
Selective SSM: See references/selective-ssm.md for mathematical formulation, state-space equations, and how selectivity enables O(n) complexity.
Mamba-2 architecture: See references/mamba2-details.md for multi-head structure, tensor parallelism, and distributed training setup.
Performance optimization: See references/performance.md for hardware-aware design, CUDA kernels, and memory efficiency techniques.
Hardware requirements
- GPU: NVIDIA with CUDA 11.6+
- VRAM:
- 130M model: 2GB
- 370M model: 4GB
- 790M model: 8GB
- 1.4B model: 14GB
- 2.8B model: 28GB (FP16)
- Inference: 5× faster than Transformers
- Memory: No KV cache (lower than Transformers)
Performance (vs Transformers):
- Speed: 5× faster inference
- Memory: 50% less (no KV cache)
- Scaling: Linear vs quadratic
Resources
- Paper (Mamba-1): https://arxiv.org/abs/2312.00752 (Dec 2023)
- Paper (Mamba-2): https://arxiv.org/abs/2405.21060 (May 2024)
- GitHub: https://github.com/state-spaces/mamba ⭐ 13,000+
- Models: https://huggingface.co/state-spaces
- Docs: Repository README and wiki
More by davila7
View all skills by davila7 →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversAccess HuggingFace models, datasets, and Spaces easily. Utilize Hugging Face AI learning tools and transformers for adva
Boost your AI code assistant with Context7: inject real-time API documentation from OpenAPI specification sources into y
Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium s
Extend your developer tools with GitHub MCP Server for advanced automation, supporting GitHub Student and student packag
Boost productivity with Task Master: an AI-powered tool for project management and agile development workflows, integrat
Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.