implementing-llms-litgpt

0views

1installs

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Install

mkdir -p .claude/skills/implementing-llms-litgpt && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4503" && unzip -o skill.zip -d .claude/skills/implementing-llms-litgpt && rm skill.zip

Installs to .claude/skills/implementing-llms-litgpt

About this skill

LitGPT - Clean LLM Implementations

Quick start

LitGPT provides 20+ pretrained LLM implementations with clean, readable code and production-ready training workflows.

Installation:

pip install 'litgpt[extra]'

Load and use any model:

from litgpt import LLM

# Load pretrained model
llm = LLM.load("microsoft/phi-2")

# Generate text
result = llm.generate(
    "What is the capital of France?",
    max_new_tokens=50,
    temperature=0.7
)
print(result)

List available models:

litgpt download list

Common workflows

Workflow 1: Fine-tune on custom dataset

Copy this checklist:

Fine-Tuning Setup:
- [ ] Step 1: Download pretrained model
- [ ] Step 2: Prepare dataset
- [ ] Step 3: Configure training
- [ ] Step 4: Run fine-tuning

Step 1: Download pretrained model

# Download Llama 3 8B
litgpt download meta-llama/Meta-Llama-3-8B

# Download Phi-2 (smaller, faster)
litgpt download microsoft/phi-2

# Download Gemma 2B
litgpt download google/gemma-2b

Models are saved to checkpoints/ directory.

Step 2: Prepare dataset

LitGPT supports multiple formats:

Alpaca format (instruction-response):

[
  {
    "instruction": "What is the capital of France?",
    "input": "",
    "output": "The capital of France is Paris."
  },
  {
    "instruction": "Translate to Spanish: Hello, how are you?",
    "input": "",
    "output": "Hola, ¿cómo estás?"
  }
]

Save as data/my_dataset.json.

Step 3: Configure training

# Full fine-tuning (requires 40GB+ GPU for 7B models)
litgpt finetune \
  meta-llama/Meta-Llama-3-8B \
  --data JSON \
  --data.json_path data/my_dataset.json \
  --train.max_steps 1000 \
  --train.learning_rate 2e-5 \
  --train.micro_batch_size 1 \
  --train.global_batch_size 16

# LoRA fine-tuning (efficient, 16GB GPU)
litgpt finetune_lora \
  microsoft/phi-2 \
  --data JSON \
  --data.json_path data/my_dataset.json \
  --lora_r 16 \
  --lora_alpha 32 \
  --lora_dropout 0.05 \
  --train.max_steps 1000 \
  --train.learning_rate 1e-4

Step 4: Run fine-tuning

Training saves checkpoints to out/finetune/ automatically.

Monitor training:

# View logs
tail -f out/finetune/logs.txt

# TensorBoard (if using --train.logger_name tensorboard)
tensorboard --logdir out/finetune/lightning_logs

Workflow 2: LoRA fine-tuning on single GPU

Most memory-efficient option.

LoRA Training:
- [ ] Step 1: Choose base model
- [ ] Step 2: Configure LoRA parameters
- [ ] Step 3: Train with LoRA
- [ ] Step 4: Merge LoRA weights (optional)

Step 1: Choose base model

For limited GPU memory (12-16GB):

Phi-2 (2.7B) - Best quality/size tradeoff
Llama 3 1B - Smallest, fastest
Gemma 2B - Good reasoning

Step 2: Configure LoRA parameters

litgpt finetune_lora \
  microsoft/phi-2 \
  --data JSON \
  --data.json_path data/my_dataset.json \
  --lora_r 16 \          # LoRA rank (8-64, higher=more capacity)
  --lora_alpha 32 \      # LoRA scaling (typically 2×r)
  --lora_dropout 0.05 \  # Prevent overfitting
  --lora_query true \    # Apply LoRA to query projection
  --lora_key false \     # Usually not needed
  --lora_value true \    # Apply LoRA to value projection
  --lora_projection true \  # Apply LoRA to output projection
  --lora_mlp false \     # Usually not needed
  --lora_head false      # Usually not needed

LoRA rank guide:

r=8: Lightweight, 2-4MB adapters
r=16: Standard, good quality
r=32: High capacity, use for complex tasks
r=64: Maximum quality, 4× larger adapters

Step 3: Train with LoRA

litgpt finetune_lora \
  microsoft/phi-2 \
  --data JSON \
  --data.json_path data/my_dataset.json \
  --lora_r 16 \
  --train.epochs 3 \
  --train.learning_rate 1e-4 \
  --train.micro_batch_size 4 \
  --train.global_batch_size 32 \
  --out_dir out/phi2-lora

# Memory usage: ~8-12GB for Phi-2 with LoRA

Step 4: Merge LoRA weights (optional)

Merge LoRA adapters into base model for deployment:

litgpt merge_lora \
  out/phi2-lora/final \
  --out_dir out/phi2-merged

Now use merged model:

from litgpt import LLM
llm = LLM.load("out/phi2-merged")

Workflow 3: Pretrain from scratch

Train new model on your domain data.

Pretraining:
- [ ] Step 1: Prepare pretraining dataset
- [ ] Step 2: Configure model architecture
- [ ] Step 3: Set up multi-GPU training
- [ ] Step 4: Launch pretraining

Step 1: Prepare pretraining dataset

LitGPT expects tokenized data. Use prepare_dataset.py:

python scripts/prepare_dataset.py \
  --source_path data/my_corpus.txt \
  --checkpoint_dir checkpoints/tokenizer \
  --destination_path data/pretrain \
  --split train,val

Step 2: Configure model architecture

Edit config file or use existing:

# config/pythia-160m.yaml
model_name: pythia-160m
block_size: 2048
vocab_size: 50304
n_layer: 12
n_head: 12
n_embd: 768
rotary_percentage: 0.25
parallel_residual: true
bias: true

Step 3: Set up multi-GPU training

# Single GPU
litgpt pretrain \
  --config config/pythia-160m.yaml \
  --data.data_dir data/pretrain \
  --train.max_tokens 10_000_000_000

# Multi-GPU with FSDP
litgpt pretrain \
  --config config/pythia-1b.yaml \
  --data.data_dir data/pretrain \
  --devices 8 \
  --train.max_tokens 100_000_000_000

Step 4: Launch pretraining

For large-scale pretraining on cluster:

# Using SLURM
sbatch --nodes=8 --gpus-per-node=8 \
  pretrain_script.sh

# pretrain_script.sh content:
litgpt pretrain \
  --config config/pythia-1b.yaml \
  --data.data_dir /shared/data/pretrain \
  --devices 8 \
  --num_nodes 8 \
  --train.global_batch_size 512 \
  --train.max_tokens 300_000_000_000

Workflow 4: Convert and deploy model

Export LitGPT models for production.

Model Deployment:
- [ ] Step 1: Test inference locally
- [ ] Step 2: Quantize model (optional)
- [ ] Step 3: Convert to GGUF (for llama.cpp)
- [ ] Step 4: Deploy with API

Step 1: Test inference locally

from litgpt import LLM

llm = LLM.load("out/phi2-lora/final")

# Single generation
print(llm.generate("What is machine learning?"))

# Streaming
for token in llm.generate("Explain quantum computing", stream=True):
    print(token, end="", flush=True)

# Batch inference
prompts = ["Hello", "Goodbye", "Thank you"]
results = [llm.generate(p) for p in prompts]

Step 2: Quantize model (optional)

Reduce model size with minimal quality loss:

# 8-bit quantization (50% size reduction)
litgpt convert_lit_checkpoint \
  out/phi2-lora/final \
  --dtype bfloat16 \
  --quantize bnb.nf4

# 4-bit quantization (75% size reduction)
litgpt convert_lit_checkpoint \
  out/phi2-lora/final \
  --quantize bnb.nf4-dq  # Double quantization

Step 3: Convert to GGUF (for llama.cpp)

python scripts/convert_lit_checkpoint.py \
  --checkpoint_path out/phi2-lora/final \
  --output_path models/phi2.gguf \
  --model_name microsoft/phi-2

Step 4: Deploy with API

from fastapi import FastAPI
from litgpt import LLM

app = FastAPI()
llm = LLM.load("out/phi2-lora/final")

@app.post("/generate")
def generate(prompt: str, max_tokens: int = 100):
    result = llm.generate(
        prompt,
        max_new_tokens=max_tokens,
        temperature=0.7
    )
    return {"response": result}

# Run: uvicorn api:app --host 0.0.0.0 --port 8000

When to use vs alternatives

Use LitGPT when:

Want to understand LLM architectures (clean, readable code)
Need production-ready training recipes
Educational purposes or research
Prototyping new model ideas
Lightning ecosystem user

Use alternatives instead:

Axolotl/TRL: More fine-tuning features, YAML configs
Megatron-Core: Maximum performance for >70B models
HuggingFace Transformers: Broadest model support
vLLM: Inference-only (no training)

Common issues

Issue: Out of memory during fine-tuning

Use LoRA instead of full fine-tuning:

# Instead of litgpt finetune (requires 40GB+)
litgpt finetune_lora  # Only needs 12-16GB

Or enable gradient checkpointing:

litgpt finetune_lora \
  ... \
  --train.gradient_accumulation_iters 4  # Accumulate gradients

Issue: Training too slow

Enable Flash Attention (built-in, automatic on compatible hardware):

# Already enabled by default on Ampere+ GPUs (A100, RTX 30/40 series)
# No configuration needed

Use smaller micro-batch and accumulate:

--train.micro_batch_size 1 \
--train.global_batch_size 32 \
--train.gradient_accumulation_iters 32  # Effective batch=32

Issue: Model not loading

Check model name:

# List all available models
litgpt download list

# Download if not exists
litgpt download meta-llama/Meta-Llama-3-8B

Verify checkpoints directory:

ls checkpoints/
# Should see: meta-llama/Meta-Llama-3-8B/

Issue: LoRA adapters too large

Reduce LoRA rank:

--lora_r 8  # Instead of 16 or 32

Apply LoRA to fewer layers:

--lora_query true \
--lora_value true \
--lora_projection false \  # Disable this
--lora_mlp false  # And this

Advanced topics

Supported architectures: See references/supported-models.md for complete list of 20+ model families with sizes and capabilities.

Training recipes: See references/training-recipes.md for proven hyperparameter configurations for pretraining and fine-tuning.

FSDP configuration: See references/distributed-training.md for multi-GPU training with Fully Sharded Data Parallel.

Custom architectures: See references/custom-models.md for implementing new model architectures in LitGPT style.

Hardware requirements

GPU: NVIDIA (CUDA 11.8+), AMD (ROCm), Apple Silicon (MPS)
Memory:
- Inference (Phi-2): 6

Content truncated.

More by davila7

View all skills by davila7 →

software-architecture

davila7

Guide for quality focused software architecture. This skill should be used when users want to write code, design architecture, analyze code, in any case that relates to software development.

523186

planning-with-files

davila7

Implements Manus-style file-based planning for complex tasks. Creates task_plan.md, findings.md, and progress.md. Use when starting complex multi-step tasks, research projects, or any task requiring >5 tool calls.

84107

scroll-experience

davila7

Expert in building immersive scroll-driven experiences - parallax storytelling, scroll animations, interactive narratives, and cinematic web experiences. Like NY Times interactives, Apple product pages, and award-winning web experiences. Makes websites feel like experiences, not just pages. Use when: scroll animation, parallax, scroll storytelling, interactive story, cinematic website.

13087

humanizer

davila7

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases. Credits: Original skill by @blader - https://github.com/blader/humanizer

11557

game-development

davila7

Game development orchestrator. Routes to platform-specific skills based on project needs.

15249

2d-games

davila7

2D game development principles. Sprites, tilemaps, physics, camera.

14448

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6791,426

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2561,315

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5251,142

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,347807

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,260725

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,465674

Related MCP Servers

Browse all servers

Browser Use

Browser Use lets LLMs and agents access and scrape any website in real time, making web scraping and web page scraping e

79,9420 tools

DINO-X

DINO-X is a powerful multimodal AI model that lets you detect, localize, and describe anything in images using natural l

1120 tools

Auth0

Integrate Auth0 with AI agents to manage Auth0 operations using natural language. Easily create apps and retrieve domain

940 tools

LLMs.txt Explorer

Explore websites using llms.txt files with LLMs.txt Explorer. Fetch and parse site-specific language model instructions

750 tools

Alby Bitcoin Payments MCP Server

Connect a Bitcoin Lightning wallet to your LLM with Nostr Wallet Connect for seamless Lightning wallet integration, cryp

530 tools

HTTP Request

Advanced web scraper lets LLMs bypass anti-bot protection using HTTP requests, ideal for web scraping tools like Octopar

430 tools

Install

mkdir -p .claude/skills/implementing-llms-litgpt && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4503" && unzip -o skill.zip -d .claude/skills/implementing-llms-litgpt && rm skill.zip

Installs to .claude/skills/implementing-llms-litgpt

Stats

Views

Installs

Author

davila7

7 skills published

Links

Source Code

implementing-llms-litgpt

Install

About this skill

LitGPT - Clean LLM Implementations

Quick start

Common workflows

Workflow 1: Fine-tune on custom dataset

Workflow 2: LoRA fine-tuning on single GPU

Workflow 3: Pretrain from scratch

Workflow 4: Convert and deploy model

When to use vs alternatives

Common issues

Advanced topics

Hardware requirements

More by davila7

software-architecture

planning-with-files

scroll-experience

humanizer

game-development

2d-games

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers