nnsight-remote-interpretability

0views

1installs

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

Install

mkdir -p .claude/skills/nnsight-remote-interpretability && curl -L -o skill.zip "https://mcp.directory/api/skills/download/6826" && unzip -o skill.zip -d .claude/skills/nnsight-remote-interpretability && rm skill.zip

Installs to .claude/skills/nnsight-remote-interpretability

About this skill

nnsight: Transparent Access to Neural Network Internals

nnsight (/ɛn.saɪt/) enables researchers to interpret and manipulate the internals of any PyTorch model, with the unique capability of running the same code locally on small models or remotely on massive models (70B+) via NDIF.

GitHub: ndif-team/nnsight (730+ stars) Paper: NNsight and NDIF: Democratizing Access to Foundation Model Internals (ICLR 2025)

Key Value Proposition

Write once, run anywhere: The same interpretability code works on GPT-2 locally or Llama-3.1-405B remotely. Just toggle remote=True.

# Local execution (small model)
with model.trace("Hello world"):
    hidden = model.transformer.h[5].output[0].save()

# Remote execution (massive model) - same code!
with model.trace("Hello world", remote=True):
    hidden = model.model.layers[40].output[0].save()

When to Use nnsight

Use nnsight when you need to:

Run interpretability experiments on models too large for local GPUs (70B, 405B)
Work with any PyTorch architecture (transformers, Mamba, custom models)
Perform multi-token generation interventions
Share activations between different prompts
Access full model internals without reimplementation

Consider alternatives when:

You want consistent API across models → Use TransformerLens
You need declarative, shareable interventions → Use pyvene
You're training SAEs → Use SAELens
You only work with small models locally → TransformerLens may be simpler

Installation

# Basic installation
pip install nnsight

# For vLLM support
pip install "nnsight[vllm]"

For remote NDIF execution, sign up at login.ndif.us for an API key.

Core Concepts

LanguageModel Wrapper

from nnsight import LanguageModel

# Load model (uses HuggingFace under the hood)
model = LanguageModel("openai-community/gpt2", device_map="auto")

# For larger models
model = LanguageModel("meta-llama/Llama-3.1-8B", device_map="auto")

Tracing Context

The trace context manager enables deferred execution - operations are collected into a computation graph:

from nnsight import LanguageModel

model = LanguageModel("gpt2", device_map="auto")

with model.trace("The Eiffel Tower is in") as tracer:
    # Access any module's output
    hidden_states = model.transformer.h[5].output[0].save()

    # Access attention patterns
    attn = model.transformer.h[5].attn.attn_dropout.input[0][0].save()

    # Modify activations
    model.transformer.h[8].output[0][:] = 0  # Zero out layer 8

    # Get final output
    logits = model.output.save()

# After context exits, access saved values
print(hidden_states.shape)  # [batch, seq, hidden]

Proxy Objects

Inside trace, module accesses return Proxy objects that record operations:

with model.trace("Hello"):
    # These are all Proxy objects - operations are deferred
    h5_out = model.transformer.h[5].output[0]  # Proxy
    h5_mean = h5_out.mean(dim=-1)              # Proxy
    h5_saved = h5_mean.save()                   # Save for later access

Workflow 1: Activation Analysis

Step-by-Step

from nnsight import LanguageModel
import torch

model = LanguageModel("gpt2", device_map="auto")

prompt = "The capital of France is"

with model.trace(prompt) as tracer:
    # 1. Collect activations from multiple layers
    layer_outputs = []
    for i in range(12):  # GPT-2 has 12 layers
        layer_out = model.transformer.h[i].output[0].save()
        layer_outputs.append(layer_out)

    # 2. Get attention patterns
    attn_patterns = []
    for i in range(12):
        # Access attention weights (after softmax)
        attn = model.transformer.h[i].attn.attn_dropout.input[0][0].save()
        attn_patterns.append(attn)

    # 3. Get final logits
    logits = model.output.save()

# 4. Analyze outside context
for i, layer_out in enumerate(layer_outputs):
    print(f"Layer {i} output shape: {layer_out.shape}")
    print(f"Layer {i} norm: {layer_out.norm().item():.3f}")

# 5. Find top predictions
probs = torch.softmax(logits[0, -1], dim=-1)
top_tokens = probs.topk(5)
for token, prob in zip(top_tokens.indices, top_tokens.values):
    print(f"{model.tokenizer.decode(token)}: {prob.item():.3f}")

Checklist

Load model with LanguageModel wrapper
Use trace context for operations
Call .save() on values you need after context
Access saved values outside context
Use .shape, .norm(), etc. for analysis

Workflow 2: Activation Patching

Step-by-Step

from nnsight import LanguageModel
import torch

model = LanguageModel("gpt2", device_map="auto")

clean_prompt = "The Eiffel Tower is in"
corrupted_prompt = "The Colosseum is in"

# 1. Get clean activations
with model.trace(clean_prompt) as tracer:
    clean_hidden = model.transformer.h[8].output[0].save()

# 2. Patch clean into corrupted run
with model.trace(corrupted_prompt) as tracer:
    # Replace layer 8 output with clean activations
    model.transformer.h[8].output[0][:] = clean_hidden

    patched_logits = model.output.save()

# 3. Compare predictions
paris_token = model.tokenizer.encode(" Paris")[0]
rome_token = model.tokenizer.encode(" Rome")[0]

patched_probs = torch.softmax(patched_logits[0, -1], dim=-1)
print(f"Paris prob: {patched_probs[paris_token].item():.3f}")
print(f"Rome prob: {patched_probs[rome_token].item():.3f}")

Systematic Patching Sweep

def patch_layer_position(layer, position, clean_cache, corrupted_prompt):
    """Patch single layer/position from clean to corrupted."""
    with model.trace(corrupted_prompt) as tracer:
        # Get current activation
        current = model.transformer.h[layer].output[0]

        # Patch only specific position
        current[:, position, :] = clean_cache[layer][:, position, :]

        logits = model.output.save()

    return logits

# Sweep over all layers and positions
results = torch.zeros(12, seq_len)
for layer in range(12):
    for pos in range(seq_len):
        logits = patch_layer_position(layer, pos, clean_hidden, corrupted)
        results[layer, pos] = compute_metric(logits)

Workflow 3: Remote Execution with NDIF

Run the same experiments on massive models without local GPUs.

Step-by-Step

from nnsight import LanguageModel

# 1. Load large model (will run remotely)
model = LanguageModel("meta-llama/Llama-3.1-70B")

# 2. Same code, just add remote=True
with model.trace("The meaning of life is", remote=True) as tracer:
    # Access internals of 70B model!
    layer_40_out = model.model.layers[40].output[0].save()
    logits = model.output.save()

# 3. Results returned from NDIF
print(f"Layer 40 shape: {layer_40_out.shape}")

# 4. Generation with interventions
with model.trace(remote=True) as tracer:
    with tracer.invoke("What is 2+2?"):
        # Intervene during generation
        model.model.layers[20].output[0][:, -1, :] *= 1.5

    output = model.generate(max_new_tokens=50)

NDIF Setup

Sign up at login.ndif.us
Get API key
Set environment variable or pass to nnsight:

import os
os.environ["NDIF_API_KEY"] = "your_key"

# Or configure directly
from nnsight import CONFIG
CONFIG.API_KEY = "your_key"

Available Models on NDIF

Llama-3.1-8B, 70B, 405B
DeepSeek-R1 models
Various open-weight models (check ndif.us for current list)

Workflow 4: Cross-Prompt Activation Sharing

Share activations between different inputs in a single trace.

from nnsight import LanguageModel

model = LanguageModel("gpt2", device_map="auto")

with model.trace() as tracer:
    # First prompt
    with tracer.invoke("The cat sat on the"):
        cat_hidden = model.transformer.h[6].output[0].save()

    # Second prompt - inject cat's activations
    with tracer.invoke("The dog ran through the"):
        # Replace with cat's activations at layer 6
        model.transformer.h[6].output[0][:] = cat_hidden
        dog_with_cat = model.output.save()

# The dog prompt now has cat's internal representations

Workflow 5: Gradient-Based Analysis

Access gradients during backward pass.

from nnsight import LanguageModel
import torch

model = LanguageModel("gpt2", device_map="auto")

with model.trace("The quick brown fox") as tracer:
    # Save activations and enable gradient
    hidden = model.transformer.h[5].output[0].save()
    hidden.retain_grad()

    logits = model.output

    # Compute loss on specific token
    target_token = model.tokenizer.encode(" jumps")[0]
    loss = -logits[0, -1, target_token]

    # Backward pass
    loss.backward()

# Access gradients
grad = hidden.grad
print(f"Gradient shape: {grad.shape}")
print(f"Gradient norm: {grad.norm().item():.3f}")

Note: Gradient access not supported for vLLM or remote execution.

Common Issues & Solutions

Issue: Module path differs between models

# GPT-2 structure
model.transformer.h[5].output[0]

# LLaMA structure
model.model.layers[5].output[0]

# Solution: Check model structure
print(model._model)  # See actual module names

Issue: Forgetting to save

# WRONG: Value not accessible outside trace
with model.trace("Hello"):
    hidden = model.transformer.h[5].output[0]  # Not saved!

print(hidden)  # Error or wrong value

# RIGHT: Call .save()
with model.trace("Hello"):
    hidden = model.transformer.h[5].output[0].save()

print(hidden)  # Works!

Issue: Remote timeout

# For long operations, increase timeout
with model.trace("prompt", remote=True, timeout=300) as tracer:
    # Long operation...

Issue: Memory with many saved activations

# Only save what you need
with model.trace("prompt"):
    # Don't save everything
    for i in range(100):
        model.transformer.h[i].output[0]

---

*Content truncated.*

More by davila7

View all skills by davila7 →

software-architecture

davila7

Guide for quality focused software architecture. This skill should be used when users want to write code, design architecture, analyze code, in any case that relates to software development.

534194

planning-with-files

davila7

Implements Manus-style file-based planning for complex tasks. Creates task_plan.md, findings.md, and progress.md. Use when starting complex multi-step tasks, research projects, or any task requiring >5 tool calls.

84113

scroll-experience

davila7

Expert in building immersive scroll-driven experiences - parallax storytelling, scroll animations, interactive narratives, and cinematic web experiences. Like NY Times interactives, Apple product pages, and award-winning web experiences. Makes websites feel like experiences, not just pages. Use when: scroll animation, parallax, scroll storytelling, interactive story, cinematic website.

13087

humanizer

davila7

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases. Credits: Original skill by @blader - https://github.com/blader/humanizer

11557

game-development

davila7

Game development orchestrator. Routes to platform-specific skills based on project needs.

15249

telegram-bot-builder

davila7

Expert in building Telegram bots that solve real problems - from simple automation to complex AI-powered bots. Covers bot architecture, the Telegram Bot API, user experience, monetization strategies, and scaling bots to thousands of users. Use when: telegram bot, bot api, telegram automation, chat bot telegram, tg bot.

10349

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6851,428

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2641,324

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5331,147

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,355809

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,264727

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,482684

Related MCP Servers

Browse all servers

Axe Accessibility

Test website accessibility and ensure WCAG compliance with Axe Accessibility, a web accessibility checker with detailed

786 tools

Frame0 Diagramming

Bridge to Frame0 Diagramming—an intuitive website and web wireframe tool for creating diagrams, mockups, and visual elem

7528 tools

React Native Development Guide

Get expert React Native software guidance with tools for component analysis, performance, debugging, and migration betwe

3913 tools

Criterion

Criterion — semantic search across Quranic verses and authentic Hadiths for precise Islamic guidance, study, and researc

113 tools

Structured Workflow

Structured Workflow guides disciplined software engineering via refactoring, feature creation, and test driven developme

920 tools

OpenFeature MCP Server

OpenFeature MCP Server — install OpenFeature SDKs, evaluate feature flags remotely via OFREP, and connect AI clients to

30 tools

Install

mkdir -p .claude/skills/nnsight-remote-interpretability && curl -L -o skill.zip "https://mcp.directory/api/skills/download/6826" && unzip -o skill.zip -d .claude/skills/nnsight-remote-interpretability && rm skill.zip

Installs to .claude/skills/nnsight-remote-interpretability

Stats

Views

Installs

Author

davila7

7 skills published

Links

Source Code

nnsight-remote-interpretability

Install

About this skill

nnsight: Transparent Access to Neural Network Internals

Key Value Proposition

When to Use nnsight

Installation

Core Concepts

LanguageModel Wrapper

Tracing Context

Proxy Objects

Workflow 1: Activation Analysis

Step-by-Step

Checklist

Workflow 2: Activation Patching

Step-by-Step

Systematic Patching Sweep

Workflow 3: Remote Execution with NDIF

Step-by-Step

NDIF Setup

Available Models on NDIF

Workflow 4: Cross-Prompt Activation Sharing

Workflow 5: Gradient-Based Analysis

Common Issues & Solutions

Issue: Module path differs between models

Issue: Forgetting to save

Issue: Remote timeout

Issue: Memory with many saved activations

More by davila7

software-architecture

planning-with-files

scroll-experience

humanizer

game-development

telegram-bot-builder

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers