pysam

0views

1installs

Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.

Install

mkdir -p .claude/skills/pysam && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4427" && unzip -o skill.zip -d .claude/skills/pysam && rm skill.zip

Installs to .claude/skills/pysam

About this skill

Pysam

Overview

Pysam is a Python module for reading, manipulating, and writing genomic datasets. Read/write SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequences with a Pythonic interface to htslib. Query tabix-indexed files, perform pileup analysis for coverage, and execute samtools/bcftools commands.

When to Use This Skill

This skill should be used when:

Working with sequencing alignment files (BAM/CRAM)
Analyzing genetic variants (VCF/BCF)
Extracting reference sequences or gene regions
Processing raw sequencing data (FASTQ)
Calculating coverage or read depth
Implementing bioinformatics analysis pipelines
Quality control of sequencing data
Variant calling and annotation workflows

Quick Start

Installation

uv pip install pysam

Basic Examples

Read alignment file:

import pysam

# Open BAM file and fetch reads in region
samfile = pysam.AlignmentFile("example.bam", "rb")
for read in samfile.fetch("chr1", 1000, 2000):
    print(f"{read.query_name}: {read.reference_start}")
samfile.close()

Read variant file:

# Open VCF file and iterate variants
vcf = pysam.VariantFile("variants.vcf")
for variant in vcf:
    print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}")
vcf.close()

Query reference sequence:

# Open FASTA and extract sequence
fasta = pysam.FastaFile("reference.fasta")
sequence = fasta.fetch("chr1", 1000, 2000)
print(sequence)
fasta.close()

Core Capabilities

1. Alignment File Operations (SAM/BAM/CRAM)

Use the AlignmentFile class to work with aligned sequencing reads. This is appropriate for analyzing mapping results, calculating coverage, extracting reads, or quality control.

Common operations:

Open and read BAM/SAM/CRAM files
Fetch reads from specific genomic regions
Filter reads by mapping quality, flags, or other criteria
Write filtered or modified alignments
Calculate coverage statistics
Perform pileup analysis (base-by-base coverage)
Access read sequences, quality scores, and alignment information

Reference: See references/alignment_files.md for detailed documentation on:

Opening and reading alignment files
AlignedSegment attributes and methods
Region-based fetching with fetch()
Pileup analysis for coverage
Writing and creating BAM files
Coordinate systems and indexing
Performance optimization tips

2. Variant File Operations (VCF/BCF)

Use the VariantFile class to work with genetic variants from variant calling pipelines. This is appropriate for variant analysis, filtering, annotation, or population genetics.

Common operations:

Read and write VCF/BCF files
Query variants in specific regions
Access variant information (position, alleles, quality)
Extract genotype data for samples
Filter variants by quality, allele frequency, or other criteria
Annotate variants with additional information
Subset samples or regions

Reference: See references/variant_files.md for detailed documentation on:

Opening and reading variant files
VariantRecord attributes and methods
Accessing INFO and FORMAT fields
Working with genotypes and samples
Creating and writing VCF files
Filtering and subsetting variants
Multi-sample VCF operations

3. Sequence File Operations (FASTA/FASTQ)

Use FastaFile for random access to reference sequences and FastxFile for reading raw sequencing data. This is appropriate for extracting gene sequences, validating variants against reference, or processing raw reads.

Common operations:

Query reference sequences by genomic coordinates
Extract sequences for genes or regions of interest
Read FASTQ files with quality scores
Validate variant reference alleles
Calculate sequence statistics
Filter reads by quality or length
Convert between FASTA and FASTQ formats

Reference: See references/sequence_files.md for detailed documentation on:

FASTA file access and indexing
Extracting sequences by region
Handling reverse complement for genes
Reading FASTQ files sequentially
Quality score conversion and filtering
Working with tabix-indexed files (BED, GTF, GFF)
Common sequence processing patterns

4. Integrated Bioinformatics Workflows

Pysam excels at integrating multiple file types for comprehensive genomic analyses. Common workflows combine alignment files, variant files, and reference sequences.

Common workflows:

Calculate coverage statistics for specific regions
Validate variants against aligned reads
Annotate variants with coverage information
Extract sequences around variant positions
Filter alignments or variants based on multiple criteria
Generate coverage tracks for visualization
Quality control across multiple data types

Reference: See references/common_workflows.md for detailed examples of:

Quality control workflows (BAM statistics, reference consistency)
Coverage analysis (per-base coverage, low coverage detection)
Variant analysis (annotation, filtering by read support)
Sequence extraction (variant contexts, gene sequences)
Read filtering and subsetting
Integration patterns (BAM+VCF, VCF+BED, etc.)
Performance optimization for complex workflows

Key Concepts

Coordinate Systems

Critical: Pysam uses 0-based, half-open coordinates (Python convention):

Start positions are 0-based (first base is position 0)
End positions are exclusive (not included in the range)
Region 1000-2000 includes bases 1000-1999 (1000 bases total)

Exception: Region strings in fetch() follow samtools convention (1-based):

samfile.fetch("chr1", 999, 2000)      # 0-based: positions 999-1999
samfile.fetch("chr1:1000-2000")       # 1-based string: positions 1000-2000

VCF files: Use 1-based coordinates in the file format, but VariantRecord.start is 0-based.

Indexing Requirements

Random access to specific genomic regions requires index files:

BAM files: Require .bai index (create with pysam.index())
CRAM files: Require .crai index
FASTA files: Require .fai index (create with pysam.faidx())
VCF.gz files: Require .tbi tabix index (create with pysam.tabix_index())
BCF files: Require .csi index

Without an index, use fetch(until_eof=True) for sequential reading.

File Modes

Specify format when opening files:

"rb" - Read BAM (binary)
"r" - Read SAM (text)
"rc" - Read CRAM
"wb" - Write BAM
"w" - Write SAM
"wc" - Write CRAM

Performance Considerations

Always use indexed files for random access operations
Use pileup() for column-wise analysis instead of repeated fetch operations
Use count() for counting instead of iterating and counting manually
Process regions in parallel when analyzing independent genomic regions
Close files explicitly to free resources
Use until_eof=True for sequential processing without index
Avoid multiple iterators unless necessary (use multiple_iterators=True if needed)

Common Pitfalls

Coordinate confusion: Remember 0-based vs 1-based systems in different contexts
Missing indices: Many operations require index files—create them first
Partial overlaps: fetch() returns reads overlapping region boundaries, not just those fully contained
Iterator scope: Keep pileup iterator references alive to avoid "PileupProxy accessed after iterator finished" errors
Quality score editing: Cannot modify query_qualities in place after changing query_sequence—create a copy first
Stream limitations: Only stdin/stdout are supported for streaming, not arbitrary Python file objects
Thread safety: While GIL is released during I/O, comprehensive thread-safety hasn't been fully validated

Command-Line Tools

Pysam provides access to samtools and bcftools commands:

# Sort BAM file
pysam.samtools.sort("-o", "sorted.bam", "input.bam")

# Index BAM
pysam.samtools.index("sorted.bam")

# View specific region
pysam.samtools.view("-b", "-o", "region.bam", "input.bam", "chr1:1000-2000")

# BCF tools
pysam.bcftools.view("-O", "z", "-o", "output.vcf.gz", "input.vcf")

Error handling:

try:
    pysam.samtools.sort("-o", "output.bam", "input.bam")
except pysam.SamtoolsError as e:
    print(f"Error: {e}")

Resources

references/

Detailed documentation for each major capability:

alignment_files.md - Complete guide to SAM/BAM/CRAM operations, including AlignmentFile class, AlignedSegment attributes, fetch operations, pileup analysis, and writing alignments
variant_files.md - Complete guide to VCF/BCF operations, including VariantFile class, VariantRecord attributes, genotype handling, INFO/FORMAT fields, and multi-sample operations
sequence_files.md - Complete guide to FASTA/FASTQ operations, including FastaFile and FastxFile classes, sequence extraction, quality score handling, and tabix-indexed file access
common_workflows.md - Practical examples of integrated bioinformatics workflows combining multiple file types, including quality control, coverage analysis, variant validation, and sequence extraction

Getting Help

For detailed information on specific operations, refer to the appropriate reference document:

Working with BAM files or calculating coverage → alignment_files.md
Analyzing variants or genotypes → variant_files.md
Extracting sequences or processing FASTQ → sequence_files.md
Complex workflows integrating multiple file types → common_workflows.md

Official documentation: https://pysam.readthedocs.io/

More by davila7

View all skills by davila7 →

software-architecture

davila7

Guide for quality focused software architecture. This skill should be used when users want to write code, design architecture, analyze code, in any case that relates to software development.

523186

planning-with-files

davila7

Implements Manus-style file-based planning for complex tasks. Creates task_plan.md, findings.md, and progress.md. Use when starting complex multi-step tasks, research projects, or any task requiring >5 tool calls.

84105

scroll-experience

davila7

Expert in building immersive scroll-driven experiences - parallax storytelling, scroll animations, interactive narratives, and cinematic web experiences. Like NY Times interactives, Apple product pages, and award-winning web experiences. Makes websites feel like experiences, not just pages. Use when: scroll animation, parallax, scroll storytelling, interactive story, cinematic website.

13087

humanizer

davila7

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases. Credits: Original skill by @blader - https://github.com/blader/humanizer

11457

game-development

davila7

Game development orchestrator. Routes to platform-specific skills based on project needs.

15249

2d-games

davila7

2D game development principles. Sprites, tilemaps, physics, camera.

14448

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6771,424

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2521,313

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5221,142

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,345805

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,257725

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,464673

Related MCP Servers

Browse all servers

Markitdown

Easily convert markdown to PDF using Markitdown MCP server. Supports HTTP, STDIO, and SSE for fast converting markdown t

90,3881 tools

Filesystem

Learn how to use Python to read a file and manipulate local files safely through the Filesystem API.

80,52714 tools

Playwright Browser Automation

Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium s

28,44922 tools

Serena

Serena is a free AI code generator toolkit providing robust code editing and retrieval, turning LLMs into powerful artif

21,1630 tools

Figma Context

Unlock seamless Figma to code: streamline Figma to HTML with Framelink MCP Server for fast, accurate design-to-code work

13,4900 tools

Browser

Supercharge browser tasks with Browser MCP—AI-driven, local browser automation for powerful, private testing. Inspired b

5,99112 tools

Install

mkdir -p .claude/skills/pysam && curl -L -o skill.zip "https://mcp.directory/api/skills/download/4427" && unzip -o skill.zip -d .claude/skills/pysam && rm skill.zip

Installs to .claude/skills/pysam

Stats

Views

Installs

Author

davila7

7 skills published

Links

Source Code

pysam

Install

About this skill

Pysam

Overview

When to Use This Skill

Quick Start

Installation

Basic Examples

Core Capabilities

1. Alignment File Operations (SAM/BAM/CRAM)

2. Variant File Operations (VCF/BCF)

3. Sequence File Operations (FASTA/FASTQ)

4. Integrated Bioinformatics Workflows

Key Concepts

Coordinate Systems

Indexing Requirements

File Modes

Performance Considerations

Common Pitfalls

Command-Line Tools

Resources

references/

Getting Help

More by davila7

software-architecture

planning-with-files

scroll-experience

humanizer

game-development

2d-games

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers