bio-alignment-indexing

0
0
Source

Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.

Install

mkdir -p .claude/skills/bio-alignment-indexing && curl -L -o skill.zip "https://mcp.directory/api/skills/download/9349" && unzip -o skill.zip -d .claude/skills/bio-alignment-indexing && rm skill.zip

Installs to .claude/skills/bio-alignment-indexing

About this skill

Version Compatibility

Reference examples tested with: pysam 0.22+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

  • Python: pip show <package> then help(module.function) to check signatures
  • CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Alignment Indexing

Create indices for random access to alignment files using samtools and pysam.

"Index a BAM file" → Create a .bai/.csi index enabling random access to genomic regions.

  • CLI: samtools index file.bam
  • Python: pysam.index('file.bam')

Index Types

IndexExtensionUse Case
BAI.baiStandard BAM index, chromosomes < 512 Mbp
CSI.csiLarge chromosomes, custom bin sizes
CRAI.craiCRAM index

samtools index

Create BAI Index

samtools index input.bam
# Creates input.bam.bai

Create CSI Index

samtools index -c input.bam
# Creates input.bam.csi

Specify Output Name

samtools index input.bam output.bai

Multi-threaded Indexing

samtools index -@ 4 input.bam

Index CRAM

samtools index input.cram
# Creates input.cram.crai

Index Requirements

Indexing requires coordinate-sorted files:

# Check sort order
samtools view -H input.bam | grep "^@HD"
# Should show SO:coordinate

# Sort if needed, then index
samtools sort -o sorted.bam input.bam
samtools index sorted.bam

Using Indices for Region Access

Goal: Extract reads overlapping specific genomic coordinates from an indexed BAM.

Approach: With the index present, samtools view or pysam.fetch() can jump directly to the relevant file offset instead of scanning the entire file.

samtools view with Region

# Requires index file present
samtools view input.bam chr1:1000000-2000000

Multiple Regions

samtools view input.bam chr1:1000-2000 chr2:3000-4000

Regions from BED File

samtools view -L regions.bed input.bam

pysam Python Alternative

Create Index

import pysam

pysam.index('input.bam')
# Creates input.bam.bai

Create CSI Index

pysam.index('input.bam', 'input.bam.csi', csi=True)

Fetch with Index

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    # fetch() requires index
    for read in bam.fetch('chr1', 1000000, 2000000):
        print(read.query_name)

Check if Indexed

import pysam
from pathlib import Path

def is_indexed(bam_path):
    bam_path = Path(bam_path)
    return (bam_path.with_suffix('.bam.bai').exists() or
            Path(str(bam_path) + '.bai').exists() or
            bam_path.with_suffix('.bam.csi').exists())

if not is_indexed('input.bam'):
    pysam.index('input.bam')

Fetch Multiple Regions

regions = [('chr1', 1000, 2000), ('chr1', 5000, 6000), ('chr2', 1000, 2000)]

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for chrom, start, end in regions:
        count = sum(1 for _ in bam.fetch(chrom, start, end))
        print(f'{chrom}:{start}-{end}: {count} reads')

Count Reads in Region

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    count = bam.count('chr1', 1000000, 2000000)
    print(f'Reads in region: {count}')

Get Reads Covering Position

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for read in bam.fetch('chr1', 1000000, 1000001):
        if read.reference_start <= 1000000 < read.reference_end:
            print(f'{read.query_name} covers position 1000000')

Index File Locations

samtools looks for indices in two locations:

input.bam.bai   # Standard location
input.bai       # Alternative location

For CRAM:

input.cram.crai

idxstats - Index Statistics

Get Per-Chromosome Counts

samtools idxstats input.bam

Output format:

chr1    248956422    5000000    0
chr2    242193529    4500000    0
*       0            0          10000

Columns: reference name, length, mapped reads, unmapped reads

Sum Total Mapped Reads

samtools idxstats input.bam | awk '{sum += $3} END {print sum}'

pysam idxstats

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for stat in bam.get_index_statistics():
        print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')

FASTA Index (faidx)

Related but different - index reference FASTA for random access:

samtools faidx reference.fa
# Creates reference.fa.fai

# Fetch region from indexed FASTA
samtools faidx reference.fa chr1:1000-2000

pysam FastaFile

with pysam.FastaFile('reference.fa') as ref:
    seq = ref.fetch('chr1', 1000, 2000)
    print(seq)

Quick Reference

Tasksamtoolspysam
Create BAIsamtools index file.bampysam.index('file.bam')
Create CSIsamtools index -c file.bampysam.index('file.bam', csi=True)
Fetch regionsamtools view file.bam chr1:1-1000bam.fetch('chr1', 0, 1000)
Count in regionsamtools view -c file.bam chr1:1-1000bam.count('chr1', 0, 1000)
Index statssamtools idxstats file.bambam.get_index_statistics()
Index FASTAsamtools faidx ref.faAutomatic with FastaFile

Common Errors

ErrorCauseSolution
random alignment retrieval only works for indexed BAMMissing indexRun samtools index file.bam
file is not sortedUnsorted BAMSort first with samtools sort
chromosome not foundWrong chromosome nameCheck names with samtools view -H

Related Skills

  • sam-bam-basics - View and convert alignment files
  • alignment-sorting - Sort BAM files (required before indexing)
  • alignment-filtering - Filter by regions using index
  • bam-statistics - Use idxstats for quick counts
  • sequence-io/read-sequences - Index FASTA with SeqIO.index_db()

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,4051,301

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,2201,024

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

9001,012

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

958658

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

970608

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,032496

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.