scikit-bio

0
0
Source

Biological data toolkit. Sequence analysis, alignments, phylogenetic trees, diversity metrics (alpha/beta, UniFrac), ordination (PCoA), PERMANOVA, FASTA/Newick I/O, for microbiome analysis.

Install

mkdir -p .claude/skills/scikit-bio && curl -L -o skill.zip "https://mcp.directory/api/skills/download/6229" && unzip -o skill.zip -d .claude/skills/scikit-bio && rm skill.zip

Installs to .claude/skills/scikit-bio

About this skill

scikit-bio

Overview

scikit-bio is a comprehensive Python library for working with biological data. Apply this skill for bioinformatics analyses spanning sequence manipulation, alignment, phylogenetics, microbial ecology, and multivariate statistics.

When to Use This Skill

This skill should be used when the user:

  • Works with biological sequences (DNA, RNA, protein)
  • Needs to read/write biological file formats (FASTA, FASTQ, GenBank, Newick, BIOM, etc.)
  • Performs sequence alignments or searches for motifs
  • Constructs or analyzes phylogenetic trees
  • Calculates diversity metrics (alpha/beta diversity, UniFrac distances)
  • Performs ordination analysis (PCoA, CCA, RDA)
  • Runs statistical tests on biological/ecological data (PERMANOVA, ANOSIM, Mantel)
  • Analyzes microbiome or community ecology data
  • Works with protein embeddings from language models
  • Needs to manipulate biological data tables

Core Capabilities

1. Sequence Manipulation

Work with biological sequences using specialized classes for DNA, RNA, and protein data.

Key operations:

  • Read/write sequences from FASTA, FASTQ, GenBank, EMBL formats
  • Sequence slicing, concatenation, and searching
  • Reverse complement, transcription (DNA→RNA), and translation (RNA→protein)
  • Find motifs and patterns using regex
  • Calculate distances (Hamming, k-mer based)
  • Handle sequence quality scores and metadata

Common patterns:

import skbio

# Read sequences from file
seq = skbio.DNA.read('input.fasta')

# Sequence operations
rc = seq.reverse_complement()
rna = seq.transcribe()
protein = rna.translate()

# Find motifs
motif_positions = seq.find_with_regex('ATG[ACGT]{3}')

# Check for properties
has_degens = seq.has_degenerates()
seq_no_gaps = seq.degap()

Important notes:

  • Use DNA, RNA, Protein classes for grammared sequences with validation
  • Use Sequence class for generic sequences without alphabet restrictions
  • Quality scores automatically loaded from FASTQ files into positional metadata
  • Metadata types: sequence-level (ID, description), positional (per-base), interval (regions/features)

2. Sequence Alignment

Perform pairwise and multiple sequence alignments using dynamic programming algorithms.

Key capabilities:

  • Global alignment (Needleman-Wunsch with semi-global variant)
  • Local alignment (Smith-Waterman)
  • Configurable scoring schemes (match/mismatch, gap penalties, substitution matrices)
  • CIGAR string conversion
  • Multiple sequence alignment storage and manipulation with TabularMSA

Common patterns:

from skbio.alignment import local_pairwise_align_ssw, TabularMSA

# Pairwise alignment
alignment = local_pairwise_align_ssw(seq1, seq2)

# Access aligned sequences
msa = alignment.aligned_sequences

# Read multiple alignment from file
msa = TabularMSA.read('alignment.fasta', constructor=skbio.DNA)

# Calculate consensus
consensus = msa.consensus()

Important notes:

  • Use local_pairwise_align_ssw for local alignments (faster, SSW-based)
  • Use StripedSmithWaterman for protein alignments
  • Affine gap penalties recommended for biological sequences
  • Can convert between scikit-bio, BioPython, and Biotite alignment formats

3. Phylogenetic Trees

Construct, manipulate, and analyze phylogenetic trees representing evolutionary relationships.

Key capabilities:

  • Tree construction from distance matrices (UPGMA, WPGMA, Neighbor Joining, GME, BME)
  • Tree manipulation (pruning, rerooting, traversal)
  • Distance calculations (patristic, cophenetic, Robinson-Foulds)
  • ASCII visualization
  • Newick format I/O

Common patterns:

from skbio import TreeNode
from skbio.tree import nj

# Read tree from file
tree = TreeNode.read('tree.nwk')

# Construct tree from distance matrix
tree = nj(distance_matrix)

# Tree operations
subtree = tree.shear(['taxon1', 'taxon2', 'taxon3'])
tips = [node for node in tree.tips()]
lca = tree.lowest_common_ancestor(['taxon1', 'taxon2'])

# Calculate distances
patristic_dist = tree.find('taxon1').distance(tree.find('taxon2'))
cophenetic_matrix = tree.cophenetic_matrix()

# Compare trees
rf_distance = tree.robinson_foulds(other_tree)

Important notes:

  • Use nj() for neighbor joining (classic phylogenetic method)
  • Use upgma() for UPGMA (assumes molecular clock)
  • GME and BME are highly scalable for large trees
  • Trees can be rooted or unrooted; some metrics require specific rooting

4. Diversity Analysis

Calculate alpha and beta diversity metrics for microbial ecology and community analysis.

Key capabilities:

  • Alpha diversity: richness, Shannon entropy, Simpson index, Faith's PD, Pielou's evenness
  • Beta diversity: Bray-Curtis, Jaccard, weighted/unweighted UniFrac, Euclidean distances
  • Phylogenetic diversity metrics (require tree input)
  • Rarefaction and subsampling
  • Integration with ordination and statistical tests

Common patterns:

from skbio.diversity import alpha_diversity, beta_diversity
import skbio

# Alpha diversity
alpha = alpha_diversity('shannon', counts_matrix, ids=sample_ids)
faith_pd = alpha_diversity('faith_pd', counts_matrix, ids=sample_ids,
                          tree=tree, otu_ids=feature_ids)

# Beta diversity
bc_dm = beta_diversity('braycurtis', counts_matrix, ids=sample_ids)
unifrac_dm = beta_diversity('unweighted_unifrac', counts_matrix,
                           ids=sample_ids, tree=tree, otu_ids=feature_ids)

# Get available metrics
from skbio.diversity import get_alpha_diversity_metrics
print(get_alpha_diversity_metrics())

Important notes:

  • Counts must be integers representing abundances, not relative frequencies
  • Phylogenetic metrics (Faith's PD, UniFrac) require tree and OTU ID mapping
  • Use partial_beta_diversity() for computing specific sample pairs only
  • Alpha diversity returns Series, beta diversity returns DistanceMatrix

5. Ordination Methods

Reduce high-dimensional biological data to visualizable lower-dimensional spaces.

Key capabilities:

  • PCoA (Principal Coordinate Analysis) from distance matrices
  • CA (Correspondence Analysis) for contingency tables
  • CCA (Canonical Correspondence Analysis) with environmental constraints
  • RDA (Redundancy Analysis) for linear relationships
  • Biplot projection for feature interpretation

Common patterns:

from skbio.stats.ordination import pcoa, cca

# PCoA from distance matrix
pcoa_results = pcoa(distance_matrix)
pc1 = pcoa_results.samples['PC1']
pc2 = pcoa_results.samples['PC2']

# CCA with environmental variables
cca_results = cca(species_matrix, environmental_matrix)

# Save/load ordination results
pcoa_results.write('ordination.txt')
results = skbio.OrdinationResults.read('ordination.txt')

Important notes:

  • PCoA works with any distance/dissimilarity matrix
  • CCA reveals environmental drivers of community composition
  • Ordination results include eigenvalues, proportion explained, and sample/feature coordinates
  • Results integrate with plotting libraries (matplotlib, seaborn, plotly)

6. Statistical Testing

Perform hypothesis tests specific to ecological and biological data.

Key capabilities:

  • PERMANOVA: test group differences using distance matrices
  • ANOSIM: alternative test for group differences
  • PERMDISP: test homogeneity of group dispersions
  • Mantel test: correlation between distance matrices
  • Bioenv: find environmental variables correlated with distances

Common patterns:

from skbio.stats.distance import permanova, anosim, mantel

# Test if groups differ significantly
permanova_results = permanova(distance_matrix, grouping, permutations=999)
print(f"p-value: {permanova_results['p-value']}")

# ANOSIM test
anosim_results = anosim(distance_matrix, grouping, permutations=999)

# Mantel test between two distance matrices
mantel_results = mantel(dm1, dm2, method='pearson', permutations=999)
print(f"Correlation: {mantel_results[0]}, p-value: {mantel_results[1]}")

Important notes:

  • Permutation tests provide non-parametric significance testing
  • Use 999+ permutations for robust p-values
  • PERMANOVA sensitive to dispersion differences; pair with PERMDISP
  • Mantel tests assess matrix correlation (e.g., geographic vs genetic distance)

7. File I/O and Format Conversion

Read and write 19+ biological file formats with automatic format detection.

Supported formats:

  • Sequences: FASTA, FASTQ, GenBank, EMBL, QSeq
  • Alignments: Clustal, PHYLIP, Stockholm
  • Trees: Newick
  • Tables: BIOM (HDF5 and JSON)
  • Distances: delimited square matrices
  • Analysis: BLAST+6/7, GFF3, Ordination results
  • Metadata: TSV/CSV with validation

Common patterns:

import skbio

# Read with automatic format detection
seq = skbio.DNA.read('file.fasta', format='fasta')
tree = skbio.TreeNode.read('tree.nwk')

# Write to file
seq.write('output.fasta', format='fasta')

# Generator for large files (memory efficient)
for seq in skbio.io.read('large.fasta', format='fasta', constructor=skbio.DNA):
    process(seq)

# Convert formats
seqs = list(skbio.io.read('input.fastq', format='fastq', constructor=skbio.DNA))
skbio.io.write(seqs, format='fasta', into='output.fasta')

Important notes:

  • Use generators for large files to avoid memory issues
  • Format can be auto-detected when into parameter specified
  • Some objects can be written to multiple formats
  • Support for stdin/stdout piping with verify=False

8. Distance Matrices

Create and manipulate distance/dissimilarity matrices with statistical methods.

Key capabilities:

  • Store symmetric (DistanceMatrix) or asymmetric (DissimilarityMatrix) data
  • ID-based indexing and slicing
  • Integration with diversity, ordination, and statistical tests
  • Read/write delimited text format

Common patterns:

from skbio import DistanceMatrix
import numpy as np

# Create from array
data = np.array([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
dm = DistanceMatrix(data, ids=['A', 'B', 'C'])

# Access distances
d

---

*Content truncated.*

literature-review

K-Dense-AI

Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).

293144

markitdown

K-Dense-AI

Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.

13741

scientific-writing

K-Dense-AI

Write scientific manuscripts. IMRAD structure, citations (APA/AMA/Vancouver), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA), abstracts, for research papers and journal submissions.

13426

reportlab

K-Dense-AI

"PDF generation toolkit. Create invoices, reports, certificates, forms, charts, tables, barcodes, QR codes, Canvas/Platypus APIs, for professional document automation."

968

matplotlib

K-Dense-AI

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

947

drugbank-database

K-Dense-AI

Access and analyze comprehensive drug information from the DrugBank database including drug properties, interactions, targets, pathways, chemical structures, and pharmacology data. This skill should be used when working with pharmaceutical data, drug discovery research, pharmacology studies, drug-drug interaction analysis, target identification, chemical similarity searches, ADMET predictions, or any task requiring detailed drug and drug target information from DrugBank.

945

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

643969

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

591705

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

318398

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

339397

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

451339

fastapi-templates

wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

304231

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.