single-cell-multi-omics-integration

Name: single-cell-multi-omics-integration
Author: Starlitnightly

5views

0installs

Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.

Install

mkdir -p .claude/skills/single-cell-multi-omics-integration && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8380" && unzip -o skill.zip -d .claude/skills/single-cell-multi-omics-integration && rm skill.zip

Installs to .claude/skills/single-cell-multi-omics-integration

About this skill

Single-Cell Multi-Omics Integration

This skill covers OmicVerse's multi-omics integration tools for combining scRNA-seq, scATAC-seq, and other modalities. Each method addresses a different scenario—choose based on your data structure and analysis goal.

Method Selection Guide

Pick the right tool before writing any code:

Scenario	Method	Key Class
Paired RNA + ATAC from same cells	MOFA directly	`ov.single.pyMOFA`
Unpaired RNA + ATAC (different experiments)	GLUE pairing → MOFA	`ov.single.GLUE_pair` → `pyMOFA`
Multi-batch single-modality integration	SIMBA	`ov.single.pySIMBA`
Transfer labels from annotated reference	TOSICA	`ov.single.pyTOSICA`
Trajectory on preprocessed multi-omic data	StaVIA	`VIA.core.VIA`

Instructions

1. MOFA on paired multi-omics

Use MOFA when you have paired measurements (RNA + ATAC from the same cells). MOFA learns shared and modality-specific factors that explain variance across omics layers.

Load each modality as a separate AnnData object
Initialise pyMOFA with matching omics and omics_name lists
Run mofa_preprocess() to select HVGs, then mofa_run(outfile=...) to train
Inspect factors with pyMOFAART(model_path=...) for correlation, weights, and variance plots
Dependencies: mofapy2; CPU-only

2. GLUE pairing then MOFA

Use GLUE when RNA and ATAC come from different experiments (unpaired). GLUE aligns cells across modalities by learning a shared embedding, then MOFA identifies joint factors.

Start from GLUE-derived embeddings (.h5ad files with embeddings in .obsm)
Build GLUE_pair and call correlation() to match unpaired cells
Subset to HVGs and run MOFA as in the paired workflow
Dependencies: mofapy2, scglue, scvi-tools; GPU optional for MDE embedding

3. SIMBA batch integration

Use SIMBA for multi-batch single-modality data (e.g., multiple pancreas studies). SIMBA builds a graph from binned features and learns batch-corrected embeddings via PyTorch-BigGraph.

Load concatenated AnnData with a batch column in .obs
Initialise pySIMBA(adata, workdir) and run the preprocessing pipeline
Call gen_graph() then train(num_workers=...) to learn embeddings
Apply batch_correction() to get harmonised AnnData with X_simba
Dependencies: simba, simba_pbg; GPU optional, needs adequate CPU threads

4. TOSICA reference transfer

Use TOSICA to transfer cell-type labels from a well-annotated reference to a query dataset. TOSICA uses a pathway-masked transformer that also provides attention-based interpretability.

Download gene-set GMT files with ov.utils.download_tosica_gmt()
Initialise pyTOSICA with reference AnnData, GMT path, label key, and project path
Train with train(epochs=...), save, then predict on query data
Dependencies: TOSICA (PyTorch transformer); depth=1 recommended (depth=2 doubles memory)

5. StaVIA trajectory cartography

Use StaVIA/VIA for trajectory inference on preprocessed data with velocity information. VIA computes pseudotime, cluster graphs, and stream plots.

Preprocess with OmicVerse (HVGs, scale, PCA, neighbors, UMAP)
Configure VIA with root selection, components, neighbors, and resolution
Run v0.run_VIA() and extract pseudotime from single_cell_pt_markov
Dependencies: scvelo, pyVIA; CPU-bound

Critical API Reference

MOFA: `omics` must be a list of separate AnnData objects

# CORRECT — each modality is a separate AnnData
mofa = ov.single.pyMOFA(omics=[rna_adata, atac_adata], omics_name=['RNA', 'ATAC'])

# WRONG — do NOT pass a single concatenated AnnData
# mofa = ov.single.pyMOFA(omics=combined_adata, omics_name=['RNA', 'ATAC'])  # TypeError!

The omics list and omics_name list must have the same length. Each AnnData should contain cells from the same experiment (paired measurements).

SIMBA: `preprocess()` must run before `gen_graph()`

# CORRECT — preprocess first, then build graph
simba = ov.single.pySIMBA(adata, workdir)
simba.preprocess(batch_key='batch', min_n_cells=3, method='lib_size', n_top_genes=3000, n_bins=5)
simba.gen_graph()
simba.train(num_workers=6)

# WRONG — skipping preprocess causes gen_graph to fail
# simba.gen_graph()  # KeyError: missing binned features

TOSICA: `gmt_path` must be an actual file path

# CORRECT — download GMT files first, then pass the file path
ov.utils.download_tosica_gmt()
tosica = ov.single.pyTOSICA(adata=ref, gmt_path='genesets/GO_bp.gmt', ...)

# WRONG — passing a database name string instead of file path
# tosica = ov.single.pyTOSICA(adata=ref, gmt_path='GO_Biological_Process', ...)  # FileNotFoundError!

MOFA HDF5: `outfile` directory must exist

import os
os.makedirs('models', exist_ok=True)  # Create output directory first
mofa.mofa_run(outfile='models/rna_atac.hdf5')

Defensive Validation Patterns

Always validate inputs before running integration methods:

# Before MOFA: verify inputs are compatible
assert isinstance(omics, list), "omics must be a list of AnnData objects"
assert len(omics) == len(omics_name), f"omics ({len(omics)}) and omics_name ({len(omics_name)}) must match in length"
for i, a in enumerate(omics):
    assert a.n_obs > 0, f"AnnData '{omics_name[i]}' has 0 cells"
    assert a.n_vars > 0, f"AnnData '{omics_name[i]}' has 0 genes/features"

# Before SIMBA: verify batch column exists
assert 'batch' in adata.obs.columns, "adata.obs must contain a 'batch' column for SIMBA"
assert adata.obs['batch'].nunique() > 1, "Need >1 batch for batch integration"

# Before TOSICA: verify GMT file exists and reference has labels
import os
assert os.path.isfile(gmt_path), f"GMT file not found: {gmt_path}. Run ov.utils.download_tosica_gmt() first."
assert label_name in ref_adata.obs.columns, f"Label column '{label_name}' not found in reference AnnData"

# Before StaVIA: verify PCA and neighbors are computed
assert 'X_pca' in adata.obsm, "PCA required. Run ov.pp.pca(adata) first."
assert 'neighbors' in adata.uns, "Neighbor graph required. Run ov.pp.neighbors(adata) first."

Troubleshooting

PermissionError or OSError writing MOFA HDF5: The output directory for mofa_run(outfile=...) must exist and be writable. Create it with os.makedirs() before training.
GLUE correlation() returns empty DataFrame: The RNA and ATAC embeddings have no overlapping features. Verify both AnnData objects have been through GLUE preprocessing and contain embeddings in .obsm.
SIMBA gen_graph() runs out of memory: Reduce n_top_genes (try 2000) or increase n_bins to compress the feature space. SIMBA graph construction scales with gene count.
TOSICA FileNotFoundError after download_tosica_gmt(): The download writes to genesets/ in the current working directory. Verify the file exists at the expected path, or pass an absolute path.
StaVIA root_user mismatch: The root must be a value that exists in the true_label array. Check adata.obs['clusters'].unique() to find valid root names.
ImportError: No module named 'mofapy2': Install with pip install mofapy2. Similarly, SIMBA needs pip install simba simba_pbg.
MOFA factors all zero or NaN: Input AnnData may have constant or all-zero features. Filter genes with sc.pp.filter_genes(adata, min_cells=10) before MOFA.

Examples

"I have paired scRNA and scATAC h5ad files—run MOFA to find shared factors and plot variance explained per factor."
"Integrate three pancreas batches using SIMBA and visualise the corrected embedding coloured by batch and cell type."
"Transfer cell type labels from my annotated reference to a new query dataset using TOSICA with GO biological process pathways."

References

MOFA tutorial: t_mofa.ipynb
GLUE+MOFA tutorial: t_mofa_glue.ipynb
SIMBA tutorial: t_simba.ipynb
TOSICA tutorial: t_tosica.ipynb
StaVIA tutorial: t_stavia.ipynb
Quick copy/paste commands: reference.md

More by Starlitnightly

View all skills by Starlitnightly →

data-viz-plots

Starlitnightly

Create publication-quality plots and visualizations using matplotlib and seaborn. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

2411

data-stats-analysis

Starlitnightly

Perform statistical tests, hypothesis testing, correlation analysis, and multiple testing corrections using scipy and statsmodels. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

bulktrajblend-trajectory-interpolation

Starlitnightly

Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.

data-export-excel

Starlitnightly

Export analysis results, data tables, and formatted spreadsheets to Excel files using openpyxl. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

bulk-rna-seq-deseq2-analysis-with-omicverse

Starlitnightly

Walk Claude through PyDESeq2-based differential expression, including ID mapping, DE testing, fold-change thresholding, and enrichment visualisation.

bulk-rna-seq-differential-expression-with-omicverse

Starlitnightly

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

3,2362,769

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

4,2761,838

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,2261,672

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,3661,519

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,6751,284

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

2,0831,002

Related MCP Servers

Browse all servers

Excel File Manipulation

Automate Excel file tasks without Microsoft Excel using openpyxl and xlsxwriter for formatting, formulas, charts, and advanced spreadsheet automation.

3,43225 tools

Google Workspace MCP

Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms, Tasks, Search, and Drive with AI.

1,7310 tools

Financial Datasets

Access stock price for NVDA, income statements, balance sheets, and market news via the Financial Datasets server and API integration.

1,5410 tools

Excel

Unlock powerful Excel automation: read/write Excel files, create sheets, and automate workflows with seamless integration and data management.

8666 tools

Google Sheets

Integrate with Google Sheets and Google Drive to manage spreadsheets easily using the Google Sheets API and advanced authentication methods.

7260 tools

Google Drive & Sheets

Integrate Google Sheets API and Google Drive for seamless file operations and spreadsheet data manipulation via the Google Spreadsheet API.

2700 tools

Install

mkdir -p .claude/skills/single-cell-multi-omics-integration && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8380" && unzip -o skill.zip -d .claude/skills/single-cell-multi-omics-integration && rm skill.zip

Installs to .claude/skills/single-cell-multi-omics-integration

Stats

Views

Installs

Author

Starlitnightly

7 skills published

Links

Source Code

single-cell-multi-omics-integration

Install

About this skill

Single-Cell Multi-Omics Integration

Method Selection Guide

Instructions

1. MOFA on paired multi-omics

2. GLUE pairing then MOFA

3. SIMBA batch integration

4. TOSICA reference transfer

5. StaVIA trajectory cartography

Critical API Reference

MOFA: omics must be a list of separate AnnData objects

SIMBA: preprocess() must run before gen_graph()

TOSICA: gmt_path must be an actual file path

MOFA HDF5: outfile directory must exist

Defensive Validation Patterns

Troubleshooting

Examples

References

More by Starlitnightly

data-viz-plots

data-stats-analysis

bulktrajblend-trajectory-interpolation

data-export-excel

bulk-rna-seq-deseq2-analysis-with-omicverse

bulk-rna-seq-differential-expression-with-omicverse

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers

MOFA: `omics` must be a list of separate AnnData objects

SIMBA: `preprocess()` must run before `gen_graph()`

TOSICA: `gmt_path` must be an actual file path

MOFA HDF5: `outfile` directory must exist