single-cell-multi-omics-integration
Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.
Install
mkdir -p .claude/skills/single-cell-multi-omics-integration && curl -L -o skill.zip "https://mcp.directory/api/skills/download/8380" && unzip -o skill.zip -d .claude/skills/single-cell-multi-omics-integration && rm skill.zipInstalls to .claude/skills/single-cell-multi-omics-integration
About this skill
Single-Cell Multi-Omics Integration
This skill covers OmicVerse's multi-omics integration tools for combining scRNA-seq, scATAC-seq, and other modalities. Each method addresses a different scenario—choose based on your data structure and analysis goal.
Method Selection Guide
Pick the right tool before writing any code:
| Scenario | Method | Key Class |
|---|---|---|
| Paired RNA + ATAC from same cells | MOFA directly | ov.single.pyMOFA |
| Unpaired RNA + ATAC (different experiments) | GLUE pairing → MOFA | ov.single.GLUE_pair → pyMOFA |
| Multi-batch single-modality integration | SIMBA | ov.single.pySIMBA |
| Transfer labels from annotated reference | TOSICA | ov.single.pyTOSICA |
| Trajectory on preprocessed multi-omic data | StaVIA | VIA.core.VIA |
Instructions
1. MOFA on paired multi-omics
Use MOFA when you have paired measurements (RNA + ATAC from the same cells). MOFA learns shared and modality-specific factors that explain variance across omics layers.
- Load each modality as a separate AnnData object
- Initialise
pyMOFAwith matchingomicsandomics_namelists - Run
mofa_preprocess()to select HVGs, thenmofa_run(outfile=...)to train - Inspect factors with
pyMOFAART(model_path=...)for correlation, weights, and variance plots - Dependencies:
mofapy2; CPU-only
2. GLUE pairing then MOFA
Use GLUE when RNA and ATAC come from different experiments (unpaired). GLUE aligns cells across modalities by learning a shared embedding, then MOFA identifies joint factors.
- Start from GLUE-derived embeddings (
.h5adfiles with embeddings in.obsm) - Build
GLUE_pairand callcorrelation()to match unpaired cells - Subset to HVGs and run MOFA as in the paired workflow
- Dependencies:
mofapy2,scglue,scvi-tools; GPU optional for MDE embedding
3. SIMBA batch integration
Use SIMBA for multi-batch single-modality data (e.g., multiple pancreas studies). SIMBA builds a graph from binned features and learns batch-corrected embeddings via PyTorch-BigGraph.
- Load concatenated AnnData with a
batchcolumn in.obs - Initialise
pySIMBA(adata, workdir)and run the preprocessing pipeline - Call
gen_graph()thentrain(num_workers=...)to learn embeddings - Apply
batch_correction()to get harmonised AnnData withX_simba - Dependencies:
simba,simba_pbg; GPU optional, needs adequate CPU threads
4. TOSICA reference transfer
Use TOSICA to transfer cell-type labels from a well-annotated reference to a query dataset. TOSICA uses a pathway-masked transformer that also provides attention-based interpretability.
- Download gene-set GMT files with
ov.utils.download_tosica_gmt() - Initialise
pyTOSICAwith reference AnnData, GMT path, label key, and project path - Train with
train(epochs=...), save, then predict on query data - Dependencies: TOSICA (PyTorch transformer);
depth=1recommended (depth=2 doubles memory)
5. StaVIA trajectory cartography
Use StaVIA/VIA for trajectory inference on preprocessed data with velocity information. VIA computes pseudotime, cluster graphs, and stream plots.
- Preprocess with OmicVerse (HVGs, scale, PCA, neighbors, UMAP)
- Configure VIA with root selection, components, neighbors, and resolution
- Run
v0.run_VIA()and extract pseudotime fromsingle_cell_pt_markov - Dependencies:
scvelo,pyVIA; CPU-bound
Critical API Reference
MOFA: omics must be a list of separate AnnData objects
# CORRECT — each modality is a separate AnnData
mofa = ov.single.pyMOFA(omics=[rna_adata, atac_adata], omics_name=['RNA', 'ATAC'])
# WRONG — do NOT pass a single concatenated AnnData
# mofa = ov.single.pyMOFA(omics=combined_adata, omics_name=['RNA', 'ATAC']) # TypeError!
The omics list and omics_name list must have the same length. Each AnnData should contain cells from the same experiment (paired measurements).
SIMBA: preprocess() must run before gen_graph()
# CORRECT — preprocess first, then build graph
simba = ov.single.pySIMBA(adata, workdir)
simba.preprocess(batch_key='batch', min_n_cells=3, method='lib_size', n_top_genes=3000, n_bins=5)
simba.gen_graph()
simba.train(num_workers=6)
# WRONG — skipping preprocess causes gen_graph to fail
# simba.gen_graph() # KeyError: missing binned features
TOSICA: gmt_path must be an actual file path
# CORRECT — download GMT files first, then pass the file path
ov.utils.download_tosica_gmt()
tosica = ov.single.pyTOSICA(adata=ref, gmt_path='genesets/GO_bp.gmt', ...)
# WRONG — passing a database name string instead of file path
# tosica = ov.single.pyTOSICA(adata=ref, gmt_path='GO_Biological_Process', ...) # FileNotFoundError!
MOFA HDF5: outfile directory must exist
import os
os.makedirs('models', exist_ok=True) # Create output directory first
mofa.mofa_run(outfile='models/rna_atac.hdf5')
Defensive Validation Patterns
Always validate inputs before running integration methods:
# Before MOFA: verify inputs are compatible
assert isinstance(omics, list), "omics must be a list of AnnData objects"
assert len(omics) == len(omics_name), f"omics ({len(omics)}) and omics_name ({len(omics_name)}) must match in length"
for i, a in enumerate(omics):
assert a.n_obs > 0, f"AnnData '{omics_name[i]}' has 0 cells"
assert a.n_vars > 0, f"AnnData '{omics_name[i]}' has 0 genes/features"
# Before SIMBA: verify batch column exists
assert 'batch' in adata.obs.columns, "adata.obs must contain a 'batch' column for SIMBA"
assert adata.obs['batch'].nunique() > 1, "Need >1 batch for batch integration"
# Before TOSICA: verify GMT file exists and reference has labels
import os
assert os.path.isfile(gmt_path), f"GMT file not found: {gmt_path}. Run ov.utils.download_tosica_gmt() first."
assert label_name in ref_adata.obs.columns, f"Label column '{label_name}' not found in reference AnnData"
# Before StaVIA: verify PCA and neighbors are computed
assert 'X_pca' in adata.obsm, "PCA required. Run ov.pp.pca(adata) first."
assert 'neighbors' in adata.uns, "Neighbor graph required. Run ov.pp.neighbors(adata) first."
Troubleshooting
PermissionErrororOSErrorwriting MOFA HDF5: The output directory formofa_run(outfile=...)must exist and be writable. Create it withos.makedirs()before training.- GLUE
correlation()returns empty DataFrame: The RNA and ATAC embeddings have no overlapping features. Verify both AnnData objects have been through GLUE preprocessing and contain embeddings in.obsm. - SIMBA
gen_graph()runs out of memory: Reducen_top_genes(try 2000) or increasen_binsto compress the feature space. SIMBA graph construction scales with gene count. - TOSICA
FileNotFoundErrorafterdownload_tosica_gmt(): The download writes togenesets/in the current working directory. Verify the file exists at the expected path, or pass an absolute path. - StaVIA
root_usermismatch: The root must be a value that exists in thetrue_labelarray. Checkadata.obs['clusters'].unique()to find valid root names. ImportError: No module named 'mofapy2': Install withpip install mofapy2. Similarly, SIMBA needspip install simba simba_pbg.- MOFA factors all zero or NaN: Input AnnData may have constant or all-zero features. Filter genes with
sc.pp.filter_genes(adata, min_cells=10)before MOFA.
Examples
- "I have paired scRNA and scATAC h5ad files—run MOFA to find shared factors and plot variance explained per factor."
- "Integrate three pancreas batches using SIMBA and visualise the corrected embedding coloured by batch and cell type."
- "Transfer cell type labels from my annotated reference to a new query dataset using TOSICA with GO biological process pathways."
References
- MOFA tutorial:
t_mofa.ipynb - GLUE+MOFA tutorial:
t_mofa_glue.ipynb - SIMBA tutorial:
t_simba.ipynb - TOSICA tutorial:
t_tosica.ipynb - StaVIA tutorial:
t_stavia.ipynb - Quick copy/paste commands:
reference.md
More by Starlitnightly
View all skills by Starlitnightly →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversAutomate Excel file tasks without Microsoft Excel using openpyxl and xlsxwriter for formatting, formulas, charts, and ad
Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms, Tasks, Search, and Drive with AI. Comprehensive Googl
Access stock price for NVDA, income statements, balance sheets, and market news via the Financial Datasets server and AP
Unlock powerful Excel automation: read/write Excel files, create sheets, and automate workflows with seamless integratio
Integrate with Google Sheets and Google Drive to manage spreadsheets easily using the Google Sheets API and advanced aut
Integrate Google Sheets API and Google Drive for seamless file operations and spreadsheet data manipulation via the Goog
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.