tooluniverse-expression-data-retrieval

17
2
Source

Retrieves gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation, experiment quality assessment, and structured reports. Creates comprehensive dataset profiles with metadata, sample information, and download links. Use when users need expression data, omics datasets, or mention ArrayExpress (E-MTAB, E-GEOD) or BioStudies (S-BSST) accessions.

Install

mkdir -p .claude/skills/tooluniverse-expression-data-retrieval && curl -L -o skill.zip "https://mcp.directory/api/skills/download/1961" && unzip -o skill.zip -d .claude/skills/tooluniverse-expression-data-retrieval && rm skill.zip

Installs to .claude/skills/tooluniverse-expression-data-retrieval

About this skill

Gene Expression & Omics Data Retrieval

Retrieve gene expression experiments and multi-omics datasets with proper disambiguation and quality assessment.

IMPORTANT: Always use English terms in tool calls (gene names, tissue names, condition descriptions), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.

Workflow Overview

Phase 0: Clarify Query (if ambiguous)
    ↓
Phase 1: Disambiguate Gene/Condition
    ↓
Phase 2: Search & Retrieve (Internal)
    ↓
Phase 3: Report Dataset Profile

Phase 0: Clarification (When Needed)

Ask the user ONLY if:

  • Gene name is ambiguous (e.g., "p53" → TP53 or MDM2 studies?)
  • Tissue/condition unclear for comparative studies
  • Organism not specified for non-human research

Skip clarification for:

  • Specific accession numbers (E-MTAB-, E-GEOD-, S-BSST*)
  • Clear disease/tissue + organism combinations
  • Explicit platform requests (RNA-seq, microarray)

Phase 1: Query Disambiguation

1.1 Gene Name Resolution

If searching by gene, first resolve official identifiers:

from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

# For gene-focused searches, resolve official symbol first
# This helps construct better search queries
# Example: "p53" → "TP53" (official HGNC symbol)

Gene Disambiguation Checklist:

  • Official gene symbol identified (HGNC for human, MGI for mouse)
  • Common aliases noted for search expansion
  • Species confirmed

1.2 Construct Search Strategy

User Query TypeSearch Strategy
Specific accessionDirect retrieval
Gene + condition"[gene] [condition]" + species filter
Disease only"[disease]" + species filter
Technology-specificAdd platform keywords (RNA-seq, microarray)

Phase 2: Data Retrieval (Internal)

Search silently. Do NOT narrate the process.

2.1 Search Experiments

# ArrayExpress search
result = tu.tools.arrayexpress_search_experiments(
    keywords="[gene/disease] [condition]",
    species="[species]",
    limit=20
)

# BioStudies for multi-omics
biostudies_result = tu.tools.biostudies_search_studies(
    query="[keywords]",
    limit=10
)

2.2 Get Experiment Details

For top results, retrieve full metadata:

# Get details for each relevant experiment
details = tu.tools.arrayexpress_get_experiment_details(
    accession=accession
)

# Get sample information
samples = tu.tools.arrayexpress_get_experiment_samples(
    accession=accession
)

# Get available files
files = tu.tools.arrayexpress_get_experiment_files(
    accession=accession
)

2.3 BioStudies Retrieval

# Multi-omics study details
study_details = tu.tools.biostudies_get_study_details(
    accession=study_accession
)

# Study structure
sections = tu.tools.biostudies_get_study_sections(
    accession=study_accession
)

# Available files
files = tu.tools.biostudies_get_study_files(
    accession=study_accession
)

Fallback Chains

PrimaryFallbackNotes
ArrayExpress searchBioStudies searchArrayExpress empty
arrayexpress_get_experiment_detailsbiostudies_get_study_detailsE-GEOD may have BioStudies mirror
arrayexpress_get_experiment_filesNote "Files unavailable"Some studies restrict downloads

Phase 3: Report Dataset Profile

Output Structure

Present as a Dataset Search Report. Hide search process.

# Expression Data: [Query Topic]

**Search Summary**
- Query: [gene/disease] in [species]
- Databases: ArrayExpress, BioStudies
- Results: [N] relevant experiments found

**Data Quality Overview**: [assessment based on criteria below]

---

## Top Experiments

### 1. [E-MTAB-XXXX]: [Title]

| Attribute | Value |
|-----------|-------|
| **Accession** | [accession with link] |
| **Organism** | [species] |
| **Experiment Type** | RNA-seq / Microarray |
| **Platform** | [specific platform] |
| **Samples** | [N] samples |
| **Release Date** | [date] |

**Description**: [Brief description from metadata]

**Experimental Design**:
- Conditions: [treatment vs control, etc.]
- Replicates: [N biological, M technical]
- Tissue/Cell type: [if specified]

**Sample Groups**:
| Group | Samples | Description |
|-------|---------|-------------|
| Control | [N] | [description] |
| Treatment | [N] | [description] |

**Data Files Available**:
| File | Type | Size |
|------|------|------|
| [filename] | Processed data | [size] |
| [filename] | Raw data | [size] |
| [filename] | Sample metadata | [size] |

**Quality Assessment**: ●●● High / ●●○ Medium / ●○○ Low
- Sample size: [adequate/limited]
- Replication: [yes/no]
- Metadata completeness: [complete/partial]

---

### 2. [E-GEOD-XXXXX]: [Title]
[Same structure as above]

---

## Multi-Omics Studies (from BioStudies)

### [S-BSST-XXXXX]: [Title]

| Attribute | Value |
|-----------|-------|
| **Accession** | [accession] |
| **Study Type** | [proteomics/metabolomics/integrated] |
| **Organism** | [species] |
| **Samples** | [N] |

**Data Types Included**:
- [ ] Transcriptomics
- [ ] Proteomics
- [ ] Metabolomics
- [ ] Other: [specify]

---

## Summary Table

| Accession | Type | Samples | Platform | Quality |
|-----------|------|---------|----------|---------|
| [E-MTAB-X] | RNA-seq | [N] | Illumina | ●●● |
| [E-GEOD-X] | Microarray | [N] | Affymetrix | ●●○ |

---

## Recommendations

**For [specific analysis type]**:
- Best experiment: [accession] - [reason]
- Alternative: [accession] - [reason]

**Data Integration Notes**:
- Platform compatibility: [notes on combining datasets]
- Batch considerations: [if applicable]

---

## Data Access

### Direct Download Links
- [E-MTAB-XXXX processed data](link)
- [E-MTAB-XXXX raw data](link)

### Database Links
- ArrayExpress: https://www.ebi.ac.uk/arrayexpress/experiments/[accession]
- BioStudies: https://www.ebi.ac.uk/biostudies/studies/[accession]

Retrieved: [date]

Data Quality Tiers

Assessment criteria for expression experiments:

TierSymbolCriteria
High Quality●●●≥3 bio replicates, complete metadata, processed data available
Medium Quality●●○2-3 replicates OR some metadata gaps, data accessible
Low Quality●○○No replicates, sparse metadata, or data access issues
Use with Caution○○○Single sample, no replication, outdated platform

Include assessment rationale:

**Quality**: ●●● High
- ✓ 4 biological replicates per condition
- ✓ Complete sample annotations
- ✓ Processed and raw data available
- ✓ Recent RNA-seq platform

Completeness Checklist

Every dataset report MUST include:

Per Experiment (Required)

  • Accession number with database link
  • Organism
  • Experiment type (RNA-seq/microarray/etc.)
  • Sample count
  • Brief description
  • Quality assessment

Search Summary (Required)

  • Query parameters stated
  • Number of results
  • Databases searched

Recommendations (Required)

  • Best dataset for user's purpose (or "No suitable data found")
  • Data access notes

Include Even If Empty

  • Multi-omics studies section (or "No multi-omics studies found")
  • Data integration notes (or "Single-platform data, no integration needed")

Common Use Cases

Disease Gene Expression

User: "Find breast cancer RNA-seq data"

result = tu.tools.arrayexpress_search_experiments(
    keywords="breast cancer RNA-seq",
    species="Homo sapiens",
    limit=20
)

→ Report top experiments with quality assessment

Gene-Specific Studies

User: "Find TP53 expression experiments in mouse"

result = tu.tools.arrayexpress_search_experiments(
    keywords="TP53 p53",  # Include aliases
    species="Mus musculus",
    limit=15
)

→ Report experiments studying this gene

Specific Accession Lookup

User: "Get details for E-MTAB-5214" → Single experiment profile with all details and files

Multi-Omics Integration

User: "Find proteomics and transcriptomics studies for liver disease" → Search both ArrayExpress and BioStudies, note integration potential


Error Handling

ErrorResponse
"No experiments found"Broaden keywords, remove species filter, try synonyms
"Accession not found"Verify format (E-MTAB-, E-GEOD-, S-BSST*), check if withdrawn
"Files not available"Note in report: "Data files restricted by submitter"
"API timeout"Retry once, then note: "(metadata retrieval incomplete)"

Tool Reference

ArrayExpress (Gene Expression)

ToolPurpose
arrayexpress_search_experimentsKeyword/species search
arrayexpress_get_experiment_detailsFull metadata
arrayexpress_get_experiment_filesDownload links
arrayexpress_get_experiment_samplesSample annotations

BioStudies (Multi-Omics)

ToolPurpose
biostudies_search_studiesMulti-omics search
biostudies_get_study_detailsStudy metadata
biostudies_get_study_filesData files
biostudies_get_study_sectionsStudy structure

Search Parameters Reference

ArrayExpress

ParameterDescriptionExample
keywordsFree text search"breast cancer RNA-seq"
speciesScientific name"Homo sapiens"
arrayPlatform filter"Illumina"
limitMax results20

BioStudies

ParameterDescriptionExample
queryFree text"proteomics liver"
limitMax results10

tooluniverse-precision-oncology

mims-harvard

Provide actionable treatment recommendations for cancer patients based on molecular profile. Interprets tumor mutations, identifies FDA-approved therapies, finds resistance mechanisms, matches clinical trials. Use when oncologist asks about treatment options for specific mutations (EGFR, KRAS, BRAF, etc.), therapy resistance, or clinical trial eligibility.

203

tooluniverse-drug-research

mims-harvard

Generates comprehensive drug research reports with compound disambiguation, evidence grading, and mandatory completeness sections. Covers identity, chemistry, pharmacology, targets, clinical trials, safety, pharmacogenomics, and ADMET properties. Use when users ask about drugs, medications, therapeutics, or need drug profiling, safety assessment, or clinical development research.

213

tooluniverse-pharmacovigilance

mims-harvard

Analyze drug safety signals from FDA adverse event reports, label warnings, and pharmacogenomic data. Calculates disproportionality measures (PRR, ROR), identifies serious adverse events, assesses pharmacogenomic risk variants. Use when asked about drug safety, adverse events, post-market surveillance, or risk-benefit assessment.

202

drug-repurposing

mims-harvard

Identify drug repurposing candidates using ToolUniverse for target-based, compound-based, and disease-driven strategies. Searches existing drugs for new therapeutic indications by analyzing targets, bioactivity, safety profiles, and literature evidence. Use when exploring drug repurposing opportunities, finding new indications for approved drugs, or when users mention drug repositioning, off-label uses, or therapeutic alternatives.

202

tooluniverse-target-research

mims-harvard

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

52

tooluniverse-literature-deep-research

mims-harvard

Conduct comprehensive literature research with target disambiguation, evidence grading, and structured theme extraction. Creates a detailed report with mandatory completeness checklist, biological model synthesis, and testable hypotheses. For biological targets, resolves official IDs (Ensembl/UniProt), synonyms, naming collisions, and gathers expression/pathway context before literature search. Default deliverable is a report file; for single factoid questions, uses a fast verification mode and may include an inline answer. Use when users need thorough literature reviews, target profiles, or to verify specific claims from the literature.

122

You might also like

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,5691,369

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,1151,188

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,4171,109

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,192747

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,153683

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,310614

Stay ahead of the MCP ecosystem

Get weekly updates on new skills and servers.