data-stats-analysis

2views

1installs

Perform statistical tests, hypothesis testing, correlation analysis, and multiple testing corrections using scipy and statsmodels. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

Install

mkdir -p .claude/skills/data-stats-analysis && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5549" && unzip -o skill.zip -d .claude/skills/data-stats-analysis && rm skill.zip

Installs to .claude/skills/data-stats-analysis

About this skill

Statistical Analysis (Universal)

Overview

This skill enables you to perform rigorous statistical analyses including t-tests, ANOVA, correlation analysis, hypothesis testing, and multiple testing corrections. Unlike cloud-hosted solutions, this skill uses standard Python statistical libraries (scipy, statsmodels, numpy) and executes locally in your environment, making it compatible with ALL LLM providers including GPT, Gemini, Claude, DeepSeek, and Qwen.

When to Use This Skill

Compare means between groups (t-tests, ANOVA)
Test for correlations between variables
Perform hypothesis testing with p-value calculation
Apply multiple testing corrections (FDR, Bonferroni)
Calculate statistical summaries and confidence intervals
Test for normality and distribution fitting
Perform non-parametric tests (Mann-Whitney, Kruskal-Wallis)

How to Use

Step 1: Import Required Libraries

import numpy as np
import pandas as pd
from scipy import stats
from scipy.stats import ttest_ind, mannwhitneyu, pearsonr, spearmanr
from scipy.stats import f_oneway, kruskal, chi2_contingency
from statsmodels.stats.multitest import multipletests
from statsmodels.stats.proportion import proportions_ztest
import warnings
warnings.filterwarnings('ignore')

Step 2: Two-Sample t-Test

# Compare means between two groups
# group1, group2: arrays of numeric values

# Perform independent t-test
t_statistic, p_value = ttest_ind(group1, group2)

print(f"t-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05:
    print("✅ Significant difference between groups (p < 0.05)")
else:
    print("❌ No significant difference (p >= 0.05)")

# With equal variance assumption check
# Levene's test for equal variances
_, levene_p = stats.levene(group1, group2)
if levene_p < 0.05:
    # Use Welch's t-test (unequal variances)
    t_stat, p_val = ttest_ind(group1, group2, equal_var=False)
    print(f"Welch's t-test p-value: {p_val:.4e}")
else:
    print("Equal variances assumed")

Step 3: One-Way ANOVA

# Compare means across multiple groups
# groups: list of arrays, e.g., [group1, group2, group3]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(*groups)

print(f"F-statistic: {f_statistic:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05:
    print("✅ Significant difference between groups (p < 0.05)")
    print("Note: Use post-hoc tests to identify which groups differ")
else:
    print("❌ No significant difference between groups")

# Post-hoc pairwise t-tests with Bonferroni correction
from itertools import combinations

group_names = ['Group A', 'Group B', 'Group C']
pairwise_results = []

for (name1, data1), (name2, data2) in combinations(zip(group_names, groups), 2):
    _, p = ttest_ind(data1, data2)
    pairwise_results.append({
        'comparison': f'{name1} vs {name2}',
        'p_value': p
    })

# Apply Bonferroni correction
pairwise_df = pd.DataFrame(pairwise_results)
n_tests = len(pairwise_df)
pairwise_df['p_adjusted'] = pairwise_df['p_value'] * n_tests
pairwise_df['p_adjusted'] = pairwise_df['p_adjusted'].clip(upper=1.0)

print("\nPairwise Comparisons (Bonferroni-corrected):")
print(pairwise_df)

Step 4: Correlation Analysis

# Pearson correlation (linear relationships)
r_pearson, p_pearson = pearsonr(variable1, variable2)

print(f"Pearson correlation: r = {r_pearson:.4f}, p = {p_pearson:.4e}")

# Spearman correlation (monotonic relationships, robust to outliers)
r_spearman, p_spearman = spearmanr(variable1, variable2)

print(f"Spearman correlation: ρ = {r_spearman:.4f}, p = {p_spearman:.4e}")

# Interpretation
if abs(r_pearson) < 0.3:
    strength = "weak"
elif abs(r_pearson) < 0.7:
    strength = "moderate"
else:
    strength = "strong"

direction = "positive" if r_pearson > 0 else "negative"
print(f"Interpretation: {strength} {direction} correlation")

if p_pearson < 0.05:
    print("✅ Statistically significant (p < 0.05)")
else:
    print("❌ Not statistically significant")

Step 5: Multiple Testing Correction

# Scenario: Testing 1000 genes for differential expression
# p_values: array of p-values from individual tests

# Method 1: Benjamini-Hochberg FDR correction (recommended)
reject_fdr, p_adjusted_fdr, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')

# Method 2: Bonferroni correction (more conservative)
reject_bonf, p_adjusted_bonf, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')

# Create results DataFrame
results_df = pd.DataFrame({
    'gene': gene_names,
    'p_value': p_values,
    'q_value_fdr': p_adjusted_fdr,
    'p_adjusted_bonferroni': p_adjusted_bonf,
    'significant_fdr': reject_fdr,
    'significant_bonf': reject_bonf
})

# Summary
print(f"Original significant (p < 0.05): {(p_values < 0.05).sum()}")
print(f"Significant after FDR correction: {reject_fdr.sum()}")
print(f"Significant after Bonferroni correction: {reject_bonf.sum()}")

# Save results
results_df.to_csv('statistical_results.csv', index=False)
print("✅ Results saved to: statistical_results.csv")

Step 6: Non-Parametric Tests

# Use when data is not normally distributed

# Mann-Whitney U test (alternative to t-test)
u_statistic, p_value_mw = mannwhitneyu(group1, group2, alternative='two-sided')

print(f"Mann-Whitney U test:")
print(f"U-statistic: {u_statistic:.4f}")
print(f"p-value: {p_value_mw:.4e}")

# Kruskal-Wallis H test (alternative to ANOVA)
h_statistic, p_value_kw = kruskal(*groups)

print(f"\nKruskal-Wallis H test:")
print(f"H-statistic: {h_statistic:.4f}")
print(f"p-value: {p_value_kw:.4e}")

Advanced Features

Normality Testing

from scipy.stats import shapiro, normaltest, kstest

# Test if data follows normal distribution

# Shapiro-Wilk test (best for n < 5000)
stat_sw, p_sw = shapiro(data)
print(f"Shapiro-Wilk test: W={stat_sw:.4f}, p={p_sw:.4e}")

# D'Agostino-Pearson test
stat_dp, p_dp = normaltest(data)
print(f"D'Agostino-Pearson test: stat={stat_dp:.4f}, p={p_dp:.4e}")

# Interpretation
if p_sw < 0.05:
    print("❌ Data does NOT follow normal distribution (p < 0.05)")
    print("→ Recommendation: Use non-parametric tests (Mann-Whitney, Kruskal-Wallis)")
else:
    print("✅ Data appears normally distributed (p >= 0.05)")
    print("→ OK to use parametric tests (t-test, ANOVA)")

Chi-Square Test for Contingency Tables

# Test independence between categorical variables
# contingency_table: 2D array (rows=categories1, columns=categories2)

# Example: Cell type distribution across conditions
contingency_table = np.array([
    [50, 30, 20],  # Condition A: T cells, B cells, NK cells
    [40, 45, 15],  # Condition B
    [35, 25, 40]   # Condition C
])

chi2, p_value, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4e}")
print(f"Degrees of freedom: {dof}")
print(f"\nExpected frequencies:\n{expected}")

if p_value < 0.05:
    print("✅ Significant association between variables (p < 0.05)")
else:
    print("❌ No significant association")

Confidence Intervals

from scipy.stats import t as t_dist

def calculate_confidence_interval(data, confidence=0.95):
    """Calculate confidence interval for mean"""
    n = len(data)
    mean = np.mean(data)
    std_err = stats.sem(data)  # Standard error of mean

    # t-distribution critical value
    t_crit = t_dist.ppf((1 + confidence) / 2, df=n-1)

    margin_error = t_crit * std_err
    ci_lower = mean - margin_error
    ci_upper = mean + margin_error

    return mean, ci_lower, ci_upper

# Usage
mean, ci_low, ci_high = calculate_confidence_interval(data, confidence=0.95)

print(f"Mean: {mean:.4f}")
print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")

Effect Size Calculation

def cohens_d(group1, group2):
    """Calculate Cohen's d effect size"""
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)

    # Pooled standard deviation
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))

    # Cohen's d
    d = (np.mean(group1) - np.mean(group2)) / pooled_std

    return d

# Usage
effect_size = cohens_d(group1, group2)
print(f"Cohen's d: {effect_size:.4f}")

# Interpretation
if abs(effect_size) < 0.2:
    print("Effect size: negligible")
elif abs(effect_size) < 0.5:
    print("Effect size: small")
elif abs(effect_size) < 0.8:
    print("Effect size: medium")
else:
    print("Effect size: large")

Common Use Cases

Differential Gene Expression Statistical Testing

# Compare gene expression between two conditions
# gene_expression_df: rows=genes, columns=samples
# condition_labels: array indicating which condition each sample belongs to

results = []

for gene in gene_expression_df.index:
    # Get expression values for each condition
    cond1_expr = gene_expression_df.loc[gene, condition_labels == 'Condition1']
    cond2_expr = gene_expression_df.loc[gene, condition_labels == 'Condition2']

    # t-test
    t_stat, p_val = ttest_ind(cond1_expr, cond2_expr)

    # Log2 fold change
    log2fc = np.log2(cond2_expr.mean() / cond1_expr.mean())

    results.append({
        'gene': gene,
        'log2FC': log2fc,
        'p_value': p_val,
        'mean_cond1': cond1_expr.mean(),
        'mean_cond2': cond2_expr.mean()
    })

deg_results = pd.DataFrame(results)

# Apply FDR correction
_, deg_results['q_value'], _, _ = multipletests(
    deg_results['p_value'],
    alpha=0.05,
    method='fdr_bh'
)

# Filter significant genes
significant_genes = deg_results[
    (deg_results['q_value'] < 0.05) &
    (abs(deg_results['log2FC']) > 1)
]

print(f"✅ Identified {len(significant_genes)} differentially expressed genes")
print(f"   - Upregulated: {(significant_genes['log2FC'] > 1).sum()}")
print(f"   - Downregulated: {(significant_genes['log2FC'] < -1).sum()}")

# Save
significant_genes.to_csv('deg_results.csv', ind

---

*Content truncated.*

More by Starlitnightly

View all skills by Starlitnightly →

data-viz-plots

Starlitnightly

Create publication-quality plots and visualizations using matplotlib and seaborn. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

159

bulk-rna-seq-deseq2-analysis-with-omicverse

Starlitnightly

Walk Claude through PyDESeq2-based differential expression, including ID mapping, DE testing, fold-change thresholding, and enrichment visualisation.

data-export-excel

Starlitnightly

Export analysis results, data tables, and formatted spreadsheets to Excel files using openpyxl. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

bulk-rna-seq-differential-expression-with-omicverse

Starlitnightly

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

data-transform

Starlitnightly

Transform, clean, reshape, and preprocess data using pandas and numpy. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

tcga-bulk-data-preprocessing-with-omicverse

Starlitnightly

Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

1,6881,430

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

1,2721,337

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

1,5471,153

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

1,359809

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,269732

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

1,498687

Related MCP Servers

Browse all servers

Browserbase

Unlock browser automation studio with Browserbase MCP Server. Enhance Selenium software testing and AI-driven workflows

3,1820 tools

Android MCP

Android MCP — lightweight bridge enabling AI agents for Android to perform Android automation and Android UI testing: ap

4390 tools

Salesforce

Unlock seamless Salesforce org management with the secure, flexible Salesforce DX MCP Server. Streamline workflows and b

3040 tools

Playwright

Playwright is your browser automation studio for powerful web and visual tasks. Achieve advanced playwright testing and

28210 tools

Playwright

Playwright enables advanced browser control for web interactions and visual testing, offering a powerful alternative to

28210 tools

Globalping

Run a ping test worldwide with Globalping. Diagnose connectivity issues using Cloudflare Workers for accurate network tr

480 tools

Install

mkdir -p .claude/skills/data-stats-analysis && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5549" && unzip -o skill.zip -d .claude/skills/data-stats-analysis && rm skill.zip

Installs to .claude/skills/data-stats-analysis

Stats

Views

Installs

Author

Starlitnightly

7 skills published

Links

Source Code

data-stats-analysis

Install

About this skill

Statistical Analysis (Universal)

Overview

When to Use This Skill

How to Use

Step 1: Import Required Libraries

Step 2: Two-Sample t-Test

Step 3: One-Way ANOVA

Step 4: Correlation Analysis

Step 5: Multiple Testing Correction

Step 6: Non-Parametric Tests

Advanced Features

Normality Testing

Chi-Square Test for Contingency Tables

Confidence Intervals

Effect Size Calculation

Common Use Cases

Differential Gene Expression Statistical Testing

More by Starlitnightly

data-viz-plots

bulk-rna-seq-deseq2-analysis-with-omicverse

data-export-excel

bulk-rna-seq-differential-expression-with-omicverse

data-transform

tcga-bulk-data-preprocessing-with-omicverse

You might also like

flutter-development

ui-ux-pro-max

drawio-diagrams-enhanced

godot

nano-banana-pro

pdf-to-markdown

Related MCP Servers