pubchem-database

Name: pubchem-database
Author: benchflow-ai

2views

1installs

Query PubChem via PUG-REST API/PubChemPy (110M+ compounds). Search by name/CID/SMILES, retrieve properties, similarity/substructure searches, bioactivity, for cheminformatics.

Install

mkdir -p .claude/skills/pubchem-database && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3694" && unzip -o skill.zip -d .claude/skills/pubchem-database && rm skill.zip

Installs to .claude/skills/pubchem-database

About this skill

PubChem Database

Overview

PubChem is the world's largest freely available chemical database with 110M+ compounds and 270M+ bioactivities. Query chemical structures by name, CID, or SMILES, retrieve molecular properties, perform similarity and substructure searches, access bioactivity data using PUG-REST API and PubChemPy.

When to Use This Skill

This skill should be used when:

Searching for chemical compounds by name, structure (SMILES/InChI), or molecular formula
Retrieving molecular properties (MW, LogP, TPSA, hydrogen bonding descriptors)
Performing similarity searches to find structurally related compounds
Conducting substructure searches for specific chemical motifs
Accessing bioactivity data from screening assays
Converting between chemical identifier formats (CID, SMILES, InChI)
Batch processing multiple compounds for drug-likeness screening or property analysis

Core Capabilities

1. Chemical Structure Search

Search for compounds using multiple identifier types:

By Chemical Name:

import pubchempy as pcp
compounds = pcp.get_compounds('aspirin', 'name')
compound = compounds[0]

By CID (Compound ID):

compound = pcp.Compound.from_cid(2244)  # Aspirin

By SMILES:

compound = pcp.get_compounds('CC(=O)OC1=CC=CC=C1C(=O)O', 'smiles')[0]

By InChI:

compound = pcp.get_compounds('InChI=1S/C9H8O4/...', 'inchi')[0]

By Molecular Formula:

compounds = pcp.get_compounds('C9H8O4', 'formula')
# Returns all compounds matching this formula

2. Property Retrieval

Retrieve molecular properties for compounds using either high-level or low-level approaches:

Using PubChemPy (Recommended):

import pubchempy as pcp

# Get compound object with all properties
compound = pcp.get_compounds('caffeine', 'name')[0]

# Access individual properties
molecular_formula = compound.molecular_formula
molecular_weight = compound.molecular_weight
iupac_name = compound.iupac_name
smiles = compound.canonical_smiles
inchi = compound.inchi
xlogp = compound.xlogp  # Partition coefficient
tpsa = compound.tpsa    # Topological polar surface area

Get Specific Properties:

# Request only specific properties
properties = pcp.get_properties(
    ['MolecularFormula', 'MolecularWeight', 'CanonicalSMILES', 'XLogP'],
    'aspirin',
    'name'
)
# Returns list of dictionaries

Batch Property Retrieval:

import pandas as pd

compound_names = ['aspirin', 'ibuprofen', 'paracetamol']
all_properties = []

for name in compound_names:
    props = pcp.get_properties(
        ['MolecularFormula', 'MolecularWeight', 'XLogP'],
        name,
        'name'
    )
    all_properties.extend(props)

df = pd.DataFrame(all_properties)

Available Properties: MolecularFormula, MolecularWeight, CanonicalSMILES, IsomericSMILES, InChI, InChIKey, IUPACName, XLogP, TPSA, HBondDonorCount, HBondAcceptorCount, RotatableBondCount, Complexity, Charge, and many more (see references/api_reference.md for complete list).

3. Similarity Search

Find structurally similar compounds using Tanimoto similarity:

import pubchempy as pcp

# Start with a query compound
query_compound = pcp.get_compounds('gefitinib', 'name')[0]
query_smiles = query_compound.canonical_smiles

# Perform similarity search
similar_compounds = pcp.get_compounds(
    query_smiles,
    'smiles',
    searchtype='similarity',
    Threshold=85,  # Similarity threshold (0-100)
    MaxRecords=50
)

# Process results
for compound in similar_compounds[:10]:
    print(f"CID {compound.cid}: {compound.iupac_name}")
    print(f"  MW: {compound.molecular_weight}")

Note: Similarity searches are asynchronous for large queries and may take 15-30 seconds to complete. PubChemPy handles the asynchronous pattern automatically.

4. Substructure Search

Find compounds containing a specific structural motif:

import pubchempy as pcp

# Search for compounds containing pyridine ring
pyridine_smiles = 'c1ccncc1'

matches = pcp.get_compounds(
    pyridine_smiles,
    'smiles',
    searchtype='substructure',
    MaxRecords=100
)

print(f"Found {len(matches)} compounds containing pyridine")

Common Substructures:

Benzene ring: c1ccccc1
Pyridine: c1ccncc1
Phenol: c1ccc(O)cc1
Carboxylic acid: C(=O)O

5. Format Conversion

Convert between different chemical structure formats:

import pubchempy as pcp

compound = pcp.get_compounds('aspirin', 'name')[0]

# Convert to different formats
smiles = compound.canonical_smiles
inchi = compound.inchi
inchikey = compound.inchikey
cid = compound.cid

# Download structure files
pcp.download('SDF', 'aspirin', 'name', 'aspirin.sdf', overwrite=True)
pcp.download('JSON', '2244', 'cid', 'aspirin.json', overwrite=True)

6. Structure Visualization

Generate 2D structure images:

import pubchempy as pcp

# Download compound structure as PNG
pcp.download('PNG', 'caffeine', 'name', 'caffeine.png', overwrite=True)

# Using direct URL (via requests)
import requests

cid = 2244  # Aspirin
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/PNG?image_size=large"
response = requests.get(url)

with open('structure.png', 'wb') as f:
    f.write(response.content)

7. Synonym Retrieval

Get all known names and synonyms for a compound:

import pubchempy as pcp

synonyms_data = pcp.get_synonyms('aspirin', 'name')

if synonyms_data:
    cid = synonyms_data[0]['CID']
    synonyms = synonyms_data[0]['Synonym']

    print(f"CID {cid} has {len(synonyms)} synonyms:")
    for syn in synonyms[:10]:  # First 10
        print(f"  - {syn}")

8. Bioactivity Data Access

Retrieve biological activity data from assays:

import requests
import json

# Get bioassay summary for a compound
cid = 2244  # Aspirin
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/assaysummary/JSON"

response = requests.get(url)
if response.status_code == 200:
    data = response.json()
    # Process bioassay information
    table = data.get('Table', {})
    rows = table.get('Row', [])
    print(f"Found {len(rows)} bioassay records")

For more complex bioactivity queries, use the scripts/bioactivity_query.py helper script which provides:

Bioassay summaries with activity outcome filtering
Assay target identification
Search for compounds by biological target
Active compound lists for specific assays

9. Comprehensive Compound Annotations

Access detailed compound information through PUG-View:

import requests

cid = 2244
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON"

response = requests.get(url)
if response.status_code == 200:
    annotations = response.json()
    # Contains extensive data including:
    # - Chemical and Physical Properties
    # - Drug and Medication Information
    # - Pharmacology and Biochemistry
    # - Safety and Hazards
    # - Toxicity
    # - Literature references
    # - Patents

Get Specific Section:

# Get only drug information
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/{cid}/JSON?heading=Drug and Medication Information"

Installation Requirements

Install PubChemPy for Python-based access:

uv pip install pubchempy

For direct API access and bioactivity queries:

uv pip install requests

Optional for data analysis:

uv pip install pandas

Helper Scripts

This skill includes Python scripts for common PubChem tasks:

scripts/compound_search.py

Provides utility functions for searching and retrieving compound information:

Key Functions:

search_by_name(name, max_results=10): Search compounds by name
search_by_smiles(smiles): Search by SMILES string
get_compound_by_cid(cid): Retrieve compound by CID
get_compound_properties(identifier, namespace, properties): Get specific properties
similarity_search(smiles, threshold, max_records): Perform similarity search
substructure_search(smiles, max_records): Perform substructure search
get_synonyms(identifier, namespace): Get all synonyms
batch_search(identifiers, namespace, properties): Batch search multiple compounds
download_structure(identifier, namespace, format, filename): Download structures
print_compound_info(compound): Print formatted compound information

Usage:

from scripts.compound_search import search_by_name, get_compound_properties

# Search for a compound
compounds = search_by_name('ibuprofen')

# Get specific properties
props = get_compound_properties('aspirin', 'name', ['MolecularWeight', 'XLogP'])

scripts/bioactivity_query.py

Provides functions for retrieving biological activity data:

Key Functions:

get_bioassay_summary(cid): Get bioassay summary for compound
get_compound_bioactivities(cid, activity_outcome): Get filtered bioactivities
get_assay_description(aid): Get detailed assay information
get_assay_targets(aid): Get biological targets for assay
search_assays_by_target(target_name, max_results): Find assays by target
get_active_compounds_in_assay(aid, max_results): Get active compounds
get_compound_annotations(cid, section): Get PUG-View annotations
summarize_bioactivities(cid): Generate bioactivity summary statistics
find_compounds_by_bioactivity(target, threshold, max_compounds): Find compounds by target

Usage:

from scripts.bioactivity_query import get_bioassay_summary, summarize_bioactivities

# Get bioactivity summary
summary = summarize_bioactivities(2244)  # Aspirin
print(f"Total assays: {summary['total_assays']}")
print(f"Active: {summary['active']}, Inactive: {summary['inactive']}")

API Rate Limits and Best Practices

Rate Limits:

Maximum 5 requests per second
Maximum 400 requests per minute
Maximum 300 seconds running time per minute

**Best Pr

Content truncated.

More by benchflow-ai

View all skills by benchflow-ai →

latex-writing

benchflow-ai

Guide LaTeX document authoring following best practices and proper semantic markup. Use proactively when: (1) writing or editing .tex files, (2) writing or editing .nw literate programming files, (3) literate-programming skill is active and working with .nw files, (4) user mentions LaTeX, BibTeX, or document formatting, (5) reviewing LaTeX code quality. Ensures proper use of semantic environments (description vs itemize), csquotes (\enquote{} not ``...''), and cleveref (\cref{} not \S\ref{}).

475384

polyglot-rust-c

benchflow-ai

Guidance for creating polyglot source files that compile and run correctly as both Rust and C or C++. This skill applies when tasks require writing code that is valid in multiple languages simultaneously, exploiting comment syntax differences or preprocessor directives to create dual-language source files. Use this skill for polyglot programming challenges, CTF tasks, or educational exercises involving multi-language source compatibility.

77216

pytorch

benchflow-ai

Building and training neural networks with PyTorch. Use when implementing deep learning models, training loops, data pipelines, model optimization with torch.compile, distributed training, or deploying PyTorch models.

6430

marker

benchflow-ai

Convert PDF documents to Markdown using marker_single. Use when Claude needs to extract text content from PDFs while preserving LaTeX formulas, equations, and document structure. Ideal for academic papers and technical documents containing mathematical notation.

5326

search-flights

benchflow-ai

Search flights by origin, destination, and departure date using the bundled flights dataset. Use this skill when proposing flight options or checking whether a route/date combination exists.

16224

gemini-video-understanding

benchflow-ai

Analyze videos using Google's Gemini API - describe content, answer questions, transcribe audio with visual descriptions, reference timestamps, clip videos, and process YouTube URLs. Supports 9 video formats, multiple models (Gemini 2.5/2.0), and context windows up to 2M tokens (6 hours of video).

2522

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,8892,530

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,8201,662

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1561,645

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,2691,469

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,4731,225

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,961969

Related MCP Servers

Browse all servers

Knowledge Graph Memory

Build persistent semantic networks for enterprise & engineering data management. Enable data persistence and memory across chats efficiently.

80,5279 tools

Google GenAI Toolbox

Google GenAI Toolbox: open-source GenAI database agent and AI database connector for Google Cloud database—query Cloud SQL connector, Spanner & AlloyDB with…

13,3270 tools

Google BigQuery

Explore official Google BigQuery MCP servers. Find resources and examples to build context-aware apps in Google's ecosystem.

3,3520 tools

Supabase MCP Server

Connect Supabase projects to AI with Supabase MCP Server. Standardize LLM communication for secure, efficient development and data management.

2,5160 tools

Grafana

Safely connect cloud Grafana to AI agents with MCP: query, inspect, and manage Grafana resources using simple, focused operations.

2,4940 tools

Read MySQL

Securely join MySQL databases with Read MySQL for read-only query access and in-depth data analysis.

1,2940 tools

Install

mkdir -p .claude/skills/pubchem-database && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3694" && unzip -o skill.zip -d .claude/skills/pubchem-database && rm skill.zip

Installs to .claude/skills/pubchem-database

Stats

Views

Installs

Author

benchflow-ai

7 skills published

Links

Source Code

pubchem-database

Install

About this skill

PubChem Database

Overview

When to Use This Skill

Core Capabilities

1. Chemical Structure Search

2. Property Retrieval

3. Similarity Search

4. Substructure Search

5. Format Conversion

6. Structure Visualization

7. Synonym Retrieval

8. Bioactivity Data Access

9. Comprehensive Compound Annotations

Installation Requirements

Helper Scripts

scripts/compound_search.py

scripts/bioactivity_query.py

API Rate Limits and Best Practices

More by benchflow-ai

latex-writing

polyglot-rust-c

pytorch

marker

search-flights

gemini-video-understanding

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers