tooluniverse-binder-discovery
Discover novel small molecule binders for protein targets using structure-based and ligand-based approaches. Creates actionable reports with candidate compounds, ADMET profiles, and synthesis feasibility. Use when users ask to find small molecules for a target, identify novel binders, perform virtual screening, or need hit-to-lead compound identification.
Install
mkdir -p .claude/skills/tooluniverse-binder-discovery && curl -L -o skill.zip "https://mcp.directory/api/skills/download/3540" && unzip -o skill.zip -d .claude/skills/tooluniverse-binder-discovery && rm skill.zipInstalls to .claude/skills/tooluniverse-binder-discovery
About this skill
Small Molecule Binder Discovery Strategy
Systematic discovery of novel small molecule binders using 60+ ToolUniverse tools across druggability assessment, known ligand mining, similarity expansion, ADMET filtering, and synthesis feasibility.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Target validation FIRST - Confirm druggability before compound searching
- Multi-strategy approach - Combine structure-based and ligand-based methods
- ADMET-aware filtering - Eliminate poor compounds early
- Evidence grading - Grade candidates by supporting evidence
- Actionable output - Provide prioritized candidates with rationale
- English-first queries - Always use English terms in tool calls, even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language
Critical Workflow Requirements
1. Report-First Approach (MANDATORY)
DO NOT show search process or tool outputs to the user. Instead:
-
Create the report file FIRST - Before any data collection:
- File name:
[TARGET]_binder_discovery_report.md - Initialize with all section headers from the template (see REPORT_TEMPLATE.md)
- Add placeholder text:
[Researching...]in each section
- File name:
-
Progressively update the report - As you gather data:
- Update each section with findings immediately
- The user sees the report growing, not the search process
-
Output separate data files:
[TARGET]_candidate_compounds.csv- Prioritized compounds with SMILES, scores[TARGET]_bibliography.json- Literature references (optional)
2. Citation Requirements (MANDATORY)
Every piece of information MUST include its source:
*Source: ChEMBL via `ChEMBL_get_target_activities` (CHEMBL203)*
*Source: PDB via `get_protein_metadata_by_pdb_id` (1M17)*
*Source: ADMET-AI via `ADMETAI_predict_toxicity`*
*Source: NVIDIA NIM via `NvidiaNIM_alphafold2` (pLDDT: 90.94)*
Workflow Overview
Phase 0: Tool Verification (check parameter names)
|
Phase 1: Target Validation
|- 1.1 Resolve identifiers (UniProt, Ensembl, ChEMBL target ID)
|- 1.2 Assess druggability/tractability
| +- 1.2a GPCRdb integration (for GPCR targets)
| +- 1.2.5 Check therapeutic antibodies (Thera-SAbDab)
|- 1.3 Identify binding sites
+- 1.4 Predict structure (NvidiaNIM_alphafold2/esmfold)
|
Phase 2: Known Ligand Mining
|- ChEMBL bioactivity data
|- GtoPdb interactions
|- Chemical probes (Open Targets)
|- BindingDB affinity data (Ki/IC50/Kd)
|- PubChem BioAssay HTS data (screening hits)
+- SAR analysis from known actives
|
Phase 3: Structure Analysis
|- PDB structures with ligands
|- EMDB cryo-EM structures (for membrane targets)
|- Binding pocket analysis
+- Key interactions
|
Phase 3.5: Docking Validation (NvidiaNIM_diffdock/boltz2)
|- Dock reference inhibitor
+- Validate binding pocket geometry
|
Phase 4: Compound Expansion
|- 4.1-4.3 Similarity/substructure search
+- 4.4 De novo generation (NvidiaNIM_genmol/molmim)
|
Phase 5: ADMET Filtering
|- Physicochemical properties (Lipinski, QED)
|- Bioavailability, toxicity, CYP interactions
+- Structural alerts (PAINS)
|
Phase 6: Candidate Docking & Prioritization
|- Dock all candidates (NvidiaNIM_diffdock/boltz2)
|- Score by docking (40%) + ADMET (30%) + similarity (20%) + novelty (10%)
|- Assess synthesis feasibility
+- Generate final ranked list (top 20)
|
Phase 6.5: Literature Evidence
|- PubMed (peer-reviewed SAR studies)
|- EuropePMC preprints (source='PPR')
+- OpenAlex citation analysis
|
Phase 7: Report Synthesis & Delivery
Phase 0: Tool Verification
CRITICAL: Verify tool parameters before calling unfamiliar tools.
tool_info = tu.tools.get_tool_info(tool_name="ChEMBL_get_target_activities")
Known Parameter Corrections
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
OpenTargets_* | ensembl_id | ensemblId (camelCase) |
ChEMBL_get_target_activities | chembl_target_id | target_chembl_id |
ChEMBL_search_similar_molecules | smiles | molecule (accepts SMILES, ChEMBL ID, or name) |
alphafold_get_prediction | uniprot | accession |
ADMETAI_* | smiles="..." | smiles=["..."] (must be list) |
NvidiaNIM_alphafold2 | seq | sequence |
NvidiaNIM_genmol | smiles="C..." | smiles="C...[*{1-3}]..." (must have mask) |
NvidiaNIM_boltz2 | sequence="..." | polymers=[{"molecule_type": "protein", "sequence": "..."}] |
Phase 1: Target Validation
1.1 Identifier Resolution
Resolve all IDs upfront and store for downstream queries:
1. UniProt_search(query=target_name, organism="human") -> UniProt accession
2. MyGene_query_genes(q=gene_symbol, species="human") -> Ensembl gene ID
3. ChEMBL_search_targets(query=target_name, organism="Homo sapiens") -> ChEMBL target ID
4. GtoPdb_get_targets(query=target_name) -> GtoPdb ID (if GPCR/channel/enzyme)
1.2 Druggability Assessment
Use multi-source triangulation:
OpenTargets_get_target_tractability_by_ensemblID(ensemblId)- tractability bucketDGIdb_get_gene_druggability(genes=[gene_symbol])- druggability categoriesOpenTargets_get_target_classes_by_ensemblID(ensemblId)- target class- For GPCRs:
GPCRdb_get_protein+GPCRdb_get_ligands+GPCRdb_get_structures - For antibody landscape:
TheraSAbDab_search_by_target(target=target_name)
Decision Point: If druggability < 2 stars, warn user about challenges.
1.3 Binding Site Analysis
ChEMBL_search_binding_sites(target_chembl_id)get_binding_affinity_by_pdb_id(pdb_id)for co-crystallized ligandsInterPro_get_protein_domains(accession)for domain architecture
1.4 Structure Prediction (NVIDIA NIM)
Requires NVIDIA_API_KEY. Two options:
- AlphaFold2:
NvidiaNIM_alphafold2(sequence, algorithm="mmseqs2")- high accuracy, 5-15 min - ESMFold:
NvidiaNIM_esmfold(sequence)- fast (~30s), max 1024 AA
Always report pLDDT confidence scores (>=90 very high, 70-90 confident, <70 caution).
Phase 2: Known Ligand Mining
Tools (in order of priority)
| Source | Tool | Strengths |
|---|---|---|
| ChEMBL | ChEMBL_get_target_activities | Curated, SAR-ready |
| BindingDB | BindingDB_get_ligands_by_uniprot | Direct Ki/Kd, literature links |
| GtoPdb | GtoPdb_get_target_interactions | Pharmacology focus (GPCRs, channels) |
| PubChem | PubChem_search_assays_by_target_gene | HTS screens, novel scaffolds |
| Open Targets | OpenTargets_get_chemical_probes_by_target_ensemblID | Validated probes |
Key Steps
- Get all bioactivities: filter to IC50/Ki/Kd < 10 uM
- Get molecule details for top actives:
ChEMBL_get_molecule - Identify chemical probes and approved drugs
- Analyze SAR: common scaffolds, key modifications
- Check off-target selectivity:
BindingDB_get_targets_by_compound
Phase 3: Structure Analysis
Tools
PDB_search_similar_structures(query=uniprot, type="sequence")- find PDB entriesget_protein_metadata_by_pdb_id(pdb_id)- resolution, methodget_binding_affinity_by_pdb_id(pdb_id)- co-crystal ligand affinitiesget_ligand_smiles_by_chem_comp_id(chem_comp_id)- ligand SMILES from PDBemdb_search(query)- cryo-EM structures (prefer for GPCRs, ion channels)alphafold_get_prediction(accession)- AlphaFold DB fallback
Phase 3.5: Docking Validation (NVIDIA NIM)
| Situation | Tool | Input |
|---|---|---|
| Have PDB + SDF | NvidiaNIM_diffdock | protein=PDB, ligand=SDF, num_poses=10 |
| Have sequence + SMILES | NvidiaNIM_boltz2 | polymers=[...], ligands=[...] |
Dock a known reference inhibitor first to validate the binding pocket.
Phase 4: Compound Expansion
4.1-4.3 Search-Based Expansion
Use 3-5 diverse actives as seeds, similarity threshold 70-85%:
ChEMBL_search_similar_molecules(molecule=SMILES, similarity=70)PubChem_search_compounds_by_similarity(smiles, threshold=0.7)ChEMBL_search_substructure(smiles=core_scaffold)STITCH_get_chemical_protein_interactions(identifier=gene, species=9606)
4.4 De Novo Generation (NVIDIA NIM)
GenMol - scaffold hopping with masked regions:
NvidiaNIM_genmol(smiles="...core...[*{3-8}]...tail...[*{1-3}]...", num_molecules=100, temperature=2.0, scoring="QED")
Mask syntax: [*{min-max}] specifies atom count range.
MolMIM - controlled analog generation:
NvidiaNIM_molmim(smi=reference_smiles, num_molecules=50, algorithm="CMA-ES")
Phase 5: ADMET Filtering
Apply filters sequentially (all take smiles=[list]):
| Step | Tool | Filter Criteria |
|---|---|---|
| Physicochemical | ADMETAI_predict_physicochemical_properties | Lipinski <= 1, QED > 0.3, MW 200-600 |
| Bioavailability | ADMETAI_predict_bioavailability | Oral bioavailability > 0.3 |
| Toxicity | ADMETAI_predict_toxicity | AMES < 0.5, hERG < 0.5, DILI < 0.5 |
| CYP | ADMETAI_predict_CYP_interactions | Flag CYP3A4 inhibitors |
| Alerts | ChEMBL_search_compound_structural_alerts | No PAINS |
Include a filter funnel table in the report showing pass/fail counts at each stage.
Phase 6: Candidate Docking & Prioritization
Scoring Framework
| Dimension | Weight | Source |
|---|---|---|
| Docking confidence | 40% | NvidiaNIM_diffdock/boltz2 |
| ADMET score | 30% | ADMETAI predictions |
| Similarity to known active | 20% | Tanimoto coefficient |
| Novelty | 10% | Not in ChEMBL + novel scaffold bonus |
Evidence Tiers
| Tier | Criteria |
|---|---|
| T0 (4 stars) | Docking score > reference inhibitor |
| T1 (3 stars) | Experimental IC50/Ki < 100 nM |
| T2 (2 stars) | Docking within 5% of reference OR IC50 100-1000 nM |
| T3 (1 star) | >80% similarity to T1 compound |
| T4 (0 stars) | 70-80% similarity, scaffold match |
| T5 (empty) | Generated molecule, ADMET-passed, no docking |
Deliver top 20 candidates with: Rank, ID, SMILES, Docking score, ADMET score, overall score, source, evidence tier.
Phase 6.5: Literature Evidence
PubMed_search_articles(query="[TARGET] inhibitor SAR")- peer-reviewedEuropePMC_search_articles(query, source="PPR")- preprints (not peer-reviewed)openalex_search_works(query)- citation analysis
Fallback Chains
Target ID: ChEMBL_search_targets -> GtoPdb_get_targets -> "Not in databases"
Druggability: OpenTargets tractability -> DGIdb druggability -> target class proxy
Bioactivity: ChEMBL -> BindingDB -> GtoPdb -> PubChem BioAssay -> "No data"
Structure: PDB -> EMDB (membrane) -> NvidiaNIM_alphafold2 -> NvidiaNIM_esmfold -> AlphaFold DB -> "None"
Similarity: ChEMBL similar -> PubChem similar -> "Search failed"
Docking: NvidiaNIM_diffdock -> NvidiaNIM_boltz2 -> similarity-based scoring
Generation: NvidiaNIM_genmol -> NvidiaNIM_molmim -> similarity search only
Literature: PubMed -> EuropePMC (preprints) -> OpenAlex
GPCR data: GPCRdb_get_protein -> GtoPdb_get_targets
NVIDIA NIM Runtime Reference
| Tool | Runtime | Notes |
|---|---|---|
NvidiaNIM_alphafold2 | 5-15 min | Async, max ~2000 AA |
NvidiaNIM_esmfold | ~30 sec | Max 1024 AA |
NvidiaNIM_diffdock | ~1-2 min | Per ligand |
NvidiaNIM_boltz2 | ~2-5 min | End-to-end complex |
NvidiaNIM_genmol | ~1-3 min | Depends on num_molecules |
NvidiaNIM_molmim | ~1-2 min | Close analog generation |
Always check: import os; nvidia_available = bool(os.environ.get("NVIDIA_API_KEY"))
Rate Limiting
| Database | Limit | Strategy |
|---|---|---|
| ChEMBL | ~10 req/sec | Batch queries |
| PubChem | ~5 req/sec | Batch endpoints |
| ADMET-AI | No strict limit | Batch SMILES in lists |
| NVIDIA NIM | API key quota | Cache results |
For large expansions (>500 compounds): batch in chunks of 100, prioritize top candidates for docking.
Reference Files
For detailed protocols, examples, and templates, see:
| File | Contents |
|---|---|
| WORKFLOW_DETAILS.md | Phase-by-phase procedures, code patterns, screening protocols, fallback chain details |
| TOOLS_REFERENCE.md | Complete tool reference with parameters, usage examples, and fallback chains |
| REPORT_TEMPLATE.md | Report file template, evidence grading system, section formatting examples |
| EXAMPLES.md | End-to-end workflow examples (EGFR, novel target, lead optimization, NVIDIA NIM) |
| CHECKLIST.md | Pre-delivery verification checklist for report quality |
More by mims-harvard
View all →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
rust-coding-skill
UtakataKyosui
Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.