DATA.GOV.HK

DATA.GOV.HK

mcp-open-data-hk

Provides access to Hong Kong government's official open data portal, allowing you to search, browse, and retrieve metadata for thousands of public datasets from DATA.GOV.HK.

Integrates with DATA.GOV.HK to provide comprehensive access to Hong Kong government datasets through search, filtering, and metadata retrieval tools for researchers, developers, and data scientists working with official Hong Kong public sector data.

4351 views4Local (stdio)

What it does

  • Search datasets by keywords and metadata
  • Browse datasets by category and format
  • Retrieve detailed dataset information and metadata
  • List available data categories and formats
  • Filter datasets by file format (CSV, JSON, GeoJSON, etc.)
  • Get faceted search results for data exploration

Best for

Researchers analyzing Hong Kong public sector dataDevelopers building applications with HK government dataData scientists exploring official Hong Kong datasetsCitizens accessing public information and statistics
No API key neededAccess to official Hong Kong government data8+ specialized search and filtering tools

About DATA.GOV.HK

DATA.GOV.HK is a community-built MCP server published by mcp-open-data-hk that provides AI assistants with tools and capabilities via the Model Context Protocol. Access Hong Kong government datasets with DATA.GOV.HK for easy search, filtering, and metadata tools. Ideal for research It is categorized under analytics data. This server exposes 8 tools that AI clients can invoke during conversations and coding sessions.

How to install

You can install DATA.GOV.HK in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

DATA.GOV.HK is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Tools (8)

list_datasets

Get a list of dataset IDs from data.gov.hk Args: limit: Maximum number of datasets to return (default: 1000) offset: Offset of the first dataset to return language: Language code (en, tc, sc)

get_dataset_details

Get detailed information about a specific dataset Args: dataset_id: The ID or name of the dataset to retrieve language: Language code (en, tc, sc) include_tracking: Add tracking information to dataset and resources

list_categories

Get a list of data categories (groups) Args: order_by: Field to sort by ('name' or 'packages') - deprecated, use sort instead sort: Sorting of results ('name asc', 'package_count desc', etc.) limit: Maximum number of categories to return offset: Offset for pagination all_fields: Return full group dictionaries instead of just names language: Language code (en, tc, sc)

get_category_details

Get detailed information about a specific category (group) Args: category_id: The ID or name of the category to retrieve include_datasets: Include a truncated list of the category's datasets include_dataset_count: Include the full package count include_extras: Include the category's extra fields include_users: Include the category's users include_groups: Include the category's sub groups include_tags: Include the category's tags include_followers: Include the category's number of followers language: Language code (en, tc, sc)

search_datasets

Search for datasets by query term using the package_search API. This function searches across dataset titles, descriptions, and other metadata to find datasets matching the query term. Args: query: The solr query string (e.g., "transport", "weather", "*:*" for all) limit: Maximum number of datasets to return (default: 10, max: 1000) offset: Offset for pagination language: Language code (en, tc, sc) Returns: A dictionary containing: - count: Total number of matching datasets - results: List of matching datasets (up to limit) - has_more: Boolean indicating if there are more results available

mcp-open-data-hk

smithery badge

This is an MCP (Model Context Protocol) server that provides access to data from DATA.GOV.HK, the official open data portal of the Hong Kong government.

Installation

Installing via Smithery

To install mcp-open-data-hk for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @mcp-open-data-hk/mcp-open-data-hk --client claude

Using uv (recommended)

When using uv no specific installation is needed. We will use uvx to directly run mcp-server-fetch.

Using PIP

Alternatively you can install mcp-server-fetch via pip:

pip install mcp-open-data-hk

After installation, you can run it as a script using:

python -m mcp_open_data_hk

After installation, configure your MCP-compatible client (like Cursor, Claude Code, or Claude Desktop) by adding the following to your settings.json:

Using uvx
{
  "mcpServers": {
    "mcp-open-data-hk": {
      "command": "uvx",
      "args": ["mcp-open-data-hk"]
    }
  }
}
Using pip installation
{
  "mcpServers": {
    "mcp-open-data-hk": {
      "command": "python",
      "args": ["-m", "mcp_open_data_hk"]
    }
  }
}

Features

The server provides the following tools to interact with the DATA.GOV.HK API:

  1. list_datasets - Get a list of dataset IDs
  2. get_dataset_details - Get detailed information about a specific dataset
  3. list_categories - Get a list of data categories
  4. get_category_details - Get detailed information about a specific category
  5. search_datasets - Search for datasets by query term with advanced options
  6. search_datasets_with_facets - Search datasets and return faceted results
  7. get_datasets_by_format - Get datasets by file format
  8. get_supported_formats - Get list of supported file formats

Tools

list_datasets

Get a list of dataset IDs from DATA.GOV.HK

Parameters:

  • limit (optional): Maximum number of datasets to return (default: 1000)
  • offset (optional): Offset of the first dataset to return
  • language (optional): Language code (en, tc, sc) - defaults to "en"

get_dataset_details

Get detailed information about a specific dataset

Parameters:

  • dataset_id: The ID or name of the dataset to retrieve
  • language (optional): Language code (en, tc, sc) - defaults to "en"
  • include_tracking (optional): Add tracking information to dataset and resources - defaults to False

list_categories

Get a list of data categories (groups)

Parameters:

  • order_by (optional): Field to sort by ('name' or 'packages') - deprecated, use sort instead
  • sort (optional): Sorting of results ('name asc', 'package_count desc', etc.) - defaults to "title asc"
  • limit (optional): Maximum number of categories to return
  • offset (optional): Offset for pagination
  • all_fields (optional): Return full group dictionaries instead of just names - defaults to False
  • language (optional): Language code (en, tc, sc) - defaults to "en"

get_category_details

Get detailed information about a specific category (group)

Parameters:

  • category_id: The ID or name of the category to retrieve
  • include_datasets (optional): Include a truncated list of the category's datasets - defaults to False
  • include_dataset_count (optional): Include the full package count - defaults to True
  • include_extras (optional): Include the category's extra fields - defaults to True
  • include_users (optional): Include the category's users - defaults to True
  • include_groups (optional): Include the category's sub groups - defaults to True
  • include_tags (optional): Include the category's tags - defaults to True
  • include_followers (optional): Include the category's number of followers - defaults to True
  • language (optional): Language code (en, tc, sc) - defaults to "en"

search_datasets

Search for datasets by query term using the package_search API.

This function searches across dataset titles, descriptions, and other metadata to find datasets matching the query term. It supports advanced Solr search parameters.

Parameters:

  • query (optional): The solr query string (e.g., "transport", "weather", ":" for all) - defaults to ":"
  • limit (optional): Maximum number of datasets to return (default: 10, max: 1000)
  • offset (optional): Offset for pagination - defaults to 0
  • language (optional): Language code (en, tc, sc) - defaults to "en"

Returns: A dictionary containing:

  • count: Total number of matching datasets
  • results: List of matching datasets (up to limit)
  • search_facets: Faceted information about the results
  • has_more: Boolean indicating if there are more results available

search_datasets_with_facets

Search for datasets and return faceted results for better data exploration.

This function is useful for exploring what types of data are available by showing counts of datasets grouped by tags, organizations, or other facets.

Parameters:

  • query (optional): The solr query string - defaults to ":"
  • language (optional): Language code (en, tc, sc) - defaults to "en"

Returns: A dictionary containing:

  • count: Total number of matching datasets
  • search_facets: Faceted information about the results
  • sample_results: First 3 matching datasets

get_datasets_by_format

Get datasets that have resources in a specific file format.

Parameters:

  • file_format: The file format to filter by (e.g., "CSV", "JSON", "GeoJSON")
  • limit (optional): Maximum number of datasets to return - defaults to 10
  • language (optional): Language code (en, tc, sc) - defaults to "en"

Returns: A dictionary containing:

  • count: Total number of matching datasets
  • results: List of matching datasets

get_supported_formats

Get a list of file formats supported by DATA.GOV.HK

Returns: A list of supported file formats

Local Testing

Run test scripts:

python tests/test_client.py
python tests/debug_search.py
python tests/comprehensive_test.py

Run server directly:

python -m src.mcp_open_data_hk

Run unit tests:

pytest tests/

Understanding Path Configuration

When installed as a package, the server can be referenced by its module name rather than file path. This is more convenient for users as they don't need to specify full file paths.

Installed Package:

{
  "mcpServers": {
    "mcp-open-data-hk": {
      "command": "python",
      "args": ["-m", "mcp_open_data_hk"]
    }
  }
}

Local Development (file path approach):

{
  "mcpServers": {
    "mcp-open-data-hk": {
      "command": "python",
      "args": ["-m", "src.mcp_open_data_hk"],
      "cwd": "/full/path/to/mcp-open-data-hk"
    }
  }
}

The package installation approach is recommended for end users, while the file path approach is useful for local development and testing.

Example Queries

Once installed, try these queries with your AI assistant:

  1. "List some datasets from the Hong Kong government data portal via mcp-open-data-hk mcp."
  2. "Find datasets related to transportation in Hong Kong. Use mcp-open-data-hk."
  3. "What categories of data are available on DATA.GOV.HK? Use mcp-open-data-hk."
  4. "Get details about the flight information dataset. Use mcp-open-data-hk."
  5. "Search for datasets about weather in Hong Kong. Use mcp-open-data-hk."
  6. "What file formats are supported by DATA.GOV.HK? Use mcp-open-data-hk."
  7. "Find CSV datasets about population Use mcp-open-data-hk."
  8. "Show me the most common tags in transport datasets Use mcp-open-data-hk."

The AI will automatically use the appropriate tools from your MCP server to fetch the requested information.

Troubleshooting

Common Issues

  1. Module not found errors: Make sure you've installed the dependencies with pip install -e . for local development, or pip install mcp-open-data-hk for the published package.

  2. Path issues: Ensure the cwd in your IDE configuration is the correct absolute path to the project root.

  3. Permission errors: On Unix systems, make sure the scripts have execute permissions:

    chmod +x src/mcp_open_data_hk/__main__.py
    
  4. FastMCP not found: Install it with:

    pip install fastmcp
    

Testing the Connection

If you're having issues, you can test the connection manually:

  1. Run the server in one terminal:

    python -m src.mcp_open_data_hk
    
  2. In another terminal, run the test client:

    python tests/test_client.py
    

If this works, the issue is likely in the IDE configuration.

Extending the Server

You can extend the server by adding more tools in src/mcp_open_data_hk/server.py. Follow the existing patterns:

  1. Add a new function decorated with @mcp.tool
  2. Provide a clear docstring explaining the function and parameters
  3. Implement the functionality
  4. Test with the client

The server automatically exposes all functions decorated with @mcp.tool to MCP clients.

GitHub Workflows

This project includes GitHub Actions workflows for CI/CD:

  1. CI Workflow: Runs tests across multiple Python versions (3.10-3.12) on every push/PR to main branch
  2. Publish Workflow: Automatically builds and publishes to TestPyPI on every push to main, and to PyPI on version tags (v*.*.*)
  3. Code Quality Workflow: Checks code formatting and linting on every push/PR
  4. Release Workflow: Automatically creates GitHub releases when tags are pushed

Setup for Publishing (Trusted Publishing)

This project uses PyPI's Trusted Publishing which is more secure than using API t


README truncated. View full README on GitHub.

Alternatives

Related Skills

Browse all skills
data-storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

27
content-trend-researcher

Advanced content and topic research skill that analyzes trends across Google Analytics, Google Trends, Substack, Medium, Reddit, LinkedIn, X, blogs, podcasts, and YouTube to generate data-driven article outlines based on user intent analysis

23
data-scientist

Expert data scientist for advanced analytics, machine learning, and statistical modeling. Handles complex data analysis, predictive modeling, and business intelligence. Use PROACTIVELY for data analysis tasks, ML modeling, statistical analysis, and data-driven insights.

13
google-analytics

Analyze Google Analytics data, review website performance metrics, identify traffic patterns, and suggest data-driven improvements. Use when the user asks about analytics, website metrics, traffic analysis, conversion rates, user behavior, or performance optimization.

13
senior-data-scientist

World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.

8
backend-dev-guidelines

Comprehensive backend development guide for Langfuse's Next.js 14/tRPC/Express/TypeScript monorepo. Use when creating tRPC routers, public API endpoints, BullMQ queue processors, services, or working with tRPC procedures, Next.js API routes, Prisma database access, ClickHouse analytics queries, Redis queues, OpenTelemetry instrumentation, Zod v4 validation, env.mjs configuration, tenant isolation patterns, or async patterns. Covers layered architecture (tRPC procedures → services, queue processors → services), dual database system (PostgreSQL + ClickHouse), projectId filtering for multi-tenant isolation, traceException error handling, observability patterns, and testing strategies (Jest for web, vitest for worker).

7