Malaysia Open Data

Malaysia Open Data

hithereiamaliff

Provides AI-friendly access to Malaysia's government open datasets through unified search, data parsing, and geocoding capabilities. Connects to official Malaysian government data sources including statistics, weather, and transport data.

A unified AI-first bridge to Malaysia's open data ecosystem with intelligent search, context-aware geocoding, and comprehensive documentation for seamless AI integration.

2211 views1Local (stdio)

What it does

  • Search Malaysian government datasets with intelligent query handling
  • Parse Parquet data files directly with support for up to 500 rows
  • Geocode Malaysian addresses using multiple providers (Google Maps, GrabMaps, OpenStreetMap)
  • Access real-time weather forecasts and warnings
  • Retrieve public transport and GTFS data
  • Browse interactive data visualization dashboards

Best for

Researchers analyzing Malaysian government statisticsDevelopers building apps with Malaysian location dataData scientists working with Southeast Asian datasetsApplications requiring Malaysian weather or transport data
Real-time access to official Malaysian government dataIntelligent search with synonym expansion and fuzzy matchingMulti-provider geocoding optimized for Malaysia

About Malaysia Open Data

Malaysia Open Data is a community-built MCP server published by hithereiamaliff that provides AI assistants with tools and capabilities via the Model Context Protocol. Discover Malaysia Open Data: AI-first access, intelligent search, and seamless integration for Malaysia's leading open d It is categorized under analytics data.

How to install

You can install Malaysia Open Data in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Malaysia Open Data is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Malaysia Open Data MCP

MCP Endpoint: https://mcp.techmavie.digital/datagovmy/mcp

Analytics Dashboard: https://mcp.techmavie.digital/datagovmy/analytics/dashboard

MCP (Model Context Protocol) server for Malaysia's Open Data APIs, providing easy access to government datasets and collections.

Do note that this is NOT an official MCP server by the Government of Malaysia or anyone from Malaysia's Open Data/Jabatan Digital Negara/Ministry of Digital team.

Features

  • Enhanced Unified Search with flexible tokenization and synonym expansion
    • Intelligent query handling with term normalization
    • Support for plurals and common prefixes (e.g., "e" in "epayment")
    • Smart prioritization for different data types
  • Parquet File Support using pure JavaScript
    • Parse Parquet files directly in the browser or Node.js
    • Support for BROTLI compression
    • Intelligent date field handling for empty date objects
    • Increased row limits (up to 500 rows) for comprehensive data retrieval
    • Fallback to metadata estimation when parsing fails
    • Automatic dashboard URL mapping for visualization
  • Live Data Access Architecture
    • Real-time index fetching from GitHub (data-gov-my/datagovmy-meta)
    • In-memory caching with configurable TTL
    • Dynamic API calls for detailed metadata
  • Multi-Provider Geocoding
    • Support for Google Maps, GrabMaps, and Nominatim (OpenStreetMap)
    • Intelligent service selection based on location and available API keys
    • GrabMaps optimization for locations in Malaysia
    • Automatic fallback between providers
  • Comprehensive Data Sources
    • Malaysia's Data Catalogue with rich metadata
    • Interactive Dashboards for data visualization
    • Department of Statistics Malaysia (DOSM) data
    • Weather forecast and warnings
    • Public transport and GTFS data
  • Multi-Provider Malaysian Geocoding
    • Optimized for Malaysian addresses and locations
    • Three-tier geocoding system: GrabMaps, Google Maps, and Nominatim
    • Prioritizes local knowledge with GrabMaps for better Malaysian coverage
    • Automatic fallback to Nominatim when no API keys are provided

Architecture

This MCP server fetches dataset and dashboard metadata live from the data-gov-my/datagovmy-meta GitHub repository:

  • Live GitHub Indexes — Fetches all dataset and dashboard metadata via the GitHub Trees API and raw content URLs
  • Cache Pre-Warming — Indexes are fetched immediately on server startup, so the first user request is fast
  • In-Memory Caching — Indexes are cached in memory with a configurable TTL (default: 1 hour)
  • Background Refresh — When cache expires, stale data is served instantly while a background refresh fetches updated indexes. Users never experience fetch delays after the initial startup.
  • Dynamic Detail Fetching — Individual dataset/dashboard details are fetched on-demand from GitHub raw content

This approach provides several benefits:

  • Always up-to-date with the latest datasets and dashboards
  • No static data that goes stale
  • Zero-latency responses (pre-warmed cache + background refresh)
  • Consistent data access patterns

Documentation

  • TOOLS.md - Detailed information about available tools and best practices
  • PROMPT.md - AI integration guidelines and usage patterns

AI Integration

When integrating this MCP server with AI models:

  1. Use the unified search tool first - Always start with search_all for any data queries
  2. Follow the correct URL patterns - Use https://data.gov.my/... and https://open.dosm.gov.my/...
  3. Leverage Parquet file tools - Use parse_parquet_file to access data directly or get_parquet_info for metadata
  4. Live indexes - Dataset and dashboard indexes are fetched live from GitHub and cached in memory
  5. Consider dashboard visualization - For complex data, use the dashboard links provided by find_dashboard_for_parquet
  6. Leverage the multi-provider Malaysian geocoding - For Malaysian location queries, the system automatically selects the best provider (GrabMaps, Google Maps, or Nominatim) with fallback to Nominatim when no API keys are configured

Refer to PROMPT.md for comprehensive AI integration guidelines.

Installation

npm install

Quick Start (Hosted Server)

The easiest way to use this MCP server is via the hosted endpoint. No installation required!

Server URL:

https://mcp.techmavie.digital/datagovmy/mcp

Using Your Own API Keys

You can provide your own API keys via URL query parameters:

https://mcp.techmavie.digital/datagovmy/mcp?googleMapsApiKey=YOUR_KEY

Or via headers:

  • X-Google-Maps-Api-Key: YOUR_KEY
  • X-GrabMaps-Api-Key: YOUR_KEY
  • X-AWS-Access-Key-Id: YOUR_KEY
  • X-AWS-Secret-Access-Key: YOUR_KEY
  • X-AWS-Region: ap-southeast-5

Supported Query Parameters:

ParameterDescription
googleMapsApiKeyGoogle Maps API key for geocoding
grabMapsApiKeyGrabMaps API key for Southeast Asia geocoding
awsAccessKeyIdAWS Access Key ID for AWS Location Service
awsSecretAccessKeyAWS Secret Access Key
awsRegionAWS Region (default: ap-southeast-5)

⚠️ Important: GrabMaps Requirements

To use GrabMaps geocoding, you need ALL FOUR parameters:

  • grabMapsApiKey
  • awsAccessKeyId
  • awsSecretAccessKey
  • awsRegion

GrabMaps uses AWS Location Service under the hood, so AWS credentials are required alongside the GrabMaps API key.

Client Configuration

For Claude Desktop / Cursor / Windsurf, add to your MCP configuration:

{
  "mcpServers": {
    "malaysia-opendata": {
      "transport": "streamable-http",
      "url": "https://mcp.techmavie.digital/datagovmy/mcp"
    }
  }
}

With your own API key:

{
  "mcpServers": {
    "malaysia-opendata": {
      "transport": "streamable-http",
      "url": "https://mcp.techmavie.digital/datagovmy/mcp?googleMapsApiKey=YOUR_KEY"
    }
  }
}

Self-Hosted (VPS)

If you prefer to run your own instance, see deploy/DEPLOYMENT.md for detailed VPS deployment instructions with Docker and Nginx.

Analytics Dashboard

The hosted server includes a built-in analytics dashboard:

Dashboard URL: https://mcp.techmavie.digital/datagovmy/analytics/dashboard

Analytics Endpoints

EndpointDescription
/analyticsFull analytics summary (JSON)
/analytics/toolsDetailed tool usage stats (JSON)
/analytics/dashboardVisual dashboard with charts (HTML)

The dashboard tracks:

  • Total requests and tool calls
  • Tool usage distribution
  • Hourly request trends (last 24 hours)
  • Requests by endpoint
  • Top clients by user agent
  • Recent tool calls feed

Auto-refreshes every 30 seconds.

Available Tools

Unified Search

  • search_all: Primary search tool — searches across both datasets and dashboards with intelligent fallback and scoring

Data Catalogue

  • list_datasets_catalogue: Lists available datasets in the Data Catalogue
  • search_datasets_catalogue: Searches datasets in the Data Catalogue
  • filter_datasets_catalogue: Filters datasets by frequency, geography, demography, data source, or year range
  • get_dataset_details: Gets metadata/details for a specific dataset
  • get_dataset_filters: Gets available filter options for datasets

Dashboards

  • list_dashboards: Lists all available dashboards
  • search_dashboards: Searches dashboards by query
  • get_dashboard_details: Gets comprehensive metadata for a dashboard
  • get_dashboard_charts: Gets chart configurations for a specific dashboard

Department of Statistics Malaysia (DOSM)

  • list_dosm_datasets: Lists available datasets from DOSM
  • get_dosm_dataset: Gets data from a specific DOSM dataset

Parquet File Handling

  • parse_parquet_file: Parse and display data from a Parquet file URL
    • Supports up to 500 rows for comprehensive data analysis
    • Automatically handles empty date objects with appropriate formatting
    • Processes BigInt values for proper JSON serialization
  • get_parquet_info: Get metadata and structure information about a Parquet file
  • find_dashboard_for_parquet: Find the corresponding dashboard URL for a Parquet file

Weather

  • get_weather_forecast: Gets weather forecast for Malaysia
  • get_weather_warnings: Gets current weather warnings for Malaysia
  • get_earthquake_warnings: Gets earthquake warnings for Malaysia

Transport

  • list_transport_agencies: Lists available transport agencies with GTFS data
  • get_transport_data: Gets GTFS data for a specific transport agency

GTFS Parsing

  • parse_gtfs_static: Parses GTFS Static data (ZIP files with CSV data) for a specific transport provider
  • parse_gtfs_realtime: Parses GTFS Realtime data (Protocol Buffer format) for vehicle positions
  • get_transit_routes: Extracts route information from GTFS data
  • get_transit_stops: Extracts stop information from GTFS data, optionally filtered by route

Flood Warnings

  • get_flood_warnings: Gets current flood warnings for Malaysia, filterable by state, district, and severity

Test

  • hello: A simple test tool to verify that the MCP server is working correctly

Data-Catalogue Information Retrieval

The MCP server provides robust handling for data-catalogue information retrieval:

Date Handling in Parquet Files

  • Empty Date Objects: The system automatically detects and handles empty date objects in parquet files
  • Dataset-Specific Handling: Special handling for known datasets like employment_sector with annual data from 2001-2022
  • Pattern Recognition: Detects date patterns in existing data to maintain co

README truncated. View full README on GitHub.

Alternatives

Related Skills

Browse all skills
backend-dev-guidelines

Comprehensive backend development guide for Langfuse's Next.js 14/tRPC/Express/TypeScript monorepo. Use when creating tRPC routers, public API endpoints, BullMQ queue processors, services, or working with tRPC procedures, Next.js API routes, Prisma database access, ClickHouse analytics queries, Redis queues, OpenTelemetry instrumentation, Zod v4 validation, env.mjs configuration, tenant isolation patterns, or async patterns. Covers layered architecture (tRPC procedures → services, queue processors → services), dual database system (PostgreSQL + ClickHouse), projectId filtering for multi-tenant isolation, traceException error handling, observability patterns, and testing strategies (Jest for web, vitest for worker).

1
data-storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

13
content-trend-researcher

Advanced content and topic research skill that analyzes trends across Google Analytics, Google Trends, Substack, Medium, Reddit, LinkedIn, X, blogs, podcasts, and YouTube to generate data-driven article outlines based on user intent analysis

13
openalex-database

Query and analyze scholarly literature using the OpenAlex database. This skill should be used when searching for academic papers, analyzing research trends, finding works by authors or institutions, tracking citations, discovering open access publications, or conducting bibliometric analysis across 240M+ scholarly works. Use for literature searches, research output analysis, citation analysis, and academic database queries.

6
agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

6
crypto-market-data

No API KEY needed for free tier. Professional-grade cryptocurrency market data integration for real-time prices, historical charts, and global analytics.

4