Webpage Timestamps

Name: Webpage Timestamps
Rating: 4.6 (21 reviews)
Author: fabien-desablens

Extracts when web pages were created, modified, or published by analyzing HTML metadata, headers, and structured data. Returns timestamps with confidence scores to help you understand content freshness and publication dates.

Extracts webpage creation, modification, and publication timestamps from HTML meta tags, HTTP headers, JSON-LD structured data, microdata, OpenGraph, and Twitter cards with confidence scoring and intelligent consolidation for content freshness analysis and temporal metadata extraction.

1346 views1Local (stdio)

search web analytics data

GitHub

What it does

Extract creation timestamps from webpages
Extract modification timestamps from webpages
Extract publication timestamps from webpages
Process multiple URLs in batch
Analyze JSON-LD and microdata for dates
Score timestamp confidence levels

Best for

Content researchers analyzing publication datesSEO professionals tracking content freshnessWeb scrapers needing temporal metadataDigital archivists dating web content

Multiple data sources (meta tags, headers, structured data)Confidence scoring for reliabilityBatch processing support

About Webpage Timestamps

Webpage Timestamps is a community-built MCP server published by fabien-desablens that provides AI assistants with tools and capabilities via the Model Context Protocol. Webpage Timestamps extracts and consolidates creation, modification, and publication dates from web pages for accurate f It is categorized under search web, analytics data. This server exposes 2 tools that AI clients can invoke during conversations and coding sessions.

How to install

You can install Webpage Timestamps in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

License

Webpage Timestamps is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

Tools (2)

extract_timestamps

Extract creation, modification, and publication timestamps from a webpage

batch_extract_timestamps

Extract timestamps from multiple webpages in batch

MCP Webpage Timestamps

A powerful Model Context Protocol (MCP) server for extracting webpage creation, modification, and publication timestamps. This tool is designed for web scraping and temporal analysis of web content.

Features

Comprehensive Timestamp Extraction: Extracts creation, modification, and publication timestamps from webpages
Multiple Data Sources: Supports HTML meta tags, HTTP headers, JSON-LD, microdata, OpenGraph, Twitter cards, and heuristic analysis
Confidence Scoring: Provides confidence levels (high/medium/low) for extracted timestamps
Batch Processing: Extract timestamps from multiple URLs simultaneously
Configurable: Customizable timeout, user agent, redirect handling, and heuristic options
Production Ready: Robust error handling, comprehensive logging, and TypeScript support

Installation

Quick Install

npm install -g mcp-webpage-timestamps

Usage with npx

npx mcp-webpage-timestamps

Installing via Smithery

To install mcp-webpage-timestamps for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @Fabien-desablens/mcp-webpage-timestamps --client claude

Prerequisites

Node.js 18.0.0 or higher
npm or yarn

Development Install

git clone https://github.com/Fabien-desablens/mcp-webpage-timestamps.git
cd mcp-webpage-timestamps
npm install
npm run build

Usage

As MCP Server

The server can be used with any MCP-compatible client. Here's how to configure it:

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "webpage-timestamps": {
      "command": "npx",
      "args": ["mcp-webpage-timestamps"],
      "env": {}
    }
  }
}

Cline Configuration

Add to your MCP settings:

{
  "mcpServers": {
    "webpage-timestamps": {
      "command": "npx",
      "args": ["mcp-webpage-timestamps"]
    }
  }
}

Direct Usage

# Start the server
npm start

# Or run in development mode
npm run dev

API Reference

Tools

`extract_timestamps`

Extract timestamps from a single webpage.

Parameters:

url (string, required): The URL of the webpage to extract timestamps from
config (object, optional): Configuration options

Configuration Options:

timeout (number): Request timeout in milliseconds (default: 10000)
userAgent (string): User agent string for requests
followRedirects (boolean): Whether to follow HTTP redirects (default: true)
maxRedirects (number): Maximum number of redirects to follow (default: 5)
enableHeuristics (boolean): Enable heuristic timestamp detection (default: true)

Example:

{
  "name": "extract_timestamps",
  "arguments": {
    "url": "https://example.com/article",
    "config": {
      "timeout": 15000,
      "enableHeuristics": true
    }
  }
}

`batch_extract_timestamps`

Extract timestamps from multiple webpages in batch.

Parameters:

urls (array of strings, required): Array of URLs to extract timestamps from
config (object, optional): Same configuration options as extract_timestamps

Example:

{
  "name": "batch_extract_timestamps",
  "arguments": {
    "urls": [
      "https://example.com/article1",
      "https://example.com/article2",
      "https://example.com/article3"
    ],
    "config": {
      "timeout": 10000
    }
  }
}

Response Format

Both tools return a JSON object with the following structure:

{
  url: string;
  createdAt?: Date;
  modifiedAt?: Date;
  publishedAt?: Date;
  sources: TimestampSource[];
  confidence: 'high' | 'medium' | 'low';
  errors?: string[];
}

TimestampSource:

{
  type: 'html-meta' | 'http-header' | 'json-ld' | 'microdata' | 'opengraph' | 'twitter' | 'heuristic';
  field: string;
  value: string;
  confidence: 'high' | 'medium' | 'low';
}

Supported Timestamp Sources

HTTP Headers

Last-Modified
Date

JSON-LD Structured Data

datePublished
dateModified
dateCreated

Microdata

datePublished
dateModified

OpenGraph

og:article:published_time
og:article:modified_time
og:updated_time

Twitter Cards

twitter:data1 (when containing date information)

Heuristic Analysis

Time elements with datetime attributes
Common date patterns in text
Date-related CSS classes

Development

Scripts

# Development with hot reload
npm run dev

# Build the project
npm run build

# Run tests
npm test

# Run tests in watch mode
npm run test:watch

# Lint code
npm run lint

# Fix linting issues
npm run lint:fix

# Format code
npm run format

Testing

The project includes comprehensive tests:

# Run all tests
npm test

# Run tests with coverage
npm test -- --coverage

# Run specific test file
npm test -- extractor.test.ts

Code Quality

TypeScript: Full TypeScript support with strict type checking
ESLint: Code linting with recommended rules
Prettier: Code formatting
Jest: Unit and integration testing
95%+ Test Coverage: Comprehensive test suite

Examples

Basic Usage

import { TimestampExtractor } from './src/extractor.js';

const extractor = new TimestampExtractor();
const result = await extractor.extractTimestamps('https://example.com/article');

console.log('Published:', result.publishedAt);
console.log('Modified:', result.modifiedAt);
console.log('Confidence:', result.confidence);
console.log('Sources:', result.sources.length);

Custom Configuration

const extractor = new TimestampExtractor({
  timeout: 15000,
  userAgent: 'MyBot/1.0',
  enableHeuristics: false,
  maxRedirects: 3
});

const result = await extractor.extractTimestamps('https://example.com');

Batch Processing

const urls = [
  'https://example.com/article1',
  'https://example.com/article2',
  'https://example.com/article3'
];

const results = await Promise.all(
  urls.map(url => extractor.extractTimestamps(url))
);

Use Cases

Content Analysis: Analyze temporal aspects of web content
Web Scraping: Extract temporal metadata from scraped pages
SEO Analysis: Analyze publication and modification patterns
Research: Study temporal aspects of web content
Content Management: Track content lifecycle and updates

Error Handling

The extractor handles various error conditions gracefully:

Network Errors: Timeout, connection refused, DNS resolution failures
HTTP Errors: 404, 500, and other HTTP status codes
Parsing Errors: Invalid HTML, malformed JSON-LD, unparseable dates
Configuration Errors: Invalid URLs, timeout values, etc.

All errors are captured in the errors array of the response, allowing for robust error handling and debugging.

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

Fork the repository
Clone your fork: git clone https://github.com/Fabien-desablens/mcp-webpage-timestamps.git
Install dependencies: npm install
Create a branch: git checkout -b feature/your-feature
Make your changes
Run tests: npm test
Commit your changes: git commit -m 'Add some feature'
Push to the branch: git push origin feature/your-feature
Submit a pull request

Code Style

Follow the existing code style
Use TypeScript for all new code
Add tests for new functionality
Update documentation as needed

License

MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Wiki

Changelog

See CHANGELOG.md for a detailed history of changes.

Acknowledgments

Model Context Protocol for the excellent MCP framework
Cheerio for HTML parsing
Axios for HTTP requests
date-fns for date parsing and manipulation

Alternatives

Browser Use

browser-use

79.9k

Browser Use lets LLMs and agents access and scrape any website in real time, making web scraping and web page scraping e

OfficialPopular

36616

FireCrawl

firecrawl

5.7k

Integrate FireCrawl for advanced web scraping to extract clean, structured data from complex websites—fast, scalable, an

OfficialRemotePopular

3214

Playwright

executeautomation

5.3k

Playwright automates web browsers for web scraping, scraping, and internet scraping, enabling you to scrape any website

CommunityPopular

84311

Deep Research MCP

u14app

4.5k

Deep Research MCP — an AI research assistant and LLM research tool for multi-step web search, content analysis, and synt

Community

219

Related Skills

Browse all skills

browser-automation

Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications. Triggers include "browse", "navigate to", "go to website", "extract data from webpage", "screenshot", "web scraping", "fill out form", "click on", "search for on the web". When taking actions be as specific as possible.

brightdata

Web scraping and search via Bright Data API. Requires BRIGHTDATA_API_KEY and BRIGHTDATA_UNLOCKER_ZONE. Use for scraping any webpage as markdown (bypassing bot detection/CAPTCHA) or searching Google with structured results.

ga4-analytics

Google Analytics 4, Search Console, and Indexing API toolkit. Analyze website traffic, page performance, user demographics, real-time visitors, search queries, and SEO metrics. Use when the user asks to: check site traffic, analyze page views, see traffic sources, view user demographics, get real-time visitor data, check search console queries, analyze SEO performance, request URL re-indexing, inspect index status, compare date ranges, check bounce rates, view conversion data, or get e-commerce revenue. Requires a Google Cloud service account with GA4 and Search Console access.

content-trend-researcher

Advanced content and topic research skill that analyzes trends across Google Analytics, Google Trends, Substack, Medium, Reddit, LinkedIn, X, blogs, podcasts, and YouTube to generate data-driven article outlines based on user intent analysis

zotero

Manage Zotero reference libraries via the Web API. Search, list, add items by DOI/ISBN/PMID (with duplicate detection), delete/trash items, update metadata and tags, export in BibTeX/RIS/CSL-JSON, batch-add from files, check PDF attachments, cross-reference citations, find missing DOIs via CrossRef, and fetch open-access PDFs. Supports --json output for scripting. Use when the user asks about academic references, citation management, literature libraries, PDFs for papers, bibliography export, or Zotero specifically.

google-analytics

Analyze Google Analytics data, review website performance metrics, identify traffic patterns, and suggest data-driven improvements. Use when the user asks about analytics, website metrics, traffic analysis, conversion rates, user behavior, or performance optimization.

What it does

Best for

About Webpage Timestamps

How to install

License

Tools (2)

MCP Webpage Timestamps

Features

Installation

Quick Install

Usage with npx

Installing via Smithery

Prerequisites

Development Install

Usage

As MCP Server

Claude Desktop Configuration

Cline Configuration

Direct Usage

API Reference

Tools

extract_timestamps

batch_extract_timestamps

Response Format

Supported Timestamp Sources

HTML Meta Tags

HTTP Headers

JSON-LD Structured Data

Microdata

OpenGraph

Twitter Cards

Heuristic Analysis

Development

Scripts

Testing

Code Quality

Examples

Basic Usage

Custom Configuration

Batch Processing

Use Cases

Error Handling

Contributing

Development Setup

Code Style

License

Support

Changelog

Acknowledgments

Alternatives

Browser Use

FireCrawl

Playwright

Deep Research MCP

Related Skills

`extract_timestamps`

`batch_extract_timestamps`