
Webpage Timestamps
Extracts when web pages were created, modified, or published by analyzing HTML metadata, headers, and structured data. Returns timestamps with confidence scores to help you understand content freshness and publication dates.
Extracts webpage creation, modification, and publication timestamps from HTML meta tags, HTTP headers, JSON-LD structured data, microdata, OpenGraph, and Twitter cards with confidence scoring and intelligent consolidation for content freshness analysis and temporal metadata extraction.
What it does
- Extract creation timestamps from webpages
- Extract modification timestamps from webpages
- Extract publication timestamps from webpages
- Process multiple URLs in batch
- Analyze JSON-LD and microdata for dates
- Score timestamp confidence levels
Best for
About Webpage Timestamps
Webpage Timestamps is a community-built MCP server published by fabien-desablens that provides AI assistants with tools and capabilities via the Model Context Protocol. Webpage Timestamps extracts and consolidates creation, modification, and publication dates from web pages for accurate f It is categorized under search web, analytics data. This server exposes 2 tools that AI clients can invoke during conversations and coding sessions.
How to install
You can install Webpage Timestamps in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.
License
Webpage Timestamps is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.
Tools (2)
Extract creation, modification, and publication timestamps from a webpage
Extract timestamps from multiple webpages in batch
MCP Webpage Timestamps
A powerful Model Context Protocol (MCP) server for extracting webpage creation, modification, and publication timestamps. This tool is designed for web scraping and temporal analysis of web content.
Features
- Comprehensive Timestamp Extraction: Extracts creation, modification, and publication timestamps from webpages
- Multiple Data Sources: Supports HTML meta tags, HTTP headers, JSON-LD, microdata, OpenGraph, Twitter cards, and heuristic analysis
- Confidence Scoring: Provides confidence levels (high/medium/low) for extracted timestamps
- Batch Processing: Extract timestamps from multiple URLs simultaneously
- Configurable: Customizable timeout, user agent, redirect handling, and heuristic options
- Production Ready: Robust error handling, comprehensive logging, and TypeScript support
Installation
Quick Install
npm install -g mcp-webpage-timestamps
Usage with npx
npx mcp-webpage-timestamps
Installing via Smithery
To install mcp-webpage-timestamps for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @Fabien-desablens/mcp-webpage-timestamps --client claude
Prerequisites
- Node.js 18.0.0 or higher
- npm or yarn
Development Install
git clone https://github.com/Fabien-desablens/mcp-webpage-timestamps.git
cd mcp-webpage-timestamps
npm install
npm run build
Usage
As MCP Server
The server can be used with any MCP-compatible client. Here's how to configure it:
Claude Desktop Configuration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"webpage-timestamps": {
"command": "npx",
"args": ["mcp-webpage-timestamps"],
"env": {}
}
}
}
Cline Configuration
Add to your MCP settings:
{
"mcpServers": {
"webpage-timestamps": {
"command": "npx",
"args": ["mcp-webpage-timestamps"]
}
}
}
Direct Usage
# Start the server
npm start
# Or run in development mode
npm run dev
API Reference
Tools
extract_timestamps
Extract timestamps from a single webpage.
Parameters:
url(string, required): The URL of the webpage to extract timestamps fromconfig(object, optional): Configuration options
Configuration Options:
timeout(number): Request timeout in milliseconds (default: 10000)userAgent(string): User agent string for requestsfollowRedirects(boolean): Whether to follow HTTP redirects (default: true)maxRedirects(number): Maximum number of redirects to follow (default: 5)enableHeuristics(boolean): Enable heuristic timestamp detection (default: true)
Example:
{
"name": "extract_timestamps",
"arguments": {
"url": "https://example.com/article",
"config": {
"timeout": 15000,
"enableHeuristics": true
}
}
}
batch_extract_timestamps
Extract timestamps from multiple webpages in batch.
Parameters:
urls(array of strings, required): Array of URLs to extract timestamps fromconfig(object, optional): Same configuration options asextract_timestamps
Example:
{
"name": "batch_extract_timestamps",
"arguments": {
"urls": [
"https://example.com/article1",
"https://example.com/article2",
"https://example.com/article3"
],
"config": {
"timeout": 10000
}
}
}
Response Format
Both tools return a JSON object with the following structure:
{
url: string;
createdAt?: Date;
modifiedAt?: Date;
publishedAt?: Date;
sources: TimestampSource[];
confidence: 'high' | 'medium' | 'low';
errors?: string[];
}
TimestampSource:
{
type: 'html-meta' | 'http-header' | 'json-ld' | 'microdata' | 'opengraph' | 'twitter' | 'heuristic';
field: string;
value: string;
confidence: 'high' | 'medium' | 'low';
}
Supported Timestamp Sources
HTML Meta Tags
article:published_timearticle:modified_timedatepubdatepublishdatelast-modifieddc.date.createddc.date.modifieddcterms.createddcterms.modified
HTTP Headers
Last-ModifiedDate
JSON-LD Structured Data
datePublisheddateModifieddateCreated
Microdata
datePublisheddateModified
OpenGraph
og:article:published_timeog:article:modified_timeog:updated_time
Twitter Cards
twitter:data1(when containing date information)
Heuristic Analysis
- Time elements with
datetimeattributes - Common date patterns in text
- Date-related CSS classes
Development
Scripts
# Development with hot reload
npm run dev
# Build the project
npm run build
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Lint code
npm run lint
# Fix linting issues
npm run lint:fix
# Format code
npm run format
Testing
The project includes comprehensive tests:
# Run all tests
npm test
# Run tests with coverage
npm test -- --coverage
# Run specific test file
npm test -- extractor.test.ts
Code Quality
- TypeScript: Full TypeScript support with strict type checking
- ESLint: Code linting with recommended rules
- Prettier: Code formatting
- Jest: Unit and integration testing
- 95%+ Test Coverage: Comprehensive test suite
Examples
Basic Usage
import { TimestampExtractor } from './src/extractor.js';
const extractor = new TimestampExtractor();
const result = await extractor.extractTimestamps('https://example.com/article');
console.log('Published:', result.publishedAt);
console.log('Modified:', result.modifiedAt);
console.log('Confidence:', result.confidence);
console.log('Sources:', result.sources.length);
Custom Configuration
const extractor = new TimestampExtractor({
timeout: 15000,
userAgent: 'MyBot/1.0',
enableHeuristics: false,
maxRedirects: 3
});
const result = await extractor.extractTimestamps('https://example.com');
Batch Processing
const urls = [
'https://example.com/article1',
'https://example.com/article2',
'https://example.com/article3'
];
const results = await Promise.all(
urls.map(url => extractor.extractTimestamps(url))
);
Use Cases
- Content Analysis: Analyze temporal aspects of web content
- Web Scraping: Extract temporal metadata from scraped pages
- SEO Analysis: Analyze publication and modification patterns
- Research: Study temporal aspects of web content
- Content Management: Track content lifecycle and updates
Error Handling
The extractor handles various error conditions gracefully:
- Network Errors: Timeout, connection refused, DNS resolution failures
- HTTP Errors: 404, 500, and other HTTP status codes
- Parsing Errors: Invalid HTML, malformed JSON-LD, unparseable dates
- Configuration Errors: Invalid URLs, timeout values, etc.
All errors are captured in the errors array of the response, allowing for robust error handling and debugging.
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
- Fork the repository
- Clone your fork:
git clone https://github.com/Fabien-desablens/mcp-webpage-timestamps.git - Install dependencies:
npm install - Create a branch:
git checkout -b feature/your-feature - Make your changes
- Run tests:
npm test - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature/your-feature - Submit a pull request
Code Style
- Follow the existing code style
- Use TypeScript for all new code
- Add tests for new functionality
- Update documentation as needed
License
MIT License - see the LICENSE file for details.
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Changelog
See CHANGELOG.md for a detailed history of changes.
Acknowledgments
- Model Context Protocol for the excellent MCP framework
- Cheerio for HTML parsing
- Axios for HTTP requests
- date-fns for date parsing and manipulation
Alternatives
Related Skills
Browse all skillsAutomate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications. Triggers include "browse", "navigate to", "go to website", "extract data from webpage", "screenshot", "web scraping", "fill out form", "click on", "search for on the web". When taking actions be as specific as possible.
Web scraping and search via Bright Data API. Requires BRIGHTDATA_API_KEY and BRIGHTDATA_UNLOCKER_ZONE. Use for scraping any webpage as markdown (bypassing bot detection/CAPTCHA) or searching Google with structured results.
Google Analytics 4, Search Console, and Indexing API toolkit. Analyze website traffic, page performance, user demographics, real-time visitors, search queries, and SEO metrics. Use when the user asks to: check site traffic, analyze page views, see traffic sources, view user demographics, get real-time visitor data, check search console queries, analyze SEO performance, request URL re-indexing, inspect index status, compare date ranges, check bounce rates, view conversion data, or get e-commerce revenue. Requires a Google Cloud service account with GA4 and Search Console access.
Advanced content and topic research skill that analyzes trends across Google Analytics, Google Trends, Substack, Medium, Reddit, LinkedIn, X, blogs, podcasts, and YouTube to generate data-driven article outlines based on user intent analysis
Manage Zotero reference libraries via the Web API. Search, list, add items by DOI/ISBN/PMID (with duplicate detection), delete/trash items, update metadata and tags, export in BibTeX/RIS/CSL-JSON, batch-add from files, check PDF attachments, cross-reference citations, find missing DOIs via CrossRef, and fetch open-access PDFs. Supports --json output for scripting. Use when the user asks about academic references, citation management, literature libraries, PDFs for papers, bibliography export, or Zotero specifically.
Analyze Google Analytics data, review website performance metrics, identify traffic patterns, and suggest data-driven improvements. Use when the user asks about analytics, website metrics, traffic analysis, conversion rates, user behavior, or performance optimization.