Submit

πŸ† MCP-Server-Template - REFERENCE IMPLEMENTATION πŸ†

@SteffenHebestreit

a year ago
developer-tools
This repository implements a Model Context Protocol (MCP) server for web crawling capabilities, exposing crawlers as LangChain-compatible tools or any MCP-compliant client. It includes two main services:
Content

πŸ† MCP-Server-Template - REFERENCE IMPLEMENTATION πŸ†

β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—    β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•
β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  
β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β•β•     β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•”β•β•β•  
β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘         β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘     
β•šβ•β•     β•šβ•β• β•šβ•β•β•β•β•β•β•šβ•β•         β•šβ•β•  β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•     
     
   100% COMPLIANT IMPLEMENTATION βœ…

This repository implements a reference-grade Model Context Protocol (MCP) server for web crawling capabilities, exposing crawlers as tools for any MCP-compliant client. It achieves 100% compliance with the official MCP specification version 2024-11-05.

🎯 MCP Compliance Status: 100% COMPLIANT βœ…

This server achieves 100% compliance with the official Model Context Protocol specification:

βœ… Core MCP Features Implemented

  • Session Management: Complete session lifecycle with Mcp-Session-Id headers
  • Protocol Initialization: Proper initialize and notifications/initialized handshake
  • Capability Negotiation: Full support for protocol version negotiation (2024-11-05, 2025-03-26)
  • Transport Interfaces: Callbacks for onclose, onerror, onmessage
  • Standard Error Codes: All JSON-RPC 2.0 error codes (-32700, -32600, -32601, -32602, -32603)

πŸš€ Transport Protocols

  • Modern Streamable HTTP: /mcp endpoint with GET info endpoint and enhanced POST handling
  • Official SSE Pattern: Separate GET /mcp/sse (connection) + POST /mcp/messages (data)
  • Legacy SSE Support: Backward compatible POST /mcp/sse endpoint

πŸ› οΈ Method Support

  • Modern Methods: tools/list, tools/call, resources/list, resources/read
  • Legacy Methods: mcp.capabilities, mcp.tool.use, mcp.resource.list, mcp.resource.get
  • Initialization: initialize, notifications/initialized

🎯 Enhanced Features (June 2025)

  • JSON Schema Support: Created schemaConverter.ts utility for converting Joi schemas to JSON Schema format
  • Development-Friendly: Relaxed rate limiting (1 min window, 1000 requests) for testing environment
  • Enhanced Debugging: Detailed request/response logging in Streamable HTTP handler
  • HTTP Transport Improvements: Proper JSON response termination and enhanced error handling
  • Tool Definitions: Dedicated schemas for crawl and crawlWithMarkdown tools with improved parameter validation

πŸ§ͺ Testing

Run MCP compliance tests: npm run test:mcp-compliance

Documentation

  • OVERVIEW.md: High-level architecture and conceptual overview.
  • CODE_STRUCTURE.md: Detailed explanation of each source file and its purpose.
  • MCP_API.md: Detailed API endpoint specifications, JSON-RPC methods, request/response schemas, examples, and sequence diagrams.

Folder Structure

.
β”œβ”€β”€ .env                        # Environment variables configuration file
β”œβ”€β”€ .env-template               # Template for environment variables
β”œβ”€β”€ .github/                    # GitHub-specific files (workflows, templates)
β”œβ”€β”€ .gitignore                  # Git ignore configuration
β”œβ”€β”€ CODE_STRUCTURE.md           # Detailed code structure documentation
β”œβ”€β”€ CONTRIBUTING.md             # Contribution guidelines
β”œβ”€β”€ LICENSE.md                  # MIT License file
β”œβ”€β”€ MCP_API.md                  # API specifications document
β”œβ”€β”€ OVERVIEW.md                 # System overview document
β”œβ”€β”€ README.md                   # Project overview and quick start (this file)
β”œβ”€β”€ docker-compose.yml          # Defines multi-container environment
β”œβ”€β”€ package.json                # Root package with workspace configuration
β”‚
└── mcp-service/                # MCP server implementation
    β”œβ”€β”€ Dockerfile              # Docker configuration for MCP server
    β”œβ”€β”€ package.json            # Package configuration
    β”œβ”€β”€ tsconfig.json           # TypeScript configuration
    β”œβ”€β”€ tsconfig.node.json      # TypeScript Node.js-specific configuration
    └── src/
        β”œβ”€β”€ index.ts            # Entry point for MCP server
        β”œβ”€β”€ config/             # Centralized configuration
        β”‚   β”œβ”€β”€ index.ts        # Main configuration entry point
        β”‚   β”œβ”€β”€ appConfig.ts    # Application settings
        β”‚   β”œβ”€β”€ mcpConfig.ts    # MCP-specific settings
        β”‚   β”œβ”€β”€ securityConfig.ts # Security-related settings
        β”‚   β”œβ”€β”€ crawlConfig.ts  # Web crawling settings
        β”‚   └── utils.ts        # Configuration utility functions
        β”œβ”€β”€ controllers/        # API endpoint controllers
        β”‚   β”œβ”€β”€ resourceController.ts
        β”‚   └── toolController.ts
        β”œβ”€β”€ mcp/                # MCP protocol implementation
        β”‚   └── SimpleMcpServer.ts
        β”œβ”€β”€ routes/             # Route definitions
        β”‚   β”œβ”€β”€ apiRoutes.ts    # General API endpoints
        β”‚   β”œβ”€β”€ mcpRoutes.ts    # MCP-specific endpoints (SSE)
        β”‚   └── mcpStreamableRoutes.ts # MCP endpoints with Streamable HTTP
        β”œβ”€β”€ server/             # Unified server implementation
        β”‚   └── server.ts       # Express and MCP server integration        β”œβ”€β”€ services/           # Business logic services
        β”‚   └── crawlExecutionService.ts  # Web crawling service
        β”œβ”€β”€ types/              # TypeScript type definitions
        β”‚   β”œβ”€β”€ mcp.ts          # MCP type definitions
        β”‚   β”œβ”€β”€ modelcontextprotocol.d.ts # MCP SDK type declarations
        β”‚   └── module.d.ts     # Module declarations for external libraries
        β”œβ”€β”€ utils/              # Utility functions
        β”‚   β”œβ”€β”€ logger.ts       # Logging utilities
        β”‚   β”œβ”€β”€ requestLogger.ts # HTTP request logging middleware
        β”‚   └── schemaConverter.ts # Joi to JSON Schema conversion utilities
        └── test/               # Test files
            β”œβ”€β”€ mcp-compliance-test.ts # Comprehensive MCP compliance tests
            └── mcp-compliance-test-simple.ts # Simple MCP tests

Quick Start

Using Docker

docker-compose up --build

Running Locally

  1. Install dependencies:
    npm install
    
  2. Navigate to the mcp-service directory:
    cd mcp-service
    npm install
    
  3. Define environment variables (see Configuration).
  4. Build and start the server:
    npm run build
    npm start
    
    # Or for development with auto-reload:
    npm run dev
    

API Endpoints

The server provides multiple endpoints:

  • MCP Streamable HTTP (Recommended): /mcp - Modern JSON-RPC over HTTP with streaming support
  • MCP SSE (Deprecated): /mcp/sse - Legacy Server-Sent Events endpoint
  • API Endpoints: /api/health, /api/version - General server information

See detailed API documentation in MCP_API.md.

Testing Endpoints

The modern approach recommended by the MCP specification.

Capabilities Request

curl -X POST http://localhost:${PORT:-3000}/mcp \
  -H "Content-Type: application/json" \
  -d '{
      "jsonrpc": "2.0",
      "method": "mcp.capabilities",
      "params": {},
      "id": 1
    }'

Use Tool (crawl)

curl -X POST http://localhost:${PORT:-3000}/mcp \
  -H "Content-Type: application/json" \
  -d '{
      "jsonrpc": "2.0",
      "method": "mcp.tool.use",
      "params": {
        "name": "crawl",
        "parameters": { 
          "url": "https://example.com", 
          "maxPages": 3,
          "depth": 1,
          "strategy": "bfs",
          "captureScreenshots": true,
          "captureNetworkTraffic": false,
          "waitTime": 2000
        }
      },
      "id": 2
    }'

Use Tool (crawlWithMarkdown)

curl -X POST http://localhost:${PORT:-3000}/mcp \
  -H "Content-Type: application/json" \
  -d '{
      "jsonrpc": "2.0",
      "method": "mcp.tool.use",
      "params": {
        "name": "crawlWithMarkdown",
        "parameters": { 
          "url": "https://example.com", 
          "query": "What is this site about?",
          "maxPages": 2,
          "depth": 1,
          "waitTime": 1500
        }
      },
      "id": 3
    }'

MCP SSE Endpoint (Deprecated)

The legacy approach that uses Server-Sent Events (SSE).

Capabilities Request

curl -N -X POST http://localhost:${PORT:-3000}/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{
      "jsonrpc": "2.0",
      "method": "mcp.capabilities",
      "params": {},
      "id": 1
    }'

Use Tool (crawl)

curl -N -X POST http://localhost:${PORT:-3000}/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{
      "jsonrpc": "2.0",
      "method": "mcp.tool.use",
      "params": {
        "name": "crawl",
        "parameters": { "url": "https://example.com", "maxPages": 1 }
      },
      "id": 2
    }'

API Endpoints

Health Check

curl http://localhost:${PORT:-3000}/api/health

Response:

OK

Version Info

curl http://localhost:${PORT:-3000}/api/version

Response:

{"name":"webcrawl-mcp","version":"1.0.0","description":"MCP Server for scrape websites"}

List Available Tools

curl http://localhost:${PORT:-3000}/api/tools

Response:

{
  "success": true,
  "tools": [
    {
      "name": "crawl",
      "description": "Crawl a website and extract text content and tables.",
      "parameterDescription": "URL to crawl along with optional crawling parameters like maxPages, depth, strategy, etc.",
      "returnDescription": "Object containing success status, original URL, extracted text content, optional tables, and optional error message.",
      "endpoint": "/api/tools/crawl"
    },
    {
      "name": "crawlWithMarkdown", 
      "description": "Crawl a website and return markdown-formatted content, potentially answering a specific query.",
      "parameterDescription": "URL to crawl, optional crawling parameters, and an optional query.",
      "returnDescription": "Object containing success status, original URL, markdown content, and optional error message.",
      "endpoint": "/api/tools/crawlWithMarkdown"
    }
  ]
}

Direct Tool Execution

Crawl Tool

curl -X POST http://localhost:${PORT:-3000}/api/tools/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "maxPages": 3,
    "depth": 1,
    "strategy": "bfs",
    "captureScreenshots": true,
    "captureNetworkTraffic": false,
    "waitTime": 2000
  }'

Crawl with Markdown Tool

curl -X POST http://localhost:${PORT:-3000}/api/tools/crawlWithMarkdown \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "query": "What is this site about?",
    "maxPages": 2,
    "depth": 1
  }'

Environment Variables

  • PORT (default: 3000): Port for the MCP server.
  • NODE_ENV (default: development): Environment mode (development or production).
  • MCP_NAME, MCP_VERSION, MCP_DESCRIPTION: MCP server identification.
  • CRAWL_DEFAULT_MAX_PAGES (default: 10): Default maximum pages to crawl.
  • CRAWL_DEFAULT_DEPTH (default: 3): Default crawl depth.
  • CRAWL_DEFAULT_STRATEGY (default: bfs): Default crawl strategy (bfs|dfs|bestFirst).
  • CRAWL_DEFAULT_WAIT_TIME (default: 1000): Default wait time in ms between requests.
  • LOG_LEVEL (default: info): Logging level (debug|info|warn|error).
  • CACHE_TTL (default: 3600): Cache TTL in seconds.
  • MAX_REQUEST_SIZE (default: 10mb): Maximum HTTP payload size.
  • CORS_ORIGINS (default: *): Allowed origins for CORS.
  • RATE_LIMIT_WINDOW (default: 900000): Rate limit window in milliseconds (15 minutes).
  • RATE_LIMIT_MAX_REQUESTS (default: 100): Max requests per rate limit window.
  • PUPPETEER_EXECUTABLE_PATH (optional): Custom path to Chrome/Chromium executable for Puppeteer.
  • PUPPETEER_SKIP_DOWNLOAD (default: false): Skip automatic Chromium download during installation.

Configuration

All configuration is centralized in the src/config directory with separate modules for different aspects of the system:

  • appConfig.ts: Core application settings
  • mcpConfig.ts: MCP server specific settings
  • securityConfig.ts: Security-related settings
  • crawlConfig.ts: Web crawling default parameters

Key Components

  • Server: Unified server implementation integrating Express and MCP capabilities.
  • Routes: Organized in separate files for API and MCP endpoints.
  • SimpleMcpServer: Implements MCP discovery and tool invocation logic.
  • Controllers: toolController and resourceController for handling business logic.
  • Configuration: Centralized configuration system with module-specific settings.
  • MCP Transport: Supports both modern Streamable HTTP and legacy SSE transport methods.
  • Web Crawling: Advanced Puppeteer-based crawler with error handling, multiple strategies, and content extraction capabilities.

Web Crawling Features

The MCP server includes a comprehensive web crawling service with the following features:

Browser Management

  • Robust Error Handling: Improved browser initialization with proper error handling and fallback mechanisms
  • Custom Chrome Path: Support for custom Chrome/Chromium executable paths via PUPPETEER_EXECUTABLE_PATH
  • Resource Management: Automatic browser cleanup and connection management

Content Extraction

  • Text Extraction: Intelligent visible text extraction that skips hidden elements and scripts
  • Markdown Conversion: HTML to Markdown conversion with custom rules for tables and code blocks
  • Table Extraction: Structured extraction of HTML tables with caption support
  • Screenshot Capture: Optional full-page screenshot functionality

Crawling Strategies

  • Breadth-First Search (BFS): Default strategy for systematic exploration
  • Depth-First Search (DFS): Prioritizes deeper paths in the site structure
  • Best-First Search: Simple heuristic-based crawling (shorter paths first)

Advanced Options

  • Network Traffic Monitoring: Optional tracking of HTTP requests during crawling
  • Multi-page Crawling: Support for crawling multiple pages with depth control
  • Wait Time Configuration: Configurable delays between page loads
  • Screenshot Capture: Optional full-page screenshots saved to temporary directory

Error Resilience

  • Browser Launch Fallbacks: Multiple strategies for browser initialization
  • Connection Recovery: Automatic reconnection handling for disconnected browsers
  • URL Validation: Robust URL parsing and validation with error handling
  • Timeout Management: Configurable timeouts to prevent hanging on slow pages

Customization

  • Add new routes in the routes directory.
  • Extend MCP capabilities by modifying SimpleMcpServer or adding new controllers.
  • Tune performance and security via environment variables in the config directory.

Architecture Diagram

graph LR
  Client --> Server
  Server --> Router["Routes (API/MCP)"]
  Router --> |SSE| SimpleMcpServer
  Router --> |Streamable HTTP| SimpleMcpServer
  Router --> ApiControllers
  SimpleMcpServer --> Controllers
  Controllers --> CrawlExecutionService

References

License

This project is licensed under the MIT License (see the license field in package.json).

Β© 2025 MCP.so. All rights reserved.

Build with ShipAny.