MCP Server Development Guide

Overview

Create MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. The quality of an MCP server is measured by how well it enables LLMs to accomplish real-world tasks.

Process

🚀 High-Level Workflow

Creating a high-quality MCP server involves four main phases:

Phase 1: Deep Research and Planning

1.1 Understand Modern MCP Design

API Coverage vs. Workflow Tools: Balance comprehensive API endpoint coverage with specialized workflow tools. Workflow tools can be more convenient for specific tasks, while comprehensive coverage gives agents flexibility to compose operations. Performance varies by client—some clients benefit from code execution that combines basic tools, while others work better with higher-level workflows. When uncertain, prioritize comprehensive API coverage.

Tool Naming and Discoverability: Clear, descriptive tool names help agents find the right tools quickly. Use consistent prefixes (e.g., github_create_issue, github_list_repos) and action-oriented naming.

Context Management: Agents benefit from concise tool descriptions and the ability to filter/paginate results. Design tools that return focused, relevant data. Some clients support code execution which can help agents filter and process data efficiently.

Actionable Error Messages: Error messages should guide agents toward solutions with specific suggestions and next steps.

1.2 Study MCP Protocol Documentation

Navigate the MCP specification:

Start with the sitemap to find relevant pages: https://modelcontextprotocol.io/sitemap.xml

Then fetch specific pages with .md suffix for markdown format (e.g., https://modelcontextprotocol.io/specification/draft.md).

Key pages to review:

Specification overview and architecture
Transport mechanisms (streamable HTTP, stdio)
Tool, resource, and prompt definitions

1.3 Study Framework Documentation

Recommended stack:

Language: TypeScript (high-quality SDK support and good compatibility in many execution environments e.g. MCPB. Plus AI models are good at generating TypeScript code, benefiting from its broad usage, static typing and good linting tools)
Transport: Streamable HTTP for remote servers, using stateless JSON (simpler to scale and maintain, as opposed to stateful sessions and streaming responses). stdio for local servers.

Load framework documentation:

MCP Best Practices: 📋 View Best Practices - Core guidelines

For TypeScript (recommended):

TypeScript SDK: Use WebFetch to load https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md
⚡ TypeScript Guide - TypeScript patterns and examples

For Python:

Python SDK: Use WebFetch to load https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md
🐍 Python Guide - Python patterns and examples

1.4 Plan Your Implementation

Understand the API: Review the service's API documentation to identify key endpoints, authentication requirements, and data models. Use web search and WebFetch as needed.

Tool Selection: Prioritize comprehensive API coverage. List endpoints to implement, starting with the most common operations.

Phase 2: Implementation

2.1 Set Up Project Structure

See language-specific guides for project setup:

⚡ TypeScript Guide - Project structure, package.json, tsconfig.json
🐍 Python Guide - Module organization, dependencies

2.2 Implement Core Infrastructure

Create shared utilities:

API client with authentication
Error handling helpers
Response formatting (JSON/Markdown)
Pagination support

2.3 Implement Tools

For each tool:

Input Schema:

Use Zod (TypeScript) or Pydantic (Python)
Include constraints and clear descriptions
Add examples in field descriptions

Output Schema:

Define outputSchema where possible for structured data
Use structuredContent in tool responses (TypeScript SDK feature)
Helps clients understand and process tool outputs

Tool Description:

Concise summary of functionality
Parameter descriptions
Return type schema

Implementation:

Async/await for I/O operations
Proper error handling with actionable messages
Support pagination where applicable
Return both text content and structured data when using modern SDKs

Annotations:

readOnlyHint: true/false
destructiveHint: true/false
idempotentHint: true/false
openWorldHint: true/false

Phase 3: Review and Test

3.1 Code Quality

Review for:

No duplicated code (DRY principle)
Consistent error handling
Full type coverage
Clear tool descriptions

3.2 Build and Test

TypeScript:

Run npm run build to verify compilation
Test with MCP Inspector: npx @modelcontextprotocol/inspector

Python:

Verify syntax: python -m py_compile your_server.py
Test with MCP Inspector

See language-specific guides for detailed testing approaches and quality checklists.

Phase 4: Create Evaluations

After implementing your MCP server, create comprehensive evaluations to test its effectiveness.

Load ✅ Evaluation Guide for complete evaluation guidelines.

4.1 Understand Evaluation Purpose

Use evaluations to test whether LLMs can effectively use your MCP server to answer realistic, complex questions.

4.2 Create 10 Evaluation Questions

To create effective evaluations, follow the process outlined in the evaluation guide:

Tool Inspection: List available tools and understand their capabilities
Content Exploration: Use READ-ONLY operations to explore available data
Question Generation: Create 10 complex, realistic questions
Answer Verification: Solve each question yourself to verify answers

4.3 Evaluation Requirements

Ensure each question is:

Independent: Not dependent on other questions
Read-only: Only non-destructive operations required
Complex: Requiring multiple tool calls and deep exploration
Realistic: Based on real use cases humans would care about
Verifiable: Single, clear answer that can be verified by string comparison
Stable: Answer won't change over time

4.4 Output Format

Create an XML file with this structure:

<evaluation>
  <qa_pair>
    <question>Find discussions about AI model launches with animal codenames. One model needed a specific safety designation that uses the format ASL-X. What number X was being determined for the model named after a spotted wild cat?</question>
    <answer>3</answer>
  </qa_pair>
<!-- More qa_pairs... -->
</evaluation>

Reference Files

📚 Documentation Library

Load these resources as needed during development:

Core MCP Documentation (Load First)

MCP Protocol: Start with sitemap at https://modelcontextprotocol.io/sitemap.xml, then fetch specific pages with .md suffix
📋 MCP Best Practices - Universal MCP guidelines including:
- Server and tool naming conventions
- Response format guidelines (JSON vs Markdown)
- Pagination best practices
- Transport selection (streamable HTTP vs stdio)
- Security and error handling standards

SDK Documentation (Load During Phase 1/2)

Python SDK: Fetch from https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md
TypeScript SDK: Fetch from https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md

Language-Specific Implementation Guides (Load During Phase 2)

🐍 Python Implementation Guide - Complete Python/FastMCP guide with:
- Server initialization patterns
- Pydantic model examples
- Tool registration with @mcp.tool
- Complete working examples
- Quality checklist
⚡ TypeScript Implementation Guide - Complete TypeScript guide with:
- Project structure
- Zod schema patterns
- Tool registration with server.registerTool
- Complete working examples
- Quality checklist

Evaluation Guide (Load During Phase 4)

✅ Evaluation Guide - Complete evaluation creation guide with:
- Question creation guidelines
- Answer verification strategies
- XML format specifications
- Example questions and answers
- Running an evaluation with the provided scripts

Files9

9 files · 110.0 KB

Select a file to preview

Grade adjusted by static analysis guardrails

AI scored this skill as grade A, but static analysis findings capped it to B:

• Recursive deletion pattern (rm -rf) (max: B)

Overall Score

86/100

Grade

B

Good

Safety

82

Quality

90

Clarity

85

Completeness

87

Summary

This skill guides developers through creating Model Context Protocol (MCP) servers that enable LLMs to interact with external APIs and services. It provides a structured 4-phase workflow covering research, implementation, testing, and evaluation, with language-specific guides for Python (FastMCP) and TypeScript (MCP SDK), plus reference materials on best practices and evaluation methodologies.

Static Analysis Findings

2 findings

Patterns detected by deterministic static analysis before AI scoring. Hover over any finding code for detailed information and remediation guidance.

Credential Exposure

SEC-020Direct .env File Access8x in 3 files

Direct .env file access

scripts/evaluation.py.env2x

scripts/connections.py.env2x

reference/node_mcp_server.md.env4x

85% confidenceCWE-538: Sensitive Info in Externally-Accessible File

Destructive Operation

SEC-001Recursive DeletionMax: B

Recursive deletion pattern (rm -rf)

reference/node_mcp_server.mdrm -rf

100% confidenceCWE-379: Unrestricted File Deletion

Detected Capabilities

Guidance on MCP protocol design and tool naming conventionsPython and TypeScript implementation frameworks and patternsInput validation strategies using Pydantic (Python) and Zod (TypeScript)Pagination, error handling, and response formatting techniquesEvaluation harness (scripts/evaluation.py) for testing MCP servers with ClaudeDocumentation and project structure recommendationsAsync/await and type safety best practices

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

build mcp serverintegrate api with agentscreate mcp toolsmcp server evaluationfastmcp pythonmcp typescript sdktool schema validationllm tool testing

Risk Signals

INFO

Direct .env file access for loading environment variables

scripts/evaluation.py, scripts/connections.py, reference/node_mcp_server.md

INFO

Recursive deletion (rm -rf) reference

reference/node_mcp_server.md, line in package.json 'clean' script example

INFO

External URL references and WebFetch usage

SKILL.md Phase 1.2, Phase 1.3; reference materials fetch from raw.githubusercontent.com and modelcontextprotocol.io

WARNING

API request execution and tool calls to external services

scripts/evaluation.py (agent_loop function), scripts/connections.py (MCP connection classes)

INFO

Environment variable usage for API authentication

scripts/evaluation.py and scripts/connections.py: env parameter handling

INFO

Anthropic API key requirement

scripts/evaluation.py (uses Anthropic client), reference/evaluation.md setup instructions

Referenced Domains

External domains referenced in skill content, detected by static analysis.

api.example.comexample.commodelcontextprotocol.ioraw.githubusercontent.comwww.apache.org

Use Cases

Build an MCP server to integrate a third-party API with AI agents
Create evaluation tests for an MCP server to verify LLM tool usage
Design tools with proper input validation, error handling, and response formatting
Set up a TypeScript or Python MCP project with best practices
Evaluate whether an MCP server enables LLMs to solve complex real-world tasks

Quality Notes

STRENGTH: Comprehensive 4-phase workflow provides clear structure from research through evaluation. Each phase has defined objectives and deliverables.
STRENGTH: Language-specific implementation guides (Python/TypeScript) are well-organized with complete working examples, type definitions, and error handling patterns.
STRENGTH: Best practices documentation covers tool naming, pagination, response formats, security, authentication, and error handling. Adheres to agentskills.io specification.
STRENGTH: Evaluation guide is exceptionally thorough—explains purpose, provides concrete question/answer guidelines, includes examples of good vs. poor questions, and guides verification process.
STRENGTH: All reference files are present and comprehensive. Supporting documentation includes Zod/Pydantic patterns, async/await best practices, and complete project setup.
STRENGTH: The evaluation harness (scripts/evaluation.py) is production-ready with multiple transport support (stdio, SSE, HTTP), structured reporting, and metrics collection.
STRENGTH: Security guidance is explicit—environment variable usage for credentials, input validation patterns, authentication best practices, DNS rebinding protection for HTTP servers.
STRENGTH: Clear annotations system (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) helps clients understand tool behavior.
MODERATE: Some sections reference external URLs (e.g., MCP specification sitemap) that require WebFetch calls—agents must have network access. This is documented but adds a dependency.
MODERATE: The skill assumes agents have familiarity with async programming, schema validation (Zod/Pydantic), and HTTP APIs. Steep learning curve for beginners.
MINOR: Python and TypeScript guides are lengthy (25-28KB each). While comprehensive, could benefit from shorter quick-reference sections at the start.
MINOR: The evaluation guide emphasizes creating 'very challenging' questions, but doesn't provide guidance on when to simplify if a tool set genuinely cannot support complex tasks.

Model: claude-haiku-4-5-20251001Analyzed: Apr 5, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

mcp-builder

MCP Server Development Guide

Overview

Process

🚀 High-Level Workflow

Phase 1: Deep Research and Planning

1.1 Understand Modern MCP Design

1.2 Study MCP Protocol Documentation

1.3 Study Framework Documentation

1.4 Plan Your Implementation

Phase 2: Implementation

2.1 Set Up Project Structure

2.2 Implement Core Infrastructure

2.3 Implement Tools

Phase 3: Review and Test

3.1 Code Quality

3.2 Build and Test

Phase 4: Create Evaluations

4.1 Understand Evaluation Purpose

4.2 Create 10 Evaluation Questions

4.3 Evaluation Requirements

4.4 Output Format

Reference Files

📚 Documentation Library

Core MCP Documentation (Load First)

SDK Documentation (Load During Phase 1/2)

Language-Specific Implementation Guides (Load During Phase 2)

Evaluation Guide (Load During Phase 4)

Summary

Static Analysis Findings

Detected Capabilities

Trigger Keywords

Risk Signals

Referenced Domains

Use Cases

Quality Notes

Reviews

Command Palette