In partnership with

MCP Integration — Enjoyable, Skimmable, Production‑Safe (10‑minute Edition)

Estimated read: ~10 minutes  •  Audience: engineers & product teams who prefer clear, fast writing.
Promise: No fluff. By the end, you’ll know what MCP is, how to ship it, and how to keep it safe — with copy‑paste snippets and checklists.


🥤 The shortest possible “what & why”

  • What: MCP standardizes how an AI host discovers and calls your capabilities (tools, resources, prompts) via a simple client–server protocol.
  • Why now: It reduces glue code and breakage when models or backends change.
  • Win: predictable contracts (JSON Schemas), safer writes (approvals), easier audits (logs/traces), cleaner upgrade paths.

🧭 Quick decision tree

Do you already have a private HTTP service?
  ├─ Yes → Streamable HTTP (control headers/tokens; stays in VPC)
  └─ No
       ├─ Need a public connector? → Hosted tools
       ├─ Already stream events? → HTTP + SSE
       └─ Desktop/local dev? → stdio

🔑 Core concepts without jargon

  • Host: your AI application that orchestrates conversations and user experience.
  • Client: the MCP‑speaking piece inside the host.
  • Server: a process that exposes capabilities to the host.
  • Capabilities:
    • Tools: callable functions; inputs validated with JSON Schema.
    • Resources: readable URIs (files, DB views, API projections).
    • Prompts: reusable, parameterized templates.

Message flow (conceptual):

Host(Client)  -- list_tools ---------------------->  Server
Host(Client)  <-- tools[] -------------------------  Server
Host(Model)   -- call tool + args ---------------->  Server
Host(Model)   <-- result / error ------------------  Server
Host(Client)  -- read resource / prompt ---------->  Server
Host(Client)  <-- content -------------------------  Server

📦 Copy‑paste starter pack

1) Tool schema skeleton (trim to fit)

{
  "name": "invoice.pay.v2",
  "description": "Authorize and schedule payment for an invoice",
  "input_schema": {
    "type": "object",
    "properties": {
      "invoice_id": { "type": "string", "pattern": "INV-[0-9]{6}" },
      "amount": { "type": "number", "minimum": 0 },
      "currency": { "type": "string", "enum": ["USD","EUR","GBP"] },
      "scheduled_for": { "type": "string", "format": "date-time" },
      "note": { "type": "string", "maxLength": 200 }
    },
    "required": ["invoice_id","amount","currency"]
  }
}

2) Human‑readable approval request

Request: Pay INV-000123 for 42.50 USD tomorrow
Tool: invoice.pay.v2
Arguments: {"invoice_id":"INV-000123","amount":42.5,"currency":"USD"}
Risk: medium  |  Two-person rule: no  |  Request ID: r_4a71

3) Minimal log fields

ts, request_id, user_id, tenant_id, server, tool, args_redacted, outcome, latency_ms

🎯 Do this / Not that

Do thisNot that
Allowlist tools per task/tenant; default deny Expose every tool and hope the model guesses right
Require approval for writes; show a friendly summary Let writes happen silently behind the scenes
Add idempotency keys and timeouts Retry blindly until a duplicate payment lands
Return structured results whenever possible Force brittle text parsing of critical outcomes
Trace list‑tools and tool‑calls with request IDs Debug blind when a tenant’s run fails

🧪 Tests you’ll actually run

case "pay valid invoice":
  input = {"invoice_id":"INV-000123","amount":42.5,"currency":"USD"}
  expect status="scheduled"

case "reject wrong currency":
  input = {"invoice_id":"INV-000124","amount":10,"currency":"XYZ"}
  expect error="INVALID_INPUT"

case "timeout then retry":
  simulate TIMEOUT once → expect retry w/ backoff → success

🧰 Five patterns that ship

  1. Read‑first guardrail: every write tool has a paired read tool so the model can confirm state before proposing a change.
  2. Human‑in‑the‑loop writes: show a one‑line, human summary of the action; require explicit approval.
  3. Scoped resources: tools accept tenant/project scopes; hosts inject only what’s needed for the task.
  4. Structured outputs: return JSON with stable fields; give text only for user‑visible summaries.
  5. Idempotent by default: writes require an idempotency_key and return the same result on repeat.

🧩 Pitfalls and fixes

  • Pitfall: model invents arguments.
    Fix: strict schema validation; reject unknown fields; echo validated args in the preview.
  • Pitfall: retries cause duplicate side‑effects.
    Fix: idempotency keys; store‑and‑forward with safe repeats.
  • Pitfall: timeouts cause user confusion.
    Fix: per‑tool timeouts and partial results; clear status messages.
  • Pitfall: hidden model prompts.
    Fix: sampling transparency; let users view the final prompt for sensitive actions.
  • Pitfall: over‑broad access.
    Fix: scope tokens and resources to tenant/project; default deny.

📖 Case Study #1 — Finance “Tuesday Payables” (2 minutes)

Finance asks: “Pay vendor ACME for last month.” The assistant connects to the MCP server via Streamable HTTP. The host filters tools to three: invoice.get, report.summary, and invoice.pay. It fetches open invoices (read), shows a preview of the write call, and asks for approval: “Pay INV‑000123 for 42.50 USD tomorrow, request r_4a71.” You approve. The client sets an idempotency_key, the server executes, and returns a structured result. Logs capture who approved and how long it took. Later, audit is a screenshot away.


🧰 Case Study #2 — Support Automation (RAG + Actions) (3 minutes)

Problem: Support agents spend time hunting across docs, tickets, and Slack for fixes, then run routine operations (restart a job, block a token). You want a safe assist that finds info and can act with guardrails.

Server capabilities:

  • resource.search_kb: query docs and past tickets; returns snippets with sources.
  • tool.summarize_incident: condenses multiple snippets into a short action list.
  • tool.restart_worker (sensitive): restarts a named worker in a scoped project; requires approval.

Host flow:

  1. Filter to the three capabilities above for the “incident triage” task.
  2. Run resource.search_kb with the incident ID; stream snippets and display sources.
  3. Call tool.summarize_incident to produce a bullet plan.
  4. Propose tool.restart_worker with arguments and a one‑line summary; require approval.
  5. If approved, call with an idempotency_key; log request ID, approver, and outcome.

Result: New agents resolve issues faster; risky actions are gated; audits are one click. Upgrading the model doesn’t break the flow because schemas and approvals are stable.


🧪 Myths vs Reality

MythReality
“MCP is just another agent framework.” MCP is a protocol for capability discovery and calls. It complements whatever agent stack you use.
“Schemas slow us down.” Minimal schemas (types, enums, examples) speed you up by preventing invalid calls and brittle parsing.
“If we trust the model, we don’t need approvals.” Approvals are about governance, not model quality. They document intent and limit blast radius.
stdio is unsafe.” stdio avoids network exposure; pair it with OS sandboxing and explicit consent prompts.

⚙️ Latency & Cost Knobs (Cheat Sheet)

  • Cache discovery: tools/resources list with short TTLs.
  • Batch reads: collapse small lookups into single calls.
  • Stream results: show incremental progress for large outputs.
  • Timeout tiers: fast for reads (<1s), longer for reports; surface partials.
  • Backpressure: cap concurrent tool calls per user and per tenant.
  • Prompt economy: concise tool descriptions reduce tokens and deliberation.

🛡️ Mini Threat Model (5 checks)

  1. Input integrity: validate against schemas; reject unknown fields.
  2. Context poisoning: treat tool descriptions and resources as untrusted unless from trusted servers.
  3. Scope leaks: attach tenant/project scopes; never wildcard.
  4. Replay & duplication: require idempotency_key for writes.
  5. Human confirmation: approvals with human‑readable summaries for any irreversible action.

🗺️ Rollout in four weeks

  1. Week 1: pick one read tool with clear value. Define schema + golden tests. Turn on traces.
  2. Week 2: add one write tool behind approval. Implement idempotency, error taxonomy, and timeouts.
  3. Week 3: pilot with a test tenant. Add dashboards for latency/errors/denials. Red‑team for prompt injection and scope leaks.
  4. Week 4: write runbooks, finalize retention & access controls, and roll out gradually.

❓ FAQ (one‑liners)

  • Does MCP include authentication? It defines how to talk; auth and network controls are your choices. Use scoped tokens and private networking where possible.
  • Can I use multiple transports? Yes. Prototype with stdio, scale with HTTP, expose a subset as Hosted tools.
  • What breaks during model upgrades? Poorly specified tools. Schemas + tests prevent surprises.
  • How do I keep auditors happy? Log request IDs, tool names, redacted args, outcomes, approvals, and retention policy.

🧩 Glossary (single‑line definitions)

  • Host: the app that connects and orchestrates calls.
  • Client: the MCP‑speaking piece inside the host.
  • Server: your process that exposes tools/resources/prompts.
  • Transport: how they talk (Hosted, HTTP, SSE, stdio).
  • Idempotency key: a token that turns retries into safe repeats.
  • Approval: an explicit human confirmation step for sensitive actions.

📎 Appendix: runbooks & codes

Runbook: elevated error rate

1) Identify tool + tenant on dashboard
2) Check recent deploys or schema changes
3) Sample 5 failing requests (redacted args)
4) Roll back if spike follows a deploy
5) Add circuit breaker if upstream is degraded

Suggested error codes

INVALID_INPUT        // schema validation failed
PERMISSION_DENIED    // auth or scope check failed
NOT_FOUND            // resource missing
RATE_LIMITED         // client should retry with backoff
TIMEOUT              // partial or no result
INTERNAL             // unexpected server error

Note: HTML is deliberately simple (no scripts/styles) for safe email rendering.


🧱 Design Review Checklist (use before exposing a new tool)

  1. Purpose clear? One sentence that a non‑owner can understand.
  2. Inputs validated? Types, ranges, enums, formats; examples included.
  3. Side‑effects mapped? Read vs write; data touched; rollback story.
  4. Idempotency? Key accepted and enforced for writes.
  5. Scopes? Tenant/project/resource URIs required; no wildcards.
  6. Approval? Who approves; summary text reviewed; two‑person rule?
  7. Errors? Stable codes + human remediation notes.
  8. Observability? Trace spans + minimal logs + dashboards.
  9. Runbook? “Elevated errors” and “slow tool” procedures exist.

📤 Structured Results — Good vs Better

GoodBetter
{"status":"scheduled","invoice_id":"INV-000123"}
{
  "status": "scheduled",
  "invoice_id": "INV-000123",
  "scheduled_for": "2025-11-15T09:00:00Z",
  "amount": {"value": 42.5, "currency": "USD"},
  "approval": {"by":"alice","at":"2025-11-11T10:03:11Z"},
  "request_id": "r_4a71"
}

🧬 Versioning Strategies

  • Major in name: invoice.pay.v2 — easiest to reason about; keep v1 until consumers migrate.
  • Capability flags: expose invoice.pay plus capabilities=["scheduled","partial_refund"]; use only when necessary.
  • Deprecation policy: announce N+2 releases ahead; keep smoke tests for old versions during the window.

🗣️ Approval UX — Microcopy you can reuse

  • “You’re about to pay $42.50 for INV‑000123 on Nov 15. Continue?”
  • “This action changes production data. A human will review and approve before it runs.”
  • “We’ll log the request ID and outcome for audit. No sensitive fields are stored.”

🔒 Data Privacy in Practice

  • Minimize inputs: only pass the fields a tool needs; mask personal data when possible.
  • Limit retention: set explicit TTLs for logs and caches; document who can access audits.
  • Purpose binding: tag logs with the task and tenant to prevent off‑purpose reuse.

🏢 Multi‑Tenant Patterns

  • Require tenant_id on every call; inject it at the host from session context.
  • Partition logs and metrics by tenant; alert per‑tenant anomalies separately.
  • Use distinct approval policies per tenant where risk differs.

🚀 Migration: From Ad‑hoc Tool Calls to MCP

  1. Inventory existing “tool‑like” endpoints (internal APIs, scripts, Lambda functions).
  2. Pick three: one read, one transform, one write (low risk). Wrap them with MCP schemas.
  3. Add approval + idempotency to the write; create golden tests for all three.
  4. Switch the host to discover tools via MCP; keep the old path behind a feature flag.
  5. Measure errors/latency for a week; then retire the old path.

🩺 Troubleshooting — Top 7 errors you’ll see

  1. INVALID_INPUT: schema mismatch — log the path to the failing field, not the whole payload.
  2. PERMISSION_DENIED: missing scope — include which scope is required in the message.
  3. RATE_LIMITED: tell the client the retry window; prefer jittered backoff.
  4. TIMEOUT: surface partials and the last successful step.
  5. INTERNAL: include a stable error code and request ID; avoid leaking stack traces to users.
  6. NOT_FOUND: clarify whether the resource or the tool version was missing.
  7. CONFLICT: when idempotency detects a prior write — return the prior result.

📈 Metrics & SLOs

  • Availability: % successful tool calls per tool and per tenant.
  • Latency: p50/p95/p99 by tool; separate network vs execution time.
  • Error mix: user vs system errors; watch for invalid‑input spikes after deploys.
  • Governance: approval rates and denials by tool; mean time to approve.
  • Cost: average tokens per run (prompts + tool descriptions); cache hit rate.

🧭 Deploy Playbook (small but real)

1) Merge server with: schemas, error codes, traces, runbooks
2) Deploy to staging; run golden tests + fuzz + fault injection
3) Enable in host for internal users only; add dashboards
4) Choose one low‑risk tenant; run for a week
5) Fix top 3 issues; enable for 10% of tenants; monitor
6) Full rollout with approvals; keep deprecation window for old path

📝 Change Management — One‑paragraph template

We’re enabling MCP for payables with three capabilities: invoice.get, report.summary, and invoice.pay. Writes require human approval; all calls are logged with request IDs and redacted arguments. Expect clearer errors, faster triage, and stable contracts across model updates. Contact #assist‑mcp for questions.

🧪 Benchmark Template (keep it honest)

Task: "Schedule payment for last week's ACME invoice"
Setup: same prompts, same tenant, same network
Metrics: success rate, p95 latency, tokens, approvals
Baselines: human‑only, ad‑hoc tool calls
Result: MCP reduces invalid calls 70%, p95 latency −18%, tokens −22%
Notes: approval adds 1–2s median; acceptable for writes

Free email without sacrificing your privacy

Gmail is free, but you pay with your data. Proton Mail is different.

We don’t scan your messages. We don’t sell your behavior. We don’t follow you across the internet.

Proton Mail gives you full-featured, private email without surveillance or creepy profiling. It’s email that respects your time, your attention, and your boundaries.

Email doesn’t have to cost your privacy.

Keep Reading

No posts found