Methodology version 1.0 — published March 2026
MCP servers are not generic APIs. They are tools that AI agents call in autonomous loops, often without human oversight between calls. The failure modes are different and more consequential than a standard API integration.
An agent calling a server with an undocumented destructive side effect may irreversibly delete data. An agent that cannot parse a server's error response may retry indefinitely, cascading failures across a workflow. A server whose tool descriptions are written for humans — not agents — causes the model to misuse or mis-select the tool entirely. A server leaking credentials in response headers compromises the agent's entire environment.
None of these failure modes are captured by existing registries. Star counts, install counts, and uptime monitoring tell you nothing about whether a server is safe to run in an autonomous agent loop. Sagentum's 8-dimension standard was built specifically for this context.
Can an AI agent understand what this tool does, when to use it, and what it will receive back — without human interpretation?
Does the server return predictable, structured output every time?
When something goes wrong, does the server tell the agent what happened and whether to retry?
Does the server handle credentials and permissions safely?
Is it safe for an agent to call this server multiple times with the same parameters? Are destructive operations clearly labelled?
Is the server documented well enough that a developer can integrate it without human interpretation?
Is there evidence the server is maintained and will continue to be?
Can a developer and their agent access this server without friction?
Dimensions 1 and 5 are weighted 1.5× — they are the most consequential for autonomous agent use. All other dimensions are weighted 1.0×. Not Tested dimensions are excluded from both numerator and denominator.
Pass = 1.0 | Partial = 0.5 | Fail = 0.0
Score = (Σ weight × value) ÷ (Σ weight) × 100
The test harness makes a maximum of 15 calls per server per assessment using User-Agent Sagentum/1.0 (+https://sagentum.com/testing-policy). Server developers can opt out by emailing testing-opt-out@sagentum.com. Opting out is not penalised.
If a published score is incorrect, submit specific counter-evidence — a quote from documentation that contradicts the assessment, or a test result showing different behaviour. Disputes without specific evidence are acknowledged but do not trigger re-assessment. Valid disputes are reviewed within 48 hours and resolved with a public changelog note.
Disputes are not negotiations. A score changes only when new evidence changes what the dimension criteria require.
The scoring rubric is versioned. Every assessment record carries the rubric version used to produce it. The rubric may only be amended quarterly. Prior versions are permanently documented — old assessments are never retroactively changed.
Current rubric version: 1.0