Every agent evaluated by Pecta accumulates a reputation score tied to itsDocumentation Index
Fetch the complete documentation index at: https://docs.pecta.ai/llms.txt
Use this file to discover all available pages before exploring further.
agent_id. The score is a portable 0–1000 integer stored centrally, which means it travels with the agent across every platform and integration mode — whether the agent is gated via the SDK, the proxy, or the REST API. A single score reflects all of an agent’s evaluations regardless of where they originated.
Scoring formula
The score is a weighted sum of four components, each normalised to its maximum contribution:| Component | Weight | Description |
|---|---|---|
pass_rate | 400 | Fraction of evaluations that passed, in the rolling window. 1.0 = all pass. |
latency_score | 250 | Normalised inverse of average latency. Faster agents score higher. |
streak | 200 | Rewards consecutive passing evaluations. A long unbroken streak raises the score. |
volume | 150 | Rewards consistent usage. More evaluations in the window increases this component up to its cap. |
Lifecycle states
The score goes through four lifecycle states as evaluations accumulate:| State | Condition | Meaning |
|---|---|---|
new | 0 evaluations | Agent has never been evaluated. |
calibrating | 1–49 evaluations | Score is being established; shown as N/50 in the dashboard. |
active | 50–499 evaluations | Score is statistically reliable and fully displayed. |
mature | 500+ evaluations | Rolling window is fully saturated; oldest evaluations age out as new ones arrive. |
The score only becomes visible to external consumers at 50 evaluations. Below that threshold the data is insufficient for a statistically reliable signal, so the dashboard displays the calibration progress instead of a raw number.
Rolling window
Reputation is computed over the last 500 evaluations. When a new evaluation arrives and the window is full, the oldest entry is evicted. This means an agent can recover from a bad period: sustained good behaviour will eventually push earlier failures out of the window.Portability
The score is stored centrally under a stableagent_id per organization. Any integration mode that uses the same agent_id writes to the same window:
- An SDK evaluation in your Node.js service
- A proxy evaluation from Claude Desktop on a developer’s laptop
- A REST API evaluation from your Python test suite
agent_id string, so you can segment agents as finely as you need (e.g. dsp-bidder-prod vs dsp-bidder-staging).
Reading the score
The REST API returns the current reputation inline on every/v1/evaluate response: