Every agent evaluated by Pecta accumulates a reputation score tied to its agent_id. The score is a portable 0–1000 integer stored centrally, which means it travels with the agent across every platform and integration mode — whether the agent is gated via the SDK, the proxy, or the REST API. A single score reflects all of an agent’s evaluations regardless of where they originated.
The score is a weighted sum of four components, each normalised to its maximum contribution:
Score = (pass_rate × 400) + (latency_score × 250) + (streak × 200) + (volume × 150)
| Component | Weight | Description |
|---|
pass_rate | 400 | Fraction of evaluations that passed, in the rolling window. 1.0 = all pass. |
latency_score | 250 | Normalised inverse of average latency. Faster agents score higher. |
streak | 200 | Rewards consecutive passing evaluations. A long unbroken streak raises the score. |
volume | 150 | Rewards consistent usage. More evaluations in the window increases this component up to its cap. |
The maximum possible score is 1000. A brand-new agent with perfect evaluations climbs toward 1000 as it accumulates volume.
Lifecycle states
The score goes through four lifecycle states as evaluations accumulate:
| State | Condition | Meaning |
|---|
new | 0 evaluations | Agent has never been evaluated. |
calibrating | 1–49 evaluations | Score is being established; shown as N/50 in the dashboard. |
active | 50–499 evaluations | Score is statistically reliable and fully displayed. |
mature | 500+ evaluations | Rolling window is fully saturated; oldest evaluations age out as new ones arrive. |
The score only becomes visible to external consumers at 50 evaluations. Below that threshold the data is insufficient for a statistically reliable signal, so the dashboard displays the calibration progress instead of a raw number.
Rolling window
Reputation is computed over the last 500 evaluations. When a new evaluation arrives and the window is full, the oldest entry is evicted. This means an agent can recover from a bad period: sustained good behaviour will eventually push earlier failures out of the window.
Portability
The score is stored centrally under a stable agent_id per organization. Any integration mode that uses the same agent_id writes to the same window:
- An SDK evaluation in your Node.js service
- A proxy evaluation from Claude Desktop on a developer’s laptop
- A REST API evaluation from your Python test suite
All three update the same rolling window and the same score. You control the agent_id string, so you can segment agents as finely as you need (e.g. dsp-bidder-prod vs dsp-bidder-staging).
Reading the score
The REST API returns the current reputation inline on every /v1/evaluate response:
{
"evaluation_id": "v7k2mQpXtR9fNwJz",
"passed": true,
"reputation": {
"score": 812,
"lifecycle": "active",
"eval_count": 124
}
}
You can also fetch the score directly:
curl https://api.pecta.ai/v1/reputation/research-bot-v2 \
-H "Authorization: Bearer $PECTA_API_KEY"
{
"agent_id": "research-bot-v2",
"score": 812,
"lifecycle": "active",
"eval_count": 124,
"window_size": 500
}
Scores are served with low latency and reflect evaluations ingested within seconds.