Runs

A run is a single end-to-end cycle: compile your behaviour spec into training data, augment examples with AI, fine-tune the model, and auto-evaluate the result.

The Run Object

{
  "id": "e0b7694b-2c65-4199-89a1-fc54a6a6010c",
  "behavior_spec_id": "cafd8799-...",
  "run_number": 1,
  "status": "completed",
  "spec_snapshot": { ... },
  "dataset_id": "dc66546b-...",
  "fine_tune_job_id": "b3e2b918-...",
  "model_id": "96e9f0d9-...",
  "hyperparameters": {
    "augment": true,
    "n_epochs": 4,
    "lora_rank": 8,
    "lora_alpha": 16
  },
  "eval_summary": {
    "total": 5,
    "avg_score": 0.82,
    "pass_rate": 0.8,
    "scoring_method": "llm_judge",
    "regressions": 0,
    "improvements": 3
  },
  "started_at": "2026-03-06T10:30:00.000Z",
  "completed_at": "2026-03-06T10:57:50.000Z"
}

Run Lifecycle

Status	Description
`preparing`	Compiling spec → augmenting examples → uploading to provider
`training`	Fine-tuning job running on the configured training provider
`evaluating`	Model being tested against the spec's examples
`completed`	Eval results available
`failed`	Error — check the `error` field
`cancelled`	Manually cancelled

Spec Snapshot

Every run captures a spec_snapshot — a frozen copy of the behaviour spec at run time. You can freely edit your spec between runs; each run preserves exactly what it trained on.

Eval Summary

Field	Description
`avg_score`	Mean score across all examples (0–1)
`pass_rate`	Fraction of examples that passed (score ≥ 0.7)
`exact_match_rate`	Fraction of near-perfect scores (≥ 0.95)
`avg_latency_ms`	Mean inference latency per example
`scoring_method`	`llm_judge` or `similarity`
`regressions`	Examples that scored ≥ 0.1 worse than previous run
`improvements`	Examples that scored ≥ 0.1 better than previous run

The recommended way to work with runs is the tt CLI — each endpoint below shows the tt command first, followed by the equivalent REST call.

Start a Run

POST /api/v1/behavior-specs/:id/runs

CLI

tt runs start <spec-id> --epochs 4 --lr 0.00002 --batch-size 8
tt runs start <spec-id> --dataset <dataset-id> --long-examples truncate --max-seq-length 4096
tt runs start <spec-id> --dataset <dataset-id> --max-output-tokens 512 --eval-reserved-output-tokens 128

Use tt runs watch <run-id> after starting to stream the run through preparing → training → evaluating → completed.

Use tt runs diagnose <run-id> while training is in progress to see recent epoch, loss, pace, and estimated remaining training time without direct infrastructure access.

Equivalent REST call

curl -X POST https://tunedtensor.com/api/v1/behavior-specs/:id/runs \
  -H "Authorization: Bearer <api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "augment": true,
    "hyperparameters": {
      "n_epochs": 4,
      "learning_rate": 0.00002,
      "lora_rank": 8,
      "lora_alpha": 16,
      "long_examples": "truncate",
      "max_seq_length": 4096,
      "max_output_tokens": 512,
      "eval_reserved_output_tokens": 128
    }
  }'

Parameter	Default	Description
`augment`	true	Use AI to expand examples into a larger training set
`hyperparameters.n_epochs`	3 if omitted; dashboard starts at 4	Number of training epochs (1–20)
`hyperparameters.learning_rate`	auto	Learning rate
`hyperparameters.batch_size`	provider default	Training batch size
`hyperparameters.lora_rank`	16 if omitted; dashboard starts at 8	LoRA adapter rank
`hyperparameters.lora_alpha`	32 if omitted; dashboard starts at 16	LoRA alpha scaling factor
`hyperparameters.long_examples`	error	Training preflight policy for rows that exceed the sequence budget: `error`, `truncate`, or `skip`
`hyperparameters.max_seq_length`	provider default	Maximum training sequence length in tokens; larger values may require a larger training instance
`hyperparameters.max_output_tokens`	runner default	Desired evaluation output budget in tokens for long-response tasks
`hyperparameters.eval_reserved_output_tokens`	runner default	Minimum evaluation output tokens to reserve per row when inputs approach the context window

Returns immediately with status preparing. Work happens asynchronously.

Estimate a Run

POST /api/v1/behavior-specs/:id/runs/estimate

Use the same request body as Start a Run to preview estimated training tokens, cost, and wall-clock duration before creating a run. Duration is a rough range based on completed run history.

CLI

tt runs estimate <spec-id> --epochs 4
tt runs estimate <spec-id> --dataset <dataset-id> --train-ratio 0.8 --validation-ratio 0.1 --test-ratio 0.1

Equivalent REST call

curl -X POST https://tunedtensor.com/api/v1/behavior-specs/:id/runs/estimate \
  -H "Authorization: Bearer <api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "augment": true,
    "hyperparameters": { "n_epochs": 4 }
  }'

{
  "data": {
    "estimated_training_tokens": 120000,
    "estimated_cost_cents": 22,
    "estimated_epochs": 4,
    "duration": {
      "estimated_minutes": 58,
      "range_minutes": { "low": 42, "high": 78 },
      "confidence": "medium",
      "sample_count": 12,
      "basis": "matched_model"
    }
  }
}

Live Diagnostics

GET /api/v1/runs/:id/diagnostics

CLI

tt runs diagnose <run-id>

Diagnostics include a plain-language summary plus recent learning signals such as epoch progress, loss, training pace, and estimated remaining training time.

Equivalent REST call

curl https://tunedtensor.com/api/v1/runs/:id/diagnostics \
  -H "Authorization: Bearer <api-key>" \

Run Report

GET /api/v1/runs/:id/report

CLI

tt runs report <run-id>
tt runs report <run-id> --mode failures
tt runs report <run-id> --split test

Reports expose sanitized evaluation artifacts through the Tuned Tensor API, including aggregate base-vs-tuned deltas and side-by-side Expected, Base, and Tuned outputs for regressions or failed examples.

Equivalent REST call

curl https://tunedtensor.com/api/v1/runs/:id/report \
  -H "Authorization: Bearer <api-key>" \

Cost & credits

Runs are charged from your starter or prepaid credit balance only after successful completion. Cost is calculated as:

cost_cents = ceil(epochs × training_tokens × model_rate / 1_000_000)

The model rate is per 1M training tokens, per epoch — see the rate card. Tokens are counted by the training provider after tokenisation. Successful runs debit the final provider-reported cost. Failed and cancelled runs are free.

If your credit balance is too low at start time, the request returns 402 insufficient_credits:

{
  "error": {
    "code": "insufficient_credits",
    "message": "This run is estimated at $0.42 but you have $0.10.",
    "required_cents": 42,
    "available_cents": 10,
    "topup_url": "/dashboard/billing"
  }
}

Free-eligible runs use the monthly free-run quota before credits. For paid runs or runs outside the free quota, top up at Dashboard → Billing or run tt topup. See Billing & credits for full pricing details.

List Runs for a Spec

GET /api/v1/behavior-specs/:id/runs

CLI

tt runs list --spec <spec-id>

Equivalent REST call

curl https://tunedtensor.com/api/v1/behavior-specs/:id/runs \
  -H "Authorization: Bearer <api-key>" \

List All Runs

GET /api/v1/runs

CLI

tt runs list

Equivalent REST call

curl https://tunedtensor.com/api/v1/runs \
  -H "Authorization: Bearer <api-key>" \

Returns runs across all specs with _spec_name for display.

Get Run Detail

GET /api/v1/runs/:id

CLI

# One-shot fetch
tt runs get <run-id>

# Live streaming until terminal state
tt runs watch <run-id>

Equivalent REST call

curl https://tunedtensor.com/api/v1/runs/:id \
  -H "Authorization: Bearer <api-key>" \

Returns the full run with _evals — per-example results sorted by score (worst first). Each eval includes:

prompt, expected, actual
score (0–1), passed (boolean)
reasoning — LLM judge's explanation
latency_ms — inference time

Cancel a Run

POST /api/v1/runs/:id/cancel

CLI

tt runs cancel <run-id>

Equivalent REST call

curl -X POST https://tunedtensor.com/api/v1/runs/:id/cancel \
  -H "Authorization: Bearer <api-key>" \

Cancels runs in preparing, training, or evaluating status. Also cancels the provider fine-tuning job if running.