Runs
A run is a single end-to-end cycle: compile your behaviour spec into training data, augment examples with AI, fine-tune the model, and auto-evaluate the result.
The Run Object
{
"id": "e0b7694b-2c65-4199-89a1-fc54a6a6010c",
"behavior_spec_id": "cafd8799-...",
"run_number": 1,
"status": "completed",
"spec_snapshot": { ... },
"dataset_id": "dc66546b-...",
"fine_tune_job_id": "b3e2b918-...",
"model_id": "96e9f0d9-...",
"hyperparameters": {
"augment": true,
"n_epochs": 4,
"lora_rank": 8,
"lora_alpha": 16
},
"eval_summary": {
"total": 5,
"avg_score": 0.82,
"pass_rate": 0.8,
"scoring_method": "llm_judge",
"regressions": 0,
"improvements": 3
},
"started_at": "2026-03-06T10:30:00.000Z",
"completed_at": "2026-03-06T10:57:50.000Z"
}Run Lifecycle
| Status | Description |
|---|---|
preparing | Compiling spec → augmenting examples → uploading to provider |
training | Fine-tuning job running on the configured training provider |
evaluating | Model being tested against the spec's examples |
completed | Eval results available |
failed | Error — check the error field |
cancelled | Manually cancelled |
Spec Snapshot
Every run captures a spec_snapshot — a frozen copy of the behaviour spec at run time. You can freely edit your spec between runs; each run preserves exactly what it trained on.
Eval Summary
| Field | Description |
|---|---|
avg_score | Mean score across all examples (0–1) |
pass_rate | Fraction of examples that passed (score ≥ 0.7) |
exact_match_rate | Fraction of near-perfect scores (≥ 0.95) |
avg_latency_ms | Mean inference latency per example |
scoring_method | llm_judge or similarity |
regressions | Examples that scored ≥ 0.1 worse than previous run |
improvements | Examples that scored ≥ 0.1 better than previous run |
The recommended way to work with runs is the tt CLI — each endpoint below shows the tt command first, followed by the equivalent REST call.
Start a Run
POST /api/v1/behavior-specs/:id/runs
CLI
tt runs start <spec-id> --epochs 4 --lr 0.00002 --batch-size 8
tt runs start <spec-id> --dataset <dataset-id> --long-examples truncate --max-seq-length 4096
tt runs start <spec-id> --dataset <dataset-id> --max-output-tokens 512 --eval-reserved-output-tokens 128Use tt runs watch <run-id> after starting to stream the run through preparing → training → evaluating → completed.
Use tt runs diagnose <run-id> while training is in progress to see recent epoch, loss, pace, and estimated remaining training time without direct infrastructure access.
Equivalent REST call
curl -X POST https://tunedtensor.com/api/v1/behavior-specs/:id/runs \
-H "Authorization: Bearer <api-key>" \
-H "Content-Type: application/json" \
-d '{
"augment": true,
"hyperparameters": {
"n_epochs": 4,
"learning_rate": 0.00002,
"lora_rank": 8,
"lora_alpha": 16,
"long_examples": "truncate",
"max_seq_length": 4096,
"max_output_tokens": 512,
"eval_reserved_output_tokens": 128
}
}'| Parameter | Default | Description |
|---|---|---|
augment | true | Use AI to expand examples into a larger training set |
hyperparameters.n_epochs | 3 if omitted; dashboard starts at 4 | Number of training epochs (1–20) |
hyperparameters.learning_rate | auto | Learning rate |
hyperparameters.batch_size | provider default | Training batch size |
hyperparameters.lora_rank | 16 if omitted; dashboard starts at 8 | LoRA adapter rank |
hyperparameters.lora_alpha | 32 if omitted; dashboard starts at 16 | LoRA alpha scaling factor |
hyperparameters.long_examples | error | Training preflight policy for rows that exceed the sequence budget: error, truncate, or skip |
hyperparameters.max_seq_length | provider default | Maximum training sequence length in tokens; larger values may require a larger training instance |
hyperparameters.max_output_tokens | runner default | Desired evaluation output budget in tokens for long-response tasks |
hyperparameters.eval_reserved_output_tokens | runner default | Minimum evaluation output tokens to reserve per row when inputs approach the context window |
Returns immediately with status preparing. Work happens asynchronously.
Estimate a Run
POST /api/v1/behavior-specs/:id/runs/estimate
Use the same request body as Start a Run to preview estimated training tokens, cost, and wall-clock duration before creating a run. Duration is a rough range based on completed run history.
CLI
tt runs estimate <spec-id> --epochs 4
tt runs estimate <spec-id> --dataset <dataset-id> --train-ratio 0.8 --validation-ratio 0.1 --test-ratio 0.1Equivalent REST call
curl -X POST https://tunedtensor.com/api/v1/behavior-specs/:id/runs/estimate \
-H "Authorization: Bearer <api-key>" \
-H "Content-Type: application/json" \
-d '{
"augment": true,
"hyperparameters": { "n_epochs": 4 }
}'{
"data": {
"estimated_training_tokens": 120000,
"estimated_cost_cents": 22,
"estimated_epochs": 4,
"duration": {
"estimated_minutes": 58,
"range_minutes": { "low": 42, "high": 78 },
"confidence": "medium",
"sample_count": 12,
"basis": "matched_model"
}
}
}Live Diagnostics
GET /api/v1/runs/:id/diagnostics
CLI
tt runs diagnose <run-id>Diagnostics include a plain-language summary plus recent learning signals such as epoch progress, loss, training pace, and estimated remaining training time.
Equivalent REST call
curl https://tunedtensor.com/api/v1/runs/:id/diagnostics \
-H "Authorization: Bearer <api-key>"Cost & credits
Runs are charged from your prepaid credit balance. Cost is calculated as:
cost_cents = ceil(epochs × training_tokens × model_rate / 1_000_000)The model rate is per 1M training tokens, per epoch — see the rate card. Tokens are counted by the training provider after tokenisation. Successful runs debit the final provider-reported cost. Failed and cancelled runs are free.
If your credit balance is too low at start time, the request returns 402 insufficient_credits:
{
"error": {
"code": "insufficient_credits",
"message": "This run is estimated at $0.42 but you have $0.10.",
"required_cents": 42,
"available_cents": 10,
"topup_url": "/dashboard/billing"
}
}Top up at Dashboard → Billing or run tt topup. See Billing & credits for full pricing details.
List Runs for a Spec
GET /api/v1/behavior-specs/:id/runs
CLI
tt runs list --spec <spec-id>Equivalent REST call
curl https://tunedtensor.com/api/v1/behavior-specs/:id/runs \
-H "Authorization: Bearer <api-key>"List All Runs
GET /api/v1/runs
CLI
tt runs listEquivalent REST call
curl https://tunedtensor.com/api/v1/runs \
-H "Authorization: Bearer <api-key>"Returns runs across all specs with _spec_name for display.
Get Run Detail
GET /api/v1/runs/:id
CLI
# One-shot fetch
tt runs get <run-id>
# Live streaming until terminal state
tt runs watch <run-id>Equivalent REST call
curl https://tunedtensor.com/api/v1/runs/:id \
-H "Authorization: Bearer <api-key>"Returns the full run with _evals — per-example results sorted by score (worst first). Each eval includes:
prompt,expected,actualscore(0–1),passed(boolean)reasoning— LLM judge's explanationlatency_ms— inference time
Cancel a Run
POST /api/v1/runs/:id/cancel
CLI
tt runs cancel <run-id>Equivalent REST call
curl -X POST https://tunedtensor.com/api/v1/runs/:id/cancel \
-H "Authorization: Bearer <api-key>"Cancels runs in preparing, training, or evaluating status. Also cancels the provider fine-tuning job if running.