Tuned Tensor
DocsDashboard

Runs

A run is a single end-to-end cycle: compile your behaviour spec into training data, augment examples with AI, fine-tune the model, and auto-evaluate the result.

The Run Object

{
  "id": "e0b7694b-2c65-4199-89a1-fc54a6a6010c",
  "behavior_spec_id": "cafd8799-...",
  "run_number": 1,
  "status": "completed",
  "spec_snapshot": { ... },
  "dataset_id": "dc66546b-...",
  "fine_tune_job_id": "b3e2b918-...",
  "model_id": "96e9f0d9-...",
  "hyperparameters": {
    "augment": true,
    "n_epochs": 4,
    "lora_rank": 8,
    "lora_alpha": 16
  },
  "eval_summary": {
    "total": 5,
    "avg_score": 0.82,
    "pass_rate": 0.8,
    "scoring_method": "llm_judge",
    "regressions": 0,
    "improvements": 3
  },
  "started_at": "2026-03-06T10:30:00.000Z",
  "completed_at": "2026-03-06T10:57:50.000Z"
}

Run Lifecycle

StatusDescription
preparingCompiling spec → augmenting examples → uploading to provider
trainingFine-tuning job running on the configured training provider
evaluatingModel being tested against the spec's examples
completedEval results available
failedError — check the error field
cancelledManually cancelled

Spec Snapshot

Every run captures a spec_snapshot — a frozen copy of the behaviour spec at run time. You can freely edit your spec between runs; each run preserves exactly what it trained on.

Eval Summary

FieldDescription
avg_scoreMean score across all examples (0–1)
pass_rateFraction of examples that passed (score ≥ 0.7)
exact_match_rateFraction of near-perfect scores (≥ 0.95)
avg_latency_msMean inference latency per example
scoring_methodllm_judge or similarity
regressionsExamples that scored ≥ 0.1 worse than previous run
improvementsExamples that scored ≥ 0.1 better than previous run

The recommended way to work with runs is the tt CLI — each endpoint below shows the tt command first, followed by the equivalent REST call.

Start a Run

POST /api/v1/behavior-specs/:id/runs

CLI

tt runs start <spec-id> --epochs 4 --lr 0.00002 --batch-size 8
tt runs start <spec-id> --dataset <dataset-id> --long-examples truncate --max-seq-length 4096
tt runs start <spec-id> --dataset <dataset-id> --max-output-tokens 512 --eval-reserved-output-tokens 128

Use tt runs watch <run-id> after starting to stream the run through preparingtraining evaluatingcompleted.

Use tt runs diagnose <run-id> while training is in progress to see recent epoch, loss, pace, and estimated remaining training time without direct infrastructure access.

Equivalent REST call

curl -X POST https://tunedtensor.com/api/v1/behavior-specs/:id/runs \
  -H "Authorization: Bearer <api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "augment": true,
    "hyperparameters": {
      "n_epochs": 4,
      "learning_rate": 0.00002,
      "lora_rank": 8,
      "lora_alpha": 16,
      "long_examples": "truncate",
      "max_seq_length": 4096,
      "max_output_tokens": 512,
      "eval_reserved_output_tokens": 128
    }
  }'
ParameterDefaultDescription
augmenttrueUse AI to expand examples into a larger training set
hyperparameters.n_epochs3 if omitted; dashboard starts at 4Number of training epochs (1–20)
hyperparameters.learning_rateautoLearning rate
hyperparameters.batch_sizeprovider defaultTraining batch size
hyperparameters.lora_rank16 if omitted; dashboard starts at 8LoRA adapter rank
hyperparameters.lora_alpha32 if omitted; dashboard starts at 16LoRA alpha scaling factor
hyperparameters.long_exampleserrorTraining preflight policy for rows that exceed the sequence budget: error, truncate, or skip
hyperparameters.max_seq_lengthprovider defaultMaximum training sequence length in tokens; larger values may require a larger training instance
hyperparameters.max_output_tokensrunner defaultDesired evaluation output budget in tokens for long-response tasks
hyperparameters.eval_reserved_output_tokensrunner defaultMinimum evaluation output tokens to reserve per row when inputs approach the context window

Returns immediately with status preparing. Work happens asynchronously.

Estimate a Run

POST /api/v1/behavior-specs/:id/runs/estimate

Use the same request body as Start a Run to preview estimated training tokens, cost, and wall-clock duration before creating a run. Duration is a rough range based on completed run history.

CLI

tt runs estimate <spec-id> --epochs 4
tt runs estimate <spec-id> --dataset <dataset-id> --train-ratio 0.8 --validation-ratio 0.1 --test-ratio 0.1

Equivalent REST call

curl -X POST https://tunedtensor.com/api/v1/behavior-specs/:id/runs/estimate \
  -H "Authorization: Bearer <api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "augment": true,
    "hyperparameters": { "n_epochs": 4 }
  }'
{
  "data": {
    "estimated_training_tokens": 120000,
    "estimated_cost_cents": 22,
    "estimated_epochs": 4,
    "duration": {
      "estimated_minutes": 58,
      "range_minutes": { "low": 42, "high": 78 },
      "confidence": "medium",
      "sample_count": 12,
      "basis": "matched_model"
    }
  }
}

Live Diagnostics

GET /api/v1/runs/:id/diagnostics

CLI

tt runs diagnose <run-id>

Diagnostics include a plain-language summary plus recent learning signals such as epoch progress, loss, training pace, and estimated remaining training time.

Equivalent REST call

curl https://tunedtensor.com/api/v1/runs/:id/diagnostics \
  -H "Authorization: Bearer <api-key>"

Cost & credits

Runs are charged from your prepaid credit balance. Cost is calculated as:

cost_cents = ceil(epochs × training_tokens × model_rate / 1_000_000)

The model rate is per 1M training tokens, per epoch — see the rate card. Tokens are counted by the training provider after tokenisation. Successful runs debit the final provider-reported cost. Failed and cancelled runs are free.

If your credit balance is too low at start time, the request returns 402 insufficient_credits:

{
  "error": {
    "code": "insufficient_credits",
    "message": "This run is estimated at $0.42 but you have $0.10.",
    "required_cents": 42,
    "available_cents": 10,
    "topup_url": "/dashboard/billing"
  }
}

Top up at Dashboard → Billing or run tt topup. See Billing & credits for full pricing details.

List Runs for a Spec

GET /api/v1/behavior-specs/:id/runs

CLI

tt runs list --spec <spec-id>

Equivalent REST call

curl https://tunedtensor.com/api/v1/behavior-specs/:id/runs \
  -H "Authorization: Bearer <api-key>"

List All Runs

GET /api/v1/runs

CLI

tt runs list

Equivalent REST call

curl https://tunedtensor.com/api/v1/runs \
  -H "Authorization: Bearer <api-key>"

Returns runs across all specs with _spec_name for display.

Get Run Detail

GET /api/v1/runs/:id

CLI

# One-shot fetch
tt runs get <run-id>

# Live streaming until terminal state
tt runs watch <run-id>

Equivalent REST call

curl https://tunedtensor.com/api/v1/runs/:id \
  -H "Authorization: Bearer <api-key>"

Returns the full run with _evals — per-example results sorted by score (worst first). Each eval includes:

  • prompt, expected, actual
  • score (0–1), passed (boolean)
  • reasoning — LLM judge's explanation
  • latency_ms — inference time

Cancel a Run

POST /api/v1/runs/:id/cancel

CLI

tt runs cancel <run-id>

Equivalent REST call

curl -X POST https://tunedtensor.com/api/v1/runs/:id/cancel \
  -H "Authorization: Bearer <api-key>"

Cancels runs in preparing, training, or evaluating status. Also cancels the provider fine-tuning job if running.