CLI Tool

The tt CLI lets you manage behaviour specs, runs, models, and datasets from the command line — without writing API calls. It is the recommended way to use Tuned Tensor.

Open source (MIT). Source and issues at github.com/tunedtensor/tuned-tensor-cli. Published to npm as @tuned-tensor/cli.

Installation

Install globally via npm:

npm install -g @tuned-tensor/cli

Verify the installation with tt --version.

Or run from source:

git clone https://github.com/tunedtensor/tuned-tensor-cli.git
cd tuned-tensor-cli
npm install
npm run build
npm link

Authentication

Store your API key so you don't need to pass it on every command:

tt auth login
# or:
tt auth login <api-key>

Check your auth status:

tt auth status

Other auth commands:

tt auth logout — Remove stored credentials
tt -k <api-key> <command> — Pass API key inline (overrides stored key)
tt -u https://your-api.example.com <command> — Use a custom base URL (e.g. for local dev)

Commands Overview

Command	Description
`tt auth`	Manage authentication (login, logout, status)
`tt specs`	List, create, get, update, delete behaviour specs
`tt runs`	List runs, start a run, get details, diagnose, report regressions, cancel, watch
`tt datasets`	List and manage datasets
`tt label`	Teacher-label unlabeled JSONL/CSV rows, review outputs, and promote reviewed rows into datasets
`tt models`	List supported base models, inspect, download, serve, export, and delete fine-tuned models
`tt balance`	Show credit balance and recent transactions
`tt topup`	Open Stripe Checkout to add credits
`tt init`	Create a local behaviour spec file (tunedtensor.json)
`tt push`	Push local spec to the Tuned Tensor API
`tt eval`	Validate a local behaviour spec file

Quick Examples

List specs and runs

tt specs list
tt runs list

Start a run

tt runs estimate <spec-id>
tt runs start <spec-id>
tt runs start <spec-id> --epochs 5 --lr 0.0001 --batch-size 8
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --train-ratio 0.8 --validation-ratio 0.1 --test-ratio 0.1
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --long-examples truncate --max-seq-length 4096
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --max-output-tokens 512 --eval-reserved-output-tokens 128
tt runs start <spec-id> --parent-model <model-id-or-prefix>
tt runs start <spec-id> --no-llm-judge

Use tt runs estimate <spec-id> before starting to preview rough cost, wall-clock duration, and whether the run is expected to use free monthly quota or paid credits.

Use tt runs watch <run-id> to poll until the run completes.

For live learning progress, use tt runs diagnose <run-id>. It reports recent epoch, loss, pace, and estimated remaining training time through the Tuned Tensor API.

After completion, use tt runs report <run-id> to compare aggregate base-vs-tuned metrics and inspect side-by-side Expected, Base, and Tuned outputs. Add --mode failures for worst tuned failures or --split test / --split all for held-out results.

Spec, run, dataset, and model commands accept full UUIDs or unambiguous ID prefixes of at least four characters.

Use uploaded datasets and evaluation caps

Pass --dataset <dataset-id-or-prefix> to train from an uploaded dataset instead of inline spec examples. Add --train-ratio, --validation-ratio, and --test-ratio to override the default 80/10/10 split.

Use --max-eval-examples <n> and --max-test-eval-examples <n> to cap primary and secondary test evaluation passes for larger datasets. The runs backend still clamps values to its configured ceiling.

For long uploaded rows, use --long-examples error|truncate|skip to choose the training preflight policy. Advanced runs can also pass --max-seq-length <tokens> when the selected model and training instance can support a larger sequence length, and --max-output-tokens <tokens> / --eval-reserved-output-tokens <tokens> to tune eval output budgets for long-response tasks.

Use --no-llm-judge to opt out of LLM judging for a new run.

Label real data safely

Use tt label when you have inputs but still need target outputs. Upload a JSONL file with one { "input": "..." } object per line, or a CSV with --input-column. Tuned Tensor drafts labels under a behaviour spec, then waits for review before any rows become a training dataset.

tt label upload unlabeled.jsonl --spec <spec-id> --name "support labels" --watch
tt label rows <job-id> --status labeled
tt label accept <job-id> --all
tt label promote <job-id> --name "reviewed-support-labels"

Labeling jobs run a deterministic sanitization pass before teacher calls. Ordinary PII such as email addresses, phone numbers, SSNs, and credit-card-like values is replaced with redaction placeholders. Rows containing secret-like content, including password assignments, bearer tokens, API keys, connection strings, or private keys, are marked failed and are not sent to the teacher model.

Promotion re-scans reviewed rows before creating the dataset. Blocked rows are excluded, and promotion fails if edited outputs introduce secret-like content.

Download and serve a model locally

tt models download <model-id> --output model.tar.gz

# One-time setup for local reference serving
tt models setup-runtime

# Serve by model ID, archive, or extracted model directory
tt models serve <model-id> --spec tunedtensor.json
tt models serve model.tar.gz --spec tunedtensor.json
tt models serve ./models/my-model --spec tunedtensor.json --device mps

# Enforce JSON output for local chat completions
tt models serve ./models/my-model --spec tunedtensor.json --json-schema schema.json

# Managed serving starts on demand, idles down, serializes requests, and logs JSONL
tt models serve <model-id> --spec tunedtensor.json --managed \
  --idle-timeout 300 \
  --restart-after-requests 100 \
  --json-schema schema.json

# Export downloadable weights to GGUF and optionally package them for Ollama
tt models export <model-id> --format gguf --quant q4_k_m --ollama
tt models export <model-id> --quant q8_0 --ollama --print-command

tt models download downloads models that have a Tuned Tensor-hosted artifact. In interactive terminals it shows download progress, transfer rate, and ETA; --json output remains machine-readable. Hosted models can still be used for inference through their model ID, but may not expose downloadable weights.

tt models export converts downloadable model weights to GGUF with llama.cpp and can write an Ollama Modelfile with the behaviour spec prompt embedded. Use --llama-cpp <dir> or the explicit --convert-script / --quantize-binflags when llama.cpp tools are not on your path, and--print-command to inspect the planned conversion before running it.

tt models serve starts a local OpenAI-compatible chat completions endpoint. It applies the behaviour spec prompt from tunedtensor.json by default, preserving the behaviour spec prompt used during training. Use --spec <path> to point at a specific spec, or --no-spec-prompt when you intentionally want raw model behaviour. Use --json-schema <path> to require local chat completions to satisfy a JSON Schema, with malformed responses rejected as HTTP 422 after the configured repair attempts. Add --managed when a local app needs an on-demand wrapper: the CLI keeps the model warm while requests are active, idles it down after --idle-timeout, serializes generation requests, restarts after --restart-after-requests or failed health checks, and emits JSONL request logs with latency, request size, schema validity, and a configurable --gate-field result. Use --device auto, --device cpu, --device cuda, or --device mps to control the inference device.

tt models setup-runtime installs an isolated local Python runtime for reference serving. It chooses Python 3.10-3.12, creates a managed virtual environment in the Tuned Tensor cache, and installs the serving dependencies.

Manage credits

tt balance              # show credit balance and recent transactions
tt topup --amount 25    # open Stripe Checkout for a $25 top-up
tt topup --amount 25 --no-open  # print the checkout URL instead
tt topup                # interactive amount picker

tt balance shows your current prepaid and subscription credit balance. Free-eligible runs use the monthly free-run quota before credits; if a paid run returns 402 insufficient_credits, top up and retry.

Create a spec from a local file

# Create a local spec template
tt init -n "My Spec" --model Qwen/Qwen3.5-2B

# Edit tunedtensor.json, then push to the API
tt push

Local spec file (tunedtensor.json)

tt init scaffolds a tunedtensor.json file in the current directory. This is your source of truth — edit it, keep it in version control, and tt push whenever you want to sync it to Tuned Tensor. For field-by-field guidance, see the spec file guide.

The typical loop:

tt init --name "Customer Support Bot" --model Qwen/Qwen3.5-2B

# Edit tunedtensor.json — add examples, guidelines, constraints

tt push                    # create or update the spec on the server
tt runs estimate <spec-id> # preview rough cost, duration, and billing source
tt runs start <spec-id>    # kick off a run
tt runs start <spec-id> --parent-model <model-id>  # continue from a completed model
tt runs diagnose <run-id>  # inspect live learning progress
tt runs watch <run-id>     # stream status until complete

Supported spec base models are google/gemma-4-E2B-it, google/gemma-4-E4B-it, Qwen/Qwen3.5-2B, Qwen/Qwen3-VL-2B-Instruct, Qwen/Qwen3.5-4B, meta-llama/Llama-3.2-3B-Instruct, microsoft/Phi-4-mini-instruct, ibm-granite/granite-3.3-2b-instruct, and bigcode/starcoder2-3b. Unsupported models fail local CLI validation before the API request is sent.

Print the same list from the CLI:

tt models base
tt models base --json

Spec validation with tt eval

tt eval validates your local tunedtensor.json. It checks required fields, confirms examples are present, warns when guidelines are missing, and checks simple constraints against example outputs. It does not call a model or the Playground API.

tt eval

Configuration

Credentials are stored in ~/.config/tuned-tensor/config.json (respects XDG_CONFIG_HOME).

API key precedence:

--api-key / -k flag
TUNED_TENSOR_API_KEY environment variable
Stored config from tt auth login

Global options

-k, --api-key <key> — Override stored API key
-u, --base-url <url> — Custom API base URL (default: https://tunedtensor.com)
--json — Output raw JSON
--no-color — Disable colors
-h, --help — Show command help