CLI Tool
The tt CLI lets you manage behaviour specs, runs, models, and datasets from the command line — without writing API calls. It is the recommended way to use Tuned Tensor.
@tuned-tensor/cli.Installation
Install globally via npm:
npm install -g @tuned-tensor/cliVerify the installation with tt --version.
Or run from source:
git clone https://github.com/tunedtensor/tuned-tensor-cli.git
cd tuned-tensor-cli
npm install
npm run build
npm linkAuthentication
Store your API key so you don't need to pass it on every command:
tt auth login
# or:
tt auth login <api-key>Check your auth status:
tt auth statusOther auth commands:
tt auth logout— Remove stored credentialstt -k <api-key> <command>— Pass API key inline (overrides stored key)tt -u https://your-api.example.com <command>— Use a custom base URL (e.g. for local dev)
Commands Overview
| Command | Description |
|---|---|
tt auth | Manage authentication (login, logout, status) |
tt specs | List, create, get, update, delete behaviour specs |
tt runs | List runs, start a run, get details, diagnose, cancel, watch |
tt datasets | List and manage datasets |
tt models | List supported base models, inspect, download, serve, export, and delete fine-tuned models |
tt balance | Show credit balance and recent transactions |
tt topup | Open Stripe Checkout to add credits |
tt init | Create a local behaviour spec file (tunedtensor.json) |
tt push | Push local spec to the Tuned Tensor API |
tt eval | Validate a local behaviour spec file |
Quick Examples
List specs and runs
tt specs list
tt runs listStart a run
tt runs estimate <spec-id>
tt runs start <spec-id>
tt runs start <spec-id> --epochs 5 --lr 0.0001 --batch-size 8
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --train-ratio 0.8 --validation-ratio 0.1 --test-ratio 0.1
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --long-examples truncate --max-seq-length 4096
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --max-output-tokens 512 --eval-reserved-output-tokens 128
tt runs start <spec-id> --parent-model <model-id-or-prefix>
tt runs start <spec-id> --no-llm-judgeUse tt runs estimate <spec-id> before starting to preview rough cost and wall-clock duration from completed run history.
Use tt runs watch <run-id> to poll until the run completes.
For live learning progress, use tt runs diagnose <run-id>. It reports recent epoch, loss, pace, and estimated remaining training time through the Tuned Tensor API.
Spec, run, dataset, and model commands accept full UUIDs or unambiguous ID prefixes of at least four characters.
Use uploaded datasets and evaluation caps
Pass --dataset <dataset-id-or-prefix> to train from an uploaded dataset instead of inline spec examples. Add --train-ratio, --validation-ratio, and --test-ratio to override the default 80/10/10 split.
Use --max-eval-examples <n> and --max-test-eval-examples <n> to cap primary and secondary test evaluation passes for larger datasets. The runs backend still clamps values to its configured ceiling.
For long uploaded rows, use --long-examples error|truncate|skip to choose the training preflight policy. Advanced runs can also pass --max-seq-length <tokens> when the selected model and training instance can support a larger sequence length, and --max-output-tokens <tokens> / --eval-reserved-output-tokens <tokens> to tune eval output budgets for long-response tasks.
Use --no-llm-judge to opt out of LLM judging for a new run.
Download and serve a model locally
tt models download <model-id> --output model.tar.gz
# One-time setup for local reference serving
tt models setup-runtime
# Serve by model ID, archive, or extracted model directory
tt models serve <model-id> --spec tunedtensor.json
tt models serve model.tar.gz --spec tunedtensor.json
tt models serve ./models/my-model --spec tunedtensor.json --device mps
# Enforce JSON output for local chat completions
tt models serve ./models/my-model --spec tunedtensor.json --json-schema schema.json
# Managed serving starts on demand, idles down, serializes requests, and logs JSONL
tt models serve <model-id> --spec tunedtensor.json --managed \
--idle-timeout 300 \
--restart-after-requests 100 \
--json-schema schema.json
# Export downloadable weights to GGUF and optionally package them for Ollama
tt models export <model-id> --format gguf --quant q4_k_m --ollama
tt models export <model-id> --quant q8_0 --ollama --print-commandtt models download downloads models that have a Tuned Tensor-hosted artifact. In interactive terminals it shows download progress, transfer rate, and ETA; --json output remains machine-readable. Hosted models can still be used for inference through their model ID, but may not expose downloadable weights.
tt models export converts downloadable model weights to GGUF with llama.cpp and can write an Ollama Modelfile with the behaviour spec prompt embedded. Use --llama-cpp <dir> or the explicit --convert-script / --quantize-binflags when llama.cpp tools are not on your path, and--print-command to inspect the planned conversion before running it.
tt models serve starts a local OpenAI-compatible chat completions endpoint. It applies the behaviour spec prompt from tunedtensor.json by default, preserving the behaviour spec prompt used during training. Use --spec <path> to point at a specific spec, or --no-spec-prompt when you intentionally want raw model behaviour. Use --json-schema <path> to require local chat completions to satisfy a JSON Schema, with malformed responses rejected as HTTP 422 after the configured repair attempts. Add --managed when a local app needs an on-demand wrapper: the CLI keeps the model warm while requests are active, idles it down after --idle-timeout, serializes generation requests, restarts after --restart-after-requests or failed health checks, and emits JSONL request logs with latency, request size, schema validity, and a configurable --gate-field result. Use --device auto, --device cpu, --device cuda, or --device mps to control the inference device.
tt models setup-runtime installs an isolated local Python runtime for reference serving. It chooses Python 3.10-3.12, creates a managed virtual environment in the Tuned Tensor cache, and installs the serving dependencies.
Manage credits
tt balance # show credit balance and recent transactions
tt topup --amount 25 # open Stripe Checkout for a $25 top-up
tt topup --amount 25 --no-open # print the checkout URL instead
tt topup # interactive amount pickertt balance shows your current credit balance. If a run returns 402 insufficient_credits, top up and retry.
Create a spec from a local file
# Create a local spec template
tt init -n "My Spec" --model Qwen/Qwen3.5-2B
# Edit tunedtensor.json, then push to the API
tt pushLocal spec file (tunedtensor.json)
tt init scaffolds a tunedtensor.json file in the current directory. This is your source of truth — edit it, keep it in version control, and tt push whenever you want to sync it to Tuned Tensor.
The typical loop:
tt init --name "Customer Support Bot" --model Qwen/Qwen3.5-2B
# Edit tunedtensor.json — add examples, guidelines, constraints
tt push # create or update the spec on the server
tt runs estimate <spec-id> # preview rough cost and duration
tt runs start <spec-id> # kick off a run
tt runs start <spec-id> --parent-model <model-id> # continue from a completed model
tt runs diagnose <run-id> # inspect live learning progress
tt runs watch <run-id> # stream status until completeSupported spec base models are google/gemma-4-E2B-it, google/gemma-4-E4B-it, Qwen/Qwen3.5-2B, Qwen/Qwen3.5-4B, meta-llama/Llama-3.2-3B-Instruct, microsoft/Phi-4-mini-instruct, ibm-granite/granite-3.3-2b-instruct, and bigcode/starcoder2-3b. Unsupported models fail local CLI validation before the API request is sent.
Print the same list from the CLI:
tt models base
tt models base --jsonSpec validation with tt eval
tt eval validates your local tunedtensor.json. It checks required fields, confirms examples are present, warns when guidelines are missing, and checks simple constraints against example outputs. It does not call a model or the Playground API.
tt evalConfiguration
Credentials are stored in ~/.config/tuned-tensor/config.json (respects XDG_CONFIG_HOME).
API key precedence:
--api-key/-kflagTUNED_TENSOR_API_KEYenvironment variable- Stored config from
tt auth login
Global options
-k, --api-key <key>— Override stored API key-u, --base-url <url>— Custom API base URL (default: https://tunedtensor.com)--json— Output raw JSON--no-color— Disable colors-h, --help— Show command help
See Also
- Quickstart — Full workflow with CLI and REST API
- Authentication — API keys and sessions
- Behaviour Specs — Schema and API reference