Tuned Tensor
DocsDashboard

CLI Tool

The tt CLI lets you manage behaviour specs, runs, models, and datasets from the command line — without writing API calls. It is the recommended way to use Tuned Tensor.

Open source (MIT). Source and issues at github.com/tunedtensor/tuned-tensor-cli. Published to npm as @tuned-tensor/cli.

Installation

Install globally via npm:

npm install -g @tuned-tensor/cli

Verify the installation with tt --version.

Or run from source:

git clone https://github.com/tunedtensor/tuned-tensor-cli.git
cd tuned-tensor-cli
npm install
npm run build
npm link

Authentication

Store your API key so you don't need to pass it on every command:

tt auth login
# or:
tt auth login <api-key>

Check your auth status:

tt auth status

Other auth commands:

  • tt auth logout — Remove stored credentials
  • tt -k <api-key> <command> — Pass API key inline (overrides stored key)
  • tt -u https://your-api.example.com <command> — Use a custom base URL (e.g. for local dev)

Commands Overview

CommandDescription
tt authManage authentication (login, logout, status)
tt specsList, create, get, update, delete behaviour specs
tt runsList runs, start a run, get details, diagnose, cancel, watch
tt datasetsList and manage datasets
tt modelsList supported base models, inspect, download, serve, export, and delete fine-tuned models
tt balanceShow credit balance and recent transactions
tt topupOpen Stripe Checkout to add credits
tt initCreate a local behaviour spec file (tunedtensor.json)
tt pushPush local spec to the Tuned Tensor API
tt evalValidate a local behaviour spec file

Quick Examples

List specs and runs

tt specs list
tt runs list

Start a run

tt runs estimate <spec-id>
tt runs start <spec-id>
tt runs start <spec-id> --epochs 5 --lr 0.0001 --batch-size 8
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --train-ratio 0.8 --validation-ratio 0.1 --test-ratio 0.1
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --long-examples truncate --max-seq-length 4096
tt runs start <spec-id> --dataset <dataset-id-or-prefix> --max-output-tokens 512 --eval-reserved-output-tokens 128
tt runs start <spec-id> --parent-model <model-id-or-prefix>
tt runs start <spec-id> --no-llm-judge

Use tt runs estimate <spec-id> before starting to preview rough cost and wall-clock duration from completed run history.

Use tt runs watch <run-id> to poll until the run completes.

For live learning progress, use tt runs diagnose <run-id>. It reports recent epoch, loss, pace, and estimated remaining training time through the Tuned Tensor API.

Spec, run, dataset, and model commands accept full UUIDs or unambiguous ID prefixes of at least four characters.

Use uploaded datasets and evaluation caps

Pass --dataset <dataset-id-or-prefix> to train from an uploaded dataset instead of inline spec examples. Add --train-ratio, --validation-ratio, and --test-ratio to override the default 80/10/10 split.

Use --max-eval-examples <n> and --max-test-eval-examples <n> to cap primary and secondary test evaluation passes for larger datasets. The runs backend still clamps values to its configured ceiling.

For long uploaded rows, use --long-examples error|truncate|skip to choose the training preflight policy. Advanced runs can also pass --max-seq-length <tokens> when the selected model and training instance can support a larger sequence length, and --max-output-tokens <tokens> / --eval-reserved-output-tokens <tokens> to tune eval output budgets for long-response tasks.

Use --no-llm-judge to opt out of LLM judging for a new run.

Download and serve a model locally

tt models download <model-id> --output model.tar.gz

# One-time setup for local reference serving
tt models setup-runtime

# Serve by model ID, archive, or extracted model directory
tt models serve <model-id> --spec tunedtensor.json
tt models serve model.tar.gz --spec tunedtensor.json
tt models serve ./models/my-model --spec tunedtensor.json --device mps

# Enforce JSON output for local chat completions
tt models serve ./models/my-model --spec tunedtensor.json --json-schema schema.json

# Managed serving starts on demand, idles down, serializes requests, and logs JSONL
tt models serve <model-id> --spec tunedtensor.json --managed \
  --idle-timeout 300 \
  --restart-after-requests 100 \
  --json-schema schema.json

# Export downloadable weights to GGUF and optionally package them for Ollama
tt models export <model-id> --format gguf --quant q4_k_m --ollama
tt models export <model-id> --quant q8_0 --ollama --print-command

tt models download downloads models that have a Tuned Tensor-hosted artifact. In interactive terminals it shows download progress, transfer rate, and ETA; --json output remains machine-readable. Hosted models can still be used for inference through their model ID, but may not expose downloadable weights.

tt models export converts downloadable model weights to GGUF with llama.cpp and can write an Ollama Modelfile with the behaviour spec prompt embedded. Use --llama-cpp <dir> or the explicit --convert-script / --quantize-binflags when llama.cpp tools are not on your path, and--print-command to inspect the planned conversion before running it.

tt models serve starts a local OpenAI-compatible chat completions endpoint. It applies the behaviour spec prompt from tunedtensor.json by default, preserving the behaviour spec prompt used during training. Use --spec <path> to point at a specific spec, or --no-spec-prompt when you intentionally want raw model behaviour. Use --json-schema <path> to require local chat completions to satisfy a JSON Schema, with malformed responses rejected as HTTP 422 after the configured repair attempts. Add --managed when a local app needs an on-demand wrapper: the CLI keeps the model warm while requests are active, idles it down after --idle-timeout, serializes generation requests, restarts after --restart-after-requests or failed health checks, and emits JSONL request logs with latency, request size, schema validity, and a configurable --gate-field result. Use --device auto, --device cpu, --device cuda, or --device mps to control the inference device.

tt models setup-runtime installs an isolated local Python runtime for reference serving. It chooses Python 3.10-3.12, creates a managed virtual environment in the Tuned Tensor cache, and installs the serving dependencies.

Manage credits

tt balance              # show credit balance and recent transactions
tt topup --amount 25    # open Stripe Checkout for a $25 top-up
tt topup --amount 25 --no-open  # print the checkout URL instead
tt topup                # interactive amount picker

tt balance shows your current credit balance. If a run returns 402 insufficient_credits, top up and retry.

Create a spec from a local file

# Create a local spec template
tt init -n "My Spec" --model Qwen/Qwen3.5-2B

# Edit tunedtensor.json, then push to the API
tt push

Local spec file (tunedtensor.json)

tt init scaffolds a tunedtensor.json file in the current directory. This is your source of truth — edit it, keep it in version control, and tt push whenever you want to sync it to Tuned Tensor.

The typical loop:

tt init --name "Customer Support Bot" --model Qwen/Qwen3.5-2B

# Edit tunedtensor.json — add examples, guidelines, constraints

tt push                    # create or update the spec on the server
tt runs estimate <spec-id> # preview rough cost and duration
tt runs start <spec-id>    # kick off a run
tt runs start <spec-id> --parent-model <model-id>  # continue from a completed model
tt runs diagnose <run-id>  # inspect live learning progress
tt runs watch <run-id>     # stream status until complete

Supported spec base models are google/gemma-4-E2B-it, google/gemma-4-E4B-it, Qwen/Qwen3.5-2B, Qwen/Qwen3.5-4B, meta-llama/Llama-3.2-3B-Instruct, microsoft/Phi-4-mini-instruct, ibm-granite/granite-3.3-2b-instruct, and bigcode/starcoder2-3b. Unsupported models fail local CLI validation before the API request is sent.

Print the same list from the CLI:

tt models base
tt models base --json

Spec validation with tt eval

tt eval validates your local tunedtensor.json. It checks required fields, confirms examples are present, warns when guidelines are missing, and checks simple constraints against example outputs. It does not call a model or the Playground API.

tt eval

Configuration

Credentials are stored in ~/.config/tuned-tensor/config.json (respects XDG_CONFIG_HOME).

API key precedence:

  1. --api-key / -k flag
  2. TUNED_TENSOR_API_KEY environment variable
  3. Stored config from tt auth login

Global options

  • -k, --api-key <key> — Override stored API key
  • -u, --base-url <url> — Custom API base URL (default: https://tunedtensor.com)
  • --json — Output raw JSON
  • --no-color — Disable colors
  • -h, --help — Show command help

See Also