Fine-tune an open model. See if it got better.

Tuned Tensor runs the baseline and fine-tuned candidate on the same representative examples, then reports score changes and regressions—not just training loss. Run it locally on Linux with a compatible NVIDIA GPU, or use managed cloud training.

Apache-2.0 local runner · no Tuned Tensor account required · run artifacts are stored on your disk

TT Local

NVIDIA / Linux

npm install -g @tuned-tensor/local
tt-local init --model Qwen/Qwen3.5-2B
# replace the placeholder and add representative examples
tt-local doctor
tt-local run

Node 22+, uv, and a compatible NVIDIA GPU with sufficient VRAM are required for bundled training. Read the setup guide.

Evaluation built into the run

Know whether your fine-tune beats the baseline.

Training loss measures fit to the training objective, not performance on your task. Tuned Tensor compares base and tuned outputs on representative evaluation cases. TT Local creates a holdout when the spec has enough examples; for prebuilt data, it requires a test or validation split by default.

Baseline and tuned, side by side

Evaluate the same cases with the run baseline and fine-tuned candidate.

Metrics tied to the task

TT Local measures pass rate, average score, exact match, or field-level JSON accuracy.

Regressions you can inspect

See which examples became worse and compare the expected, base, and tuned outputs.

An auditable run record

Keep the spec, selected evaluation cases, model artifact, events, and JSON report together.

Choose normalized exact match, field-level JSON scoring, or an optional LLM judge in TT Local. Reports summarise evaluation on the cases you provide; they do not guarantee production performance.

A test-driven workflow

Define, train, compare, decide.

Your behaviour spec defines what success looks like. The run report shows how the tuned model performed against the baseline model.

Define the task

Write the behaviour, examples, constraints, and base model in tunedtensor.json.

Evaluate the baseline

Run the baseline model against representative evaluation cases.

Fine-tune

Train a LoRA adapter with SFT. TT Local also supports DPO for text models.

Evaluate the tuned model

Use the same cases, generation settings, and scoring criteria.

Review the result

Inspect score deltas and failures before you keep, iterate, or ship the model.

Choose your compute

Choose local or managed training.

Both paths begin with the same core behaviour fields and compare the baseline and tuned models. What changes is who operates the training infrastructure.

Local-first

TT Local

Run on hardware you control

The Apache-2.0 runner trains and evaluates small models on Linux with a compatible NVIDIA GPU and sufficient VRAM, including NVIDIA DGX Spark. TT Local requires no Tuned Tensor account and does not wait for a hosted GPU.

No Tuned Tensor account or hosted GPU queue
Bundled Transformers/PEFT training, plus hooks for custom backends
Specs, adapters, run events, and reports stored as local files

Set up TT Local GitHub

Base-model downloads use Hugging Face unless already cached. OpenRouter is optional for teacher labeling and LLM-judge scoring.

Managed Tuned Tensor

We manage the training infrastructure

Use managed Tuned Tensor when you do not want to maintain a local GPU environment. Start times depend on available managed GPU capacity.

No local CUDA environment to maintain
Managed training, teacher labeling, model lineage, and auto-tune
Downloadable model artifacts and side-by-side run reports

Use managed cloud Cloud pricing

Best fit

Tasks where “better” can be tested.

Tuned Tensor is designed for focused application tasks with clear success criteria, not frontier-scale model research. If you can provide representative examples and score the outputs, you can decide whether the fine-tune is worth using.

Structured extraction into strict JSON

Classification, safety, and triage

Routing and tool-selection decisions

Format, tone, and policy adherence

Measured on held-out data

Email triage: test pass rate rose 27.5 percentage points.

We fine-tuned Qwen 3.5 2B on 8,000 public training examples. On a 200-example held-out test sample, pass rate improved from 61.5% to 89.0% and average score improved from 0.537 to 0.862.

Read the full evaluation Inspect the spec

These results are specific to this task and evaluation setup. Test on data that reflects your production workload.

Base pass rate

61.5%

Tuned pass rate

89.0%

Pass-rate change

+27.5 pts

Held-out cases

200

Compare your fine-tune with the model you started from.

Define the task, train a small open-weight model, and get a base-vs-tuned evaluation report before you decide to ship it.

View the local quickstart Read the local setup guide