Specify behaviour, fine-tune open-weight models, and use evaluation feedback to start the next run.
Tuned Tensor turns a behaviour spec into a tuned open-weight model, then uses evaluation feedback to improve the next run.
01
Rules, constraints, examples, and base model.
02
Compile training data and tune an open-weight model.
03
Score outputs, inspect failures, and catch regressions.
04
Use AI feedback to improve the spec and start the next run.
Auto-tune can repeat the loop until the target score is reached or the iteration limit is hit.
Use case story
We fine-tuned Qwen 3.5 2B to turn raw emails into structured category, priority, and next-action decisions, then measured the gains against the base model on validation and held-out test examples.
Email triage, from spec to local model
A focused fine-tune measured against baseline behaviour.
Dataset
9,882 rows
Validation pass rate
41.7% -> 75.0%
Test avg score
0.623 -> 0.829
The article includes the behaviour spec, dataset shape, run command, evaluation results, lessons learned, and local serving notes.
Start from small open-weight models that are ready for managed LoRA fine-tuning.
google/gemma-4-E2B-it
google/gemma-4-E4B-it
Qwen/Qwen3.5-2B
Qwen/Qwen3.5-4B
meta-llama/Llama-3.2-3B-Instruct
microsoft/Phi-4-mini-instruct
ibm-granite/granite-3.3-2b-instruct
bigcode/starcoder2-3b
Explore the documentation to see how behaviour specs, runs, and evaluations work under the hood.