Behaviour control infrastructure

Specify behaviour, fine-tune open-weight models, and use evaluation feedback to start the next run.

npm i -g @tuned-tensor/cli
tt init
tt runs start <spec-id>

Open source CLI tool (MIT) on GitHub. Agent-readable workflow at /skill.md.

A closed loop for model behaviour

Tuned Tensor turns a behaviour spec into a tuned open-weight model, then uses evaluation feedback to improve the next run.

01

Specify behaviour

Rules, constraints, examples, and base model.

02

Fine-tune

Compile training data and tune an open-weight model.

03

Evaluate

Score outputs, inspect failures, and catch regressions.

04

Auto-tune loop

Use AI feedback to improve the spec and start the next run.

Each run preservesVersioned
Spec version
Run report
AI feedback
Next iteration

Auto-tune can repeat the loop until the target score is reached or the iteration limit is hit.

Use case story

Fine-tune a small model to triage email

We fine-tuned Qwen 3.5 2B to turn raw emails into structured category, priority, and next-action decisions, then measured the gains against the base model on validation and held-out test examples.

Email triage, from spec to local model

A focused fine-tune measured against baseline behaviour.

Dataset

9,882 rows

Validation pass rate

41.7% -> 75.0%

Test avg score

0.623 -> 0.829

The article includes the behaviour spec, dataset shape, run command, evaluation results, lessons learned, and local serving notes.

Supported base models

Start from small open-weight models that are ready for managed LoRA fine-tuning.

Model docs
googleE2B

gemma-4-E2B-it

google/gemma-4-E2B-it

googleE4B

gemma-4-E4B-it

google/gemma-4-E4B-it

Qwen2B

Qwen3.5-2B

Qwen/Qwen3.5-2B

Qwen4B

Qwen3.5-4B

Qwen/Qwen3.5-4B

meta-llama3B

Llama-3.2-3B-Instruct

meta-llama/Llama-3.2-3B-Instruct

microsoft3.8B

Phi-4-mini-instruct

microsoft/Phi-4-mini-instruct

ibm-granite2B

granite-3.3-2b-instruct

ibm-granite/granite-3.3-2b-instruct

bigcode3B

starcoder2-3b

bigcode/starcoder2-3b

Want to learn more?

Explore the documentation to see how behaviour specs, runs, and evaluations work under the hood.