Product update

A new labeling loop for training data

Jun 14, 20263 min read

Raw rows should not be the hard part

The fastest fine-tuning loop still stalls if the useful data is sitting in an unlabeled CSV. The new labeling workflow gives that step a dedicated path inside Tuned Tensor: upload raw rows, generate candidate outputs, review them, and promote approved examples into a training dataset.

It is designed for the messy middle between having examples and having a model. You can keep the behaviour spec close to the dataset, inspect generated labels before they matter, and move only reviewed rows forward.

We recently ran that loop on a small finance-news corpus from Hugging Face. The goal was not to publish another pile of raw article text. It was to turn a manageable batch of market headlines and descriptions into reviewed structured labels that could support a finance-specific fine-tune.

The loop

Upload

Bring in unlabeled rows without hand-writing every target output first.

Generate

Use the labeling pipeline to draft structured candidate labels.

Review

Approve, edit, or reject rows before they become training data.

Promote

Turn reviewed labels into examples that can feed the next run.

A real finance-news pass

We selected 1,000 finance-news rows from a roughly five-thousand-row source dataset, labeled them against a market-analysis spec, reviewed the outputs, and promoted every accepted row into a training dataset.

Rows accepted

1,000

Settled cost

$6.48

Teacher tokens

651k prompt / 129k completion

The spec asked the teacher to return strict JSON with a summary, sentiment, market relevance, asset class, entities, tickers, risk flags, time horizon, actionability, and rationale. That made review much easier than reviewing free prose, because every row had the same shape.

{
  "summary": "TotalEnergies raises dividend and announces a $2B buyback after record 2022 profit on surging energy prices.",
  "sentiment": "positive",
  "market_relevance": "high",
  "asset_class": "equities",
  "tickers": ["TTE"],
  "risk_flags": ["dividend", "earnings"],
  "time_horizon": "short_term",
  "actionability": "trade_relevant",
  "rationale": "Record profit, dividend hike, and buyback announcement are direct catalysts for TotalEnergies equity price movement."
}

Use it from the CLI

The dashboard is useful for review, but the whole labeling path is scriptable with tt label. Start with a behaviour spec and an unlabeled file. JSONL rows need an input field; CSV files need an input column. Make sure the CLI is current; labeling commands are available in @tuned-tensor/cli 0.4.20 and later.

# JSONL input: one raw task per line
{"input":"Label this finance-news row. Return only strict JSON with keys summary, sentiment, market_relevance, asset_class, entities, tickers, risk_flags, time_horizon, actionability, and rationale.\n\nTitle: <market headline>\nDescription: <licensed or internal source text>"}

# Or CSV input with a text column, for example: news_text
news_text,source_domain,row_id
"Energy company raises dividend after record profit...",example.com,row_001

Upload the raw rows under a spec. With --watch, the CLI stays attached until the teacher has drafted labels; without it, the job keeps running in the cloud.

npm install -g @tuned-tensor/cli@latest
tt --version
tt auth status
tt specs list

tt label upload finance-news-selected-1000.jsonl \
  --spec <spec-id> \
  --name krosskinetic-finance-news-market-labels-1000 \
  --watch

Labeling takes a credit hold from the estimate and settles against actual teacher-token usage. If an individual row exceeds the input limit, that row is marked failed while valid rows continue through the job.

After labeling finishes, review the drafted outputs. Accept the rows that are good, edit the rows that need a sharper target, and reject anything that should not enter training.

tt label rows <job-id> --status labeled
tt label accept <job-id> --rows 0,1,4
tt label edit <job-id> --row 3 --output '{"category":"billing","priority":"high","action":"reply"}'
tt label reject <job-id> --rows 8,13

# For a clean first pass, accept every unreviewed labeled row.
tt label accept <job-id> --all

Promote reviewed rows into a validated dataset, then use that dataset in a normal training run.

tt label promote <job-id> --name "KrossKinetic finance news market labels selected 1000"
tt datasets list
tt runs estimate <spec-id> --dataset <dataset-id>
tt runs start <spec-id> --dataset <dataset-id>
tt runs diagnose <run-id>

Why it matters

Fine-tuning gets better when the data pipeline is part of the product, not a side quest in a notebook. The labeling loop makes it easier to go from a vague pile of raw task examples to a reviewed dataset that the rest of the Tuned Tensor run system can train and evaluate.

The finance-news run is the shape we like: small enough to inspect, large enough to matter, costed clearly, promoted only after review, and packaged with the spec that created it.

It also gives future improvements a clean home: better label generation, review shortcuts, quality checks, and promotion rules can all build on the same workflow.

Try labeling