Which Base Model Should I Use?

Tuned Tensor supports 9 small open-weight base models for managed LoRA-style behaviour tuning. Training prices currently range from $0.45 to $0.70 per 1M training tokens per epoch.

The practical default is Qwen/Qwen3.5-2B. Start there unless you already know you need stronger reasoning, a specific model ecosystem, a permissive enterprise model, or code-specialised tuning.

Quick routing

Use Qwen 4B for better general quality, Phi-4-mini for reasoning-heavy tasks, Qwen3-VL 2B for OCR/image-to-JSON workflows, Granite 2B for business/RAG/document workflows, Gemma E2B/E4B for local plus multimodal or agentic flexibility, Llama 3.2 3B for broad ecosystem compatibility, and StarCoder2 3B only for code-focused tuning.

Model Comparison

Model	Price	Main strength	Use this when...	Avoid / be careful when...
`google/gemma-4-E2B-it`	$0.45	Efficient local and multimodal-capable Gemma option with reasoning, coding, function calling, native system-role support, and 128K context.	You want a small local model with strong conversational control, multimodal potential, or Google/Gemma ecosystem compatibility.	You only need plain text business workflows; Qwen 2B or Granite 2B may be simpler defaults.
`google/gemma-4-E4B-it`	$0.70	Higher-capacity Gemma option with the same Gemma strengths as E2B, but more room for quality.	Choose it over Gemma E2B when answer quality matters more than cost or latency, especially for agentic, coding, reasoning, or multimodal-adjacent use cases.	It costs more than E2B, Qwen 2B, and Granite. Use it when the extra capability is worth it.
`Qwen/Qwen3.5-2B`	$0.45	Best cheap default: broad multilingual coverage, long context, efficient tuning, and good task-specific behaviour.	You want the safest starting point for most behaviour specs, especially classification, extraction, routing, support replies, structured output, or quick experiments.	You need maximum reasoning quality or more robust multi-step work; step up to Qwen 4B, Phi, or Gemma E4B.
`Qwen/Qwen3-VL-2B-Instruct`	$0.55	Small Qwen vision-language model for image, screenshot, and document extraction workflows with strong OCR-oriented multimodal behaviour.	Use it for OCR, document-to-JSON, screenshot understanding, chart/table extraction, or other image-input text-output tuning.	You only have text examples; Qwen 2B is cheaper and simpler for text-only behaviour specs.
`Qwen/Qwen3.5-4B`	$0.70	Stronger general-purpose Qwen option with more capacity, very long context, multilingual support, and agent/coding/reasoning orientation.	You like Qwen 2B but need better accuracy, instruction following, long-context performance, or agentic workflows and can pay the higher training cost.	You are extremely latency or cost constrained; Qwen 2B is cheaper and likely enough for narrow behaviour tuning.
`meta-llama/Llama-3.2-3B-Instruct`	$0.55	Familiar, well-supported chat/instruct model for multilingual assistant, retrieval, summarisation, rewriting, and local AI use cases.	You want broad ecosystem support, Llama compatibility, and a known instruction-tuned chat baseline.	For highly specialised reasoning or math, Phi may be better. For narrow low-cost tuning, Qwen 2B may be cheaper.
`microsoft/Phi-4-mini-instruct`	$0.70	Strong small-model reasoning, especially math and logic, with 128K context for memory/compute-constrained and latency-bound settings.	Use it for logic-heavy assistants, mathy workflows, policy reasoning, rule application, or structured decisioning where reasoning quality matters.	It is in Tuned Tensor's higher price tier; it is not the first pick for simple classification or extraction.
`ibm-granite/granite-3.3-2b-instruct`	$0.45	Enterprise/business-friendly 2B model for reasoning, instruction following, RAG, extraction, classification, function calling, multilingual, and long-document tasks; Apache 2.0.	Use it for business assistants, RAG, document QA, summarisation, compliance-style workflows, structured outputs, and permissive-license needs.	For consumer-chat polish or multimodal tasks, Gemma, Qwen, or Llama may fit better.
`bigcode/starcoder2-3b`	$0.55	Code-specialist base model trained for programming languages and fill-in-the-middle code completion.	Use it when the target behaviour is code completion, code transformation, repository-specific coding style, or developer tooling.	It is not an instruction model. Do not use it for general chat or support bots unless your tuning data is very code-specific.

Recommended Default

Choose Qwen/Qwen3.5-2B for the first run of most behaviour specs. It is the default in the API docs because it is inexpensive, efficient, multilingual, and strong enough for many narrow tuning targets.

Move up only when the task makes the tradeoff clear: more general quality from Qwen 4B, better reasoning from Phi, business/document fit from Granite, OCR/image-to-JSON behaviour from Qwen3-VL, Gemma ecosystem or multimodal flexibility from Gemma, Llama compatibility from Llama 3.2, or code-specialised behaviour from StarCoder2.