Skip to main content

Fine-tuning platforms

Dated content — June 2026

This page names specific tools, models, and prices, which rotate quarterly. The selection logic is durable; the names are a snapshot. Cross-check the Model snapshot for current model names and pricing.

In one line: Where you actually run the training job. Pick after you've decided fine-tuning is the right tool — most of the time, it isn't.

In plain English

Fine-tuning takes a base model and continues training it on your data so it gets better at your specific task. It's not magic: it can't teach the model facts (that's RAG), it doesn't fix bad prompts, and it costs more time and money than people expect. When it works, it's transformative — a small specialized model that beats a frontier model on your one task, for 1/10th the cost per call. When it doesn't, you've spent two weeks for nothing. The platforms here are the how; deciding whether is the bigger question.

→ Going deeper: This page is the platform layer. For the method — the decision tree, data prep, LoRA vs full SFT, and how to evaluate a fine-tune — see Chapter 7: Fine-tuning & Customization, starting with When to fine-tune and LoRA / QLoRA.

The major options (2026)

PlatformTypeModels supportedStyleBest for
OpenAI fine-tuningHostedGPT-4o, GPT-5.1 mini, o-mini linesSFT, DPO, RFTEasiest path; production-ready in days
Anthropic (via Bedrock)HostedClaude family (limited)SFTEnterprise; AWS-stack
Together AIHostedLlama, Mistral, Qwen, DeepSeekSFT, LoRA, DPOOpen-weight fine-tuning at scale
FireworksHostedOpen weightsLoRA, SFTFast iteration on open models
ReplicateHostedMany open modelsSFT, LoRAQuick and visual
ModalServerless GPUsAnything (you write the script)DIYCustom training loops, full control
Hugging Face AutoTrainHostedHF Hub modelsNo-code SFTNon-engineers; small experiments
UnslothOSS libraryLlama, Mistral, Qwen, Gemma2x faster, 70% less memoryPairs with Modal / RunPod
AxolotlOSS config-drivenMost open modelsYAML configsReproducible community recipes
PredibaseHosted (Ludwig)Many open modelsLoRA-as-a-serviceProduction LoRA at scale
Mosaic AI (Databricks)EnterpriseOpen weightsSFT, continued pre-trainingDatabricks shops
NVIDIA NeMo / NIMsSelf / NVIDIA-managedNVIDIA-curated modelsEnterprise full-stackNVIDIA enterprise

Default pick for most teams

Don't fine-tune. First exhaust prompting, structured output, RAG, and tier-routing. The frontier model is updated faster than you can re-train, and prompt iteration is hours, not days.

When you've genuinely decided to fine-tune:

  • You want the easiest path: OpenAI fine-tuning on gpt-5.1-mini for SFT, or RFT if you have graders.
  • You want an open model you can rehost cheaply: Together AI or Fireworks to fine-tune Llama / Mistral, then serve it on the same platform.
  • You want full control or weird training setups: Modal + Unsloth + Llama / Qwen / Gemma. ~$5–$50 a job; pure Python.

Fine-tuning flavors

  • SFT (Supervised Fine-Tuning){input, ideal_output} pairs. The default. Teaches the model "for inputs like this, produce outputs like this." Needs ~500+ clean examples.
  • DPO (Direct Preference Optimization){input, chosen, rejected} triples. Teaches preferences ("this answer style is better than that style") without needing a reward model.
  • RFT (Reinforcement Fine-Tuning, OpenAI 2024+) — Grader function defines the reward. Useful when you have a programmatic check ("the answer is correct iff this regex matches") but can't write the perfect output by hand.
  • LoRA / QLoRA — Adapter-based: you don't touch the full model, you train a small low-rank addition. ~100× cheaper than full fine-tuning; quality usually within 1–3% of full SFT for most tasks.
  • Continued pre-training — Take a base model and keep training on raw domain text (medical literature, legal corpora). Rare; expensive; for specialty domains only.

When fine-tuning makes sense

  • You've exhausted prompting and RAG and hit a quality ceiling.
  • You have hundreds to thousands of high-quality examples of the input → ideal-output pattern.
  • The task is narrow and stable — you're not going to need to retrain monthly.
  • Latency or cost pressure justifies a smaller specialized model (e.g. Sonnet → fine-tuned Haiku).
  • You want specific output style/format that prompting can't enforce reliably.
  • You need on-prem and want a model fully under your control.

When it doesn't

  • "The model doesn't know X." That's a RAG problem, not a fine-tune problem.
  • You have < 200 clean examples. You'll overfit; the model will get worse on anything not in the training set.
  • The task changes often. Every retrain is a new bill.
  • A larger frontier model would solve it without retraining (often does).
  • You haven't built evals yet. You can't tell if fine-tuning helped.

Minimum integration

OpenAI SFT — three steps:

import openai
client = openai.OpenAI()

# 1. Upload JSONL with {"messages": [...]} per line
training_file = client.files.create(
file=open("train.jsonl", "rb"),
purpose="fine-tune",
)

# 2. Kick off a fine-tune
job = client.fine_tuning.jobs.create(
training_file=training_file.id,
model="gpt-5.1-mini-2025-08",
method={"type": "supervised"},
)

# 3. When done, the fine-tuned model has its own ID; call it like any other.
client.chat.completions.create(model=job.fine_tuned_model, messages=[...])

Together — LoRA on Llama:

from together import Together
client = Together()

job = client.fine_tuning.create(
training_file="train.jsonl",
model="meta-llama/Llama-3.3-70B-Instruct-Reference",
lora=True,
n_epochs=3,
)
# Same shape as OpenAI; serve the resulting model on Together's endpoints.

Modal + Unsloth — DIY at half the cost:

# Modal function that runs Unsloth's training script on an A100
import modal
app = modal.App("ft")
img = modal.Image.debian_slim().pip_install("unsloth", "trl", "torch")

@app.function(image=img, gpu="A100-80GB", timeout=3600)
def train():
from unsloth import FastLanguageModel
model, tok = FastLanguageModel.from_pretrained("meta-llama/Llama-3.3-8B")
# ... write SFTTrainer config, train, push to HF hub or save to volume

Pricing & cost notes (May 2026)

PlatformTraining costInference markup
OpenAI fine-tuning~$3/Mtok training~2× base model for inference
Anthropic (Bedrock)enterprise pricingusually ~1.5–2× base
Together~$0.40/Mtok training (Llama 70B)same as base + small LoRA premium
Fireworks~$0.50/Mtok trainingbase + ~$0.20/Mtok
Modal + Unsloth$1.50/hr GPU; ~$5–$50 per jobself-served = your GPU cost
Predibaseusage-basedusage-based

A typical "1000-example SFT on a small model" costs $5–$50 in training; the real cost is your engineering time and the per-call inference surcharge after.

Pitfalls

  • Fine-tuning before evals. You can't measure success. Always build the eval set first.
  • Fine-tuning to teach facts. Use RAG. Fine-tuning bakes in patterns, not knowledge.
  • Dirty training data. One mis-labeled example per fifty is the difference between a great model and a confidently-wrong one. Spend time on data quality.
  • Too few examples. Below ~200 you're more likely to overfit than to learn.
  • No held-out set. Train and evaluate on the same data = you have a memorizer, not a generalizer.
  • Fine-tuning a model you can't roll back. Always keep the base-model call wired up so you can A/B and revert.
  • Catastrophic forgetting. The model gets great at your task and worse at general reasoning. Mix in some general examples or evaluate broadly.
  • Forgetting that frontier models keep getting better. A fine-tune that won by 10% over GPT-5.1 today may be tied by GPT-5.2 in six months. Re-check periodically.
  • Hand-rolling distributed training. Modal, Together, Unsloth — they exist so you don't have to. Don't reinvent.
🤔 Quick checkQuick check

→ Next: Stack checkpoint