Intermediate·4 min read

Fine-Tuning

Fine-tuning is the process of continuing to train a pre-trained model on a smaller, task-specific or domain-specific dataset to adapt its behavior. It

Definition

Fine-tuning is the process of continuing to train a pre-trained model on a smaller, task-specific or domain-specific dataset to adapt its behavior. It modifies the model's parameters (all or a subset) to improve performance on a target domain, task, or behavioral style.

Why Fine-Tune?

  • Base/instruct models are general-purpose — they may underperform on specialized tasks
  • Fine-tuning gives the model domain knowledge and task-specific behavior
  • More efficient than training from scratch (leverages existing pre-trained knowledge)
  • Can shape tone, format, persona, refusal behavior
  • Types of Fine-Tuning

    Full Fine-Tuning

  • All model parameters are updated during training
  • Most expressive — best performance potential
  • Requires significant GPU memory (same as pre-training the model size)
  • Risk of catastrophic forgetting (model forgets general capabilities)
  • Parameter-Efficient Fine-Tuning (PEFT)

    Fine-tune only a small subset of parameters to save compute/memory:

    | Method | Description | Trainable Params |

    |--------|-------------|-----------------|

    | LoRA | Adds low-rank decomposition matrices to attention layers | ~0.1–1% of total |

    | QLoRA | LoRA on a quantized (4-bit) base model | ~0.1–1% |

    | Prefix Tuning | Prepends trainable tokens to input | Tiny |

    | Prompt Tuning | Learns soft prompt embeddings only | Tiny |

    | Adapters | Inserts small trainable modules between layers | ~1–5% |

    Instruction Fine-Tuning (IFT / SFT)

  • Fine-tune on (instruction, response) pairs
  • Teaches the model the instruct format and helpful behavior
  • Also called Supervised Fine-Tuning (SFT)
  • Domain-Specific Fine-Tuning

  • Fine-tune on domain text (medical papers, legal documents, code)
  • Model learns domain vocabulary, conventions, and reasoning
  • Examples: BioMedLM, LegalBERT, CodeLLaMA
  • The Fine-Tuning Process

    1. Choose a base/instruct model to start from

    2. Prepare dataset: (prompt, response) pairs, typically 1K–100K examples

    3. Format using chat template: apply the model's expected instruct format

    4. Configure training: learning rate, batch size, epochs, max sequence length

    5. Train with low learning rate: typically 1e-5 to 1e-4 (much lower than pre-training)

    6. Evaluate: compare against base model on target task metrics

    7. Merge or deploy: with LoRA, merge adapter weights back into base model

    Dataset Requirements

    | Quantity | Quality | Format |

    |---------|---------|--------|

    | 1K–10K examples sufficient for format/style | High quality >> high quantity | Must match model's chat template |

    | More data needed for knowledge injection | Diverse examples generalize better | Consistent instruction style |

    LoRA: The Dominant PEFT Method

    LoRA (Low-Rank Adaptation) works by decomposing weight updates:

    `

    Original weight matrix: W (d × d) — frozen

    LoRA update: ΔW = A × B where A is (d × r), B is (r × d), r << d

    New weight at inference: W + ΔW = W + AB

    `

  • r (rank) is typically 4–64
  • Only A and B are trained (tiny vs. full W)
  • After training, merge: W_new = W + AB — no inference overhead
  • QLoRA: Fine-Tuning on Consumer Hardware

    QLoRA (Quantized LoRA):

    1. Quantize the base model to 4-bit (NF4 format)

    2. Add LoRA adapters in full precision

    3. Train only the LoRA adapters

    4. Result: Fine-tune a 70B model on a single 48GB GPU (vs. 8× 80GB GPUs for full fine-tuning)

    Common Fine-Tuning Platforms

    | Platform | Notes |

    |----------|-------|

    | HuggingFace TRL | SFTTrainer, DPOTrainer — most popular |

    | Axolotl | Config-driven, supports many architectures |

    | LLaMA Factory | Flexible UI and CLI fine-tuning |

    | Unsloth | 2× faster training, low VRAM |

    | AWS SageMaker | Managed cloud fine-tuning |

    | Azure ML / Vertex AI | Enterprise cloud fine-tuning |

    Evaluation After Fine-Tuning

  • Task-specific metrics: BLEU, ROUGE, accuracy, F1
  • Human evaluation: preference over base model
  • Benchmark regression: ensure general capabilities didn't degrade
  • MT-Bench, Alpaca Eval: instruction-following quality
  • Risks and Mitigations

    | Risk | Description | Mitigation |

    |------|-------------|-----------|

    | Catastrophic forgetting | Loses general capabilities | Use PEFT (LoRA), mix in general data |

    | Overfitting | Memorizes training set | More data, regularization, early stopping |

    | Alignment degradation | Safety behaviors weaken | Include safety examples in fine-tune data |

    | Data quality issues | Noisy data hurts performance | Curate and filter carefully |

    Related Concepts

  • Base Model, Instruct Model, LoRA, RLHF, Pre-training, Parameters, SFT, QLoRA

Go Deeper With Live Instruction

This topic is covered in depth in our llm engineering program (Session 3).