Definition
A Large Language Model is a deep learning model trained on massive text corpora that generates text by predicting the most probable next token given a sequence of prior tokens.
Core Mechanism
- Built on the Transformer architecture (introduced in "Attention Is All You Need", 2017)
- Uses self-attention to weigh the relevance of every token against every other token in the input
- Processes input in parallel (unlike RNNs which process sequentially)
- Output is a probability distribution over the vocabulary at each step — the most probable token is selected (or sampled)
- Embedding Layer — converts tokens to dense vectors
- Transformer Blocks (stacked) — each contains:
- Output Head (LM Head) — linear layer + softmax projecting to vocabulary size
- "Large" refers to parameter count: billions to trillions of parameters
- Examples: GPT-4 (~1T estimated), Claude 3 Opus, LLaMA 3 (8B–70B), Mistral (7B)
- Scale follows scaling laws (Chinchilla): performance improves predictably with more data + parameters + compute
- Next-token prediction (autoregressive/causal language modeling)
- Given tokens [t1, t2, ..., tn], predict t(n+1)
- Loss function: Cross-entropy between predicted distribution and true next token
- Text generation, summarization, translation
- Code generation and debugging
- Reasoning, question answering
- Few-shot and zero-shot task generalization
- No real-time knowledge (knowledge cutoff)
- Prone to hallucination
- Context window limits
- No persistent memory across sessions by default
- Token, Tokenization, Embeddings, Parameters, Pre-training, Inference
Architecture Components
- Multi-Head Self-Attention
- Feed-Forward Network (FFN)
- Layer Normalization
- Residual Connections
Scale
Training Objective
Capabilities (Emergent at Scale)
Limitations
Key Variants
| Type | Description |
|------|-------------|
| Base/Pretrained | Raw next-token predictor |
| Instruct-tuned | Fine-tuned to follow instructions |
| RLHF-aligned | Further shaped by human feedback |
| Multimodal | Handles text + images/audio |