Beginner·3 min read

Token

A token is the atomic unit of text that an LLM processes. It is a piece of text — such as a word, sub-word, character, or punctuation symbol — that th

Definition

A token is the atomic unit of text that an LLM processes. It is a piece of text — such as a word, sub-word, character, or punctuation symbol — that the model reads and generates one unit at a time.

Why Tokens (Not Characters or Words)?

  • Characters → too granular, very long sequences, poor semantic grouping
  • Words → vocabulary explodes (millions of rare/compound words), can't handle unknown words
  • Sub-word tokens → best balance: compact vocabulary (~32K–128K tokens), handles rare words by splitting them, retains common words whole
  • Common Tokenization Schemes

    | Scheme | Description | Used By |

    |--------|-------------|---------|

    | BPE (Byte Pair Encoding) | Merges frequent byte pairs iteratively | GPT-2, GPT-4, LLaMA |

    | WordPiece | Similar to BPE, maximizes language model likelihood | BERT |

    | SentencePiece | Language-agnostic, works on raw bytes | T5, Gemini |

    | Tiktoken | OpenAI's fast BPE implementation | GPT-3.5, GPT-4 |

    Token Examples (GPT-4 tokenizer)

    | Text | Tokens | Count |

    |------|--------|-------|

    | "Hello, world!" | ["Hello", ",", " world", "!"] | 4 |

    | "tokenization" | ["token", "ization"] | 2 |

    | "LLM" | ["L", "LM"] or ["LLM"] | varies |

    Key Properties

  • Vocabulary size: typically 32K–128K unique tokens
  • Token ≠ word: one word can be 1–4 tokens; one token can span multiple characters
  • Special tokens: <|endoftext|>, , , [CLS], [SEP], [PAD] — control model behavior
  • Whitespace matters: " hello" and "hello" are often different tokens
  • Token Counting Rules of Thumb

  • 1 token ≈ 4 characters (English)
  • 1 token ≈ 0.75 words (English)
  • Non-English languages are typically less efficient (more tokens per word)
  • Code is generally tokenized efficiently
  • Practical Implications

  • Cost: APIs charge per token (input + output)
  • Context limits: models have a maximum token count they can process at once (context window)
  • Latency: more tokens = slower generation
  • Prompt design: being concise saves tokens and cost
  • Related Concepts

  • Tokenization, Embeddings, Context Window, Vocabulary, LLM

Go Deeper With Live Instruction

This topic is covered in depth in our llm engineering program (Session 1).