info Open to new work opportunities! Contact me
Daniel Hladik AI Automation Engineer

← All terms

Token (in LLM context)

The basic unit of text that LLM models work with - roughly 3/4 of an English word or 2–3 characters.

What is a token?

A token is the smallest unit of text that large language models (LLMs) work with. It is not a whole word - the model first splits text into tokens (parts of words, whole words, or punctuation) and then processes them.

How large is a token?

  • In English: approximately 3/4 of a word (100 words ≈ 133 tokens)
  • In other languages: token counts may differ based on morphological complexity
  • Numbers, punctuation, and special characters may each be a separate token

Why tokens matter

  • Cost: LLM API calls are billed per token (input + output)
  • Limits: The model's context window is also expressed in tokens
  • Chunking: When splitting documents for RAG, chunk size is typically measured in tokens