Context Window

The maximum amount of text (in tokens) that an LLM model sees and processes at once in a single request.

What is a context window?

A context window is the maximum amount of text that an LLM model can receive and process at once. Think of it as the model's working surface - everything that fits on it is visible to the model; anything that doesn't fit is invisible.

Context window sizes

GPT-4o: 128,000 tokens (approximately 96,000 words)
Claude 3.5 Sonnet: 200,000 tokens
Gemini 1.5 Pro: 1,000,000 tokens

Why the context window matters for RAG

If a document is too large to fit in the context window, only part of it can be processed. That is why RAG splits documents into smaller pieces (chunks) and sends only the relevant sections to the model - not the entire document.

🍪 A few words about cookies

Context Window

What is a context window?

Context window sizes

Why the context window matters for RAG