Bias (in AI)

A systematic skew of an AI model originating in training data that leads to unfair or incorrect outputs.

What is bias in AI?

Bias is a systematic error in an AI model's behavior. The model does not answer neutrally but leans in a particular direction - for example it favors certain demographic groups, reproduces social stereotypes, or carries political coloring. Bias is inherited mostly from training data: an LLM learns from human-written text, including human prejudices.

Types of bias

Data bias: Training data does not proportionally represent reality (e.g. more male than female software developers)
Selection bias: Data selection is skewed - e.g. the internet is mostly English-language content
Confirmation bias: The model reinforces prevailing opinions because they appear most often in the data
Labeling bias: Annotators inject their own prejudices when labeling data

How to mitigate bias

Careful preparation and diversification of training data
RLHF (Reinforcement Learning from Human Feedback) and safety tuning
Guardrails and output filters for sensitive domains
Regular output audits - tests against known bias benchmarks
Clear instructions in the system prompt telling the model when to withhold an opinion
Adjusting the model via fine-tuning on more balanced data

Bias is closely related to hallucinations - both are reasons to keep a human in the loop for important decisions.

🍪 A few words about cookies

Bias (in AI)

What is bias in AI?

Types of bias

How to mitigate bias