Bias (in AI)
A systematic skew of an AI model originating in training data that leads to unfair or incorrect outputs.
What is bias in AI?
Bias is a systematic error in an AI model's behavior. The model does not answer neutrally but leans in a particular direction - for example it favors certain demographic groups, reproduces social stereotypes, or carries political coloring. Bias is inherited mostly from training data: an LLM learns from human-written text, including human prejudices.
Types of bias
- Data bias: Training data does not proportionally represent reality (e.g. more male than female software developers)
- Selection bias: Data selection is skewed - e.g. the internet is mostly English-language content
- Confirmation bias: The model reinforces prevailing opinions because they appear most often in the data
- Labeling bias: Annotators inject their own prejudices when labeling data
How to mitigate bias
- Careful preparation and diversification of training data
- RLHF (Reinforcement Learning from Human Feedback) and safety tuning
- Guardrails and output filters for sensitive domains
- Regular output audits - tests against known bias benchmarks
- Clear instructions in the system prompt telling the model when to withhold an opinion
- Adjusting the model via fine-tuning on more balanced data
Bias is closely related to hallucinations - both are reasons to keep a human in the loop for important decisions.