info Open to new work opportunities! Contact me
Daniel Hladik AI Automation Engineer

← All terms

Bias (in AI)

A systematic skew of an AI model originating in training data that leads to unfair or incorrect outputs.

What is bias in AI?

Bias is a systematic error in an AI model's behavior. The model does not answer neutrally but leans in a particular direction - for example it favors certain demographic groups, reproduces social stereotypes, or carries political coloring. Bias is inherited mostly from training data: an LLM learns from human-written text, including human prejudices.

Types of bias

  • Data bias: Training data does not proportionally represent reality (e.g. more male than female software developers)
  • Selection bias: Data selection is skewed - e.g. the internet is mostly English-language content
  • Confirmation bias: The model reinforces prevailing opinions because they appear most often in the data
  • Labeling bias: Annotators inject their own prejudices when labeling data

How to mitigate bias

  • Careful preparation and diversification of training data
  • RLHF (Reinforcement Learning from Human Feedback) and safety tuning
  • Guardrails and output filters for sensitive domains
  • Regular output audits - tests against known bias benchmarks
  • Clear instructions in the system prompt telling the model when to withhold an opinion
  • Adjusting the model via fine-tuning on more balanced data

Bias is closely related to hallucinations - both are reasons to keep a human in the loop for important decisions.