Prompt Injection
A technique where a user attempts to bypass the original instructions of an AI model using a specially crafted command.
What is prompt injection?
Prompt injection is a technique in which a user attempts to overwrite or bypass the original instructions of an AI model (system prompt) by entering a specially formulated command into the chat.
Attacker's goal
Force the agent to do something it is not supposed to - for example, reveal sensitive data, ignore security rules, or start behaving differently than intended.
How to defend
- Define clear security rules in the system prompt
- Validate user inputs
- Limit the agent's access to only necessary tools and data