Prompt injection is a type of attack where adversarial instructions are hidden inside content that gets passed to an AI model. The goal is to override the model's original instructions and make it do something it was not supposed to do.
What makes prompt injection different from jailbreaking is who the attacker is. With jailbreaking, the user is directly trying to break the model's rules. With prompt injection, the attack is embedded in content the model is processing, like a document, email, web page, or database entry. The user may not even know it is happening.
This is a growing risk because AI systems are increasingly connected to external data. Any application that reads content from outside sources is a potential target:
The model does not know the difference between legitimate content and injected instructions. If the injection is well-crafted, the model follows it.
Prompt injection exploits the way AI systems combine their own instructions with external input. The most common methods include:
Our platform tests for prompt injection as part of the broader red teaming and safety evaluation suite within the Protect module. We simulate realistic injection scenarios across different input channels and measure whether the model's boundaries hold.
Results are scored using the Defense Success Rate (DSR) and broken down by injection type, so you can see which vectors your system handles well and which ones need hardening. For systems in production, our runtime guardrails can detect and block injection attempts automatically as part of your policy enforcement.
If you want to know more about how we test for prompt injection on your AI systems, get a demo now.