Security boundary
Severity: Low to High
Prompt Injection encompasses a variety of attacks where a researcher skillfully inserts hidden or malicious instructions into user prompts, system messages, or external data sources. By doing so, they can override safety mechanisms, leak training sources, disclose sensitive data, or gain insights into the model’s internal configurations. This manipulation tricks the model into revealing information or producing responses that violate the intended constraints.
Example:
A researcher interacts with a customer support chatbot trained on private billing data. By embedding a carefully crafted message that impersonates an authorized administrator, the researcher convinces the model to disclose personal details of another customer’s transaction history. Despite policies to prevent such disclosures, the chatbot follows the injected directive, illustrating a successful prompt injection attack.
References:
- MITRE ATLAS: LLM Prompt Injection (AML.T0051), LLM Jailbreak (AML.T0054), LLM Data Leakage (AML.T0057)
- OWASP LLM 2025: LLM01:2025 Prompt Injection, LLM02:2025 Sensitive Information Disclosure, LLM04:2025 Data and Model Poisoning
- OWASP LLM 2023-2024: LLM02: Sensitive Information Disclosure, LLM03: Training Data Poisoning
- avid-effect:security:S0300 (over-permissive api)
- avid-effect:security:S0301 (information leak)
- avid-effect:ethics:E0507 (deliberative misinformation)