Security boundary
Severity: High
Content Manipulation focuses on injecting harmful or misleading elements into the data that the model consumes or produces. By poisoning the training data or guiding the model to generate code and scripts that impact end-users, attackers introduce subtle backdoors, biases, and triggers. These manipulations cause the model to produce outputs that compromise user experiences, embed malicious scripts, or skew results, turning the AI into a vehicle for exploitation.
Example:
A threat actor contributes imperceptible yet malicious instructions within publicly available training text. Once the model is retrained, a secret trigger phrase prompts it to output harmful code that, when displayed on a webpage, executes client-side attacks against users. This demonstrates how training data poisoning and content manipulation can create covert vulnerabilities triggered after deployment.
References:
- MITRE ATLAS: Poison Training Data (AML.T0020)
- OWASP LLM 2025: LLM04:2025 Data and Model Poisoning
- OWASP LLM 2023-2024: LLM03: Training Data Poisoning
- avid-effect:security:S0600 (data poisoning)
- avid-effect:security:S0601 (ingest poisoning)
- avid-effect:ethics:E0507 (deliberative misinformation)