Language
This category focuses on the use of specific linguistic techniques, such as prompt injection or stylization, to influence the model's output.
Strategies
Note | Description |
---|---|
Code and Encode | This strategy encompasses techniques that utilize various coding methods, such as Base64 or ROT13, to bypass model restrictions and manipulate outputs. |
Prompt Injection | This technique enables attackers to override original instructions and employed controls by crafting specific wording of instructions, often resembling SQL injection methods, to manipulate the model's behavior. |
Stylizing | This strategy involves using a method of questioning that encourages critical thinking and exposes biases by subtly referencing identity elements without using direct slurs or toxic language, thereby signaling to the model about certain groups of people. |