0din logo

Chaff

Category: Language

Strategy: Code and Encode

Chaff is a technique employed by attackers to obfuscate keywords that might trigger a language model's guardrails. By injecting random characters, such as newline characters, spaces, or other tokens, into critical keywords, the attacker aims to bypass content filters while maintaining the underlying intent of the message. This method leverages the language model's ability to parse and understand fragmented input, allowing the attacker to subtly manipulate the model's response without overtly triggering its defensive mechanisms. Chaff exemplifies the nuanced interplay between linguistic creativity and technical evasion.