0din logo

Strategies

Strategies are the high-level plans or approaches that guide the actions taken during red teaming activities. They encompass the overall objectives and methodologies that participants adopt when interacting with language models. Strategies may involve specific goals, such as testing the model's limits, exposing biases, or eliciting particular types of responses. The paper discusses various strategies that participants reported using, highlighting the thought processes and intentions behind their actions. These strategies serve as a foundation for the more specific techniques that are employed to achieve the desired outcomes.

Note Description
Code and Encode This strategy encompasses techniques that utilize various coding methods, such as Base64 or ROT13, to bypass model restrictions and manipulate outputs.
Emulations This strategy involves mimicking the behavior of other systems or models to test the robustness and responses of a language model under different simulated conditions.
Meta Prompting This technique focuses on manipulating the prompting process itself to influence the model's behavior and responses, often by employing higher-level prompting strategies.
Persuasion and Manipulation This strategy focuses on employing rhetorical techniques to influence the model's responses by framing prompts in a way that persuades or manipulates the output.
Prompt Injection This technique enables attackers to override original instructions and employed controls by crafting specific wording of instructions, often resembling SQL injection methods, to manipulate the model's behavior.
Re-storying This technique involves continuing a narrative in a way that misaligns the original goal of a prompt, effectively repurposing the story to achieve a different outcome than initially intended.
Roleplaying This strategy involves prompting the language model to assume a specific role or persona, which can influence its responses based on the characteristics and moral codes associated with that role. Techniques include claiming authority or inventing personas to elicit different types of outputs.
Scatter Shot This strategy involves prompting the language model to assume a specific role or persona, which can influence its responses based on the characteristics and moral codes associated with that role. Techniques include claiming authority or inventing personas to elicit different types of outputs.
Socratic Questioning This strategy involves generating multiple outputs from a language model by using the "Regenerate response" feature to explore a range of possible interpretations and responses.
Stylizing This strategy involves using a method of questioning that encourages critical thinking and exposes biases by subtly referencing identity elements without using direct slurs or toxic language, thereby signaling to the model about certain groups of people.
Switching Genres This strategy involves adjusting the language and style of prompts to increase the likelihood of obtaining the desired output. Techniques include using formal language, servile language, synonymous language, capitalizing text for urgency, and providing examples to guide the model's responses.
World Building This technique involves changing the genre of the prompt to elicit different types of responses from the model. By framing the request within a specific genre, such as poetry, games, or forum posts, users can manipulate the model's output to align with the conventions and expectations of that genre.