The GenAI Bug Bounty Program

0DIN’s Real-World Jailbreak Benchmark: The Gold Standard For LLM Security Evaluation

By Marco Figueroa | July 31, 2025

0DIN’s latest security deep-dive benchmarks five frontier LLMs against more than 450 live “real-world” jailbreak probes, turning continuous bug-bounty findings into an always-on scanner that replays each exploit across every model. The results crown OpenAI’s ChatGPT o4-mini the hardest to crack (only 6 effective jailbreaks), followed by Anthropic Claude Sonnet 4, GPT-4o, Claude Opus 4, and Claude Sonnet 3.7. Beyond the leaderboard, the blog explains how multilayer guardrails—moderation heads co-trained with the base model, multi-step Constitutional AI, dynamic red-teaming feedback loops, and segmentation-and-scrub techniques—shrink the attack window, while new modalities (vision, audio, long context) inevitably open fresh vectors. Key takeaways urge practitioners to embed safety in the model rather than the wrapper, treat red-teaming as a perpetual feedback loop, and expect today’s “secure” score to be tomorrow’s baseline. In short, 0DIN argues that LLM security is a living system: relentless exploitation met by equally relentless alignment innovation.

Engineering Confidence: 14 Critical Questions for Secure LLM + RAG Deployment

By Marco Figueroa & Andre Ludwig - Ankura.com | July 17, 2025

The rapid evolution of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) and Model Protocol Context (MCP) implementation has led many developers and teams to quickly adopt and integrate these powerful technologies. Driven by the fear of missing out (FOMO) on potential competitive advantages and the eagerness to rapidly deliver value, teams often bypass essential controls, formality, and critical oversight. Although the rapid adoption of emerging technologies is common if not inevitable, it often precedes proper due diligence, exposing projects to serious intellectual property (IP), privacy, and security risks.

Phishing For Gemini

By Marco Figueroa | July 10, 2025

A researcher that submitted to 0DIN demonstrated a prompt-injection vulnerability in Google Gemini for Workspace that allows a threat-actor to hide malicious instructions inside an email. When the recipient clicks “Summarize this email”, Gemini faithfully obeys the hidden prompt and appends a phishing warning that looks as if it came from Google itself.

Because the injected text is rendered in white-on-white (or otherwise hidden), the victim never sees the instruction in the original message, only the fabricated “security alert” in the AI generated summary. Similar indirect prompt attacks on Gemini were first reported in 2024, and Google has already published mitigations, but the technique remains viable today.

ChatGPT Guessing Game Leads To Users Extracting Free Windows OS Keys & More

By Marco Figueroa | July 08, 2025

In a recent submission last year researchers discovered a method to bypass AI guardrails designed to prevent sharing of sensitive or harmful information. The technique leverages the game mechanics of language models, such as GPT-4o and GPT-4o-mini, by framing the interaction as a harmless guessing game. By cleverly obscuring details using HTML tags and positioning the request as part of the game’s conclusion, the AI inadvertently returned valid Windows product keys. This case underscores the challenges of reinforcing AI models against sophisticated social engineering and manipulation tactics.

ODIN Product Launch: Threat Intelligence Feed & Model Scanner

By Marco Figueroa | July 07, 2025

0DIN flips the switch on two tightly-integrated products designed to help security, ML-ops, and governance teams stay ahead of fast-moving GenAI threats. Introducing the 0DIN Threat Intelligence Feed and the 0DIN Model Scanner. Together, they close the gap between discovery and remediation, bringing true threat-informed defence to LLM deployments.

Quantifying the Unruly: A Scoring System for Jailbreak Tactics

By Pedram Amini | June 12, 2025

As large language models continue to evolve, so do the adversarial strategies designed to bypass their safeguards. Jailbreaks, also known as prompt injections, have moved beyond clever hacks and into the realm of sophisticated, reusable exploits. At 0DIN.ai, we’ve been analyzing these tactics not just for novelty, but with the goal of building a structured, measurable way to compare them.

The result is JEF, our Jailbreak Evaluation Framework (Github, PyPI). JEF is a scoring system that assigns numeric values to jailbreak methods based on their severity, flexibility, and real-world impact.

0DIN Secures the Future of AI Shopping

By Marco Figueroa | February 10, 2025

Executive Summary As organizations race to integrate GenAI technology into their products, the importance of rigorous testing cannot be overstated. A recent discovery by 0DIN researchers exposed a critical vulnerability in Amazon’s AI assistant Rufus, which allowed malicious requests to slip through built-in guardrails via ASCII encoding. This blog explores how the vulnerability was discovered, the steps taken to exploit it, and how Amazon rapidly addressed the issue. It underscores the broader lesson that AI security must evolve beyond traditional safeguards to tackle emerging threats.

See the public disclosure report: 0xF48A25FC - Amazon Rufus Guardrail Jailbreak via ASCII Integer Encoding

Poison in the Pipeline: Liberating models with Basilisk Venom

By Marco Figueroa & Pliny the Liberator (@elder_plinius) | February 06, 2025

The blog addresses a critical security challenge in modern generative AI development—data poisoning. As organizations rapidly deploy new frontier models trained on massive, decentralized datasets (ranging from curated corpora to publicly sourced data), adversaries have an increased opportunity to inject malicious or misleading information into these training sets. This deliberate “poisoning” subtly alters model behavior, potentially bypassing safeguards without immediate detection.

This blog explains the stages of data poisoning:

Data Collection: Vast and often unvetted datasets risk inclusion of malicious prompts.
Preprocessing & Filtering: Standard cleaning may overlook subtle, embedded potentially harmful content.
Model Training and Fine-Tuning: Once integrated, these instructions become part of the model’s learned behavior.
Delayed Manifestation: The poisoning effect may only emerge months later during specific query triggers, making detection challenging.

Prompt Injecting Your Way To Shell: OpenAI's Containerized ChatGPT Environment

By Marco Figueroa | November 14, 2024

Dive into OpenAI’s containerized ChatGPT environment, demonstrating how users can interact with its underlying structure through controlled prompt injections and file management techniques. By exploring ChatGPT's sandboxed Debian Bookworm environment, readers gain insights into navigating command executions, file manipulation, and the model's internal configuration, revealing both the potential and boundaries of OpenAI's secure design.

Inyección de Prompts, el Camino a una Shell: Entorno de Contenedores de ChatGPT de OpenAI

By Marco Figueroa | November 14, 2024

Comprender el entorno de contenedores de ChatGPT de OpenAI en Debian Bookworm te permite explorar cómo los usuarios pueden interactuar con su marco subyacente mediante inyecciones de instrucciones controladas y técnicas de gestión de archivos. En este post puedes obtener conocimientos prácticos sobre la ejecución de comandos, operaciones con archivos y la configuración interna del modelo. Este análisis destaca tanto las fortalezas como las limitaciones de la arquitectura segura de OpenAI, ofreciendo una comprensión más profunda de sus capacidades.

Blog

Subscribe today!

0DIN’s Real-World Jailbreak Benchmark: The Gold Standard For LLM Security Evaluation

Engineering Confidence: 14 Critical Questions for Secure LLM + RAG Deployment

Phishing For Gemini

ChatGPT Guessing Game Leads To Users Extracting Free Windows OS Keys & More

ODIN Product Launch: Threat Intelligence Feed & Model Scanner

Quantifying the Unruly: A Scoring System for Jailbreak Tactics

0DIN Secures the Future of AI Shopping

Poison in the Pipeline: Liberating models with Basilisk Venom

Prompt Injecting Your Way To Shell: OpenAI's Containerized ChatGPT Environment

Inyección de Prompts, el Camino a una Shell: Entorno de Contenedores de ChatGPT de OpenAI