Skip to main content

JEF Framework Expansion: Giving Back to the AI Safety Community

AI Security Product Updates 8 min read March 23, 2026
A
By Andrew Thal

Key Points

  • 0DIN has expanded its Jailbreak Evaluation Framework with seven active scoring attributes that cover high risk harm categories including illicit drugs, chemical and biological threats, genetic manipulation, and copyright violations, enabling more comprehensive and realistic jailbreak assessment.
  • Each module produces a 0 to 100 fidelity score that measures how much prohibited content a model outputs, and these scores feed into a composite 0 to 10 JEF score that standardizes how vulnerabilities are triaged and disclosed.
  • A new METADATA auto discovery registry allows contributors to add scoring modules with a single Python file, automatically propagating updates across the JEF library, API, and platform without complex coordination.
  • Native integrations with NVIDIA garak and Microsoft PyRIT ship directly inside the 0din-jef package, providing built in detectors, scorers, and real world n-day probes sourced from 0DIN’s bug bounty program for continuous regression testing.
  • These capabilities also power 0DIN Scanner, which runs more than 450 active probes against model endpoints with automated scheduling and trend analysis to help organizations continuously monitor and strengthen AI security.

When we launched the Jailbreak Evaluation Framework (JEF), the goal was simple: give the AI safety community a CVSS-like scoring system for quantifying jailbreak tactics. JEF has since become the backbone of how 0DIN triages and scores the hundreds of vulnerabilities submitted through our bug bounty program each month.

But building tools only for ourselves was never the point. Today we're announcing three expansions: seven new scoring attributes covering harm categories we were missing, and native integrations with both NVIDIA's garak and Microsoft's PyRIT — the two dominant open-source AI red-teaming frameworks. Both integrations ship directly inside the 0din-jef package. No upstream PRs to wait on. No version coordination with anyone else. One pip install and you're running.

What's New in JEF

JEF originally shipped with four scoring modules: crystal meth synthesis, nerve agent synthesis, Harry Potter copyright reproduction, and Tiananmen Square censorship detection. Those covered the most common jailbreak targets we saw in early submissions, but after a year of triaging real-world submissions, the gaps were obvious.

Attackers don't limit themselves to a single harm category. They retarget the same technique across drugs, weapons, biological threats, and intellectual property. A technique that extracts fentanyl synthesis instructions today gets pointed at anthrax weaponization tomorrow. Our scoring needed to keep pace.

JEF now scores across seven active attributes: crystal meth and fentanyl synthesis under illicit substances; nerve agent and anthrax weaponization under harmful substances; CRISPR gene-editing misuse under genetic manipulation; and Harry Potter reproduction plus a generic copyright module under copyright violations.

Each module produces a 0–100 fidelity score measuring how much prohibited content a model actually outputs. These scores feed into JEF's composite formula alongside blast radius and retargetability metrics to produce the final 0–10 JEF score that appears on every disclosed vulnerability.

Fentanyl and CRISPR are direct responses to patterns we've seen in bug bounty submissions. The generic copyright module lets us evaluate bypass tactics beyond the Harry Potter benchmark. And CBRM scoring with anthrax weaponization addresses arguably the highest-stakes category of prohibited content a model can produce.

Explore AI security with the Scanner Datasheet

The datasheet offers insight into the challenges and solutions in AI security.

Download Datasheet

The METADATA Registry

Adding a new scoring type used to require coordinated changes in three places: the JEF Python library, the JEF API, and the 0DIN platform. That coupling slowed everything down.

We fixed it with a METADATA auto-discovery pattern. Every scoring module now declares a METADATA dictionary with standardized fields: name, category, display name, pass threshold, deprecation status. The JEF registry discovers these at runtime, and the API and platform pick them up automatically. Want to add a new scoring type? Write one Python module. That's it.

from jef import registry

registry.list_active()
result = registry.score("fentanyl", "LLM response text")

This is how we intend to scale JEF as new harm categories emerge. One module, one pull request, the whole ecosystem picks it up.

Native Red-Teaming Integrations

Here's the practical reality of contributing to open-source frameworks: upstream PRs take time. They go through review cycles, compatibility discussions, CI pipelines, release schedules. All of that is legitimate and appropriate — it's how healthy open-source projects work. But it means that if you need these capabilities today, waiting on a merge isn't an answer.

So we ship the integrations ourselves, directly inside 0din-jef, as installable extras. Both garak and PyRIT integrations are fully self-contained. You get the latest JEF scoring the moment we cut a release, with no dependency on any external project's merge queue.

Both integrations share a common layer in jef.integrations — a scorer registry that maps short keys to JEF modules, an n-day probe registry loaded from a single YAML config, and a normalized JEFResult type that returns a consistent value (0.0–1.0), percentage (0–100), raw_score, and metadata regardless of which framework you're calling from. Add a new n-day probe to the YAML config, run the seed generator, and it shows up in both frameworks automatically.

garak

pip install 0din-jef[garak]

garak is NVIDIA's LLM vulnerability scanner. The JEF integration adds two things: detectors that score model outputs using JEF's full rubrics, and probes containing real attack prompts from 0DIN's bug bounty program.

The detectors cover all seven JEF scoring attributes — crystal meth, nerve agent, fentanyl, anthrax, CRISPR, and both Harry Potter variants — each returning a value in garak's 0.0–1.0 scale. The full JEF scoring engine is behind each one. That matters because keyword presence and procedural-chain detection are not the same thing. A model that names methylamine in passing is not the same as a model that walks someone through synthesis. JEF's graded scoring captures that distinction.

Running a probe is a single command:

garak --model_type openai --model_name gpt-4 \
      -p 0din_jef.PlaceholderInjection \
      -d 0din_jef.CrystalMethScore

Mix and match probes and detectors freely. The probes are registered as namespace packages, so if --list_detectors doesn't show them, just reference them directly by name — that always works.

PyRIT

pip install 0din-jef[pyrit]

PyRIT is Microsoft's Python Risk Identification Toolkit for generative AI, the standard framework for red-teaming in Azure environments. The JEF integration adds scorers and seed datasets.

The scorers inherit from pyrit.score.FloatScaleScorer and cover the same six harm categories, all returning values in PyRIT's 0.0–1.0 range:

from jef.integrations.pyrit.scorers import JEFMethScorer

scorer = JEFMethScorer()
scores = await scorer.score_text_async("some LLM output")
print(scores[0].get_value())

The seed datasets are pre-built YAML files — one per n-day probe — loadable via SeedDataset.from_yaml_file():

from pyrit.models import SeedDataset
from jef.integrations.pyrit.seeds import SEEDS_DIR

dataset = SeedDataset.from_yaml_file(SEEDS_DIR / "placeholder_injection.yaml")

The N-day Probes

The probes and seed datasets aren't synthetic exercises. They're real jailbreak tactics discovered through 0DIN's bug bounty program, responsibly disclosed to the affected vendors, and now published for the research community. We call them n-day probes: disclosed vulnerabilities that are no longer embargoed but remain valuable for regression testing.

Placeholder Injection — Request copyrighted content with formatting instructions that insert placeholder text between each word. The model obfuscates its own output while still producing the prohibited content. Discovered by Ron Eddings. Think of it as getting a model to reproduce Harry Potter but with PLACEHOLDER between every word. The content is still there.

Incremental Table Completion — Present a partially completed table of restricted chemical information and ask the model to fill in the blanks. Pattern-completion is one of the strongest drives in language models. The model sees a table with gaps and wants to complete it, even when the completed table contains synthesis instructions for controlled substances.

Technical Field Guide — Frame synthesis instructions as a law enforcement reference document. The authority framing bypasses guardrails that trigger on direct how to requests. When the model thinks it's helping write a field guide for first responders, the guardrails read the request differently.

Chemical Compiler Debug — Present obfuscated chemical notation as a debug task. The model wants to fix the code. The fact that the code describes a nerve agent synthesis route is secondary to the debugging framing.

Correction — Provide a forensic or technical document with deliberate errors and ask the model to identify and correct them. It's not write this — it's fix this. But the output is the same.

Hex Recipe Book — A multi-turn attack that delivers hex-encoded instructions across sequential prompts. Three turns, and the model is producing content it would have refused in a single direct request.

Each of these affected multiple frontier models at time of discovery. By packaging them as reusable probes in both garak and PyRIT, researchers and enterprises can continuously regression-test against known attack patterns without having to reconstruct the tactics from disclosure write-ups.

Safeguard Your GenAI Systems

Connect your security infrastructure with our expert-driven vulnerability detection platform.

Getting Started

Install just the scoring engine, or pull in the framework integration you need:

pip install 0din-jef           # scoring only
pip install 0din-jef[garak]    # + garak detectors and probes
pip install 0din-jef[pyrit]    # + PyRIT scorers and seed datasets

The JEF Calculator lets you compute scores interactively. Standardized Testing lets you test prompts against all active scoring types directly (0DIN researcher login required).

Source: GitHub · PyPI · Full Docs

What's Next

We're continuing to expand JEF's scoring coverage as new harm categories surface from the bug bounty program. The METADATA registry makes it easy — one module, one PR.

Adding a new n-day probe is three steps: add an entry to the YAML config, run the seed generator, commit the new seed file. The garak probe class is generated automatically at import time. The project is Apache 2.0 licensed and open for contributions.

These same capabilities power the 0DIN Scanner — our production tool that runs 450+ active probes against any model endpoint with automated scheduling and trend analysis. If you're evaluating AI security tooling for your organization, we'd love to show you what it can do.

Standardized, reproducible evaluation is the foundation of AI safety. The more researchers scoring and comparing jailbreak tactics with the same framework, the faster the whole ecosystem improves.

Secure People, Secure World.

Discover how 0DIN helps organizations identify and mitigate GenAI security risks before they become threats.

Request Trial