Security boundary

Weights and Layers Disclosure

Severity: Severe

Weights & Layers Disclosure targets the heart of the AI’s intellectual property—its learned parameters and architectural details. By extracting or deducing these internal components, an attacker can replicate the model’s capabilities, clone its performance, and analyze its structure for weaknesses. This compromises competitive advantage, reveals proprietary techniques, and facilitates further adversarial activities, from unauthorized redistribution to advanced prompt exploitation.

Example:

Through a carefully engineered exploit on the model-serving infrastructure, a researcher retrieves the AI’s internal weights and layer configurations. Armed with this data, they create a near-identical replica without incurring the original training costs. This unauthorized duplication undermines the owner’s investment and could lead to widespread, uncontrolled use of the model’s technology.

References:

MITRE ATLAS: Full ML Model Access (AML.T0044)
OWASP LLM 2023-2024: LLM10: Model Theft
avid-effect:security:S0500 (exfiltration)
avid-effect:security:S0502 (model theft)

Previous