Viernes 26 Junio 2026 06:09:47 GMT+02:00

Netcrook

InicioManifiesto
Noticias
Techcrook
Geocrook
WikicrookEquipoAppContacto
EnglishItalianoArabic

WIKICROOK

Attack success rate

The share of test attempts in which an adversarial method achieves its intended outcome.

Attack success rate is the percentage of test attempts in which an adversarial technique achieves its goal. In security testing, that goal might be making a model reveal restricted content, follow a malicious instruction, leak data, or trigger an unsafe tool action. The metric is usually calculated as successful attacks divided by total attempts, then multiplied by 100.

This measure matters because it turns subjective observations into a comparable risk signal. A model that refuses one prompt may still have a high attack success rate across many attempts, especially when an attacker can adapt over multiple turns. In AI security, teams use this metric to compare defenses against prompt injection, jailbreaking, and tool-use abuse, and to spot cases where single-turn tests underestimate real-world exposure. A lower rate generally indicates stronger resilience, but the result only applies to the specific test setup and threat model used.

← índice WIKICROOK