Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Automated Probing on MMLU
Loading...
46
Error Rate (%)
PAIR
44.4
55.2
66
76.8
Feb 13, 2026
Error Rate (%)
Attack Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate (%)
Attack Success Rate
PAIR
Generator Model=GPT-5....
2026.02
46
94.87
AutoDetect
Generator Model=GPT-5....
2026.02
67
-
PROBELLM
Generator Model=GPT-5....
2026.02
86
-
Feedback
Search any
task
Search any
task