Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Automated Probing on MBPP
Loading...
0.38
Error Rate (%)
PAIR
0.3592
0.4996
0.64
0.7804
Feb 13, 2026
Error Rate (%)
Attack Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate (%)
Attack Success Rate
PAIR
Generator Model=GPT-5....
2026.02
0.38
94.85
AutoDetect
Generator Model=GPT-5....
2026.02
0.46
-
PROBELLM
Generator Model=GPT-5....
2026.02
0.9
-
Feedback
Search any
task
Search any
task