Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Red-teaming on GPT-5 Mini
Loading...
72.32
Coverage
Ours (ME)
6.4776
23.5713
40.665
57.7587
Feb 25, 2026
Coverage
Diversity
Peak AD
Updated 26d ago
Evaluation Results
Method
Method
Links
Coverage
Diversity
Peak AD
Ours (ME)
2026.02
72.32
0
0.5
PAIR
2026.02
58.56
0
0.5
TAP
2026.02
30.08
0
0.5
Random
2026.02
9.01
0
0.5
Feedback
Search any
task
Search any
task