Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Hierarchy Robustness on RealGuardrails Distractors
Loading...
0.95
Score
GPT-5-Mini-R
0.8772
0.8961
0.915
0.9339
Mar 11, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
GPT-5-Mini-R
2026.03
0.95
GPT-5-Mini
2026.03
0.88
Feedback
Search any
task
Search any
task