Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Non-Agentic Performance Evaluation on HarmBench (test)
Loading...
73.92
Mean Score
LLama 4 Maverick
59.9632
63.5866
67.21
70.8334
Mar 5, 2026
Mean Score
Score Std Dev
Minimum Score
Maximum Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Mean Score
Score Std Dev
Minimum Score
Maximum Score
LLama 4 Maverick
2026.03
73.92
16.33
38
93.33
GPT-4o
2026.03
62.33
14.88
40
80
Gemini 2.5 Pro
2026.03
62.33
17.17
40
85.33
Claude Sonnet 4.5
2026.03
60.5
11.13
40
73.33
Feedback
Search any
task
Search any
task