Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Non-Agentic Performance Evaluation on Fortress (test)
Loading...
78.75
Mean Score
Llama Maverick 4
62.8484
66.9767
71.105
75.2333
Mar 5, 2026
Mean Score
Std Dev
Min Score
Max Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Mean Score
Std Dev
Min Score
Max Score
Llama Maverick 4
2026.03
78.75
17.27
50
100
GPT4o
2026.03
70
13.09
50
90
Claude Sonnet 4.5
2026.03
69.42
6.82
60
80
Gemini 2.5 Pro
2026.03
63.46
20.25
30
87.67
Feedback
Search any
task
Search any
task