Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-task Language and Code Understanding on Open LLM Leaderboard and HumanEval
Loading...
63.2
ARC
Mistral-Pro
60.704
61.352
62
62.648
Jan 4, 2024
ARC
Hellaswag
MMLU
TruthfulQA
Winogrande
GSM8K
HumanEval
Updated 4d ago
Evaluation Results
Method
Method
Links
ARC
Hellaswag
MMLU
TruthfulQA
Winogrande
GSM8K
HumanEval
Mistral-Pro
2024.01
63.2
82.6
60.6
48.3
78.9
50.6
32.9
Gemma-7B
2024.01
61.9
82.2
64.6
44.8
79
50.9
32.3
Mistral-7B
2024.01
60.8
83.3
62.7
42.6
78
39.2
28.7
Feedback
Search any
task
Search any
task