Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language Understanding and ReasoningGeneral Benchmarks MMLU, HellaSwag, OBQA, WinoGrande, ARC-C, PiQA, SciQ, LogiQA
MMLU Accuracy35.68
70
General Multimodal UnderstandingGeneral Benchmarks
Average Score74
12
General Language ModelingGeneral Benchmarks Llama 3.1 8B
Generation Quality Score66.5
11
General Multimodal ReasoningGeneral Benchmarks
Top-1 Accuracy57.8
6
Natural Language Understanding and ReasoningGeneral Benchmarks Italian
ARC-C-it37.47
6
General Language UnderstandingGeneral Benchmarks (MMLU, AlpacaEval, Arena-Hard)
MMLU Accuracy73.41
4
General Language Evaluation12 general benchmarks Avg
General Average Score68.24
3
Showing 7 of 7 rows