Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OpenCompass

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-task Language Understanding and ReasoningOpenCompass SIQA, GSM8K, WiC, HumanEval, MMLU, CSQA
SIQA66.79
30
Multimodal Evaluation CollectionOpenCompass
OpenCompass Score65.1
19
ReasoningOpenCompass (test)
CMMLU69.58
11
Large Language Model EvaluationOpenCompass
cMMLU84.88
11
Multimodal EvaluationOpencompass
Average Score69.1
10
Large Model Performance PredictionOpenCompass 95% masking September 30, 2024 cutoff (temporal split)
RMSE8.75
10
Visual Question AnsweringOpenCompass
MMBench82.2
6
Multimodal UnderstandingOpenCompass
Average Score67
5
Showing 8 of 8 rows