Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Aggregated Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Quantization Performance SummaryAggregated Benchmarks HellaSwag, MMLU, Arc-C, MATH-500
Average Score1.014
22
Reward ModelingAggregated Benchmarks Macro
Average Score (excl. MM-RB, VL-RB)74.74
12
Overall Language Model EvaluationAggregated Benchmarks STEM Code IF General
Average Score61.7
7
Showing 3 of 3 rows