Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Performance Prediction Suite (GSM8k, MATH, BBH, TriviaQA, MBPP, AGIEval, DROP, MMLU-pro)

1.55Mean Absolute Prediction Error (%)

COD (Complete)

1.40042.41023.424.4298Feb 24, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.02
1.552.682.680.790.471.972.421.641.051.39
2025.02
2.245.264.70.52.911.980.895.261.080.57
2025.02
3.1643.860.640.681.7564.113.72
2025.02
5.028.86.718.83.5147.346.780.262.74
2025.02
5.1713.054.235.8813.055.862.550.821.537.42
5.299.399.396.952.335.815.521.415.375.55