Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Performance Prediction Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Performance PredictionPerformance Prediction Evaluation Suite 70B Model on GSM8k, MATH, BBH, TriviaQA, MBPP, AGIEval, DROP, MMLU-pro (evaluation sets)
Mean Absolute Prediction Error (%)1.55
6
Showing 1 of 1 rows