Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Performance Prediction benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Performance Prediction
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
WMT Benchmarks Average WMT'14 & WMT'19 (aggregation)
LLM-PP
MAE
0.29
16
1mo ago
WMT En-De 2019 (val)
LLM-PP
MAE
0.29
16
1mo ago
WMT En-Fr 2014 (val)
LLM-PP
MAE
0.28
16
1mo ago
WMT En-De 2014 (val)
LLM-Distill-PP
MAE
0.22
16
1mo ago
ARC 1.2k (test)
Metabench
MAE
1.14
11
1mo ago
Winogrande (WG) 1.3k (test)
DISCO
MAE
1
11
1mo ago
HellaSwag (HS) 10k (test)
Metabench
MAE
0.8
11
1mo ago
MMLU 14k (test)
DISCO
MAE
1.07
11
1mo ago
Large Model Performance Prediction Dataset 80% masking (test)
STAR
RMSE
7.5
10
1mo ago
MNLI source domains (out-of-domain)
Cosine distance (fine-tuned)
ROC AUC
0.683
10
1mo ago
MNLI source domains (in-domain)
Cosine distance (fine-tuned)
ROC AUC
0.699
10
1mo ago
Sentiment temporal (out-of-domain)
Cosine distance (fine-tuned)
ROC AUC
0.834
10
1mo ago
Sentiment temporal (in-domain)
Cosine distance (fine-tuned)
ROC AUC
0.852
10
1mo ago
Sentiment categories (out-of-domain)
Cosine distance (fine-tuned)
ROC AUC
0.822
10
1mo ago
Sentiment categories (in-domain)
Cosine distance (fine-tuned)
ROC AUC
0.845
10
1mo ago
Tatoeba
Lasso
MAE
5.82
9
1mo ago
MewsliX
MAML
MAE
9.33
9
1mo ago
LAREQA
Average across Tasks
MAE
1.51
9
1mo ago
XQUAD
MDGPR
MAE
3.15
9
1mo ago
TyDiQA
Average within Task
MAE
4.29
9
1mo ago
XCOPA
MDGPR
MAE
1.96
9
1mo ago
PAWS
Lasso
MAE
0.8
9
1mo ago
MLQA
Group Lasso
MAE
2.21
9
1mo ago
Performance Prediction Evaluation Suite 70B Model on GSM8k, MATH, BBH, TriviaQA, MBPP, AGIEval, DROP, MMLU-pro (evaluation sets)
COD (Complete)
Mean Absolute Prediction Error (%)
1.55
6
1mo ago
ARC-LoRA (test)
W2T
MAE
0.32
5
1mo ago
Showing 25 of 33 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs