Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Performance Prediction on HellaSwag (HS) 10k (test)
Loading...
0.8
MAE
Metabench
0.718
1.2715
1.825
2.3785
Oct 9, 2025
MAE
Rank Correlation
Updated 1mo ago
Evaluation Results
Method
Method
Links
MAE
Rank Correlation
Metabench
Selection=Best for val...
2025.10
0.8
0.974
DISCO
Selection=High JSD, Pr...
2025.10
0.86
0.972
DISCO
Selection=High PDS, Pr...
2025.10
1.01
0.984
tinyBenchmarks
Selection=Anchor-corr,...
2025.10
1.27
0.937
DISCO
Selection=High PDS, Pr...
2025.10
1.32
0.956
Model signature
Selection=Random, Pred...
2025.10
1.36
0.938
Model signature
Selection=Random, Pred...
2025.10
1.49
0.899
DISCO
Selection=High JSD, Pr...
2025.10
1.5
0.944
tinyBenchmarks
Selection=Random, Pred...
2025.10
1.96
0.819
tinyBenchmarks
Selection=Anchor-IRT,...
2025.10
2.19
0.83
Baseline
Selection=Random, Pred...
2025.10
2.85
0.839
Feedback
Search any
task
Search any
task