Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Biased Overtraining Evaluation on ArxivRollBench

0.22RSII

Phi-1

0.17680.46840.761.0516Jul 25, 2025
Updated 8d ago

Evaluation Results

MethodLinks
2025.07
0.225.21
2025.07
0.452
2025.07
0.54.02
2025.07
0.511.76
2025.07
0.516.2
2025.07
0.532.39
2025.07
0.573.27
2025.07
0.643.84
2025.07
0.662.21
2025.07
0.765.84
2025.07
0.782.91
2025.07
0.963.46
2025.07
1.193.66
2025.07
1.33.88