Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OOD Quality Estimation on Mathematical Reasoning OOD (Near-shift)

0.159Kendall Tau

TV Score

0.030040.063520.0970.13048May 22, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.05
0.1590.158
2024.05
0.1310.146
2024.05
0.1230.154
2024.05
0.1130.134
2024.05
0.0740.05
2024.05
0.0590.098
2024.05
0.0570.057
2024.05
0.0380.026
2024.05
0.0380.012
2024.05
0.0360.115
2024.05
0.0360.029
2024.05
0.0350.058