Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Uncertainty Calibration on RewardBench

0.009Kuiper

Verbalized

-00.060750.12150.18225Dec 23, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2025.12
0.0090.007
2025.12
0.0230.027
2025.12
0.0310.033
2025.12
0.0310.031
2025.12
0.0320.038
2025.12
0.0330.035
2025.12
0.0340.035
2025.12
0.0350.045
2025.12
0.0350.059
2025.12
0.0390.071
2025.12
0.0390.052
2025.12
0.040.041
2025.12
0.0410.043
2025.12
0.0650.067
2025.12
0.0670.078
2025.12
0.0760.075
2025.12
0.0780.078
2025.12
0.0920.106
2025.12
0.1110.128
2025.12
0.120.145
2025.12
0.1280.14
2025.12
0.1660.19
2025.12
0.2320.249
2025.12
0.2340.251