Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on UltraFeedback core250 (held-out evaluation)

3.543Delta (Δ)

TEA

0.062120.965811.86952.77319May 11, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
3.543---72.81.22625.425
2026.05
3.542---73.6026.426.184
2026.05
3.424---73.6026.424.509
2026.05
3.204---73.2026.823.406
2026.05
2.888---71.6028.422.127
2026.05
2.453---66.8033.220.588
2026.05
1.929---63.2036.818.727
2026.05
1.392---59.2040.816.564
2026.05
0.8---53.6046.414.001
2026.05
0.292-6.2840.16460.80.838.4-
2026.05
0.279--0.7350.14665.6034.4-
2026.05
0.249-5.6140.1363.60.835.6-
2026.05
0.248-6.8010.10958.81.639.6-
2026.05
0.24-1.340.11870030-
2026.05
0.212-2.8310.170.8029.2-
2026.05
0.212-4.8580.167.6032.4-
2026.05
0.209-7.3030.005563.240.8-
2026.05
0.196-3.960.09769.6030.4-
2026.05
--1.013------
2026.05
-1.1------
2026.05
-2.619------
2026.05
-3.764------
2026.05
-4.646------
2026.05
-5.364------
2026.05
-5.992------
2026.05
-6.553------
2026.05
-7.093------
2026.05
-------13.202
2026.05
-------15.173
2026.05
-------16.797
2026.05
-------18.135
2026.05
-------19.239
2026.05
-------20.202
2026.05
-------21.085
2026.05
-------21.882
2026.05
-------22.642