Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward-wise QA fairness and alignment on OQA

0.927JS Divergence (FI)

BASE

0.9243840.9420420.95970.977358Apr 5, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
0.9270.92210.88860.6140.6810.6290.4990.5610.4770.989
2026.04
0.92830.92470.89830.6540.7220.6830.5390.6010.5320.988
2026.04
0.92850.93080.90160.650.7630.6790.5310.6580.5510.99
2026.04
0.93090.92740.90950.6840.7410.6950.5710.6230.5550.967
2026.04
0.93180.93370.92880.7680.8350.8350.620.7140.70.979
2026.04
0.93290.93840.91050.7040.7810.6990.6010.6530.5830.967
2026.04
0.9610.95960.94480.7220.7860.7420.6750.7340.6250.966
2026.04
0.96210.96440.96320.7970.860.8760.7350.8230.8190.973
2026.04
0.96210.96450.96170.8010.8590.8770.7350.8240.8060.996
2026.04
0.99020.99260.98280.7480.8230.8120.6850.7810.7070.921
2026.04
0.99090.98870.97170.7220.780.720.6860.7350.5961
2026.04
0.99180.99370.99020.8030.8560.8780.7320.8110.7921
2026.04
0.99220.9940.99230.8160.8720.8720.7650.8410.8151
2026.04
0.99230.99460.99250.7960.8580.8690.7360.8280.8160.991
2026.04
0.99240.99420.99390.8140.8720.8710.7660.8420.8230.985