Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generative Task Evaluation on Generative tasks single-preference setting

78Accuracy (Ignored)

Qwen2.5-7B + RP-Reasoner

-3.1217.943960.06Jan 23, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
787862730.660.761.621.01
2026.01
729498881.060.320.280.55
2026.01
72100100911.520.981.061.19
2026.01
7098100891.10.960.90.99
2026.01
469460671.941.081.961.66
2026.01
389666672.21.041.721.65
2026.01
8100100693.0811.041.71
2026.01
7100100613.221.021.071.79
2026.01
6100100693.280.90.981.72
2026.01
49898673.721.060.921.9
2026.01
29888633.741.21.742.23
2026.01
210096663.980.980.961.97
2026.01
2100100673.2410.941.73
2026.01
0100100673.841.11.642.19
2026.01
010098663.91.241.522.22
2026.01
0100100673.881.121.52.16