Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on PPE-IFEval

0.75Accuracy

Gemini-2.5-Flash

0.44840.52670.6050.6833Feb 2, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
0.75
2026.02
0.72
2026.02
0.708
2026.02
0.708
2026.02
0.67
2026.02
0.632
2026.02
0.612
2026.02
0.61
2026.02
0.604
2026.02
0.602
2026.02
0.592
2026.02
0.59
0.58
2026.02
0.552
2026.02
0.538
2026.02
0.51
2026.02
0.51
2026.02
0.46