Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward Modeling on PPE-IFEval

0.75Accuracy

Gemini-2.5-Flash

0.44840.52670.6050.6833Feb 2, 2026
Updated 4d ago

Evaluation Results

MethodLinks
0.75
2026.02
0.72
2026.02
0.708
2026.02
0.708
2026.02
0.67
2026.02
0.632
2026.02
0.612
2026.02
0.61
2026.02
0.604
2026.02
0.602
2026.02
0.592
2026.02
0.59
0.58
2026.02
0.552
2026.02
0.538
2026.02
0.51
2026.02
0.51
2026.02
0.46