Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Outcome Reasoning on Code-Preference

77M' (F1 Mean)

GPT-5

45.38453.59261.870.008May 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
7771
2025.05
74.466.8
2025.05
62.755.9
2025.05
58.351.6
2025.05
57.150.4
2025.05
54.748.3
2025.05
46.640