Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Simple Reasoning on CSQA

91.75Accuracy

Baseline

64.210871.360478.5185.6596Nov 28, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.11
91.75-
2025.11
91.46-
2025.11
90.93-
2025.11
90.7822.05
2025.11
90.68-
2025.11
90.6422.78
2025.11
90.1321.36
2025.11
89.3124.04
2025.11
88.91-
2025.11
87.5522.11
2025.11
68.79-
2025.11
68.73-
2025.11
67.86-
2025.11
65.44-
2025.11
65.27-