Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Out-of-Domain Reasoning on BGQA, CRUX Eval, Strategy QA, and Table Bench

71.1BGQA Accuracy

QwQ-32B-Preview

48.11654.08360.0566.017May 13, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
71.165.288.251.569
2026.05
68.735.195.646.8-
2026.05
66.8579243.864.9
2026.05
58.359.688.834.2-
2026.05
5358.191.343.261.4
2026.05
51.32885.336.250.2
2026.05
50.338.592.232.453.4
2026.05
50.24688.232.354.2
2026.05
4911.184.434.244.7