Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense Reasoning on Com^2-hard Intervention (test)

54.77Accuracy

Generator Baseline

2.988416.431729.87543.3183Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
54.77
2026.02
54.77
2026.02
13.69
2026.02
8.3
2026.02
4.98