Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Language Model Evaluation on Average of 10 tasks

45.02Overall Performance

T-SPIN

41.151242.155643.1644.1644Jan 13, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
45.020.12
2026.01
44.91.34
2026.01
44.17-
2026.01
43.560.24
2026.01
43.320.61
2026.01
42.71-
2026.01
42.6-
2026.01
42.52-0.08
2026.01
42.51-
2026.01
42.321.02
2026.01
42.3-0.22
2026.01
41.3-1