Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Evaluation on Average (GSM8K-CoT, MATH, MBPP, HumanEval)

51.6Accuracy

TAD-Q

43.845.82547.8549.875May 10, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
51.65.08225.2
2026.05
49.95.76257.1
2026.05
46.2146.2
2026.05
45.96.25206.1
2026.05
45.52.3684.7
2026.05
45.53.9128.6
2026.05
44.12.4787.9