Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Capability on BBH, GSM8K, MMLU, TruthfulQA, HumanEval, MBPP

26.77Average Score

ADG

22.97423.959524.94525.9305Apr 12, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.04
26.77
2026.04
26.23
2026.04
26.13
2026.04
26.03
2026.04
25.95
2026.04
25.85
2026.04
25.61
2026.04
25.57
2026.04
25.45
2026.04
25.33
2026.04
25.32
2026.04
25.14
2026.04
25.06
2026.04
25.01
2026.04
24.99
2026.04
24.99
2026.04
24.95
2026.04
24.8
2026.04
24.79
2026.04
24.74
2026.04
24.74
2026.04
24.73
2026.04
24.73
2026.04
24.68
2026.04
24.67
2026.04
24.55
2026.04
24.52
2026.04
24.21
2026.04
24.2
2026.04
23.12