Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Knowledge Reasoning on MMLU-CF

75.9Accuracy

GHG-TDA

64.77267.66170.5573.439Feb 10, 2026Feb 15, 2026Feb 20, 2026Feb 25, 2026Mar 2, 2026Mar 7, 2026Mar 13, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.02
75.9
2026.02
75.2
2026.02
75
2026.02
74.4
2026.02
74.3
2026.02
74.1
2026.02
73.2
2026.03
73.2
2026.03
73.2
2026.02
73
2026.02
73
2026.02
72.1
2026.03
72.1
2026.03
72.1
2026.03
72
2026.03
71.6
2026.03
71.5
2026.03
71.5
2026.03
71.5
2026.03
71.4
2026.03
71.3
2026.03
71.1
2026.03
71.1
2026.03
71
2026.03
70.9
2026.03
70.8
2026.03
70.8
2026.03
70.8
2026.03
70.8
2026.03
70.6
2026.03
70.6
2026.03
70.6
2026.03
70.5
2026.03
70.5
2026.03
70.4
2026.03
70.4
2026.03
70.4
2026.03
70.3
2026.03
70.2
2026.03
70.2
2026.03
70.1
2026.03
70.1
2026.03
70
2026.03
69.9
2026.03
69.9
2026.03
69.8
2026.03
69.7
2026.03
69.6
2026.03
69.5
2026.03
69.5
2026.03
69.4
2026.03
69.3
2026.03
66.1
2026.03
65.8
2026.03
65.2