Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Common-sense Reasoning on BBH

58.27Accuracy

ReMiT

27.236435.293243.3551.4068Jan 13, 2026Jan 16, 2026Jan 20, 2026Jan 23, 2026Jan 27, 2026Jan 30, 2026Feb 3, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
58.27
2026.02
57.1
2026.02
56.5
2026.02
55.32
2026.02
47.21
2026.01
45.56
2026.01
45.05
2026.01
44.97
2026.01
44.91
2026.01
44.9
2026.01
44.56
2026.02
44.37
2026.01
44.26
2026.02
44.2
2026.01
44.06
2026.02
43.39
2026.02
43.13
2026.01
42.88
2026.01
42.5
2026.01
41.97
2026.01
41.4
2026.02
40.45
2026.02
32.07
2026.02
30.87
2026.02
30.38
2026.02
29.33
2026.02
28.43