Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on BIG-Bench Hard (val)

43.46Accuracy

TAIA

11.771219.998128.22536.4519May 30, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2024.05
43.46
2024.05
42.54
2024.05
37.35
2024.05
36.09
2024.05
35.85
2024.05
35.05
2024.05
33.19
2024.05
32.64
2024.05
32.54
2024.05
31.47
2024.05
31.3
2024.05
30.96
2024.05
30.93
2024.05
30.76
2024.05
30.2
2024.05
30.03
2024.05
29.58
2024.05
29.1
2024.05
28.8
2024.05
28.63
2024.05
27.71
2024.05
26.36
2024.05
26
2024.05
24.07
2024.05
23.67
2024.05
23.24
2024.05
22.5
2024.05
22.49
2024.05
22.24
2024.05
21.93
2024.05
20.21
2024.05
19.09
2024.05
18.86
2024.05
16.8
2024.05
13.9
2024.05
12.99