Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Symbolic and Logical Reasoning on CodaSet BBH ID (test)

94.29Accuracy

Qwen3-235B-A22B

84.524487.059789.59592.1303May 25, 2026
Updated 8d ago

Evaluation Results

MethodLinks
94.295.4
2026.05
93.891.9
2026.05
93.891.9
92.852.7
2026.05
92.794.7
92.275.8
2026.05
92.161.6
2026.05
92.165.8
2026.05
91.584.4
91.464.2
90.5413.7
88.064.7
86.11.5
85.641.7
85.351
2026.05
84.91.1