Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on BBEH (Accuracy, Delta Avg)

81.2Accuracy

CoT2-Meta

63.83268.34172.8577.359Mar 30, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2026.03
81.210.5
2026.03
77.85.7
2026.03
75.93
2026.03
75.814.5
2026.03
74.4-
2026.03
72.512.2
2026.03
71.28.3
2026.03
69.16.4
2026.03
68.94.8
2026.03
66.83.4
2026.03
65.4-
2026.03
64.5-