Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Logical Reasoning on BBH (Accuracy, Loss)

52.5Exact Match Accuracy (BBH Logical Reasoning)

Dream-7B-Base + DyStruct

43.34845.72448.150.476May 10, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
52.5--
2026.05
51.7--
2026.05
49.3--
2026.05
44.9--
2026.05
44.8--
2026.05
43.7--
2026.04
-24.81.7
2026.04
-231.689
2026.04
-47.41.529
2026.04
-44.41.54