Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Reasoning on BigBench Hard

31.1Score

GPT-4o

7.28413.46719.6525.833Aug 8, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
2025.08
31.11
2025.08
30.42
2025.08
30.13
28.74
2025.08
8.25