Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Generalization on StoryCloze, OpenQA, ARC-E, ARC-C combined

87.76Average Accuracy

Trajectory

68.821673.738378.65583.5717Mar 1, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
87.7694.2885.58
2026.03
84.0585.2483.66
2026.03
84.0488.282.65
2026.03
77.2--
2026.03
75.1484.3272.08
2026.03
74.9675.674.74
2026.03
71--
2026.03
69.5580.3165.97