Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StoryCloze, OpenQA, ARC

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-task GeneralizationStoryCloze, OpenQA, ARC-E, ARC-C combined
Average Accuracy87.76
8
Showing 1 of 1 rows