Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Out-of-Domain Reasoning Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningOut-of-Domain Reasoning Suite
ARC-c Score94.5
29
Out-of-Domain ReasoningOut-of-Domain Reasoning Suite BGQA, CRUX Eval, Strategy QA, Table Bench
BGQA Accuracy71.1
9
Showing 2 of 2 rows