Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Image Classification16-dataset Suite Aggregate
ARobust23.21
9
Zero-shot EvaluationSuite 0-shot
Accuracy70.97
5
Recovery EvaluationZero-shot Suite (CSQA, OBQA, PIQA, SIQA, HellaSwag, WinoG, ARC-e, ARC-c, SciQ)
Average Score55.84
3
Showing 3 of 3 rows