Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Evaluation Suite (ARC-e, ARC-c, PIQA, HellaS., WinoG., BoolQ)

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot ClassificationEvaluation Suite (ARC-e, ARC-c, PIQA, HellaS., WinoG., BoolQ) Zero-shot (test)
ARC-e Accuracy92.59
30
Showing 1 of 1 rows