Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

task suite (BoolQ, WinoG., PIQA, OBQA, HellaS., ARC-e, ARC-c)

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Task AccuracyZero-shot task suite (BoolQ, WinoG., PIQA, OBQA, HellaS., ARC-e, ARC-c) (test)
BoolQ Accuracy83.76
15
Showing 1 of 1 rows