Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CSR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Common-sense reasoningCSR (ARC-Easy, ARC-Challenge, BoolQ, PIQA, SIQA, HellaSwag, OpenBookQA, WinoGrande) zero-shot lm-evaluation-harness v0.4.2
Accuracy68.95
32
Commonsense ReasoningCSR (Commonsense Reasoning Suite)
Average Accuracy72
10
Common Sense ReasoningCSR zero-shot
CF Score5.2
2
Showing 3 of 3 rows