Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Standard LLM Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning and Question AnsweringStandard LLM Benchmarks (BoolQ, RTE, HellaSWAG, ARC, OpenBookQA, PiQA)
Avg Accuracy67.24
15
Showing 1 of 1 rows