Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Standard LLM Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning and Question AnsweringStandard LLM Benchmarks (BoolQ, RTE, HellaSWAG, ARC, OpenBookQA, PiQA)
Avg Accuracy67.24
15
Showing 1 of 1 rows