Share your thoughts, 1 month free Claude Pro on usSee more

SOTA Large Language Model Reasoning benchmarks and papers with code | Wizwand

Share your thoughts, 1 month free Claude Pro on usSee more

Large Language Model Reasoning

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
3 LLM Tasks (CMMLU, GSM8K, HumanEval) (test)		Average Accuracy40.4		7	5mo ago

Showing 1 of 1 rows