Share your thoughts, 1 month free Claude Pro on usSee more

SOTA Comprehensive LLM Evaluation benchmarks and papers with code | Wizwand

Share your thoughts, 1 month free Claude Pro on usSee more

Comprehensive LLM Evaluation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
PostTrainBench (test)		AIME 202553.33		17	1mo ago

Showing 1 of 1 rows