Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PostTrainBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-task EvaluationPostTrainBench
AIME 25 Score29.17
41
Comprehensive LLM EvaluationPostTrainBench (test)
AIME 202553.33
17
General Reasoning AveragePostTrainBench
Average (%)44.81
17
Showing 3 of 3 rows