Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Chain-of-Thought ReasoningReasoning Dataset
Accuracy (Acc)86.9
21
Reasoning7 reasoning datasets
Reasoning Accuracy65.74
15
Natural Language GenerationReasoning
ROUGE-174.23
8
System Performance EvaluationReasoning
Throughput194.21
8
Tokenizer compressionReasoning
Bits per Token3.51
5
Showing 5 of 5 rows