Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Chain-of-Thought ReasoningReasoning Dataset
Accuracy (Acc)86.9
21
Reasoning7 reasoning datasets
Reasoning Accuracy65.74
15
Natural Language GenerationReasoning
ROUGE-174.23
8
System Performance EvaluationReasoning
Throughput194.21
8
Visual ReasoningReasoning
Average Score72.2
5
ClusteringReasoning
Spearman's Rho0.76
5
Tokenizer compressionReasoning
Bits per Token3.51
5
Zero-shot transfer attackReasoning
Attack Success Rate (ASR)0.6
4
Showing 8 of 8 rows