Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Reasoning on Average Overall
Loading...
71.3
P@4
MPD
54.452
58.826
63.2
67.574
May 9, 2026
P@4
Token Count
Updated 22d ago
Evaluation Results
Method
Method
Links
P@4
Token Count
MPD
2026.05
71.3
6,100
CRISP
2026.05
70.7
7,600
Direct Comp.
2026.05
68.9
6,900
Vanilla LLM
2026.05
68.2
8,200
LiteCoT
2026.05
56.8
7,300
Chain-of-Draft
2026.05
55.1
5,900
Feedback
Search any
task
Search any
task