Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on Reasoning Gym Leg Counting
Loading...
50.67
Accuracy
TokUR (EU)
25.0132
31.6741
38.335
44.9959
May 16, 2025
Accuracy
AUROC
AUPRC
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
AUROC
AUPRC
TokUR (EU)
Base Model=Qwen-2.5-3B...
2025.05
50.67
69.66
51.27
TokUR (TU)
Base Model=Qwen-2.5-3B...
2025.05
48.67
69.58
48.16
TokUR (AU)
Base Model=Qwen-2.5-3B...
2025.05
48.67
69.51
47.69
CoT (Lower-Bound)
Base Model=Qwen-2.5-3B...
2025.05
35
-
-
Self-Certainty
Base Model=Qwen-2.5-3B...
2025.05
34
50.46
32.88
LL
Base Model=Qwen-2.5-3B...
2025.05
32.67
37.92
27.1
DeepConf
Base Model=Qwen-2.5-3B...
2025.05
30
42.48
28.96
PE
Base Model=Qwen-2.5-3B...
2025.05
26
38.87
27.44
Feedback
Search any
task
Search any
task