Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on Reasoning Gym Color Cube
Loading...
32
Accuracy (*)
TokUR (EU)
21.6
24.3
27
29.7
May 16, 2025
Accuracy (*)
AUROC
AUPRC
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy (*)
AUROC
AUPRC
TokUR (EU)
Base Model=Qwen-2.5-3B...
2025.05
32
60.08
33.71
TokUR (TU)
Base Model=Qwen-2.5-3B...
2025.05
31.33
58.33
31.14
Self-Certainty
Base Model=Qwen-2.5-3B...
2025.05
28
50.62
30.8
TokUR (AU)
Base Model=Qwen-2.5-3B...
2025.05
28
56.57
30.02
CoT (Lower-Bound)
Base Model=Qwen-2.5-3B...
2025.05
26
-
-
LL
Base Model=Qwen-2.5-3B...
2025.05
24
46.36
26.01
DeepConf
Base Model=Qwen-2.5-3B...
2025.05
24
49.01
24.41
PE
Base Model=Qwen-2.5-3B...
2025.05
22
46.73
28.75
Feedback
Search any
task
Search any
task