Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on Reasoning Gym Zebra Puzzles
Loading...
39.33
Accuracy (*)
TokUR (TU)
21.3068
25.9859
30.665
35.3441
May 16, 2025
Accuracy (*)
AUROC
AUPRC
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy (*)
AUROC
AUPRC
TokUR (TU)
Base Model=Qwen-2.5-3B...
2025.05
39.33
71.38
41.42
TokUR (AU)
Base Model=Qwen-2.5-3B...
2025.05
39.33
71.66
41.71
TokUR (EU)
Base Model=Qwen-2.5-3B...
2025.05
37.33
71.28
41.2
CoT (Lower-Bound)
Base Model=Qwen-2.5-3B...
2025.05
33.67
-
-
Self-Certainty
Base Model=Qwen-2.5-3B...
2025.05
30
47.77
22.95
DeepConf
Base Model=Qwen-2.5-3B...
2025.05
26
42.41
21.07
PE
Base Model=Qwen-2.5-3B...
2025.05
24
44.7
21.76
LL
Base Model=Qwen-2.5-3B...
2025.05
22
42.15
21.02
Feedback
Search any
task
Search any
task