Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on Rule-chaining
Loading...
84
Accuracy
OVM
13.384
31.717
50.05
68.383
May 22, 2026
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
OVM
Base model=Qwen2.5, Tr...
2026.05
84
Self-Consistency
Base model=Qwen2.5, Tr...
2026.05
78
PT-SFT
Base model=Qwen2.5, Tr...
2026.05
77
HyperGuide
Base model=Qwen2.5, Tr...
2026.05
77
Self-Consistency
Base model=Mistral, Tr...
2026.05
74.5
HyperGuide
Base model=GPT-OSS, Tr...
2026.05
74
SoftCoT
Base model=Qwen2.5, Tr...
2026.05
73.5
PT-SFT
Base model=GPT-OSS, Tr...
2026.05
72
PT-SFT
Base model=Mistral, Tr...
2026.05
72
OVM
Base model=Mistral, Tr...
2026.05
71.5
OVM
Base model=GPT-OSS, Tr...
2026.05
70
HyperGuide
Base model=Mistral, Tr...
2026.05
70
SoftCoT
Base model=GPT-OSS, Tr...
2026.05
67.5
SoftCoT
Base model=Mistral, Tr...
2026.05
63.5
Few-shot
Base model=Mistral, Tr...
2026.05
54
Few-shot
Base model=Qwen2.5, Tr...
2026.05
53
Tree of Thoughts
Base model=Qwen2.5, Tr...
2026.05
52
Self-Consistency
Base model=GPT-OSS, Tr...
2026.05
52
Tree of Thoughts
Base model=GPT-OSS, Tr...
2026.05
50
Tree of Thoughts
Base model=Mistral, Tr...
2026.05
50
Few-shot
Base model=GPT-OSS, Tr...
2026.05
16.1
Feedback
Search any
task
Search any
task