Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Structural Reasoning on C2S
Loading...
77.9
Accuracy
GPT-4o
48.26
55.955
63.65
71.345
May 2, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
GPT-4o
Evaluation Protocol=0-...
2025.05
77.9
Always Tell Me The Odds
Backbone=Qwen2.5-14B-I...
2025.05
75.6
Always Tell Me The Odds
Backbone=Qwen2.5-14B-I...
2025.05
75.2
Always Tell Me The Odds
Backbone=Qwen2.5-14B-I...
2025.05
73.5
DeepSeek-R1-Distill-Qwen-32B
Evaluation Protocol=0-...
2025.05
68.5
Always Tell Me The Odds
Backbone=Qwen2.5-7B-In...
2025.05
68.4
Always Tell Me The Odds
Backbone=Qwen2.5-8B-In...
2025.05
65.5
RoBERTa-L
Type=Encoder, Reasonin...
2025.05
50.6
Llama-3-Instruct
Evaluation Protocol=Pr...
2025.05
49.4
Feedback
Search any
task
Search any
task