Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on PrOntoQA
Loading...
91.4
Calibrated Accuracy
Llama 3.1 8B
47.824
59.137
70.45
81.763
May 29, 2024
Sep 24, 2024
Jan 21, 2025
May 20, 2025
Sep 16, 2025
Jan 13, 2026
May 12, 2026
Calibrated Accuracy
Updated 21d ago
Evaluation Results
Method
Method
Links
Calibrated Accuracy
Llama 3.1 8B
Model Size=8B
2026.05
91.4
Qwen2.5-7B
Model Size=7B
2026.05
80.7
Gemma 3 27B
Model Size=27B
2026.05
77.4
DS-R1-7B
Model Size=7B
2026.05
77.1
Qwen2.5-32B
Model Size=32B
2026.05
67.8
SC+IC (tune)
Backbone=MIXTRAL-8×7B,...
2024.05
63.8
Qwen 3.5 9B
Model Size=9B
2026.05
63.5
SC+IC (tune)
Backbone=MISTRAL-7B, P...
2024.05
60.4
SC+IC (tune)
Backbone=MIXTRAL-8×7B,...
2024.05
59.3
SC+IC (tune)
Backbone=LLAMA-2-13B,...
2024.05
56.6
SC+IC (tune)
Backbone=MISTRAL-7B, P...
2024.05
56.6
SC+IC (tune)
Backbone=LLAMA-2-7B, P...
2024.05
55.7
SC+IC (tune)
Backbone=LLAMA-2-13B,...
2024.05
54.5
DS-R1-32B
Model Size=32B
2026.05
52.9
SC+IC (tune)
Backbone=LLAMA-2-7B, P...
2024.05
50.8
Qwen2.5-14B
Model Size=14B
2026.05
49.6
DS-R1-14B
Model Size=14B
2026.05
49.5
Feedback
Search any
task
Search any
task