Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abductive Reasoning on Disamb-QA
Loading...
93
Accuracy
PACS
54.52
64.51
74.5
84.49
May 8, 2026
Accuracy
Updated 23d ago
Evaluation Results
Method
Method
Links
Accuracy
PACS
Backbone=Llama 3.3 70B
2026.05
93
SC-20
Backbone=Llama 3.3 70B
2026.05
92
COT
Backbone=Llama 3.3 70B
2026.05
89
PACS
Backbone=Llama 3-Instr...
2026.05
88
If-Beam
Backbone=Llama 3.3 70B
2026.05
88
ARGOS
Backbone=Llama 3.3 70B
2026.05
83
SC-20
Backbone=Llama 3-Instr...
2026.05
81
ARGOS
Backbone=Llama 3-Instr...
2026.05
80
LoT
Backbone=Llama 3.3 70B
2026.05
78
COT
Backbone=Llama 3-Instr...
2026.05
74
If-Beam
Backbone=Llama 3-Instr...
2026.05
72
LoT
Backbone=Llama 3-Instr...
2026.05
71
LLM-Tree
Backbone=Llama 3-Instr...
2026.05
56
LLM-Tree
Backbone=Llama 3.3 70B
2026.05
56
Feedback
Search any
task
Search any
task