Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on Distractor hard
Loading...
62.2
Accuracy (Distractor hard)
NWCAD
28.504
37.252
46
54.748
Apr 17, 2026
Accuracy (Distractor hard)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (Distractor hard)
NWCAD
Model=Llama-3.1-70B
2026.04
62.2
Baseline
Model=Llama-3.1-70B
2026.04
58.4
With-context
Model=Llama-3.1-70B
2026.04
57.8
NWCAD
Model=Ministral-3-8B
2026.04
55.83
AdaCAD
Model=Llama-3.1-70B
2026.04
54.8
AdaCAD
Model=Ministral-3-8B
2026.04
53.21
With-context
Model=Ministral-3-8B
2026.04
52.82
NWCAD
Model=Llama-3.1-8B
2026.04
52.16
CoCoA
Model=Llama-3.1-70B
2026.04
52
With-context
Model=Llama-3.1-8B
2026.04
51.25
AdaCAD
Model=Llama-3.1-8B
2026.04
48.49
Baseline
Model=Ministral-3-8B
2026.04
41.42
CAD
Model=Ministral-3-8B
2026.04
41.02
Baseline
Model=Llama-3.1-8B
2026.04
40.81
CoCoA
Model=Ministral-3-8B
2026.04
40.5
CoCoA
Model=Llama-3.1-8B
2026.04
39.71
CAD
Model=Llama-3.1-8B
2026.04
32.5
CAD
Model=Llama-3.1-70B
2026.04
29.8
Feedback
Search any
task
Search any
task