Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Factuality Evaluation on NQ-Swap
Loading...
43.7
Science Category Score
CoDA
32.468
35.384
38.3
41.216
Feb 22, 2025
Science Category Score
EM (%)
Entity Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Science Category Score
EM (%)
Entity Score
CoDA
Backbone=Mistral-7b
2025.02
43.7
-
27.7
CoT
Backbone=Mistral-7b
2025.02
39
-
19.5
CoDA
Backbone=Llama-2-7b-chat
2025.02
38.9
-
26.8
Dola
Backbone=Mistral-7b
2025.02
38.4
-
15.9
SR
Backbone=Mistral-7b
2025.02
38.2
-
13.8
CoT
Backbone=Llama-2-7b-chat
2025.02
36.7
-
19.2
Greedy
Backbone=Mistral-7b
2025.02
36.7
-
12.6
USC
Backbone=Mistral-7b
2025.02
35.9
-
11.4
SR
Backbone=Llama-2-7b-chat
2025.02
35.8
-
14.2
Greedy
Backbone=Llama-2-7b-chat
2025.02
33.4
-
8.5
Dola
Backbone=Llama-2-7b-chat
2025.02
33
-
13.8
USC
Backbone=Llama-2-7b-chat
2025.02
32.9
-
9.4
Feedback
Search any
task
Search any
task