Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Fact Verification on FEVER-S
Loading...
54
Accuracy
CPO
43.912
46.531
49.15
51.769
Jun 13, 2024
Accuracy
Latency (s/ins)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Latency (s/ins)
CPO
Backbone=Mistral-7B
2024.06
54
53
ToT
Backbone=LLAMA2-13B
2024.06
51.3
3,433.2
CPO
Backbone=LLAMA2-13B
2024.06
50.7
68.2
ToT
Backbone=Mistral-7B
2024.06
50.5
5,537.8
CoT
Backbone=LLAMA2-13B
2024.06
50
61.1
TS-SFT
Backbone=Mistral-7B
2024.06
49.7
49.5
CPO
Backbone=LLAMA2-7B
2024.06
49
41.2
TS-SFT
Backbone=LLAMA2-13B
2024.06
48.8
58
CoT
Backbone=Mistral-7B
2024.06
48
51
ToT
Backbone=LLAMA2-7B
2024.06
47.5
2,539.5
TS-SFT
Backbone=LLAMA2-7B
2024.06
46
40.4
CoT
Backbone=LLAMA2-7B
2024.06
44.3
40.6
Feedback
Search any
task
Search any
task