Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form Question Answering on LongFact
Loading...
75.9
VeriScore F1
EWE
42.828
51.414
60
68.586
Dec 24, 2024
VeriScore F1
AlpacaEval Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
VeriScore F1
AlpacaEval Win Rate
EWE
Base Model=Llama-3.1 70B
2024.12
75.9
50.1
RA
Base Model=Llama-3.1 70B
2024.12
72.7
41.2
DRAGIN
Base Model=Llama-3.1 70B
2024.12
71.5
38.2
COVE w/ Retrieval
Base Model=Llama-3.1 70B
2024.12
67.4
31.8
EWE
Base Model=Llama-3.1 8B
2024.12
67.3
40.5
RA
Base Model=Llama-3.1 8B
2024.12
65.9
28.1
Llama-3.1
Base Model=Llama-3.1 70B
2024.12
64.3
-
DRAGIN
Base Model=Llama-3.1 8B
2024.12
63.9
15.9
COVE
Base Model=Llama-3.1 70B
2024.12
63.8
39.3
NEST
Base Model=Llama-3.1 70B
2024.12
63.2
9.1
Llama-3.1
Base Model=Llama-3.1 8B
2024.12
63.1
40.6
NEST
Base Model=Llama-3.1 8B
2024.12
62.3
4.2
COVE w/ Retrieval
Base Model=Llama-3.1 8B
2024.12
53.5
12.2
COVE
Base Model=Llama-3.1 8B
2024.12
44.1
8.8
Feedback
Search any
task
Search any
task