Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-form Question Answering on AlpacaFact
Loading...
66.9
VeriScore F1
EWE
50.676
54.888
59.1
63.312
Dec 24, 2024
VeriScore F1
AlpacaEval Win Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
VeriScore F1
AlpacaEval Win Rate
EWE
Base Model=Llama-3.1 70B
2024.12
66.9
49.9
RA
Base Model=Llama-3.1 70B
2024.12
66
43.1
EWE
Base Model=Llama-3.1 8B
2024.12
65.5
28
DRAGIN
Base Model=Llama-3.1 70B
2024.12
65.3
31.5
Llama-3.1
Base Model=Llama-3.1 8B
2024.12
65.3
26.7
COVE w/ Retrieval
Base Model=Llama-3.1 70B
2024.12
64
28.8
RA
Base Model=Llama-3.1 8B
2024.12
63.9
18.5
Llama-3.1
Base Model=Llama-3.1 70B
2024.12
63.8
-
COVE
Base Model=Llama-3.1 70B
2024.12
61.5
33.3
DRAGIN
Base Model=Llama-3.1 8B
2024.12
61.3
11.1
NEST
Base Model=Llama-3.1 70B
2024.12
58.1
30.2
NEST
Base Model=Llama-3.1 8B
2024.12
57.8
9.1
COVE w/ Retrieval
Base Model=Llama-3.1 8B
2024.12
54.6
12.5
COVE
Base Model=Llama-3.1 8B
2024.12
51.3
15.1
Feedback
Search any
task
Search any
task