Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-form Question Answering on Fava
Loading...
61
VeriScore F1
EWE
37.808
43.829
49.85
55.871
Dec 24, 2024
VeriScore F1
AlpacaEval Win Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
VeriScore F1
AlpacaEval Win Rate
EWE
Base Model=Llama-3.1 70B
2024.12
61
50.1
DRAGIN
Base Model=Llama-3.1 70B
2024.12
57.2
33.9
RA
Base Model=Llama-3.1 70B
2024.12
56.8
37.1
EWE
Base Model=Llama-3.1 8B
2024.12
53.1
36.2
COVE w/ Retrieval
Base Model=Llama-3.1 70B
2024.12
52.6
23.1
Llama-3.1
Base Model=Llama-3.1 70B
2024.12
52
-
RA
Base Model=Llama-3.1 8B
2024.12
51.8
16.8
DRAGIN
Base Model=Llama-3.1 8B
2024.12
51.1
10
Llama-3.1
Base Model=Llama-3.1 8B
2024.12
51
36.5
NEST
Base Model=Llama-3.1 70B
2024.12
50.3
24.1
NEST
Base Model=Llama-3.1 8B
2024.12
50.2
14.1
COVE
Base Model=Llama-3.1 70B
2024.12
49.5
33.4
COVE w/ Retrieval
Base Model=Llama-3.1 8B
2024.12
39.5
5.3
COVE
Base Model=Llama-3.1 8B
2024.12
38.7
11
Feedback
Search any
task
Search any
task