Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form Question Answering on Fava
Loading...
61
VeriScore F1
EWE
37.808
43.829
49.85
55.871
Dec 24, 2024
VeriScore F1
AlpacaEval Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
VeriScore F1
AlpacaEval Win Rate
EWE
Base Model=Llama-3.1 70B
2024.12
61
50.1
DRAGIN
Base Model=Llama-3.1 70B
2024.12
57.2
33.9
RA
Base Model=Llama-3.1 70B
2024.12
56.8
37.1
EWE
Base Model=Llama-3.1 8B
2024.12
53.1
36.2
COVE w/ Retrieval
Base Model=Llama-3.1 70B
2024.12
52.6
23.1
Llama-3.1
Base Model=Llama-3.1 70B
2024.12
52
-
RA
Base Model=Llama-3.1 8B
2024.12
51.8
16.8
DRAGIN
Base Model=Llama-3.1 8B
2024.12
51.1
10
Llama-3.1
Base Model=Llama-3.1 8B
2024.12
51
36.5
NEST
Base Model=Llama-3.1 70B
2024.12
50.3
24.1
NEST
Base Model=Llama-3.1 8B
2024.12
50.2
14.1
COVE
Base Model=Llama-3.1 70B
2024.12
49.5
33.4
COVE w/ Retrieval
Base Model=Llama-3.1 8B
2024.12
39.5
5.3
COVE
Base Model=Llama-3.1 8B
2024.12
38.7
11
Feedback
Search any
task
Search any
task