Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form Question Answering on Biography
Loading...
49.7
VeriScore F1
EWE
24.116
30.758
37.4
44.042
Dec 24, 2024
VeriScore F1
AlpacaEval Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
VeriScore F1
AlpacaEval Win Rate
EWE
Base Model=Llama-3.1 70B
2024.12
49.7
50.2
RA
Base Model=Llama-3.1 70B
2024.12
43.8
49.4
DRAGIN
Base Model=Llama-3.1 70B
2024.12
42.8
33.5
EWE
Base Model=Llama-3.1 8B
2024.12
42.2
21.5
NEST
Base Model=Llama-3.1 8B
2024.12
41.8
21.8
NEST
Base Model=Llama-3.1 70B
2024.12
41.5
22.1
RA
Base Model=Llama-3.1 8B
2024.12
41.4
21.3
COVE w/ Retrieval
Base Model=Llama-3.1 70B
2024.12
38.2
29.4
COVE
Base Model=Llama-3.1 70B
2024.12
37.7
31.3
Llama-3.1
Base Model=Llama-3.1 70B
2024.12
37.1
-
DRAGIN
Base Model=Llama-3.1 8B
2024.12
34.7
11.4
COVE w/ Retrieval
Base Model=Llama-3.1 8B
2024.12
29.1
10.2
Llama-3.1
Base Model=Llama-3.1 8B
2024.12
28.9
24.2
COVE
Base Model=Llama-3.1 8B
2024.12
25.1
13.3
Feedback
Search any
task
Search any
task