Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context Answering with Citations on MultifieldQA
Loading...
79
Citation Recall
GPT-4o
27.832
41.116
54.4
67.684
Sep 4, 2024
Citation Recall
Citation Precision
Citation F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Citation Recall
Citation Precision
Citation F1 Score
GPT-4o
Strategy=LAC-S
2024.09
79
87.9
80.6
LongCite-8B
Strategy=LAC-S
2024.09
74.7
93
80.8
GLM-4
Strategy=LAC-S
2024.09
72.3
80.1
73.6
Mistral-Large-Instruct
Strategy=LAC-S
2024.09
71.8
80.7
73.8
LongCite-9B
Strategy=LAC-S
2024.09
67.3
91
74.8
Claude-3-sonnet
Strategy=LAC-S
2024.09
64.7
85.8
71.3
Llama-3.1-70B-Instruct
Strategy=LAC-S
2024.09
53.2
65.2
53.9
GLM-4-9B-chat
Strategy=LAC-S
2024.09
51.1
60.6
52
Llama-3.1-8B-Instruct
Strategy=LAC-S
2024.09
29.8
44.3
31.6
Feedback
Search any
task
Search any
task