Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context Question Answering on NarrativeQA (EM)
Loading...
61.7
Exact Match
Qwen2.5-OpAmp-72B
44.02
48.61
53.2
57.79
Feb 18, 2025
Exact Match
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match
Qwen2.5-OpAmp-72B
Parameters=72B, Adapta...
2025.02
61.7
Llama3.3-70B-inst
Parameters=70B, Type=I...
2025.02
61.5
GPT-4o-0806
Version=0806
2025.02
61.5
DeepSeek-V3
Version=V3
2025.02
60.5
Qwen2.5-72B-inst
Parameters=72B, Type=I...
2025.02
60.2
Llama3-ChatQA2-70B
Parameters=70B, Versio...
2025.02
59.8
Llama3.1-OpAmp-8B
Parameters=8B
2025.02
57.4
Llama3.1-8B-inst
Parameters=8B
2025.02
55.9
Llama3-ChatQA2-8B
Parameters=8B
2025.02
53.1
Qwen2.5-7B-inst
Parameters=7B
2025.02
47.7
Mistral-7B-inst-v0.3
Parameters=7B
2025.02
44.7
Feedback
Search any
task
Search any
task