Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long Context Question Answering on MultiFieldQA (Accuracy)
Loading...
57.33
Accuracy
POP
-0.65
14.4025
29.455
44.5075
Feb 3, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
POP
Backbone=Gemma-3-12B-I...
2026.02
57.33
Full Model
Backbone=Gemma-3-12B-I...
2026.02
55.9
Wanda
Backbone=Gemma-3-12B-I...
2026.02
55.28
Full Model
Backbone=Llama-3.1-8B-...
2026.02
54.57
Full Model
Backbone=Qwen3-VL-8B-I...
2026.02
53.53
POP
Backbone=Llama-3.1-8B-...
2026.02
52.88
Wanda
Backbone=Qwen3-VL-8B-I...
2026.02
52.87
Wanda
Backbone=Llama-3.1-8B-...
2026.02
52.8
POP
Backbone=Qwen3-VL-8B-I...
2026.02
52.34
SliceGPT
Backbone=Qwen3-VL-8B-I...
2026.02
40.76
ShortGPT
Backbone=Qwen3-VL-8B-I...
2026.02
21.44
SliceGPT
Backbone=Llama-3.1-8B-...
2026.02
12.35
SliceGPT
Backbone=Gemma-3-12B-I...
2026.02
10.83
ShortGPT
Backbone=Llama-3.1-8B-...
2026.02
6.8
ShortGPT
Backbone=Gemma-3-12B-I...
2026.02
1.58
Feedback
Search any
task
Search any
task