Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scene Text-centric Visual Question Answering on AI2D
Loading...
76.1
Accuracy
DeFacto
35.436
45.993
56.55
67.107
Sep 25, 2025
Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Accuracy
DeFacto
Backbone=Qwen2.5-VL-7B
2025.09
76.1
GRIT
Backbone=Qwen2.5-VL-3B
2025.09
75.5
Qwen2.5-VL
Backbone=Qwen2.5-VL-7B
2025.09
69.5
Visual-SR1
Backbone=Qwen2.5-VL-7B
2025.09
69
ViCrop
Backbone=LLaVA-1.5 (Vi...
2025.09
68.8
DeepEyes
Backbone=Qwen2.5-VL-7B
2025.09
37
Feedback
Search any
task
Search any
task