Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scene Text Visual Question Answering on ST-VQA
Loading...
68.96
Accuracy
Qwen 2.5 VL + ViCrop (rel-att)
51.8208
56.2704
60.72
65.1696
Nov 25, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen 2.5 VL + ViCrop (rel-att)
Base Model=Qwen 2.5 VL...
2025.11
68.96
Qwen 2.5 VL + CropVLM
Base Model=Qwen 2.5 VL...
2025.11
68.31
Qwen 2.5 VL + ViCrop (grad-att)
Base Model=Qwen 2.5 VL...
2025.11
68.09
Qwen 2.5 VL + UV-CoT
Base Model=Qwen 2.5 VL...
2025.11
67.91
Qwen 2.5 VL
Base Model=Qwen 2.5 VL...
2025.11
65.49
LLaVA 1.5 + UV-CoT
Base Model=LLaVA 1.5,...
2025.11
59.3
LLaVA 1.5 + ViCrop (grad-att)
Base Model=LLaVA 1.5,...
2025.11
57.06
LLaVA 1.5 + ViCrop (rel-att)
Base Model=LLaVA 1.5,...
2025.11
56.95
LLaVA 1.5 + CropVLM
Base Model=LLaVA 1.5,...
2025.11
56.81
LLaVA 1.5
Base Model=LLaVA 1.5,...
2025.11
52.48
Feedback
Search any
task
Search any
task