Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scene Text-Centric Visual Question Answering on STVQA
Loading...
0.622
Accuracy
InternVL
0.00008
0.16154
0.323
0.48446
Jul 23, 2024
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
InternVL
Model Generation Capab...
2024.07
0.622
InternLM-XComposer2
Model Generation Capab...
2024.07
0.596
Monkey
Model Generation Capab...
2024.07
0.547
TextHarmony-Chat
Model Generation Capab...
2024.07
0.513
mPLUG-Owl2
Model Generation Capab...
2024.07
0.498
TextHarmony
Model Generation Capab...
2024.07
0.497
TextHarmony*
Model Generation Capab...
2024.07
0.472
DocPedia
Model Generation Capab...
2024.07
0.455
LLaVAR
Model Generation Capab...
2024.07
0.392
LLaVA1.5-7B
Model Generation Capab...
2024.07
0.381
UniDoc
Model Generation Capab...
2024.07
0.352
MM-Interleaved
Model Generation Capab...
2024.07
0.264
SEED-LLaMA-14B
Model Generation Capab...
2024.07
0.201
MiniGPT5
Model Generation Capab...
2024.07
0.024
Feedback
Search any
task
Search any
task