Multimodal Question Answering

Benchmarks

Dataset Name	SOTA Method	Metric
ScienceQA (test)	LLaVa + GPT-4 (judge)	Accuracy92.53	73	1mo ago
MMBench CN	Qwen3-VL-8B	Accuracy84.6	61	22d ago
ScienceQA	CASHEW	Accuracy97.8	61	1mo ago
MMQA		Accuracy70.5	36	4mo ago
MMBench English		MMBen85.7	33	29d ago
WebQA Average	Nemo-Emb-1B	F1 Score50.8	32	1mo ago
MMBench EN	Qwen3-VL-8B	Accuracy86.3	30	22d ago
M2RAG	Ours-top3	B-1 Score42.95	28	2mo ago
SUPERGLASSES 1.0 (Leaderboard)	SUPERLENS‡ (Ours)	Accuracy (Easy)49.68	28	3mo ago
ManyModalQA (test)	MAMMQA	Accuracy (Text)92.5	27	3mo ago
MMBench en (test)		Accuracy89	26	3mo ago
LiveVQA	OpenSearch-VL-32B	Pass@170.5	24	2mo ago
MM-Vet	Qwen3-VL-4B	Total Score68.3	24	4mo ago
Aggregate (Open-WikiTable, 2WikiMQA, InfoSeek, Dyn-VQA, TabFact, WebQA)	MoRE-7B	Average Score55.93	22	3mo ago
WebQA	MoRE-7B	F1-Recall90.92	22	3mo ago
TabFact	MoRE-3B	F1-Recall52.6	22	3mo ago
Dyn-VQA	R1-Distill-Qwen-32B	F1-Recall39.98	22	3mo ago
2WikiMQA	MoRE-7B	F1-Recall55.47	22	3mo ago
Open-WikiTable	MoRE-7B	F1 Recall53.9	22	3mo ago
ScienceQA v1.3 (test)		NAT Score0.9019	21	4mo ago
SEED-Bench	QMoSLoRA	Accuracy (All)71.1	21	4mo ago
MMBench Chinese		MMBCN Score85.1	20	29d ago
MMQA	FES-RAG-top5	BLEU-149.31	20	2mo ago
MMQA k=1		ROrig@k57.4	20	1mo ago
MME-RealWorld-Lite 1.0 (test)	HART-7B	Perception (AD) Acc57.7	19	4mo ago

Showing 25 of 75 rows