Share your thoughts, 1 month free Claude Pro on usSee more

Long-context document understanding on MMLongBench-Doc

55.8Accuracy

Synthetic Reasoning

Updated 3mo ago

Evaluation Results

Method	Links
Synthetic Reasoning 2026.03		55.8	-
Plain Distillation 2026.03		54.9	-
Qwen3 VL 235B A22B Instruct 2026.03		54.8	-
LongPO 2026.03		53.6	-
No-think 2026.03		52.2	-
Qwen3 VL 32B Instruct 2026.03		51.8	-
Synthetic Reasoning 2026.03		47.5	-
Plain Distillation 2026.03		46.6	-
GPT-4.1 2026.04		45.6	-
Qwen Thinking Traces 2026.03		45	-
No-think 2026.03		43	-
GPT-4o 2024.10		42.9	44.9
GPT-4o 2026.04		42.8	-
GPT-4o 2026.04		42.8	-
Doc-V* (GRPO) 2026.04		42.1	-
MoLoRAG 2026.04		41	-
DocSeeker 2026.04		40.1	-
Mistral 3.1 Small 24B 2026.03		39.9	-
Doc-V* (SFT) 2026.04		39.8	-
DocSeeker-SFT 2026.04		38.6	-
Qwen2.5-VL (RAG Top-5) 2026.04		36.1	-
Claude-3.7-Sonnet 2026.04		33.9	-
URaG 2026.04		33.8	-
Qwen2-VL-72B 2024.10		33.3	35.7
CogDoc 2026.04		33	-
GPT-4V 2026.04		32.4	-
GPT-4o mini 2024.10		29	28.6
Docopilot 2026.04		28.8	-
Baseline-SFT (short-answer) 2026.04		28.8	-
Docopilot 2026.04		28.8	-
GPT-4o mini 2026.04		28.6	-
ARIA 2024.10		28.3	24.6
Gemini-1.5-Pro 2024.10		28.2	20.6
Gemini-1.5-Pro 2026.04		28.2	-
Gemini-1.5-Pro 2026.04		28.2	-
Qwen2.5-VL (Baseline) 2026.04		28	-
Gemini-1.5-Flash 2024.10		27	21.3
VRAG-RL 2026.04		26.6	-
Baseline 2026.04		25.4	-
InternVL3 2026.04		24.1	-
InternVL3 2026.04		24.1	-
SV-RAG 2026.04		23	-
SV-RAG 2026.04		23	-
Qwen2-VL-7B 2024.10		21.3	22.7
M3DocRAG 2026.04		21	-
M3DocRAG 2026.04		21	-
Vis-RAG 2026.04		18.8	-
VisRAG 2026.04		18.8	-
VDocRAG 2026.04		18.4	-
VDocRAG 2026.04		18.4	-
InternVL2-40B 2024.10		18.2	17.9
InternVL-Chat-V1.5 2024.10		14.6	13
Llama3.2-11B 2024.10		13.8	11.3
mPLUG-DocOwl2 2026.04		13.4	-
mPLUG-DocOwl2 2026.04		13.4	-
MiniCPM-V-2.6 2024.10		11.5	11.6
Idefics2 2024.10		7	6.8
Pixtral-12B 2024.10		6.4	6