Share your thoughts, 1 month free Claude Pro on usSee more

Multi-document Question Answering on Loong

68Average Score

Baseline

Updated 2mo ago

Evaluation Results

Method	Links
Baseline 2026.03		68	0.273	249.1
SPD-RAG 2026.03		58.1	0.103	564.1
ProxyCoT-RL 2026.04		42.51	40.83	-
Normal RAG 2026.03		33	0.08	412.5
Agentic RAG 2026.03		32.8	0.098	334.7
Qwen3-4B-Instruct 2026.04		24.91	37.76	-
ProxyCoT-RL 2026.04		24.32	32.05	-
Gemma3-4B-IT 2026.04		3.55	25.85	-