Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering

About

Large Language Models (LLMs), despite their advancements, are fundamentally limited by their static parametric knowledge, hindering performance on tasks requiring open-domain up-to-date information. While enabling LLMs to interact with external knowledge environments is a promising solution, current efforts primarily address closed-end problems. Open-ended questions, which characterized by lacking a standard answer or providing non-unique and diverse answers, remain underexplored. To bridge this gap, we present O$^2$-Searcher, a novel search agent leveraging reinforcement learning to effectively tackle both open-ended and closed-ended questions in the open domain. O$^2$-Searcher leverages an efficient, locally simulated search environment for dynamic knowledge acquisition, effectively decoupling the external world knowledge from model's sophisticated reasoning processes. It employs a unified training mechanism with meticulously designed reward functions, enabling the agent to identify problem types and adapt different answer generation strategies. Furthermore, to evaluate performance on complex open-ended tasks, we construct O$^2$-QA, a high-quality benchmark featuring 300 manually curated, multi-domain open-ended questions with associated web page caches. Extensive experiments show that O$^2$-Searcher, using only a 3B model, significantly surpasses leading LLM agents on O$^2$-QA. It also achieves SOTA results on various closed-ended QA benchmarks against similarly-sized models, while performing on par with much larger ones.

Jianbiao Mei, Tao Hu, Daocheng Fu, Licheng Wen, Xuemeng Yang, Rong Wu, Pinlong Cai, Xinyu Cai, Xing Gao, Yu Yang, Chengjun Xie, Botian Shi, Yong Liu, Yu Qiao• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA
EM37.4
387
Multi-hop Question AnsweringHotpotQA (test)--
255
Multi-hop Question Answering2WikiMultiHopQA (test)
EM37.4
195
Question AnsweringPopQA--
186
Multi-hop Question AnsweringBamboogle
Exact Match34.4
128
Multi-hop Question AnsweringHotpotQA
Exact Match (EM)38.8
117
Question AnsweringTriviaQA--
112
Question AnsweringHotpotQA
EM38.8
109
Question Answering2WikiMultihopQA
EM37.4
107
Multi-hop Question AnsweringBamboogle (test)
EM34.4
84
Showing 10 of 23 rows

Other info

Follow for update