O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
About
Large Language Models (LLMs), despite their advancements, are fundamentally limited by their static parametric knowledge, hindering performance on tasks requiring open-domain up-to-date information. While enabling LLMs to interact with external knowledge environments is a promising solution, current efforts primarily address closed-end problems. Open-ended questions, which characterized by lacking a standard answer or providing non-unique and diverse answers, remain underexplored. To bridge this gap, we present O$^2$-Searcher, a novel search agent leveraging reinforcement learning to effectively tackle both open-ended and closed-ended questions in the open domain. O$^2$-Searcher leverages an efficient, locally simulated search environment for dynamic knowledge acquisition, effectively decoupling the external world knowledge from model's sophisticated reasoning processes. It employs a unified training mechanism with meticulously designed reward functions, enabling the agent to identify problem types and adapt different answer generation strategies. Furthermore, to evaluate performance on complex open-ended tasks, we construct O$^2$-QA, a high-quality benchmark featuring 300 manually curated, multi-domain open-ended questions with associated web page caches. Extensive experiments show that O$^2$-Searcher, using only a 3B model, significantly surpasses leading LLM agents on O$^2$-QA. It also achieves SOTA results on various closed-ended QA benchmarks against similarly-sized models, while performing on par with much larger ones.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMultihopQA | EM37.4 | 387 | |
| Multi-hop Question Answering | HotpotQA (test) | -- | 255 | |
| Multi-hop Question Answering | 2WikiMultiHopQA (test) | EM37.4 | 195 | |
| Question Answering | PopQA | -- | 186 | |
| Multi-hop Question Answering | Bamboogle | Exact Match34.4 | 128 | |
| Multi-hop Question Answering | HotpotQA | Exact Match (EM)38.8 | 117 | |
| Question Answering | TriviaQA | -- | 112 | |
| Question Answering | HotpotQA | EM38.8 | 109 | |
| Question Answering | 2WikiMultihopQA | EM37.4 | 107 | |
| Multi-hop Question Answering | Bamboogle (test) | EM34.4 | 84 |