GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?
About
In the real world, knowledge is constantly evolving, which can render existing knowledge-based datasets outdated. This unreliability highlights the critical need for continuous updates to ensure both accuracy and relevance in knowledge-intensive tasks. To address this, we propose GrowOVER-QA and GrowOVER-Dialogue, dynamic open-domain QA and dialogue benchmarks that undergo a continuous cycle of updates, keeping pace with the rapid evolution of knowledge. Our research indicates that retrieval-augmented language models (RaLMs) struggle with knowledge that has not been trained on or recently updated. Consequently, we introduce a novel retrieval-interactive language model framework, where the language model evaluates and reflects on its answers for further re-retrieval. Our exhaustive experiments demonstrate that our training-free framework significantly improves upon existing methods, performing comparably to or even surpassing continuously trained language models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | GROWOVER-QA Contriever (NEW) | F1 Score (Month 9)23.6 | 10 | |
| Dialogue | GROWOVER-DIALOGUE (UNCHANGED) | BLEU (Month 9)4.68 | 6 | |
| Dialogue | GROWOVER-DIALOGUE (NEW) | BLEU (Month 9)5.36 | 6 | |
| Dialogue Response Generation | GROWOVER-DIALOGUE (CHANGED) | BLEU (Month 9)7.26 | 6 | |
| Dialogue Response Generation | GROWOVER-DIALOGUE (ALL) | BLEU Score (Month 9)4.7 | 6 | |
| Question Answering | GROWOVER-QA (All) | Score 944.9 | 6 | |
| Question Answering | GROWOVER-QA (New split) | QA Score 939.4 | 6 | |
| Question Answering | GROWOVER-QA | QA Score 928.2 | 6 | |
| Question Answering | GROWOVER-QA (Unchanged) | Metric 9 (GROWOVER-QA)45.7 | 6 | |
| Dialogue Response Generation | GROWOVER-DIALOGUE (NEW) | Metric Value (Month 9)3.61 | 5 |