Learning Dense Representations of Phrases at Scale
About
Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019). However, current phrase retrieval models heavily depend on sparse representations and still underperform retriever-reader approaches. In this work, we show for the first time that we can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA. We present an effective method to learn phrase representations from the supervision of reading comprehension tasks, coupled with novel negative sampling methods. We also propose a query-side fine-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference. On five popular open-domain QA datasets, our model DensePhrases improves over previous phrase retrieval models by 15%-25% absolute accuracy and matches the performance of state-of-the-art retriever-reader models. Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs. Finally, we directly use our pre-indexed dense phrase representations for two slot filling tasks, showing the promise of utilizing DensePhrases as a dense knowledge base for downstream tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open Question Answering | Natural Questions (NQ) (test) | Exact Match (EM)41.6 | 134 | |
| Open-domain Question Answering | TriviaQA (test) | Exact Match56.3 | 80 | |
| Passage retrieval | TriviaQA (test) | Top-100 Acc85.8 | 67 | |
| Open-domain Question Answering | TriviaQA | EM50.7 | 62 | |
| Open-domain Question Answering | WebQuestions (WebQ) (test) | Exact Match (EM)41.5 | 55 | |
| Open-domain Question Answering | NQ (Natural Questions) | EM40.9 | 33 | |
| Open-domain Question Answering | CuratedTREC (test) | Exact Match (EM)33.6 | 26 | |
| End-to-end Open-Domain Question Answering | TREC (test) | Exact Match (EM)53.9 | 21 | |
| Reading Comprehension | SQuAD (dev) | F1 Score0.863 | 15 | |
| Open-domain Question Answering | Natural Questions (NQ) (test) | Accuracy40.9 | 14 |