| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| PopQA | Llama 2 | Accuracy18 | 32 | 1mo ago | |
| DyKnow 130 time-sensitive facts Wikidata-derived | Correctness80 | 24 | 1mo ago | ||
| Wikidata knowledge infusion | PretrainRL | Accuracy64.69 | 18 | 1mo ago | |
| Factual Evaluation Suite HHEM, PopQA, TriviaQA | HHEM Accuracy96.22 | 12 | 15d ago | ||
| WikiBench | Genius | WikiBench Score28.75 | 3 | 1mo ago |