| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| COQA zero-shot (test) | Exact Match (EM)70.85 | 32 | 3mo ago | ||
| CoQA | Llama-2 | Accuracy75.9 | 29 | 3mo ago | |
| CoQA | TAD | PRR40.7 | 22 | 1mo ago | |
| ConvMix 1.0 (test) | CONVINSE | P@1 (All)34.2 | 21 | 3mo ago | |
| LoCoMo Overall | MemWeaver | Avg Rank (F1)1 | 20 | 3mo ago | |
| LoCoMo Single-Hop | LoCoMo | F1 Score42.39 | 20 | 3mo ago | |
| LoCoMo Open-Domain | MemWeaver | F120.73 | 20 | 3mo ago | |
| LoCoMo Temporal | MemWeaver | F1 Score50.83 | 20 | 3mo ago | |
| LoCoMo Multi-Hop | MemWeaver | F1 Score26 | 20 | 3mo ago | |
| CoQA official (test) | Overall F188.8 | 17 | 3mo ago | ||
| CoQA (dev) | UNILM | Overall F10.849 | 14 | 3mo ago | |
| LoCoMo (test) | F1 Score50.7 | 12 | 16d ago | ||
| QReCC (test) | GroGU | EM (%)120 | 12 | 3mo ago | |
| TopiOCQA (test) | EM20.9 | 12 | 3mo ago | ||
| COQA | AutoMix + T | AIBC86.5 | 12 | 3mo ago | |
| CHATRAG BENCH 1.0 (test) | Llama3-ChatQA-1.5-70B | Average Score (w/o HDial)57.14 | 12 | 3mo ago | |
| Abg-CoQA Unambiguous | IntentRL | Overlap F184.4 | 10 | 3mo ago | |
| Abg-CoQA Ambiguous | IntentRL | Overlap F172.9 | 10 | 3mo ago | |
| CoQA | RWKV-7B | F1 Score62.65 | 10 | 3mo ago | |
| QuAC | SINKTRACK | Accuracy53.51 | 9 | 1mo ago | |
| QuAC 3,000 3 | SINKTRACK | Accuracy56.2 | 9 | 1mo ago | |
| QuAC-2 2,000 | SINKTRACK | Accuracy58.05 | 9 | 1mo ago | |
| QuAC 1,000 1 | SINKTRACK | Accuracy59.4 | 9 | 1mo ago | |
| ConvQuestions (test) | P@10.397 | 9 | 3mo ago | ||
| CANARD | EXCORD | F1 Score0.681 | 9 | 3mo ago |