| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| COQA zero-shot (test) | Exact Match (EM)70.85 | 32 | 4d ago | ||
| CoQA | Llama-2 | Accuracy75.9 | 29 | 4d ago | |
| ConvMix 1.0 (test) | CONVINSE | P@1 (All)34.2 | 21 | 4d ago | |
| LoCoMo Overall | MemWeaver | Avg Rank (F1)1 | 20 | 4d ago | |
| LoCoMo Single-Hop | LoCoMo | F1 Score42.39 | 20 | 4d ago | |
| LoCoMo Open-Domain | MemWeaver | F120.73 | 20 | 4d ago | |
| LoCoMo Temporal | MemWeaver | F1 Score50.83 | 20 | 4d ago | |
| LoCoMo Multi-Hop | MemWeaver | F1 Score26 | 20 | 4d ago | |
| CoQA official (test) | Overall F188.8 | 17 | 4d ago | ||
| CoQA (dev) | UNILM | Overall F10.849 | 14 | 2d ago | |
| QReCC (test) | GroGU | EM (%)120 | 12 | 4d ago | |
| TopiOCQA (test) | EM20.9 | 12 | 4d ago | ||
| COQA | AutoMix + T | AIBC86.5 | 12 | 4d ago | |
| CHATRAG BENCH 1.0 (test) | Llama3-ChatQA-1.5-70B | Average Score (w/o HDial)57.14 | 12 | 4d ago | |
| Abg-CoQA Unambiguous | IntentRL | Overlap F184.4 | 10 | 4d ago | |
| Abg-CoQA Ambiguous | IntentRL | Overlap F172.9 | 10 | 4d ago | |
| CoQA | RWKV-7B | F1 Score62.65 | 10 | 4d ago | |
| ConvQuestions (test) | P@10.397 | 9 | 4d ago | ||
| CANARD | EXCORD | F1 Score0.681 | 9 | 4d ago | |
| QuAC | EXCORD | F1 Score67.7 | 9 | 4d ago | |
| LoCoMo | MemLoRA | L Metric44.5 | 8 | 4d ago | |
| CoQA | MMKE | EM60.3 | 8 | 4d ago | |
| QuAC (test) | F1 Score66.1 | 7 | 4d ago | ||
| ConvMix-5T 1.0 (test) | CONVINSE | P@132.1 | 7 | 4d ago | |
| ShARC Long (dev) | DOCHOPPER | Easy Accuracy72.4 | 7 | 4d ago |