| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LoCoMo | ShardMemo | F10.6634 | 36 | 4d ago | |
| LoCoMo Temporal (test) | F1 Score44.09 | 24 | 4d ago | ||
| TransientTables (test) | Gemini-2.0-Flash | EM80.39 | 24 | 4d ago | |
| TimeQA Easy | R-163.4 | 20 | 4d ago | ||
| TIME QUESTIONS 1.0 (test) | QUASAR | P@175.4 | 18 | 4d ago | |
| TimeQA Hard v1 | CoT+RL pipeline | R-10.504 | 12 | 4d ago | |
| TimeQA Easy v1 | CoT+RL pipeline | R-1 Score58 | 12 | 4d ago | |
| TIQ 1.0 (test) | FAITH | P@10.491 | 10 | 4d ago | |
| TimeQA Hard | T5-L-FiD-PIT | EM52.7 | 7 | 4d ago | |
| ReasonQA Multi-hop | T5-large PIT-SFT | Set Accuracy85 | 7 | 4d ago | |
| ReasonQA Single-hop | T5-large PIT-SFT | Set Accuracy95.1 | 7 | 4d ago | |
| ActivityNet RTL | LITA | Score44 | 5 | 4d ago | |
| TIMEQUESTIONS (test) | EXAQT | P@1 (Overall)56.5 | 4 | 4d ago | |
| MultiTQ | - | - | 0 | 4d ago |