| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Who&When | AEA-4B | Agent Accuracy60.79 | 22 | 8d ago | |
| Who&When Total | AgenTracer (G) | Step-level Accuracy36.22 | 13 | 2mo ago | |
| Who&When Hand-Crafted | Famas | Step-level Accuracy41.38 | 13 | 2mo ago | |
| Who&When Algorithm-Generated | AgenTracer (G) | Step-level Accuracy42.86 | 13 | 2mo ago | |
| Who & When Boundary | All-at-Once | Agent Attribution Accuracy38.71 | 6 | 2mo ago | |
| Who & When Remove ID | All-at-Once | Agent Attribution Accuracy26.47 | 6 | 2mo ago | |
| Who & When Baseline | All-at-Once | Agent Attribution Accuracy54.33 | 6 | 2mo ago | |
| MemTraceBench Overall | MemTrace | ETA54.38 | 4 | 7d ago | |
| MemTraceBench EverMemOS | MemTrace-OBS | ETA11.67 | 4 | 7d ago | |
| MemTraceBench Mem0 | MemTrace | ETA70 | 4 | 7d ago | |
| MemTraceBench RAG | MemTrace-OBS | ETA87.5 | 4 | 7d ago | |
| MemTraceBench Long-Context | MemTrace-OBS | ETA7.5 | 4 | 7d ago | |
| Magentic | Our Baseline | Agent Accuracy81.2 | 2 | 3mo ago | |
| τ-bench | Our Baseline | Agent Accuracy75.9 | 2 | 3mo ago |