| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SocialNetwork (test) | FT (Failure Taxonomy) | Risk Drop-0.1 | 50 | 3mo ago | |
| TwinMarket (test) | Ours (Shapley-based framework) | Risk Drop7,790 | 50 | 3mo ago | |
| EconAgent (test) | Risk Drop-86.35 | 50 | 3mo ago | ||
| ImageNet (test) | Rollout | Deletion Score69.34 | 30 | 2mo ago | |
| LongRA | HETA | MoRF44 | 6 | 1mo ago | |
| IFEval, EvalPlus, MATH, and GAIA2 60 failure cases | AttnLRP | Average Tokens to Fix1.7 | 3 | 1mo ago |