| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| STRATEGYQA | MarODE(αβ) | Somers' D0.2735 | 15 | 3mo ago | |
| PROOFWRITER | MarODE | Somers' D0.339 | 15 | 3mo ago | |
| ENTAILMENTBANK | MarODE(αγ) | Somers' D0.1773 | 15 | 3mo ago | |
| GSM8K | MarODE | Somers' D0.1858 | 11 | 3mo ago | |
| EmoArt salience extension (test) | FAB-G | Dice (Sample Mean)87.42 | 10 | 16d ago | |
| 120-tool benchmark 500 tasks simulated | Tool Attention | Mean Score4.43 | 5 | 1mo ago | |
| AgriChain 1.0 (test) | AgriChain-VL3B | Faithfulness4.6 | 5 | 1mo ago | |
| 3-player Leduc Hold'em (test) | Qwen2.5-7B_ToolPoker | Hit Rate (HR)193 | 3 | 3mo ago | |
| HelpSteer1 sampled (train) | MA-SAPO | Usefulness Score3.89 | 2 | 2mo ago |