| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| gamma-Bench | FoPO | Guessing Performance94.87 | 28 | 1mo ago | |
| CheckmateInOne | SELF-THOUGHT | Acc@t165.33 | 24 | 3mo ago | |
| stra | SRPO | Pass Rate74.9 | 14 | 8d ago | |
| VariableSum Dollar OOD (held-out variant) | DEPT | Win Rate30.47 | 12 | 22d ago | |
| RandomValue Negotiation OOD (held-out variant) | DEPT | Win Rate17.08 | 12 | 22d ago | |
| HardCore Don'tSayIt OOD (held-out variant) | DEPT | Win Rate22.92 | 12 | 22d ago |