| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Trajectory Ranking | RBM OOD 1.0 (test) | Kendall's Tau-a0.66 | 8 | |
| Reward alignment | RBM-EVAL ID | Pearson r (VOC)0.92 | 8 | |
| Goodness-of-fit testing | RBM perturbation=0.06 | Null Rejection Rate100 | 7 | |
| Goodness-of-fit testing | RBM (perturbation=0.04) | Null Rejection Rate100 | 7 | |
| Goodness-of-fit testing | RBM perturbation=0.02 | Null Rejection Rate100 | 7 | |
| Goodness-of-fit testing | RBM perturbation=0 | Null Rejection Rate0 | 7 |