| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | GridWorld (full) | Accuracy95 | 22 | |
| Environment Unlearning | GridWorld | Reconstruction Rate (Pre-unlearn)100 | 18 | |
| Anomaly Detection | Gridworld (test) | Mechanism Mean AUPR88.25 | 13 | |
| Anomaly Detection | Gridworld newobj | AUROC98.22 | 13 | |
| Anomaly Detection | Gridworld mechanism | AUROC84.59 | 13 | |
| POMDP Abstraction | GridWorld 3 x 3 | Validation Error0 | 10 | |
| Trajectory Unlearning | GridWorld (test) | Pre-unlearn Score96 | 9 | |
| Trajectory Unlearning | GridWorld | Unlearn Efficacy98 | 9 | |
| State Inference Attack | GridWorld | Pre-unlearn Acc99 | 9 | |
| Proactive Assistance | Gridworld | Speedup24.5 | 8 | |
| Planning | 8x8 two-room gridworld (test) | Validity (%)0.89 | 8 | |
| Reinforcement Learning | GridWorld | Average Update Time (s)0.1419 | 7 | |
| Navigation | Gridworld | Avg Episode Return42.9 | 6 | |
| Flow Matching | GridWorld (test) | Flow Matching Loss1.33 | 5 | |
| Flow Matching | GridWorld (val) | Flow Matching Loss1.25 | 5 | |
| Inverse Transition Learning | Gridworld | Epsilon Matching78 | 5 | |
| Inverse Transition Learning | Gridworld 40% stochastic-policy states | Epsilon Matching Error0.37 | 5 | |
| Reinforcement Learning | Gridworld | Sample Complexity33 | 5 | |
| Deep Reinforcement Learning | Gridworld (test) | Usefulness74.2 | 4 | |
| Navigation Reasoning | Gridworld o.o.d 20x20 (out-of-domain) | Pass@191.5 | 4 | |
| Navigation Reasoning | Gridworld 10x10 (in-domain) | Pass@1100 | 4 | |
| Multi-Objective Constraint Inference | 5x5 Gridworld | CMSE0.027 | 3 | |
| Optimal policy search | GridWorld | Iterations (gamma=0.9)8 | 3 | |
| Generating CFMDPs | GridWorld p = 0.4 | Mean Execution Time (s)0.336 | 2 | |
| Generating CFMDPs | GridWorld p = 0.9 | Mean Execution Time (s)0.261 | 2 |