| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Explicit Attack | Toy | Avg Queries (E)500 | 17 | |
| Rationalization | Toy (test) | HI-F176.02 | 12 | |
| 2d multi-goal | Toy | Recovery Time (%)3.2 | 8 | |
| Classification | Toy Synthetic Skew (test) | F1 Score99.93 | 7 | |
| Classification | Toy (test) | F1 Score99.92 | 5 | |
| Cell detection | TOY | AP @ IoU=0.5099.98 | 4 | |
| Real2Sim Reconstruction and Interaction Prediction | Toy4K real-world experiment | Stability73.3 | 2 | |
| High-dimensional prediction | Toy-512 | Average Regret0.29 | 2 | |
| High-dimensional prediction | Toy-256 | Average Regret1.29 | 2 | |
| High-dimensional prediction | Toy-128 | Average Regret4.18 | 2 | |
| High-dimensional prediction | Toy-64 | Average Regret5.61 | 2 | |
| Counterfactual Prediction | Toy 4 | MAE (do(n1) -> n2)0.443 | 2 | |
| Counterfactual Prediction | Toy 3 | MAE (do(n1), n2)0.451 | 2 | |
| Counterfactual Prediction | Toy 2 | MAE (do(n1) -> n2)0.303 | 2 | |
| Counterfactual Prediction | Toy 1 | MAE (do(n1) -> n2)0.434 | 2 | |
| Counterfactual Prediction | Toy 4 (test) | MAE (do(n1) -> n2)0.158 | 2 | |
| Counterfactual Prediction | Toy 3 (test) | MAE (do(n1) -> n2)0.443 | 2 |