| PolicyBench Overall Average | Deepseek R1 | Accuracy66.34 | | 11 | 4d ago |
| PolicyBench Level 3 US | | Accuracy77 | | 11 | 4d ago |
| PolicyBench Level 3 CN | | Accuracy80.34 | | 11 | 4d ago |
| PolicyBench Level 2 (US) | | Accuracy68.95 | | 11 | 4d ago |
| PolicyBench Level 2 (CN) | Deepseek R1 | Accuracy62.92 | | 11 | 4d ago |
| PolicyBench Level 1 (US) | Deepseek R1 | Accuracy59.33 | | 11 | 4d ago |
| PolicyBench Level 1 (CN) | Deepseek R1 | Accuracy62.02 | | 11 | 4d ago |
| 20-Link Pole off-policy | BBO | Sum of sqrt MSE415.37 | | 7 | 1mo ago |
| 400-State Random MDP on-policy | BBO | Sum of sqrt MSE24.74 | | 7 | 1mo ago |
| 14-State Boyan Chain on-policy | | Sum of sqrt MSE25.06 | | 7 | 1mo ago |
| 20-Link Pole off-policy | BBO | MSE4.17 | | 7 | 1mo ago |
| 20-Link Pole on-policy | BBO | MSE4.26 | | 7 | 1mo ago |
| Cart-Pole off-policy perfect features | BBO | MSE0.17 | | 7 | 1mo ago |
| Cart-Pole on-policy, perfect features | BBO | MSE0.15 | | 7 | 1mo ago |
| Cart-Pole off-policy, impoverished features | | MSE2.33 | | 7 | 1mo ago |
| Cart-Pole on-policy, impoverished features | | MSE2.37 | | 7 | 1mo ago |
| 400-State Random MDP (off-policy) | BBO | MSE0.11 | | 7 | 1mo ago |
| 400-State Random MDP on-policy | BBO | MSE0.07 | | 7 | 1mo ago |
| 14-State Boyan Chain on-policy | | MSE0.1 | | 7 | 1mo ago |
| 400-State Random MDP off-policy | BBO | Sum of sqrt MSE29.65 | | 6 | 1mo ago |