| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| PIQA | RS | Accuracy81.34 | 74 | 2d ago | |
| Cut the Rope | Success Rate26.14 | 36 | 1mo ago | ||
| Angry Birds | Qwen-7B^P&I GRPO | Success Rate47.06 | 36 | 1mo ago | |
| Pooltool | Gemini-2.5-Pro | Att. 1 Score36.5 | 36 | 1mo ago | |
| Kinetix | Att. 1 Score26.89 | 36 | 1mo ago | ||
| I-PHYRE DeepPHY (test) | Qwen-3B^I GRPO | Attention Score 137.6 | 36 | 1mo ago | |
| PHYRE DeepPHY (test) | Qwen-7B^P&I GRPO | Att. 1 Score14.92 | 36 | 1mo ago | |
| PIQA | Mistral Small 24B Base 2501 | Accuracy91.3 | 34 | 1mo ago | |
| PhyX | SketchThinker-R1-7B | Accuracy48.6 | 24 | 4d ago | |
| GameBench 1.0 (test) | AVG Score56.1 | 22 | 1mo ago | ||
| PIQA | Accuracy82.21 | 20 | 1mo ago | ||
| PHYBench | AERO | Pass@1 Accuracy5.3 | 12 | 1mo ago | |
| PhysicsEval | AERO | Pass@1 Accuracy87.9 | 12 | 1mo ago | |
| UGPhysics | AERO | Pass@1 Accuracy21.7 | 12 | 1mo ago | |
| PROST | GPT-NeoX | Accuracy29.6 | 12 | 1mo ago | |
| VideoPhy 2 | VideoScoreV2 | Accuracy0.386 | 8 | 1mo ago | |
| CosmosReason1-Bench | Gemini-2.5-Pro | Overall Score64.7 | 8 | 1mo ago | |
| Physics-IQ Single Frame | Phantom | Physics-IQ Score29.59 | 7 | 8d ago | |
| PHYRE-1B cross-template (test) | RPIN | AUCCESS42.2 | 7 | 1mo ago | |
| PHYRE-1B within-template (test) | Dynamics-Aware DQN | AUCCESS86.2 | 7 | 1mo ago | |
| PIQA | Dual | PIQA Normalized Performance40.9 | 6 | 1mo ago | |
| PHYRE Cross-template 1.0 | RPIN | Success Rate50.86 | 6 | 1mo ago | |
| PHYRE Within-template 1.0 | RPIN | Success Rate (AUCCESS)85.49 | 6 | 1mo ago | |
| PHYRE-2B cross-template (test) | Dynamics-Aware DQN | AUCCESS24.3 | 5 | 1mo ago | |
| PHYRE-2B within-template (test) | Dynamics-Aware DQN | AUCCESS77.6 | 5 | 1mo ago |