| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Public Physics Benchmarks (GPQA, SciBench, PhysReason) (test) | Reasoning Style Transfer | GPQA Accuracy60.46 | 21 | 15d ago | |
| Physics C-Eval WebInstruct | Qwen3-4B-Base (HEAL) | C-Eval Score80.15 | 18 | 1mo ago | |
| OlyBench Phy | TTS | Acceptance Length4.5 | 12 | 22d ago | |
| IPhO Mechanics | Qwen3-30B + RL (synthetic) | Accuracy40 | 10 | 1mo ago | |
| HCV | Qwen3-30B + RL (synthetic) | Accuracy59 | 10 | 1mo ago | |
| Synthetic Symbolic | Qwen2.5-32B + RL (synthetic) | Accuracy10.4 | 10 | 1mo ago | |
| Synthetic Numeric | Qwen2.5-32B + RL (synthetic) | Accuracy21.9 | 10 | 1mo ago | |
| QFT Synthetic Hard | Accuracy44.5 | 6 | 1mo ago | ||
| QFT Synthetic Medium | Accuracy88.8 | 6 | 1mo ago | ||
| QFT Synthetic Easy | Accuracy89 | 6 | 1mo ago | ||
| arXiv human-adapted | Accuracy (hep-th)60.4 | 6 | 1mo ago | ||
| Human-Adapted (Pedagogical) | Accuracy (Textbooks)61.4 | 6 | 1mo ago | ||
| PhyBench | Ministral 3 | pass@1626 | 6 | 3mo ago | |
| PhysReason | ARYA | Overall Score73.3 | 4 | 2mo ago | |
| PHYSICS | Qwen2.5 32B + RL (synthetic) | Mean Accuracy43.09 | 2 | 1mo ago |