Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PHYRE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Physical ReasoningPHYRE-1B cross-template (test)
AUCCESS42.2
7
Physical ReasoningPHYRE-1B within-template (test)
AUCCESS86.2
7
Physical ReasoningPHYRE Cross-template 1.0
Success Rate50.86
6
Physical ReasoningPHYRE Within-template 1.0
Success Rate (AUCCESS)85.49
6
Physical ReasoningPHYRE-2B cross-template (test)
AUCCESS24.3
5
Physical ReasoningPHYRE cross-template B
AUCCESS56.31
5
Physical ReasoningPHYRE within-template B
AUCCESS85.49
5
Trajectory PredictionPHYRE-C (test)
Avg Prediction Error9.22
4
Trajectory PredictionPHYRE-W (t ∈ [T_train, 2 × T_train])
Average Prediction Error11.1
4
Trajectory PredictionPHYRE-W (train)
Avg Prediction Error1.31
4
PlanningPHYRE cross-task generalization B-tier
AUCCESS42.2
3
PlanningPHYRE within-task generalization B-tier
AUCCESS85.2
3
Showing 12 of 12 rows