Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PHASE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Utterance ClassificationPhase 2 debugging Overall 13 turns
F1 Score91
12
Relational inferencePHASE
Graph Accuracy79.21
10
Trajectory PredictionPHASE (test)
ADE0.801
10
Profile ClassificationPhase 10x10 grid 3
Profile Accuracy48.9
7
Criterion Validity AnalysisPhase 2
Spearman's Rho0.351
6
Criterion Validity AnalysisPhase 1
Spearman's rho0.607
6
Direct Verifier EvaluationPhase 2 (test)
Actual Accuracy15
4
Trial outcome predictionPhase 2
Log Loss0.629
3
Trial outcome predictionPhase 1 (trials)
Log Loss0.565
3
Showing 9 of 9 rows