Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sequential Decision Making on ALFWorld (val_seen)
Loading...
15,527
Total Wall-Clock Time (s)
Selective-rollout gate
15,452.36
15,956.18
16,460
16,963.82
May 7, 2026
Total Wall-Clock Time (s)
Groups Cut by Gate (of 600)
Gradient L2-Norm
Held-Out Eval (Iter 0, Pre-training)
Held-Out Eval (Iter 60)
Updated 26d ago
Evaluation Results
Method
Method
Links
Total Wall-Clock Time (s)
Groups Cut by Gate (of 600)
Gradient L2-Norm
Held-Out Eval (Iter 0, Pre-training)
Held-Out Eval (Iter 60)
Selective-rollout gate
Backbone=Qwen2.5-7B, L...
2026.05
15,527
106
0.145
39.5
44
baseline
Backbone=Qwen2.5-7B, L...
2026.05
17,393
0
0.125
39.5
41.5
Feedback
Search any
task
Search any
task