Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Average Reward on AlfWorld (Agent Task)
Loading...
59
Average Reward
Ceiling Model
44.232
48.066
51.9
55.734
Jul 25, 2025
Average Reward
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Reward
Ceiling Model
2025.07
59
W2SG with MCTS
2025.07
57.5
W2SG with Tree DPO
2025.07
56
SFT Strong Model + Best of N
2025.07
55.2
SFT Strong Model
Base Model=Llama-2-13b...
2025.07
53.7
SFT Strong Model
Base Model=Llama-2-13b...
2025.07
51.5
SFT Weak Model
Base Model=Llama-2-7b+SFT
2025.07
44.8
Feedback
Search any
task
Search any
task