Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Science Experiment Simulation on SciWorld
Loading...
69.1
Average Reward
MCTS
64.42
65.635
66.85
68.065
Jul 25, 2025
Average Reward
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Reward
MCTS
Base LLM=Llama2-70B
2025.07
69.1
Tree DPO
Base LLM=Llama2-70B
2025.07
67.4
SFT
Base LLM=Llama2-70B
2025.07
64.6
Feedback
Search any
task
Search any
task