Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sequential Decision Making on HotPotQA
Loading...
4.7
Average Steps per Episode
Teacher (LLaMA-13B)
4.628
5.114
5.6
6.086
May 20, 2025
Average Steps per Episode
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Steps per Episode
Teacher (LLaMA-13B)
Backbone=LLaMA, Model...
2025.05
4.7
Teacher (OPT-13B)
Backbone=OPT, Model Si...
2025.05
4.8
Structured Agent Distillation
Backbone=LLaMA, Model...
2025.05
4.8
Structured Agent Distillation
Backbone=OPT, Model Si...
2025.05
4.9
Structured Agent Distillation
Backbone=OPT, Model Si...
2025.05
5
Token-level
Backbone=LLaMA, Model...
2025.05
5.2
Structured Agent Distillation
Backbone=OPT, Model Si...
2025.05
5.3
Token-level
Backbone=OPT, Model Si...
2025.05
5.3
SeqKD
Backbone=LLaMA, Model...
2025.05
5.5
Token-level
Backbone=OPT, Model Si...
2025.05
5.6
SeqKD
Backbone=OPT, Model Si...
2025.05
5.6
KD
Backbone=OPT, Model Si...
2025.05
5.7
KD
Backbone=LLaMA, Model...
2025.05
5.7
SeqKD
Backbone=OPT, Model Si...
2025.05
5.9
Token-level
Backbone=OPT, Model Si...
2025.05
6
KD
Backbone=OPT, Model Si...
2025.05
6.1
SeqKD
Backbone=OPT, Model Si...
2025.05
6.2
KD
Backbone=OPT, Model Si...
2025.05
6.5
Feedback
Search any
task
Search any
task