Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Flight Recommendation Evaluation
Loading...
0.52
Inference Time per Round (s)
Direct prompting
0.3556
1.4653
2.575
3.6847
Apr 5, 2026
Inference Time per Round (s)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Inference Time per Round (s)
Direct prompting
Backbone=Gemma, Calls=...
2026.04
0.52
Direct prompting
Backbone=Qwen, Calls=1...
2026.04
0.63
Direct prompting
Backbone=Llama, Calls=...
2026.04
0.74
Bayesian Teaching
Backbone=Qwen, Calls=1...
2026.04
1.05
Bayesian Teaching
Backbone=Llama, Calls=...
2026.04
1.34
Bayesian Teaching
Backbone=Gemma, Calls=...
2026.04
1.42
CoT
Backbone=Gemma, Calls=...
2026.04
2.15
CoT
Backbone=Qwen, Calls=1...
2026.04
2.42
CoT
Backbone=Llama, Calls=...
2026.04
2.87
Self-consistency
Backbone=Gemma, Calls=...
2026.04
3.67
Self-consistency
Backbone=Qwen, Calls=5...
2026.04
3.68
Self-consistency
Backbone=Llama, Calls=...
2026.04
3.79
ADAPTFUSE
Backbone=Gemma, Calls=...
2026.04
4.35
ADAPTFUSE
Backbone=Qwen, Calls=5...
2026.04
4.51
ADAPTFUSE
Backbone=Llama, Calls=...
2026.04
4.63
Feedback
Search any
task
Search any
task