Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Plan Generation on RecipeNLG (test)
Loading...
91.3
BLEU
Self-Ask
89.116
89.683
90.25
90.817
Jan 27, 2026
BLEU
Result Violation
Updated 1mo ago
Evaluation Results
Method
Method
Links
BLEU
Result Violation
Self-Ask
reveal_range=k in [1, 4]
2026.01
91.3
15.7
ReAct
reveal_range=k in [1, 4]
2026.01
91.2
59.9
SQ-BCP
reveal_range=k in [1, 4]
2026.01
90.7
5.8
CoT
reveal_range=k in [1, 4]
2026.01
90
64.1
ToS
reveal_range=k in [1, 4]
2026.01
89.8
64.2
Direct Prompt
reveal_range=k in [1, 4]
2026.01
89.7
65.7
ToT
reveal_range=k in [1, 4]
2026.01
89.2
66.5
Feedback
Search any
task
Search any
task