Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Plan Generation on WikiHow (test)
Loading...
56.1
ROUGE-1
Self-Ask
45.908
48.554
51.2
53.846
Jan 27, 2026
ROUGE-1
ROUGE-2
Constraint Violation Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-2
Constraint Violation Rate
Self-Ask
reveal_range=k in [1, 4]
2026.01
56.1
47.4
26
ReAct
reveal_range=k in [1, 4]
2026.01
55.8
47.4
76.9
ToS
reveal_range=k in [1, 4]
2026.01
53.3
45.4
82.6
ToT
reveal_range=k in [1, 4]
2026.01
52.9
45.2
94.7
SQ-BCP
reveal_range=k in [1, 4]
2026.01
52.7
45.9
14.9
CoT
reveal_range=k in [1, 4]
2026.01
48.5
44.7
83.2
Direct Prompt
reveal_range=k in [1, 4]
2026.01
46.3
42.1
78.3
Feedback
Search any
task
Search any
task