Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Plan Generation on Commonly solved tasks
Loading...
0.36
Runtime Ratio
DeepSeek
0.3388
0.4819
0.625
0.7681
Aug 19, 2025
Runtime Ratio
Runtime F5-3 Success Rate
Plan Length Ratio
Plan Length F5-3 Success Rate
Updated 27d ago
Evaluation Results
Method
Method
Links
Runtime Ratio
Runtime F5-3 Success Rate
Plan Length Ratio
Plan Length F5-3 Success Rate
DeepSeek
N=1442, Configuration=...
2025.08
0.36
97
1.1
61
GPT
N=1112, Configuration=...
2025.08
0.42
96
1.57
42
Qwen
N=1324, Configuration=...
2025.08
0.43
96
1.29
56
Llama
N=1043, Configuration=...
2025.08
0.89
88
1.34
49
Feedback
Search any
task
Search any
task