Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Planning on Commonly solved planning tasks F5-3 vs lm
Loading...
0.33
Runtime Ratio
DeepSeek
0.326
0.353
0.38
0.407
Aug 19, 2025
Runtime Ratio
Runtime Success Rate
Plan Length Ratio
Plan Length Equality Rate
Updated 27d ago
Evaluation Results
Method
Method
Links
Runtime Ratio
Runtime Success Rate
Plan Length Ratio
Plan Length Equality Rate
DeepSeek
Configuration=F5-3 (be...
2025.08
0.33
98
1.43
34
Qwen
Configuration=F5-3 (be...
2025.08
0.36
98
1.65
32
Llama
Configuration=F5-3 (be...
2025.08
0.37
98
1.77
39
GPT
Configuration=F5-3 (be...
2025.08
0.43
97
2.19
28
Feedback
Search any
task
Search any
task