Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
High-level planning on Simulated Tasks >7 actions (Long split)
Loading...
65.18
Success Rate
Gemini
46.4184
51.2892
56.16
61.0308
Oct 13, 2024
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini
Planning LLM=Gemini, D...
2024.10
65.18
GPT
Planning LLM=GPT, DINO...
2024.10
56
GPT
Planning LLM=GPT, DINO...
2024.10
48.28
GPT
Planning LLM=GPT, DINO...
2024.10
47.14
Feedback
Search any
task
Search any
task