Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Plan Generation on Robo Challenge (Online)
Loading...
85.7
Plan Accuracy
State-Aware CP-SAT Repair
-3.428
19.711
42.85
65.989
May 31, 2026
Plan Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Plan Accuracy
State-Aware CP-SAT Repair
Model=Gemini-3-flash
2026.05
85.7
State-Aware CP-SAT Repair
Model=Qwen3.6 35B A3B
2026.05
85
CP-SAT Formalizer
Model=Gemini-3-flash
2026.05
83.6
State-Aware CP-SAT Repair
Model=GPT-5-mini
2026.05
83.6
State-Aware CP-SAT Repair
Model=DeepSeek-V4-Flash
2026.05
83.6
CP-SAT Formalizer
Model=Qwen3.6 35B A3B
2026.05
72.1
Planner
Model=GPT-5-mini
2026.05
35
Planner
Model=Qwen3.6 35B A3B
2026.05
30.7
CP-SAT Formalizer
Model=GPT-5-mini
2026.05
25
Planner
Model=Gemini-3-flash
2026.05
17.1
Planner
Model=DeepSeek-V4-Flash
2026.05
12.9
CP-SAT Formalizer
Model=DeepSeek-V4-Flash
2026.05
3.6
PDDL2.1 Formalizer
Model=Gemini-3-flash
2026.05
2.9
PDDL2.1 Formalizer
Model=GPT-5-mini
2026.05
0
PDDL2.1 Formalizer
Model=DeepSeek-V4-Flash
2026.05
0
PDDL2.1 Formalizer
Model=Qwen3.6 35B A3B
2026.05
0
Feedback
Search any
task
Search any
task