Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Robotic Task Generation on 900 Generated Tasks Suite
Loading...
99.1
Task Clarity
RoboGene
53.5792
65.3971
77.215
89.0329
Feb 18, 2026
Task Clarity
Type Consistency
Logical Validity
Object Coverage
Skill Coverage
Physical Feasibility
Updated 3mo ago
Evaluation Results
Method
Method
Links
Task Clarity
Type Consistency
Logical Validity
Object Coverage
Skill Coverage
Physical Feasibility
RoboGene
2026.02
99.1
98.76
98.99
63.23
91.52
98.99
Human
2026.02
96.44
84.11
94.89
35.8
17.69
94.78
Gemini 2.5 Pro
2026.02
87.91
86.71
66.91
21.72
24.58
67.9
GPT-4o
2026.02
79.22
85.55
76.22
31.05
25.42
74.67
Rule-based
2026.02
55.33
39
40.44
21.45
48.4
48.11
Feedback
Search any
task
Search any
task