Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open Jar on Real-world
Loading...
87.5
Success Rate
Base Policy
-0.276
22.512
45.3
68.088
Oct 9, 2025
Oct 28, 2025
Nov 17, 2025
Dec 7, 2025
Dec 26, 2025
Jan 15, 2026
Feb 4, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Base Policy
Source (Number of Demo...
2025.10
87.5
Base Policy
Source (Number of Demo...
2025.10
78.1
Base Policy
Source (Number of Demo...
2025.10
56.3
GeneralVLA
Evaluation Protocol=0-...
2026.02
50
R2RGen
Source (Number of Demo...
2025.10
50
CAP
Evaluation Protocol=0-...
2026.02
36.67
Robopoint
Evaluation Protocol=0-...
2026.02
20
DemoGen
Source (Number of Demo...
2025.10
18.8
Base Policy
Source (Number of Demo...
2025.10
3.1
Feedback
Search any
task
Search any
task