| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Cover Object (Combined) | GPT-4o Image QA | TPR100 | 16 | 1mo ago | |
| Cover Object Policy Success Rate: 3% (Out-of-Distribution) | GPT-4o Image QA | TPR100 | 16 | 1mo ago | |
| Cover Object Policy Success Rate: 98% (In-Distribution) | TPR100 | 16 | 1mo ago | ||
| Close Box Combined | GPT-4o Image QA | TPR100 | 12 | 1mo ago | |
| Close Box Out-of-Distribution | GPT-4o Image QA | TPR100 | 12 | 1mo ago | |
| Close Box In-Distribution | TPR100 | 12 | 1mo ago |