| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Retrieval (text query to multimodal candidate) | MBE 2.0 | R@143.34 | 50 | |
| Image-based Retrieval | MBE benchmark | Recall@126.76 | 20 | |
| Attribute Prediction | MBE 3.0 1.0 (test) | Accuracy49.92 | 13 | |
| Product Classification | MBE 3.0 1.0 (test) | Accuracy86.4 | 13 | |
| Multimodal Retrieval (q^t -> e^i) | MBE 3.0 1.0 (test) | R@110.24 | 13 | |
| Multimodal Retrieval (q^i -> e^t) | MBE 3.0 1.0 (test) | R@111.39 | 13 | |
| Multimodal Retrieval (q^mm -> e^mm) | MBE 3.0 1.0 (test) | Recall@116.79 | 13 | |
| Multimodal Retrieval (q^t -> e^mm) | MBE 3.0 1.0 (test) | Recall@112.57 | 13 | |
| Multimodal Retrieval (q^i -> e^mm) | MBE 3.0 1.0 (test) | Recall@116.14 | 13 | |
| Attribute Prediction | MBE 2.0 | Accuracy84.29 | 10 | |
| Product Classification | MBE 2.0 | Accuracy68.08 | 10 | |
| Product Classification | Our MBE | Accuracy66.57 | 10 | |
| Text-based Retrieval | MBE benchmark | Recall@116.92 | 10 |