Vision Model + Dual Text Decoders
| Method | Links | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.853 | 0.804 | 0.756 | 0.724 | 0.494 | 0.844 | 4.883 | 0.718 | ||
2024.03 | 0.844 | 0.783 | 0.721 | 0.671 | 0.469 | 0.833 | 2.879 | 0.739 | |
2024.03 | 0.835 | 0.777 | 0.722 | 0.685 | 0.471 | 0.816 | 4.37 | 0.689 | |
2024.03 | 0.835 | 0.777 | 0.719 | 0.672 | 0.467 | 0.828 | 3.01 | 0.717 | |
2024.03 | 0.831 | 0.766 | 0.701 | 0.653 | 0.457 | 0.812 | 2.939 | 0.695 | |
2024.03 | 0.824 | 0.753 | 0.683 | 0.631 | 0.448 | 0.807 | 2.791 | 0.688 | |
2024.03 | 0.821 | 0.752 | 0.685 | 0.636 | 0.448 | 0.803 | 3.073 | 0.691 | |
2024.03 | 0.81 | 0.745 | 0.681 | 0.633 | 0.446 | 0.797 | 3.139 | 0.676 |