| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | VRSBench VQA 1.0 (test) | Category Accuracy65.84 | 15 | |
| Remote Sensing Visual Grounding | VRSBench-VG official (test) | Acc@0.563.72 | 14 | |
| Visual Grounding | VRSBench Ref | IoU@5054.71 | 10 | |
| Visual Question Answering | VRSBench | Avg@563.09 | 10 | |
| Image Captioning | VRSBench-Cap | BLEU-4102.8 | 9 | |
| Long Captioning | VRSBench | BLEU-148.1 | 7 | |
| Remote Sensing Visual Grounding | VRSBench (test) | Accuracy @ 0.5% Threshold63.31 | 7 | |
| Grounding | VRSBench (val) | Accuracy @ IoU 0.549.8 | 5 | |
| VQA | VRSBench (val) | Accuracy77.8 | 5 | |
| Captioning | VRSBench (val) | BLEU-414.7 | 5 | |
| Remote Sensing Visual Grounding | VRSBench-VG (FAST-T) | Accuracy @0.549.71 | 5 | |
| Text-to-image generation | VRSBench | FID4.51 | 3 |