| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Social Interaction Question Answering | SIQA | Accuracy86.9 | 109 | |
| Commonsense Reasoning | SIQA | Accuracy89.85 | 106 | |
| Social Commonsense Reasoning | SIQA | Accuracy86.9 | 89 | |
| Reasoning | SIQA | Accuracy83.2 | 44 | |
| Social Commonsense Reasoning | SIQA (test) | Accuracy83.3 | 20 | |
| Social Commonsense Question Answering | SIQA | Accuracy80.04 | 14 | |
| Scientific Image Quality Assessment Understanding | SIQA-U | Scientific Completeness60.5 | 14 | |
| Zero-shot Common Sense Reasoning | SIQA | Accuracy (Zero-shot)41.91 | 12 | |
| Reasoning | SIQA (leave-one-out setup) | Average Accuracy82.4 | 12 | |
| Scientific Image Quality Assessment | SIQA-S 1.0 (test) | Perception SRCC0.857 | 12 | |
| Reasoning | SIQA | Accuracy Improvement2.12 | 12 | |
| Reasoning | SIQA (val) | Accuracy35.47 | 9 | |
| Reward Prediction | SIQA (out-of-domain) | Accuracy76.89 | 6 | |
| Commonsense Reasoning | SIQA (test) | Accuracy40.28 | 6 | |
| Social Reasoning | SIQA | Performance (%)15.2 | 6 | |
| Social Interaction Question Answering | SIQA | Normalized PLL Score15.4 | 4 |