| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| CWQ (test) | Binary Accuracy92.88 | 24 | 3mo ago | ||
| RuDE base | DeepSeek-V3.1 | AD93.1 | 16 | 21d ago | |
| MME | MME Score115.88 | 3 | 2mo ago | ||
| POPE Adversarial | Accuracy80.1 | 3 | 2mo ago | ||
| POPE Popular | Accuracy82.7 | 3 | 2mo ago | ||
| POPE Random | Accuracy84.6 | 3 | 2mo ago |