| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MAQA-ΔK−1 | P(True) | KL Divergence-0.149 | 48 | 1mo ago | |
| MAQA | P(True) | Hamming Distance0.04 | 28 | 1mo ago | |
| JEC-QA | InternVL3-2B | Score64.2 | 4 | 22d ago | |
| Mmlu Multi-Answer | InternVL3-2B | Overall Score61.45 | 4 | 22d ago | |
| AMBIGQA (test) | recall-then-verify | F1 (All Questions)46.2 | 3 | 1mo ago | |
| AMBIGQA (dev) | recall-then-verify | F1 (all questions)52.1 | 3 | 1mo ago | |
| WEBQSP (test) | recall-then-verify | F1 (All Questions)0.558 | 2 | 1mo ago | |
| WEBQSP (dev) | recall-then-verify | F1 (All Questions)55.4 | 2 | 1mo ago |