| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Code-Preference | F1 (X)80.2 | 7 | 4d ago | ||
| Open-Critic | F1 (X)71.7 | 7 | 4d ago | ||
| HumanEval Exe | F1 (X)71.4 | 7 | 4d ago | ||
| MalAlgoQA | F1 (X)84.1 | 7 | 4d ago | ||
| Arithmetic | F1 (X)88.2 | 7 | 4d ago | ||
| COCO | F1 (X)73.6 | 7 | 4d ago | ||
| CVQA Count | F1 (X)74.7 | 7 | 4d ago | ||
| CVQA Bool | F1 (X)79.8 | 7 | 4d ago | ||
| RNN-Topo | F1 (X)87.9 | 7 | 4d ago | ||
| CLOMO | F1 (X)89.8 | 7 | 4d ago | ||
| CRASS | F1 (X)92.3 | 7 | 4d ago |