| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Code-Preference | M' (F1 Mean)77 | 7 | 4d ago | ||
| Open-Critic | M' (F1 Mean)75.3 | 7 | 4d ago | ||
| HumanEval Exe | M' (F1 Mean)75.7 | 7 | 4d ago | ||
| MalAlgoQA | M' (F1 Mean)85.1 | 7 | 4d ago | ||
| Arithmetic | M' F1 Mean87.8 | 7 | 4d ago | ||
| COCO | M' (F1 Mean)77.8 | 7 | 4d ago | ||
| CVQA Count | F1 Mean (M')0.792 | 7 | 4d ago | ||
| CVQA-Bool | M' (F1 Score)81.2 | 7 | 4d ago | ||
| RNN-Topo | M' F1 Mean88.9 | 7 | 4d ago | ||
| CLOMO | M' F1 Mean90.2 | 7 | 4d ago | ||
| CRASS | M' (F1 Mean)92.1 | 7 | 4d ago |