| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Biomedical Intelligence Evaluation | BixBench 205 (Evaluation) | Accuracy85.9 | 25 | |
| Automated auditing | BIXBench (Verified-50) | Recall (A)83.3 | 6 | |
| Quantitative reasoning and autonomous analysis | BixBench Human Verified-50 | Accuracy83.33 | 3 | |
| Quantitative reasoning and autonomous analysis | BixBench-Verified-50 Full set | Accuracy90 | 3 |