| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Scientific Reasoning | Materials | Average Score @16 (1h)78 | 16 | |
| Question Answering | Materials held-out (test) | BERTScore71.36 | 6 | |
| Outlier Detection | Materials | Precision87.5 | 3 | |
| Cross-domain generalization | Materials (test) | Accuracy99.3 | 3 | |
| Scientific Image Analysis | Materials (test) | Improvement (%)4.3 | 2 |