| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Finance Domain | TimeSeriesExamAgent | Specificity8.29 | 3 | 1mo ago | |
| Security reasoning tasks | Functional Efficacy Score35.6 | 3 | 3mo ago | ||
| Science reasoning tasks | Param & Constraint Acc38.5 | 3 | 3mo ago | ||
| Data reasoning tasks | Correctness37.9 | 3 | 3mo ago | ||
| Software reasoning tasks | Functional Correctness35.2 | 3 | 3mo ago | ||
| Medicine Domain | TimeSeriesExamAgent | Specificity8.43 | 2 | 1mo ago |