| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Next-action prediction | OPeRA (test) | Action Generation Acc52.92 | 31 | |
| Lung function regression | OPERA (test) | FVC MAE (Breath)0.848 | 7 | |
| Health condition inference | OPERA Obstructive (Lung) | AUROC75.2 | 7 | |
| Health condition inference | OPERA Smoker Cough | AUROC0.83 | 7 | |
| Reasoning and Persona Consistency | OPeRA (test) | Pages per Session5.3 | 7 | |
| Human-likeness evaluation | OPeRA | Human-likeness Score4.21 | 5 | |
| Autonomous LLM Agent Verification | OPERA | Mean Td (s)11.9 | 3 |