| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Agent Safety Judgment | ROME Shortcut Decision-Making 1.0 | F1-Score94.53 | 24 | |
| Agent Safety Judgment | ROME Contextual Ambiguity 1.0 | F1 Score91.92 | 24 | |
| Agent Safety Judgment | ROME Implicit Risks 1.0 (IR) | F1-Score86.81 | 24 | |
| Agent Safety Judgment | ROME Original 1.0 (Seed) | F1-Score94.53 | 24 | |
| Agent Safety Judgment | ROME IR unsafe | F1 Score95.74 | 4 | |
| Agent Safety Judgment | ROME CA unsafe subset | F1 Score97.35 | 4 | |
| Agent Safety Judgment | ROME SDM unsafe | F1 Score100 | 4 | |
| Land Surface Temperature Estimation | Rome | RMSE2.088 | 2 |