| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Adversarial Attack | Kaggle | Average Cumulative Reward0.96 | 32 | |
| Personality Detection | Kaggle | I/E Score90.12 | 13 | |
| 16-types multiclass personality classification | Kaggle | F1 Score (%)41.34 | 10 | |
| 4-dimensional binary personality classification | Kaggle | Macro F180.57 | 10 | |
| Weather Type Classification | Kaggle (test) | F1 Score98 | 6 | |
| Sudoku solving | Kaggle Unfiltered (generalization) | Accuracy99.9 | 6 | |
| Model and Hyperparameter Selection | Kaggle private (test) | p-rank80.56 | 6 | |
| Omission Detection | Kaggle Dataset | Accuracy65.2 | 4 | |
| Hallucination Detection | Kaggle Dataset | Accuracy85.4 | 4 |