| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Story Evaluation | HANNA (test) | Pearson Correlation0.6155 | 16 | |
| Length-Constrained Text Generation | HANNA | Win Rate23 | 10 | |
| Text Generation | HANNA (test) | LCTG Error Rate2.58 | 10 | |
| Interactive Navigation | HANNA (UNSEEN-ALL) | Success Rate (SR)10,000 | 7 | |
| Interactive Navigation | HANNA (SEEN-ENV) | Success Rate10,000 | 7 | |
| Story-level evaluation | HANNA | Coherence (RP)0.678 | 6 | |
| Story Generation | HANNA | Win HANNA Score31.62 | 4 | |
| Story Generation | HANNA 1.0 (test) | Overall Score3.59 | 4 |