Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HANNA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Story EvaluationHANNA (test)
Pearson Correlation0.6155
16
Length-Constrained Text GenerationHANNA
Win Rate23
10
Text GenerationHANNA (test)
LCTG Error Rate2.58
10
Interactive NavigationHANNA (UNSEEN-ALL)
Success Rate (SR)10,000
7
Interactive NavigationHANNA (SEEN-ENV)
Success Rate10,000
7
Story-level evaluationHANNA
Coherence (RP)0.678
6
Story GenerationHANNA
Win HANNA Score31.62
4
Story GenerationHANNA 1.0 (test)
Overall Score3.59
4
Showing 8 of 8 rows