Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HANNA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Length-Constrained Text GenerationHANNA
Win Rate23
10
Text GenerationHANNA (test)
LCTG Error Rate2.58
10
Interactive NavigationHANNA (UNSEEN-ALL)
Success Rate (SR)10,000
7
Interactive NavigationHANNA (SEEN-ENV)
Success Rate10,000
7
Story-level evaluationHANNA
Coherence (RP)0.678
6
Showing 5 of 5 rows