Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

20Q

Benchmarks

Task NameDataset NameSOTA ResultTrend
Confidence Estimation20Q
Accuracy33.87
20
Multi-step interaction20Q
Winrate32.1
15
20 Questions20Q Breeds
Worst Case Interaction Length6.6
8
20 Questions20Q S128
Worst Case Interaction Length10.8
8
20 Questions20Q Common
Worst Case Interaction Length10
8
Information Seeking20Q Common weighted (test)
Worst-case Weighted Payoff235.7
8
Event Plausibility Prediction20Q (test)
AUC0.74
6
Showing 7 of 7 rows