Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Information Seeking on 20Q Common weighted (test)
Loading...
235.7
Worst-case Weighted Payoff
DP
148.548
171.174
193.8
216.426
Feb 2, 2026
Worst-case Weighted Payoff
Updated 4d ago
Evaluation Results
Method
Method
Links
Worst-case Weighted Payoff
DP
Backbone LLM=Qwen 2.5...
2026.02
235.7
UoT
Backbone LLM=Qwen 2.5...
2026.02
228.5
UoT
Backbone LLM=GPT 4.1
2026.02
227.4
DC
Backbone LLM=Qwen 2.5...
2026.02
225.1
DP
Backbone LLM=GPT 4.1
2026.02
224
DC
Backbone LLM=GPT 4.1
2026.02
199.2
GoT
Backbone LLM=GPT 4.1
2026.02
152.1
GoT
Backbone LLM=Qwen 2.5...
2026.02
151.9
Feedback
Search any
task
Search any
task