Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Troubleshooting on TS FloDial weighted (test)
Loading...
113.6
Worst-case Weighted Payoff
DC
59.312
73.406
87.5
101.594
Feb 2, 2026
Worst-case Weighted Payoff
Updated 4d ago
Evaluation Results
Method
Method
Links
Worst-case Weighted Payoff
DC
Backbone LLM=Qwen 2.5...
2026.02
113.6
DC
Backbone LLM=GPT 4.1
2026.02
97.4
DP
Backbone LLM=GPT 4.1
2026.02
90.1
UoT
Backbone LLM=GPT 4.1
2026.02
81
DP
Backbone LLM=Qwen 2.5...
2026.02
80
UoT
Backbone LLM=Qwen 2.5...
2026.02
74
GoT
Backbone LLM=Qwen 2.5...
2026.02
62.3
GoT
Backbone LLM=GPT 4.1
2026.02
61.4
Feedback
Search any
task
Search any
task