Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning Quality Evaluation on 3-player Leduc Hold'em (test)
Loading...
193
Hit Rate (HR)
Qwen2.5-7B_ToolPoker
89
116
143
170
Jan 31, 2026
Hit Rate (HR)
False Alarm Rate (FA)
Accuracy (AC)
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Hit Rate (HR)
False Alarm Rate (FA)
Accuracy (AC)
Average Score
Qwen2.5-7B_ToolPoker
Backbone=Qwen2.5-7B, F...
2026.01
193
190
1.88
1.9
GPT-4.1-mini
Model type=Baseline
2026.01
100
175
1.83
1.52
Qwen2.5-7B
Model type=Vanilla
2026.01
93
88
1.6
1.14
Feedback
Search any
task
Search any
task