Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human Pairwise Comparison on Gaming Content N=60 samples (Unfamiliar Games)
Loading...
86.7
Win %
MeepleLM
3.396
25.023
46.65
68.277
Jan 12, 2026
Win %
Tie %
GPT-5.1 Win %
Updated 4d ago
Evaluation Results
Method
Method
Links
Win %
Tie %
GPT-5.1 Win %
MeepleLM
Metric=Marketing vs. R...
2026.01
86.7
6.7
6.6
MeepleLM
Metric=Final Choice (T...
2026.01
73.3
10
16.7
MeepleLM
Metric=Risk Awareness
2026.01
70
16.7
13.3
MeepleLM
Metric=Decision Confid...
2026.01
66.7
20
13.3
GPT-5.1
Metric=Final Choice (T...
2026.01
16.7
10
73.3
GPT-5.1
Metric=Decision Confid...
2026.01
13.3
20
66.7
GPT-5.1
Metric=Risk Awareness
2026.01
13.3
16.7
70
GPT-5.1
Metric=Marketing vs. R...
2026.01
6.6
6.7
86.7
Feedback
Search any
task
Search any
task