Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Negotiation performance and belief calibration on CaSiNo native (held-out dialogues)
Loading...
0.947
Accept-F1
70B structured-CoT
0.895
0.9085
0.922
0.9355
May 6, 2026
Accept-F1
Bid Cosine Similarity
Macro Strategy F1
Brier Score
Updated 27d ago
Evaluation Results
Method
Method
Links
Accept-F1
Bid Cosine Similarity
Macro Strategy F1
Brier Score
70B structured-CoT
Model scale=70B, Proto...
2026.05
0.947
0.815
0.16
0.194
Distilled 8B student
Model scale=8B, Protoc...
2026.05
0.908
0.915
0.19
0.114
Bayesian teacher
Protocol=Protocol 3, P...
2026.05
0.897
0.744
0.048
0.085
Feedback
Search any
task
Search any
task