Share your thoughts, 1 month free Claude Pro on usSee more

Negotiation vs GPT-5.4 High Reasoning Seller on Standard Held-Out Test Set

0.4081Reward

Qwen3-30B-A3B-Instruct-2507-trained

Updated 3mo ago

Evaluation Results

Method	Links
Qwen3-30B-A3B-Instruct-2507-trained 2026.04		0.4081	75	40.81	0
Qwen3-30B-A3B-Instruct-2507-untrained 2026.04		0.2744	60.5	3.89	32.4
gpt-5.4-high-reasoning 2026.04		0.1823	91.4	18.23	0
gpt-5.4-no-reasoning 2026.04		0.1458	84.8	16.14	1.6
DeepSeek-V3.1-thinking 2026.04		0.1223	92.6	13.02	0.8
DeepSeek-V3.1-nothink 2026.04		0.1204	90.6	14.77	2.7
Kimi-K2-Thinking 2026.04		0.106	90.6	12.56	1.2