Share your thoughts, 1 month free Claude Pro on usSee more

Response Selection on P-Soups Style

0.88Accuracy

Qwen3-32Bthinking

Updated 4mo ago

Evaluation Results

Method	Links
Qwen3-32Bthinking 2026.01		0.88
Qwen3-8Bthinking 2026.01		0.875
ALIGNXPLORE+ 2026.01		0.8633
GPT-OSS-20B 2026.01		0.86
DeepSeek-R1-671B 2026.01		0.8566
DeepSeek-R1-671B 2026.01		0.85
Qwen3-8Bthinking 2026.01		0.85
Qwen3-32Bthinking 2026.01		0.8366
GPT-OSS-20B 2026.01		0.83
ALIGNXPLORE+ 2026.01		0.8033
ALIGNXPLORE 2026.01		0.78
ALIGNXPLORE 2026.01		0.7483
TALLRec 2026.01		0.7016
DS-R1-Distill-Qwen-7B 2026.01		0.6583
DS-R1-Distill-Qwen-7B 2026.01		0.6083
Qwen3-8Bnon-thinking 2026.01		0.4233