Share your thoughts, 1 month free Claude Pro on usSee more

Response Selection on P-Soups Expertise

83.66Accuracy

Qwen3-32Bthinking

Updated 5mo ago

Evaluation Results

Method	Links
Qwen3-32Bthinking 2026.01		83.66
Qwen3-8Bthinking 2026.01		83.5
Qwen3-8Bthinking 2026.01		82.66
ALIGNXPLORE+ 2026.01		82.5
DeepSeek-R1-671B 2026.01		82.33
Qwen3-32Bthinking 2026.01		81.67
GPT-OSS-20B 2026.01		81.66
DeepSeek-R1-671B 2026.01		79
ALIGNXPLORE+ 2026.01		78.5
GPT-OSS-20B 2026.01		77.5
ALIGNXPLORE 2026.01		72.66
ALIGNXPLORE 2026.01		69.16
DS-R1-Distill-Qwen-7B 2026.01		66
DS-R1-Distill-Qwen-7B 2026.01		64.16
TALLRec 2026.01		60.16
Qwen3-8Bnon-thinking 2026.01		38.33