Share your thoughts, 1 month free Claude Pro on usSee more

Response Selection on P-Soups Informativeness

78.07Accuracy

ALIGNXPLORE+

Updated 4mo ago

Evaluation Results

Method	Links
ALIGNXPLORE+ 2026.01		78.07
ALIGNXPLORE+ 2026.01		76.57
ALIGNXPLORE 2026.01		76.24
Qwen3-8Bthinking 2026.01		75.08
ALIGNXPLORE 2026.01		74.41
Qwen3-8Bthinking 2026.01		74.08
Qwen3-32Bthinking 2026.01		73.58
Qwen3-32Bthinking 2026.01		73.25
DeepSeek-R1-671B 2026.01		72.59
GPT-OSS-20B 2026.01		69.93
GPT-OSS-20B 2026.01		68.77
DeepSeek-R1-671B 2026.01		66.61
DS-R1-Distill-Qwen-7B 2026.01		58.63
DS-R1-Distill-Qwen-7B 2026.01		56.14
TALLRec 2026.01		51.66
Qwen3-8Bnon-thinking 2026.01		46.84