Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Response Selection on P-Soups Style

0.88Accuracy

Qwen3-32Bthinking

0.4050320.5283410.651650.774959Jan 8, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
0.88
2026.01
0.875
2026.01
0.8633
2026.01
0.86
2026.01
0.8566
2026.01
0.85
2026.01
0.85
2026.01
0.8366
2026.01
0.83
2026.01
0.8033
2026.01
0.78
2026.01
0.7483
2026.01
0.7016
2026.01
0.6583
2026.01
0.6083
2026.01
0.4233