Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multimodal Conversation on PCogAlignBench (LS2)

0.852LLM Judge Score

GRPO (Latent Action)

0.668960.716480.7640.81152Jan 12, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
0.852
2026.01
0.852
2026.01
0.851
2026.01
0.851
2026.01
0.85
2026.01
0.845
2026.01
0.84
2026.01
0.839
2026.01
0.837
2026.01
0.836
2026.01
0.836
2026.01
0.836
2026.01
0.835
2026.01
0.834
2026.01
0.828
2026.01
0.828
2026.01
0.81
2026.01
0.799
2026.01
0.71
2026.01
0.676