Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Conversation on PCogAlignBench (LS2)

0.852LLM Judge Score

GRPO (Latent Action)

0.668960.716480.7640.81152Jan 12, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
0.852
2026.01
0.852
2026.01
0.851
2026.01
0.851
2026.01
0.85
2026.01
0.845
2026.01
0.84
2026.01
0.839
2026.01
0.837
2026.01
0.836
2026.01
0.836
2026.01
0.836
2026.01
0.835
2026.01
0.834
2026.01
0.828
2026.01
0.828
2026.01
0.81
2026.01
0.799
2026.01
0.71
2026.01
0.676