Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multimodal Conversation on PCogAlignBench LS1

0.903LLM Judge Score

DAPO (Latent Action)

0.6690.729750.79050.85125Jan 12, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
0.903
2026.01
0.901
2026.01
0.898
2026.01
0.897
2026.01
0.879
2026.01
0.874
2026.01
0.872
2026.01
0.871
2026.01
0.87
2026.01
0.854
2026.01
0.85
2026.01
0.849
2026.01
0.845
2026.01
0.844
2026.01
0.835
2026.01
0.835
2026.01
0.808
2026.01
0.808
2026.01
0.721
2026.01
0.678