Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Subjective Evaluation on WildBench

0.8604Score

STEP3-VL-10B

0.195320.3679850.540650.713315Jan 14, 2026Jan 28, 2026Feb 11, 2026Feb 25, 2026Mar 11, 2026Mar 25, 2026Apr 9, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.01
0.8604
2026.01
0.7236
2026.01
0.6309
2026.01
0.5645
2026.04
0.5589
2026.04
0.5573
2026.04
0.5529
2026.04
0.5521
2026.04
0.551
2026.04
0.5473
2026.04
0.4508
2026.04
0.4485
2026.04
0.436
2026.04
0.4247
2026.04
0.4177
2026.04
0.4172
2026.01
0.3404
2026.04
0.2405
2026.04
0.2209