Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visual Programming and Reasoning on HumanEval_V

28.5Accuracy

Kimi-K2.5

0.1087.47914.8522.221May 21, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.05
28.5
2026.05
28.1
2026.05
27.7
2026.05
25.3
2026.05
23.7
2026.05
23.3
2026.05
21.7
2026.05
19.4
2026.05
13.3
2026.05
5.5
2026.05
4
2026.05
4
2026.05
4
2026.05
4
2026.05
2.8
2026.05
2.7
2026.05
2.4
2026.05
1.6
2026.05
1.6
2026.05
1.6
2026.05
1.2
2026.05
1.2