Share your thoughts, 1 month free Claude Pro on usSee more

Instruction Generation on Eval-400 In-house (test)

66Correctness

Gemini-3-Pro

Updated 3mo ago

Evaluation Results

Method	Links
Gemini-3-Pro 2026.04		66	21	12	1
Qwen3-VL-235B-A22B (SFT+DPO) 2026.04		66	23	10.4	0.6
Qwen3-VL-32B (SFT+DPO) 2026.04		57	29	13.25	0.75
GPT-4.1 2026.04		48.75	42.25	8.25	0.75
Qwen3-VL-235B-A22B 2026.04		41.75	47.75	8.5	2
GLM-4.5V (106B-A12B) 2026.04		38.6	51.2	9.2	1
Qwen3-VL-32B 2026.04		37	51.5	11	0.5