Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Generation on Eval-400 In-house (test)
Loading...
66
Correctness
Gemini-3-Pro
35.84
43.67
51.5
59.33
Apr 9, 2026
Correctness
Precision (P0)
Precision (P1)
Precision (P2)
Updated 9d ago
Evaluation Results
Method
Method
Links
Correctness
Precision (P0)
Precision (P1)
Precision (P2)
Gemini-3-Pro
2026.04
66
21
12
1
Qwen3-VL-235B-A22B (SFT+DPO)
Training=SFT+DPO
2026.04
66
23
10.4
0.6
Qwen3-VL-32B (SFT+DPO)
Training=SFT+DPO
2026.04
57
29
13.25
0.75
GPT-4.1
2026.04
48.75
42.25
8.25
0.75
Qwen3-VL-235B-A22B
Mode=base
2026.04
41.75
47.75
8.5
2
GLM-4.5V (106B-A12B)
2026.04
38.6
51.2
9.2
1
Qwen3-VL-32B
Mode=base
2026.04
37
51.5
11
0.5
Feedback
Search any
task
Search any
task