Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hallucination Evaluation on AMBER Generative Task

67.1Coverage

GPT-4V

36.83644.69352.5560.407Apr 28, 2026May 2, 2026May 6, 2026May 11, 2026May 15, 2026May 19, 2026May 24, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.05
67.130.72.64.6
2026.05
56.119.41.43.4
2026.05
55.722.51.94.1
2026.05
53.845.85.611
2026.05
53.218.51.63.5
2026.05
52.630.43.26.4
2026.05
52.424.52.44.4
2026.05
51.516.61.33.3
2026.05
50.539.14.68.5
2026.05
50.316.113
2026.05
50.237.34.38.8
2026.05
49.727.32.75.6
2026.04
49.223.41.2-
2026.04
48.522.21.3-
2026.05
47.911.60.92.2
2026.05
47.313.41.22.1
2026.05
46.125.12.16.3
2026.05
45.932.14.18.1
2026.04
4531.82.6-
2026.04
44.5312.2-
2026.04
44.431.82.6-
2026.04
42.4312.2-
2026.04
41.536.33.1-
2026.04
40.55.60.4-
2026.04
39.610.40.5-
2026.04
3844.531-