Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generative Hallucination Evaluation on AMBER-g

2.2CHAIR Score

UE-DPO

1.93.9255.957.975Apr 1, 2026Apr 6, 2026Apr 12, 2026Apr 18, 2026Apr 24, 2026Apr 30, 2026May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.05
2.2-0.914.4
2026.05
2.5-0.912.5
2026.05
2.9-117.4
2026.05
3-116.2
2026.05
3-0.614.7
2026.05
4.4-2.424.5
2026.04
4.521.81.3-
2026.04
4.625.32.3-
2026.04
4.920.41.2-
2026.04
5.127.42.7-
2026.04
5.422.61-
2026.05
5.6-2.727.3
2026.05
5.7-1.422.6
2026.05
6.3-2.125.1
2026.05
6.4-3.230.4
2026.04
6.542.84-
2026.05
6.6-3.432.2
2026.05
6.8-3.331.8
2026.04
7.163.87.6-
2026.04
7.234.23.8-
2026.05
7.4-3.934.3
2026.04
7.535.94.3-
2026.04
7.656.26.6-
2026.04
7.663.57.3-
2026.05
7.7-4.234.7
2026.05
7.7-438.6
2026.05
8-4.344.4
2026.04
8.368.28.1-
2026.05
9.7-5.346.6