Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hallucination Evaluation on HaluEval (Avg. Metric)

100Accuracy (ACC)

Multi-DPOP

8790.37593.7597.125Oct 10, 2025Nov 14, 2025Dec 19, 2025Jan 23, 2026Feb 27, 2026Apr 3, 2026May 8, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2025.10
100------
2025.10
100------
2025.10
99.5------
2025.10
99.5------
2025.10
99.5------
2025.10
99.5------
2026.05
99-85.6796.9289.384.4-
2025.10
99------
2025.10
99------
98.67-89.297.3488.8589.83-
2025.10
98.5------
2025.10
98.5------
98.33-88.6797.9589.1889.68-
2026.05
98.33-90.798.0891.2790.85-
2025.10
98.1------
2025.10
98.1------
98-88.0797.5289.1787.15-
98-87.5897.3589.2587.72-
2025.10
97.5------
2025.10
97.5------
2025.10
97.3------
2025.10
97.2------
2025.10
97.2------
2025.10
96.8------
96.67-85.1393.988.1587.3-
2026.05
96.67-85.4494.187.7587.92-
96.33-85.9294.1887.383.85-
2025.10
96.2------
2025.10
96.2------
2025.10
96.1------
2025.10
95.8------
2025.10
95.8------
2025.10
95.4------
2025.10
94.6------
2025.10
94.5------
2025.10
93.2------
2026.05
93-88.3794.8787.1489.45-
2025.10
92.8------
2025.10
92.1------
2025.10
91.4------
2025.10
91.2------
2025.10
91.2------
91-88.4393.1586.8889.85-
2025.10
90------
2025.10
90------
2025.10
89.1------
2025.10
89.1------
2025.10
88.3------
2025.10
88.3------
2025.10
87.5------
2025.10
87.5------
2026.04
-19.1-----
2026.04
-16.7-----
2026.04
-20.1-----
2026.04
-22.2-----
2026.04
-23.5-----
2026.04
-10.7-----
2026.05
------0
2026.05
------15.3
2026.05
------15.5
2026.05
------7.6
2026.05
------0.4
2026.05
------0.6
2026.05
------39.9
2026.05
------14.5
2026.05
------1.6
2026.05
------0.6