Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Object Hallucination Probing on GQA POPE Random

89.93Accuracy (GQA POPE)

Scalpel

57.6966.0674.4382.8Dec 29, 2025Jan 14, 2026Jan 31, 2026Feb 17, 2026Mar 6, 2026Mar 23, 2026Apr 9, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.02
89.93---------89.87--
2026.04
89.5---------89.459089
2026.02
89.3---------89.49--
2026.04
89.26---------88.7892.9385
2025.12
89.03---------89.0681.1-
2026.04
89---------88.6391.6685.8
2026.04
88.37---------88.0290.7385.47
2026.02
88.13---------88.91--
2026.04
87.97---------86.6797.1178.27
2026.04
87.7---------8885.990.2
2025.12
87.14---------86.3294.65-
2025.12
87.09---------87.9680.46-
2025.12
86.78---------86.3987.06-
2026.02
86.65---------86.99--
2026.04
86.65---------86.9984.8589.24
2026.04
86.2---------84.6295.5575.93
2026.04
86.11---------86.1196.8777.53
2025.12
86.1---------87.3180.3-
2025.12
86.1---------84.8193.78-
2025.12
85.95---------85.0894.22-
2026.04
85.9---------85.2989.181.8
2025.12
85.69---------84.6793.11-
2026.04
85.59---------85.3386.8883.84
2025.12
85.4---------85.1285.64-
2026.04
85.33---------86.1295.9578.13
2025.12
85.21---------84.2192.05-
2026.04
84.96---------86.2579.4494.33
2025.12
84.9---------83.9689.51-
2025.12
84.87---------85.3982.52-
2026.04
84.33---------81.9496.7371.07
2025.12
84.2---------85.7778-
2026.02
83.73---------82.95--
2025.12
83.23---------85.0576.73-
2025.12
83.07---------83.8780.06-
2025.12
82.83---------83.5680.16-
2025.12
82.23---------84.0376.32-
2025.12
79.67---------80.9976.05-
2026.04
74.76---------66.698.4350.33
2026.04
74.73---------65.5598.4350.26
2026.01
59.7---------56.4361.4152.2
2026.01
58.97---------55.3560.750.87
2026.01
58.93---------54.4461.1349.07
2025.06
-89.4777.0363.8370.2374.8768.07--70.81---
2025.06
--7868.3371.4773.6370.2--72.33---
2025.06
-84.2785.1383.5782.5382.67---83.63---
2025.06
-87.2382.5373.0374--65.174.373.95---
2025.06
--83.6376.173.43--55.580.273.77---
2025.06
-85.1779.9381.962.03--80.4-77.89---