Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mislabel detection on Weak Reference Labels Mislabels + subjective
Loading...
76.2
AP
4PL Δℓi (proposed)
56.544
61.647
66.75
71.853
May 28, 2026
AP
P@437
P@874
Updated 2d ago
Evaluation Results
Method
Method
Links
AP
P@437
P@874
4PL Δℓi (proposed)
Criterion=Forced-ceili...
2026.05
76.2
92.7
81
top-10 disagreement
2026.05
73.8
90.8
78.6
XGBoost (4PL params)
Model=XGBoost, Feature...
2026.05
70.3
88.6
74
plain 2PL (low ai)
Model=2PL, Selection=l...
2026.05
70
78.7
70.9
4PL, low di
Model=4PL, Selection=l...
2026.05
69.8
88.3
73.8
plain 4PL, single-stage (low di)
Model=4PL, Protocol=si...
2026.05
69.8
91.5
74
GLAD
2026.05
66.1
85.1
67.7
low ri
Selection=low response...
2026.05
62.6
87.6
67.2
overall disagreement
2026.05
57.3
81.7
66.2
Feedback
Search any
task
Search any
task