Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Label Error Detection on AlleNoise 1,000-item balanced subset
Loading...
100
Precision
Adjudicator
64.64
73.82
83
92.18
Dec 5, 2025
Precision
Recall
F1-Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Precision
Recall
F1-Score
Adjudicator
Knowledge Graph=Full
2025.12
100
98
99
Adjudicator
Knowledge Graph=None
2025.12
85
45
59
Single LLM
2025.12
66
38
48
Feedback
Search any
task
Search any
task