Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

KID-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Knowledge Conflict ResolutionKID-Bench v2
Performance (Difficulty A)97.6
4
Knowledge Conflict ResolutionKID-Bench Category C v2
Accuracy (C-Light)78.1
3
Knowledge CombinationKID-Bench Category B v2
Accuracy68
3
Novel Knowledge RecallKID-Bench Category A v2
Accuracy97.1
3
Showing 4 of 4 rows