Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WikiBigEdit

Benchmarks

Task NameDataset NameSOTA ResultTrend
Lifelong Model EditingWikiBigEdit
Efficacy81.22
63
Model EditingWikiBigEdit
Efficacy79.6
49
Model EditingWikiBigEdit
MMLU69.5
34
Hallucination CorrectionWikiBigEdit
Error Rate (ERR)1
24
Model EditingWikiBigEdit 3,000 samples (test)
Reliability99.9
13
Sequential Model EditingWikiBigEdit (500K edits)
Efficacy74.54
12
Model EditingWikiBigEdit 500K edits
Efficacy74.54
9
Sequential Knowledge EditingWikiBigEdit
Reliability99.2
6
Showing 8 of 8 rows