Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WikiBigEdit

Benchmarks

Task NameDataset NameSOTA ResultTrend
Model EditingWikiBigEdit
Efficacy79.6
49
Model EditingWikiBigEdit
MMLU69.5
34
Hallucination CorrectionWikiBigEdit
Error Rate (ERR)1
24
Model EditingWikiBigEdit 3,000 samples (test)
Reliability99.9
13
Showing 4 of 4 rows