Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Counterfact

Benchmarks

Task NameDataset NameSOTA ResultTrend
Knowledge EditingCounterfact
Efficacy9,387
301
Sequential model editingCounterfact
Efficacy99.77
61
Subject inference attackCounterFact batch-edit tasks
Recall100
36
Lifelong Model EditingCounterFact
Efficacy73.06
33
Sequential Knowledge EditingCounterFact sequential editing 10,000 Samples
Efficacy Success99.5
33
Knowledge EditingCounterfact uns
Edit Success Rate94.56
30
Model EditingCounterFact
Reliability92
30
Knowledge EditingCounterfact 10,000 facts
Relational Score10,000
27
Knowledge Model EditingCounterFact
Efficacy64.85
26
Model EditingCounterFact
Reliability79.4
26
Model EditingCounterFact
Efficacy96.12
24
Classification ProbingCounterfact (test)
Probe Acc (Best Layer)89.6
21
Knowledge EditingCounterfact Full (test)
Rel. Accuracy99
21
Sequential Knowledge EditingCounterFact larger
Efficacy98.97
14
Sequential Knowledge EditingCounterFact top
Efficacy93.87
14
Lifelong Knowledge EditingCOUNTERFACT
Reliability67.1
14
Model EditingCounterFact 3,000 samples (test)
Reliability9,980
13
Knowledge EditingCounterfact (test)
RwA99.86
12
Prompt recovery attackCounterFact
Top-1 Accuracy60
12
Model EditingCounterFact
Efficacy98.1
10
Sequential Model EditingCounterFact full (10K sequential edits) (test)
Efficacy94.45
10
Sequential Knowledge EditingCounterFact
Efficacy100
10
Prompt recovery attackCounterFact (test)
Top-1 Accuracy54
9
Model EditingCOUNTERFACT 7,500-record GPT-2 XL (test)
Score89.2
9
Model EditingCOUNTERFACT
Efficacy99.75
8
Showing 25 of 43 rows