Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Model Editing on CounterFact (Efficacy, Paraphrase, Specificity, Avg.)
Loading...
98.1
Efficacy
FT
43.084
57.367
71.65
85.933
Oct 9, 2025
Efficacy
Paraphrase
Specificity
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Efficacy
Paraphrase
Specificity
Average Score
FT
Base Model=GPT-J
2025.10
98.1
69.4
82.7
1.59
FT
Base Model=Qwen3-8B
2025.10
97.9
64.2
77.4
1.04
ACE
Base Model=Qwen3-8B
2025.10
91.2
80.7
74.6
54.27
ACE
Base Model=GPT-J
2025.10
89.7
83.6
70.6
43.58
PMET
Base Model=GPT-J
2025.10
74.6
63.2
64.1
31.07
PMET
Base Model=Qwen3-8B
2025.10
70.7
61.7
51.9
17.26
MEMIT
Base Model=GPT-J
2025.10
57
51.9
66.2
30.09
ROME
Base Model=GPT-J
2025.10
54.1
54.3
61.4
27.48
MEMIT
Base Model=Qwen3-8B
2025.10
50.3
53.6
66.4
10.27
ROME
Base Model=Qwen3-8B
2025.10
45.2
42.9
53.7
24.08
Feedback
Search any
task
Search any
task