Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Editing on ZsRE (Reliability, Generalization, MMLU, IFEval, TruthfulQA, ARC-C, GSM8K)

80.5Reliability

CrispEdit

-3.2218.51540.2561.985Feb 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
80.56969.567.950.55576
2026.02
72.860.667.870.253.65274
2026.02
71.162.967.870.253.65274
2026.02
70.160.652.747.746.340.545.5
2026.02
69.559.769.570.151.65475.5
2026.02
57.450.969.567.950.55576
2026.02
48.139.452.747.746.340.545.5
2026.02
46.843.169.34548.74350
2026.02
25.222.169.570.151.65475.5
2026.02
22.717.469.372.551.854.573
2026.02
2016.369.372.551.854.573
2026.02
18.77.267.870.8525671
2026.02
16.615.569.229.650.84239.5
2026.02
9.98.369.34548.74350
2026.02
9.17.467.870.8525671
2026.02
4.4467.364.6564767
2026.02
3.63.568.819.452.840.56.5
2026.02
2.92.169.569.350.75873.5
2026.02
2.11.769.569.350.75873.5
2026.02
1.9269.229.650.84239.5
2026.02
1.30.967.364.6564767
2026.02
0.91.268.819.452.840.56.5
2026.02
0.1022.9051.323.50
2026.02
0.10.122.9051.323.50
2026.02
0022.918.20260
2026.02
0022.918.20260