Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge Retention on WMDP cyber (retain)
Loading...
54.1
Rt
MANSU
21.756
30.153
38.55
46.947
May 14, 2026
Rt
Updated 19d ago
Evaluation Results
Method
Method
Links
Rt
MANSU
Backbone=Qwen-3-8B
2026.05
54.1
Zero-shot
Backbone=Qwen-3-8B
2026.05
53.7
LUNAR
Backbone=Qwen-3-8B
2026.05
49.8
Zero-shot
Backbone=Llama-3.1-8B-...
2026.05
47.7
Surgical GA
Backbone=Qwen-3-8B
2026.05
47.3
NPO
Backbone=Qwen-3-8B
2026.05
43.7
LUNAR
Backbone=Llama-3.1-8B-...
2026.05
42.8
NPO
Backbone=Llama-3.1-8B-...
2026.05
40
MANSU
Backbone=Llama-3.1-8B-...
2026.05
39.1
Global GA
Backbone=Llama-3.1-8B-...
2026.05
39
Global GA
Backbone=Qwen-3-8B
2026.05
37
Surgical GA
Backbone=Llama-3.1-8B-...
2026.05
36.7
SimNPO
Backbone=Qwen-3-8B
2026.05
28
SimNPO
Backbone=Llama-3.1-8B-...
2026.05
27.3
GU+SimNPO
Backbone=Qwen-3-8B
2026.05
26.7
GU+SimNPO
Backbone=Llama-3.1-8B-...
2026.05
23
Feedback
Search any
task
Search any
task