Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge Retention on WMDP bio (retain)
Loading...
80.3
Rt
Zero-shot
18.628
34.639
50.65
66.661
May 14, 2026
Rt
Updated 19d ago
Evaluation Results
Method
Method
Links
Rt
Zero-shot
Backbone=Qwen-3-8B
2026.05
80.3
Zero-shot
Backbone=Llama-3.1-8B-...
2026.05
76.3
MANSU
Backbone=Qwen-3-8B
2026.05
67.1
LUNAR
Backbone=Qwen-3-8B
2026.05
65.5
LUNAR
Backbone=Llama-3.1-8B-...
2026.05
61.9
Surgical GA
Backbone=Llama-3.1-8B-...
2026.05
56
MANSU
Backbone=Llama-3.1-8B-...
2026.05
52.3
NPO
Backbone=Llama-3.1-8B-...
2026.05
50.3
Surgical GA
Backbone=Qwen-3-8B
2026.05
30.3
NPO
Backbone=Qwen-3-8B
2026.05
30.3
GU+SimNPO
Backbone=Qwen-3-8B
2026.05
27.7
Global GA
Backbone=Llama-3.1-8B-...
2026.05
26
SimNPO
Backbone=Qwen-3-8B
2026.05
25.7
GU+SimNPO
Backbone=Llama-3.1-8B-...
2026.05
24.7
Global GA
Backbone=Qwen-3-8B
2026.05
24.7
SimNPO
Backbone=Llama-3.1-8B-...
2026.05
21
Feedback
Search any
task
Search any
task