Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge Retention on MUSE utility
Loading...
18.4
Util Score
LUNAR
-0.736
4.232
9.2
14.168
May 14, 2026
Util Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Util Score
LUNAR
Backbone=Llama-3.1-8B-...
2026.05
18.4
LUNAR
Backbone=Qwen-3-8B
2026.05
15.9
Surgical GA
Backbone=Qwen-3-8B
2026.05
2
MANSU
Backbone=Qwen-3-8B
2026.05
1.9
NPO
Backbone=Llama-3.1-8B-...
2026.05
1.3
NPO
Backbone=Qwen-3-8B
2026.05
1.1
Global GA
Backbone=Qwen-3-8B
2026.05
0.7
MANSU
Backbone=Llama-3.1-8B-...
2026.05
0.6
GU+SimNPO
Backbone=Qwen-3-8B
2026.05
0.5
Global GA
Backbone=Llama-3.1-8B-...
2026.05
0
Surgical GA
Backbone=Llama-3.1-8B-...
2026.05
0
SimNPO
Backbone=Llama-3.1-8B-...
2026.05
0
GU+SimNPO
Backbone=Llama-3.1-8B-...
2026.05
0
SimNPO
Backbone=Qwen-3-8B
2026.05
0
Feedback
Search any
task
Search any
task