Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Concept Alignment on RepE (test)
Loading...
100
HARM
Self Modulation
-4
23
50
77
Jan 2, 2025
HARM
FAIR
HAPPY
FEAR
Updated 3mo ago
Evaluation Results
Method
Method
Links
HARM
FAIR
HAPPY
FEAR
Self Modulation
Target LLM (mt)=Llama3...
2025.01
100
36
8.74
9.28
Self Modulation
Target LLM (mt)=Llama2...
2025.01
96
56
8.52
7.26
L-Cross Modulation
Target LLM (mt)=Llama2...
2025.01
96
64
9.16
8.84
L-Cross Modulation
Target LLM (mt)=Llama2...
2025.01
96
54
8.92
7.96
Self Modulation
Target LLM (mt)=Qwen2,...
2025.01
88
50
7.04
6.26
No Modulation
Target LLM (mt)=Llama2...
2025.01
0
98
5.56
5.74
Feedback
Search any
task
Search any
task