Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Concept Identifiability on Refusal
Loading...
0.9
MCC
Linear Probe
0.555344
0.644822
0.7343
0.823778
Feb 14, 2025
MCC
Updated 1mo ago
Evaluation Results
Method
Method
Links
MCC
Linear Probe
Model Backbone=Gemma-2-2B
2025.02
0.9
SSAE
Model Backbone=Gemma-2-2B
2025.02
0.6233
GemmaScope
Model Backbone=Gemma-2-2B
2025.02
0.6119
ReLU-SAE
Model Backbone=Gemma-2-2B
2025.02
0.6116
JumpReLU SAE
Model Backbone=Gemma-2-2B
2025.02
0.5699
TopK-SAE
Model Backbone=Gemma-2-2B
2025.02
0.5686
Feedback
Search any
task
Search any
task