Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AXBENCH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Concept-based SteeringAXBENCH (test)
Overall Steering Score1.102
28
Concept SteeringAxBench (Held-in)
HMean1.185
25
LLM-judge evaluationAXBENCH
Concept Score92.5
22
LLM SteeringAxBench
Steering Score0.74
18
Activation SteeringAxBench Gemma-2-2B layer 20
Steering Score0.871
18
Activation SteeringAxBench Gemma-2-9B layer 20
Steering Score1.12
17
Concept SteeringAXBENCH D_L20^G9B
Steering Score1.079
12
Latent Concept DetectionAxBench full 500 concepts
Mean AUROC96.5
9
Concept SteeringAXBENCH D_L10^G2B
Steering Score0.803
9
Concept SteeringAXBENCH D_L32^Q32B
Steering Score1.102
7
Concept SteeringAxBench (Held-out)
HMean1.113
6
Showing 11 of 11 rows