Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Output-based feature description faithfulness on GPT2 MLP SAE

40.9Faithfulness Score

EnsembleR (VP+TC)

33.72435.58737.4539.313Jan 14, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.01
40.9
2025.01
40.3
2025.01
38.3
2025.01
38.1
2025.01
37.2
2025.01
37.1
2025.01
36.5
2025.01
34