Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multiple-Choice Question Answering on Mechanistic Interpretability Benchmark (MIB) MCQA (standard)

0.04CMD

EAP-IG-inputs

0.03640.06070.0850.1093Feb 10, 2026
Updated 3mo ago

Evaluation Results

MethodLinks
2026.02
0.040.96
2026.02
0.0595
2026.02
0.0595
2026.02
0.050.95
2026.02
0.0694
2026.02
0.070.93
2026.02
0.0992
2026.02
0.130.87
2026.02
0.130.87