Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multiple-Choice Question Answering on Mechanistic Interpretability Benchmark (MIB) MCQA (standard)
Loading...
0.04
CMD
EAP-IG-inputs
0.0364
0.0607
0.085
0.1093
Feb 10, 2026
CMD
CPR
Updated 3mo ago
Evaluation Results
Method
Method
Links
CMD
CPR
EAP-IG-inputs
Model=OPT-1.3B
2026.02
0.04
0.96
EAP-IG-inputs
Model=Qwen2.5-0.5B
2026.02
0.05
95
EAP-IG-inputs
Model=Llama3.2-1B
2026.02
0.05
95
EAP
Model=OPT-1.3B
2026.02
0.05
0.95
EAP
Model=Qwen2.5-0.5B
2026.02
0.06
94
Circuit Fingerprint
Model=OPT-1.3B
2026.02
0.07
0.93
Circuit Fingerprint
Model=Qwen2.5-0.5B
2026.02
0.09
92
EAP
Model=Llama3.2-1B
2026.02
0.13
0.87
Circuit Fingerprint
Model=Llama3.2-1B
2026.02
0.13
0.87
Feedback
Search any
task
Search any
task