Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Recovering canonical instrumental variables on Housing → Crime
Loading...
0.75
EM
IV Co-Scientist
0.3444
0.4497
0.555
0.6603
Feb 8, 2026
EM
CM
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
CM
IV Co-Scientist
Backbone=GPT-4o
2026.02
0.75
0.83
IV Co-Scientist
Backbone=Llama3.1 70B
2026.02
0.59
0.75
IV Co-Scientist
Backbone=QwQ
2026.02
0.39
0.75
IV Co-Scientist
Backbone=o3-mini
2026.02
0.37
0.53
IV Co-Scientist
Backbone=Llama3.1 8B
2026.02
0.36
0.49
Feedback
Search any
task
Search any
task