Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Identifying flawed instruments on GDP → Conflict
Loading...
1
HG Score
IV Co-Scientist (GPT-4o)
-0.04
0.23
0.5
0.77
Feb 8, 2026
HG Score
Critic Score
Updated 4d ago
Evaluation Results
Method
Method
Links
HG Score
Critic Score
IV Co-Scientist (GPT-4o)
Backbone=GPT-4o, Perso...
2026.02
1
0
IV Co-Scientist (o3-mini)
Backbone=o3-mini, Pers...
2026.02
1
0
IV Co-Scientist (QwQ)
Backbone=QwQ, Persona=...
2026.02
1
0
IV Co-Scientist (Llama3.1 70B)
Backbone=Llama3.1 70B,...
2026.02
1
0
IV Co-Scientist (Llama3.1 8B)
Backbone=Llama3.1 8B,...
2026.02
0
1
Feedback
Search any
task
Search any
task