Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Identifying flawed instruments on Protests → Prices
Loading...
1
HG Score
IV Co-Scientist (GPT-4o)
0.95
0.975
1
1.025
Feb 8, 2026
HG Score
Critic Score
Updated 4d ago
Evaluation Results
Method
Method
Links
HG Score
Critic Score
IV Co-Scientist (GPT-4o)
Backbone=GPT-4o
2026.02
1
0
IV Co-Scientist (o3-mini)
Backbone=o3-mini
2026.02
1
1
IV Co-Scientist (QwQ)
Backbone=QwQ
2026.02
1
1
IV Co-Scientist (Llama3.1 8B)
Backbone=Llama3.1 8B
2026.02
1
0
IV Co-Scientist (Llama3.1 70B)
Backbone=Llama3.1 70B
2026.02
1
0
Feedback
Search any
task
Search any
task