Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Instrumental Variable Discovery on Gapminder GDP → Health
Loading...
14.28
Relevance
IV Co-Scientist
12.5848
13.0249
13.465
13.9051
Feb 8, 2026
Relevance
Cnorm
Updated 4d ago
Evaluation Results
Method
Method
Links
Relevance
Cnorm
IV Co-Scientist
Backbone LLM=GPT-4o
2026.02
14.28
51.5
IV Co-Scientist
Backbone LLM=o3-mini
2026.02
14.28
51.5
IV Co-Scientist
Backbone LLM=QwQ
2026.02
13.1
54.1
IV Co-Scientist
Backbone LLM=Llama3.1 8b
2026.02
13.1
54.1
IV Co-Scientist
Backbone LLM=Llama3.1 70b
2026.02
12.65
51.3
Feedback
Search any
task
Search any
task