Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Causal Judgment on BIG-Bench Hard
Loading...
69.5
Accuracy
GPT-4
14.4632
28.7516
43.04
57.3284
Mar 23, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
GPT-4
2026.03
69.5
DeIllusionLLM
Distillation source=GP...
2026.03
68.98
Qwen2.5-72B
2026.03
42.78
DeIllusionLLM
Distillation source=Qw...
2026.03
41.71
Llama3.3 70B
2026.03
17.11
Mixtral-8x7B
2026.03
16.58
Feedback
Search any
task
Search any
task