Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Causal Judgment on BIG-Bench Hard

69.5Accuracy

GPT-4

14.463228.751643.0457.3284Mar 23, 2026
Updated 25d ago

Evaluation Results

MethodLinks
2026.03
69.5
2026.03
68.98
42.78
2026.03
41.71
17.11
16.58