Share your thoughts, 1 month free Claude Pro on usSee more

Causal Variable Identification on HumanEval Exe

71.4F1 (X)

GPT-o4

Updated 5mo ago

Evaluation Results

Method	Links
GPT-o4 2025.05		71.4	69.2	66	70.3
GPT-5 2025.05		69.3	66.9	63.2	67.4
Llama4-M 2025.05		68.9	66.7	63	67.5
Llama4-S 2025.05		67.8	65.7	61.9	66.3
Qwen3 2025.05		63.7	61.9	51.9	59.7
DeepSeek 2025.05		62.1	59.8	53.6	60.5
Gemini2.5 2025.05		59.6	57.3	49.7	57