Share your thoughts, 1 month free Claude Pro on usSee more

Next-state prediction on SciWorld

98.64EM Accuracy

Llama3.1-8B

Updated 4mo ago

Evaluation Results

Method	Links
Llama3.1-8B 2025.12		98.64
Qwen2.5-7B 2025.12		98.6
Claude-sonnet-4.5 2025.12		73.08
Gemini-2.5-flash 2025.12		61.2
Claude-sonnet-4.5 2025.12		56.83
GPT-4o-mini 2025.12		56.26
GPT-4.1 2025.12		51.56
GPT-4-turbo 2025.12		50.08
GPT-5 2025.12		49.44
GPT-4o 2025.12		48.98
GPT-4o 2025.12		45.78
Gemini-2.5-flash 2025.12		44.81
GPT-4o-mini 2025.12		40.68
GPT-4.1 2025.12		35.65
GPT-4-turbo 2025.12		34.14
GPT-5 2025.12		13.06