Share your thoughts, 1 month free Claude Pro on usSee more

Next-state prediction on TextWorld (TW)

70.6EM Accuracy

Qwen2.5-7B

Updated 4mo ago

Evaluation Results

Method	Links
Qwen2.5-7B 2025.12		70.6
Llama3.1-8B 2025.12		70.45
Claude-sonnet-4.5 2025.12		49.12
GPT-5 2025.12		44.27
Gemini-2.5-flash 2025.12		40.35
Claude-sonnet-4.5 2025.12		17.7
GPT-4o 2025.12		14.11
GPT-4.1 2025.12		13.39
GPT-4-turbo 2025.12		11.66
GPT-4o-mini 2025.12		11.43
GPT-5 2025.12		9.2
GPT-4o 2025.12		7.86
Gemini-2.5-flash 2025.12		3.51
GPT-4o-mini 2025.12		0.36
GPT-4-turbo 2025.12		0
GPT-4.1 2025.12		0