Scientific Reasoning on ScienceWorld

75.9Success Rate

CVT-RL

Updated 1mo ago

Evaluation Results

Method
CVT-RL 2026.06	75.9	-	-	-
Information-matched CF process 2026.06	72.7	-	-	-
Compute-matched non-causal 2026.06	69.1	-	-	-
Constrained process RL 2026.06	66.7	-	-	-
RLVMR 2026.06	66.4	-	-	-
BEACON 2026.05	64.3	-	-	83.7
LongRLVR 2026.06	63.2	-	-	-
Q-RAG 2026.06	62.1	-	-	-
TROLL-style 2026.06	59.6	-	-	-
PPO-RLVR 2026.06	54.3	-	-	-
GiGPO 2026.05	53.4	-	-	69.2
GRPO 2026.05	49.1	-	-	61.8
GPT-4o (ReAct) 2026.05	45.4	-	-	54.3
BEACON 2026.05	45.3	-	-	58.9
SFT 2026.06	42.7	-	-	-
Gemini-2.5-Pro (ReAct) 2026.05	36.7	-	-	47.8
GiGPO 2026.05	25.8	-	-	35.6
PPO 2026.05	24	-	-	37.1
GRPO 2026.05	21.1	-	-	31.7
Reflexion 2026.05	11.7	-	-	23.4
PPO 2026.05	10.9	-	-	29.3
ReAct 2026.05	7.8	-	-	17.4
Direct Prompt 2026.05	4.2	-	-	11.4
Reflexion 2026.05	3.9	-	-	7.1
ReAct 2026.05	1.2	-	-	9
Direct Prompt 2026.05	0.7	-	-	5.9
SFT 2025.11	-	64.5	56.1	-
ETO 2025.11	-	72.6	65.3	-
Co-Evolving Agents 2025.11	-	74.5	65.5	-