Share your thoughts, 1 month free Claude Pro on usSee more

Open-Ended Dialogue (out-of-distribution)

68.3MT-Bench

CausalRM

Updated 5mo ago

Evaluation Results

Method	Links
CausalRM 2026.01		68.3	60.9	53.9	66.2	62.3
Standard RM 2026.01		68.2	57.8	54.2	58.5	59.7
InfoRM 2026.01		66.7	60.1	50.4	61.9	59.8
GoalRM 2026.01		66.2	59.3	53.8	63.6	60.7