Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Evaluator Alignment with Human Judgments on DynSess-Eval 8 personas (test)
Loading...
67
Role Consistency Rank Acc
DynSess-Eval
42.04
48.52
55
61.48
May 28, 2026
Role Consistency Rank Acc
Interactive Ability Rank Acc
Interactive Ability NMAE
Human-Likeness Rank Acc
Human-Likeness NMAE
Role Consistency NMAE
Contextual Coherence Rank Acc
Contextual Coherence NMAE
Updated 5d ago
Evaluation Results
Method
Method
Links
Role Consistency Rank Acc
Interactive Ability Rank Acc
Interactive Ability NMAE
Human-Likeness Rank Acc
Human-Likeness NMAE
Role Consistency NMAE
Contextual Coherence Rank Acc
Contextual Coherence NMAE
DynSess-Eval
Eval Type=Session-Level
2026.05
67
83
26
77
27
33
73
22
DynSess-Eval (w/o Rubric-Anchored)
Eval Type=Session-Leve...
2026.05
63
73
49
67
59
44
70
59
CharacterRM
Eval Type=Turn-Level
2026.05
57
-
-
50
37
37
50
51
CharacterArena
Eval Type=Trajectory Rank
2026.05
57
53
-
-
-
-
60
-
CharacterJudge
Eval Type=Turn-Level
2026.05
53
27
51
40
45
41
57
54
RMTBench
Eval Type=Pseudo-Session
2026.05
50
33
53
57
39
41
53
50
DynSess-Eval (w/o Session-Level eval)
Eval Type=Turn-Level,...
2026.05
43
20
52
53
67
45
63
67
Feedback
Search any
task
Search any
task