Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DynSess-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Role-playingDynSess-Eval Human Evaluation 1.0 (test)
Average Score3.38
10
Evaluator Alignment with Human JudgmentsDynSess-Eval 8 personas (test)
Role Consistency Rank Acc67
7
Showing 2 of 2 rows