Role-playing agent evaluation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
LLM Court 5 legal scenarios 1.0 (test)		QS d BRF Score92.5		7	3mo ago
PersonaGym		Action Justification4.13		4	2mo ago

Showing 2 of 2 rows