CN

Benchmarks

Task Name	Dataset Name	SOTA Result
Multi-Agent Reinforcement Learning	CN rac-dist	Mean Episodic Reward888	21
Multi-Agent Reinforcement Learning	CN rdist	Mean Episodic Reward-161	21
Multi-Agent Reinforcement Learning	CN rdete	Mean Episodic Reward-154	21
Role-Playing Evaluation (Conversational-Naturalness)	CN	Win Rate65	9
CN task stream	CN (Medium)	Backward Transfer35	8
CN task stream	CN (Expert)	Backward Transfer5.83	8
Multi-agent Continual Cooperation	CN Medium	Forward Transfer16	7
Multi-agent Continual Cooperation	CN Expert	Forward Transfer6.74	7
Cooperative Navigation	CN MPE hard	Mean Episode Reward3.37	7
Cooperative Navigation	CN MPE medium	Mean Episode Reward3.21	7
Soft Query Answering	CN15k	1P Score16.6	6
Backdoor Attack	CN (test)	Runtime (s)28.3	4
Intent Prediction	CN	Accuracy55.2	4
Function Invocation	CN Ver. Dual	Token Usage1,377.9	3
Function Invocation	CN (Single)	Invocation Accuracy0.89	3
Competitive ratio of ε-LDP online (Wε) vs. ε-LDP offline (Lε) stopping	Cn independent non-negative random variables	Lower Bound Competitive Ratio1	1
Competitive ratio of ε-LDP online (Wε) vs. non-private offline (M) stopping	Cn independent non-negative random variables	Lower Bound Competitive Ratio2	1
Competitive ratio of ε-LDP online (Wε) vs. non-private online (V) stopping	Cn independent non-negative random variables	Lower Bound Competitive Ratio1	1
Competitive ratio of non-private online (V) vs. non-private offline (M) stopping	Cn independent non-negative random variables	Lower Bound Competitive Ratio1	1

Showing 19 of 19 rows