HuggingGPT Human Evaluation Set

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Task Planning	HuggingGPT Human Evaluation Set 130 diverse requests (test)	Passing Rate0.9122		3
Model Selection	HuggingGPT Human Evaluation Set 130 diverse requests (test)	Passing Rate93.89		1

Showing 2 of 2 rows