Divide-and-conquer based Large-Scale Spectral Clustering

About

Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long time. In this paper, we propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. In the proposed method, a divide-and-conquer based landmark selection algorithm and a novel approximate similarity matrix approach are designed to construct a sparse similarity matrix within low computational complexities. Then clustering results can be computed quickly through a bipartite graph partition process. The proposed method achieves a lower computational complexity than most existing large-scale spectral clustering methods. Experimental results on ten large-scale datasets have demonstrated the efficiency and effectiveness of the proposed method. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li-Hongmin/MyPaperWithCode.

Hongmin Li, Xiucai Ye, Akira Imakura, Tetsuya Sakurai• 2021

Related benchmarks

Task	Dataset	Result
Clustering	MNIST (test)	--	136
Spectral Clustering	Letters	Time cost (s)0.9	29
Spectral Clustering	pendigits	Time Cost (s)0.64	29
Spectral Clustering	USPS	Time (s)1.25	29
Spectral Clustering	MNIST	Time Cost (s)5.11	28
Spectral Clustering	Covertype	Time Cost (s)13.15	22
Clustering	USPS (test)	ACC82.55	19
Spectral Clustering	CG-10M	Time Cost (s)281.1	11
Spectral Clustering	Flower-20M	Time Cost (s)837.4	11
Clustering	letters (test)	Accuracy33.54	9

Showing 10 of 30 rows

Other info

Code

Follow for update

@wizwand_team Discord