Divide-and-conquer based Large-Scale Spectral Clustering
About
Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long time. In this paper, we propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. In the proposed method, a divide-and-conquer based landmark selection algorithm and a novel approximate similarity matrix approach are designed to construct a sparse similarity matrix within low computational complexities. Then clustering results can be computed quickly through a bipartite graph partition process. The proposed method achieves a lower computational complexity than most existing large-scale spectral clustering methods. Experimental results on ten large-scale datasets have demonstrated the efficiency and effectiveness of the proposed method. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li-Hongmin/MyPaperWithCode.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Clustering | MNIST (test) | -- | 122 | |
| Spectral Clustering | Letters | Time cost (s)0.9 | 29 | |
| Spectral Clustering | pendigits | Time Cost (s)0.64 | 29 | |
| Spectral Clustering | USPS | Time (s)1.25 | 29 | |
| Spectral Clustering | MNIST | Time Cost (s)5.11 | 28 | |
| Spectral Clustering | Covertype | Time Cost (s)13.15 | 22 | |
| Clustering | USPS (test) | ACC82.55 | 19 | |
| Spectral Clustering | CG-10M | Time Cost (s)281.1 | 11 | |
| Spectral Clustering | Flower-20M | Time Cost (s)837.4 | 11 | |
| Clustering | letters (test) | Accuracy33.54 | 9 |