Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Ultra-Scalable Spectral Clustering and Ensemble Clustering

About

This paper focuses on scalability and robustness of spectral clustering for extremely large-scale datasets with limited resources. Two novel algorithms are proposed, namely, ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative selection strategy and a fast approximation method for K-nearest representatives are proposed for the construction of a sparse affinity sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the transfer cut is then utilized to efficiently partition the graph and obtain the clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated into an ensemble clustering framework to enhance the robustness of U-SPEC while maintaining high efficiency. Based on the ensemble generation via multiple U-SEPC's, a new bipartite graph is constructed between objects and base clusters and then efficiently partitioned to achieve the consensus clustering result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning ten-million-level nonlinearly-separable datasets on a PC with 64GB memory. Experiments on various large-scale datasets have demonstrated the scalability and robustness of our algorithms. The MATLAB code and experimental data are available at https://www.researchgate.net/publication/330760669.

Dong Huang, Chang-Dong Wang, Jian-Sheng Wu, Jian-Huang Lai, Chee-Keong Kwoh• 2019

Related benchmarks

TaskDatasetResultRank
ClusteringMNIST (test)--
122
ClusteringMNIST
NMI0.7502
92
ClusteringUSPS
NMI73.89
82
ClusteringUSPS
Accuracy0.7817
36
Spectral ClusteringLetters
Time cost (s)1.44
29
Spectral Clusteringpendigits
Time Cost (s)1.01
29
Spectral ClusteringUSPS
Time (s)1.59
29
Spectral ClusteringMNIST
Time Cost (s)7.48
28
Clusteringpendigits
NMI (%)85.34
26
Spectral ClusteringCovertype
Time Cost (s)14.08
22
Showing 10 of 73 rows
...

Other info

Code

Follow for update