Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CNAK : Cluster Number Assisted K-means

About

Determining the number of clusters present in a dataset is an important problem in cluster analysis. Conventional clustering techniques generally assume this parameter to be provided up front. %user supplied. %Recently, robustness of any given clustering algorithm is analyzed to measure cluster stability/instability which in turn determines the cluster number. In this paper, we propose a method which analyzes cluster stability for predicting the cluster number. Under the same computational framework, the technique also finds representatives of the clusters. The method is apt for handling big data, as we design the algorithm using \emph{Monte-Carlo} simulation. Also, we explore a few pertinent issues found to be of also clustering. Experiments reveal that the proposed method is capable of identifying a single cluster. It is robust in handling high dimensional dataset and performs reasonably well over datasets having cluster imbalance. Moreover, it can indicate cluster hierarchy, if present. Overall we have observed significant improvement in speed and quality for predicting cluster numbers as well as the composition of clusters in a large dataset.

Jayasree Saha, Jayanta Mukherjee• 2019

Related benchmarks

TaskDatasetResultRank
Cluster number predictionS2
Cluster Count (NC)15
15
Cluster number predictionUnbalanced
Number of Clusters (NC)3
6
Cluster number predictionJain
Number of Clusters (NC)3
6
Cluster number predictionA1
NC20
6
Cluster number predictionAsymmetric
NC5
6
Cluster number predictionCOVID-19
Number of Clusters (NC)3
6
Cluster number predictionMNIST
Number of Clusters (NC)10
6
Cluster number predictionOptical
NC10
6
Cluster number predictionpendigits
Normalized Clustering (NC)11
6
Cluster number predictionMulti-objective
Number of Clusters Score (NC)3
6
Showing 10 of 17 rows

Other info

Follow for update