Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Supporting Clustering with Contrastive Learning

About

Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To this end, we propose Supporting Clustering with Contrastive Learning (SCCL) -- a novel framework to leverage contrastive learning to promote better separation. We assess the performance of SCCL on short text clustering and show that SCCL significantly advances the state-of-the-art results on most benchmark datasets with 3%-11% improvement on Accuracy and 4%-15% improvement on Normalized Mutual Information. Furthermore, our quantitative analysis demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intra-cluster and inter-cluster distances when evaluated with the ground truth cluster labels.

Dejiao Zhang, Feng Nan, Xiaokai Wei, Shangwen Li, Henghui Zhu, Kathleen McKeown, Ramesh Nallapati, Andrew Arnold, Bing Xiang• 2021

Related benchmarks

TaskDatasetResultRank
New Intent DiscoveryBANKING
NMI63.43
76
New Intent DiscoveryM-CID
NMI55.18
75
Short Text ClusteringSearchSnippets
Accuracy85.2
38
Short Text ClusteringAGNews
ACC88.2
38
Short Text ClusteringStackOverflow
Accuracy75.5
38
Short Text ClusteringTweet
Accuracy78.2
28
New Intent DiscoveryStackOverflow
NMI68.69
27
Event DetectionMAVEN (test)
F1 Score24.24
26
New Intent DiscoveryCLINC
NMI79.14
20
ClusteringBank77
NMI81.8
19
Showing 10 of 23 rows

Other info

Code

Follow for update