Large Language Models Enable Few-Shot Clustering
About
Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user's intent. Existing approaches to semi-supervised clustering require a significant amount of feedback from an expert to improve the clusters. In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering. We show that LLMs are surprisingly effective at improving clustering. We explore three stages where LLMs can be incorporated into clustering: before clustering (improving input features), during clustering (by providing constraints to the clusterer), and after clustering (using LLMs post-correction). We find incorporating LLMs in the first two stages can routinely provide significant improvements in cluster quality, and that LLMs enable a user to make trade-offs between cost and accuracy to produce desired clusters. We release our code and LLM prompts for the public to use.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Intent Classification | Banking77 (test) | Accuracy72 | 151 | |
| Short Text Clustering | Tweet | Accuracy61.8 | 28 | |
| Short Text Clustering | Clinc150 (test) | NMI92.6 | 23 | |
| Short Text Clustering | Bank 77 (test) | NMI82.4 | 22 | |
| Clustering | Bank77 | NMI83.4 | 19 | |
| Clustering | CLINC | Accuracy84.1 | 15 | |
| Dialogue Intent Clustering | Chinese dialogue intent dataset (test) | NMI Gain5.97 | 12 | |
| retrieval judgment | RAL2M covidqa, expertqa, hagrid, hotpotqa, msmarco (test) | Accuracy58.3 | 10 | |
| Clustering | Adobe Lightroom (test) | Accuracy71.5 | 9 | |
| Clustering | OpenAI Codex (test) | Accuracy40.3 | 9 |