One-Shot Coresets: The Case of k-Clustering

About

Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent - the practitioner has to commit to a fixed clustering objective before even exploring the data. However, can one construct small data summaries for a wide range of clustering problems simultaneously? In this work, we affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for k-clustering problems while retaining strong theoretical guarantees.

Olivier Bachem, Mario Lucic, Silvio Lattanzi• 2017

Related benchmarks

Task	Dataset	Result
Coreset Construction	Crime	Wasserstein Distance2.01	30
Coreset Construction	drug	Wasserstein Distance4.16	30
Coreset Construction	German Credit	Wasserstein Distance0.26	28
Coreset Construction	Adult	Wasserstein Distance9.37	24
Classification	Credit Dataset (test)	DD0.07	10
Classification	Drug Dataset (test)	DD0.16	10
Classification	Crime Dataset (test)	DD0.45	10
Classification	Adult Dataset (test)	DD0.12	8

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord