Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees

About

We introduce Lumbermark, a robust divisive clustering algorithm capable of detecting clusters of varying sizes, densities, and shapes. Lumbermark iteratively chops off large limbs connected by protruding segments of a dataset's mutual reachability minimum spanning tree. The use of mutual reachability distances smoothens the data distribution and decreases the influence of low-density objects, such as noise points between clusters or outliers at their peripheries. The algorithm can be viewed as an alternative to HDBSCAN that produces partitions with user-specified sizes. A fast, easy-to-use implementation of the new method is available in the open-source 'lumbermark' package for Python and R. We show that Lumbermark performs well on benchmark data and hope it will prove useful to data scientists and practitioners across different fields.

Marek Gagolewski• 2026

Related benchmarks

TaskDatasetResultRank
ClusteringRandom datasets Uniform distribution on the unit interval in R^d (test)
Runtime (s)0.01
117
ClusteringBenchmark dataset repository All labels v1.1.0
Cases with AR < 0.8 Count11
14
ClusteringBenchmark dataset repository Third-party labels only v1.1.0
Cases with AR < 0.86
14
Showing 3 of 3 rows

Other info

Follow for update