Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Coresets for Relational Data and The Applications

About

A coreset is a small set that can approximately preserve the structure of the original input data set. Therefore we can run our algorithm on a coreset so as to reduce the total computational complexity. Conventional coreset techniques assume that the input data set is available to process explicitly. However, this assumption may not hold in real-world scenarios. In this paper, we consider the problem of coresets construction over relational data. Namely, the data is decoupled into several relational tables, and it could be very expensive to directly materialize the data matrix by joining the tables. We propose a novel approach called ``aggregation tree with pseudo-cube'' that can build a coreset from bottom to up. Moreover, our approach can neatly circumvent several troublesome issues of relational learning problems [Khamis et al., PODS 2019]. Under some mild assumptions, we show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.

Jiaxiang Chen, Qingyuan Yang, Ruomin Huang, Hu Ding• 2022

Related benchmarks

TaskDatasetResultRank
Logistic RegressionQ1
Runtime (s)208
16
Support Vector Machine (SVM)HOME CREDIT Query 1 1.0
Runtime (s)208
15
Support Vector Machine optimizationHome Credit Query Q1 (train)
Runtime (s)208
15
Logistic RegressionQ2
Runtime (s)228
10
SVM ClassificationHome Credit Q2
Runtime (s)228
10
Showing 5 of 5 rows

Other info

Code

Follow for update