Bridged Clustering: Semi-Supervised Sparse Bridging
About
We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $\hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly leverages output-only data, and unlike dense transport-based methods, it maintains a sparse and interpretable alignment. Through theoretical analysis, we show that with bounded mis-clustering and mis-bridging rates, our algorithm becomes an effective and efficient predictor. Empirically, our method is competitive with SOTA methods while remaining simple, model-agnostic, and highly label-efficient in low-supervision settings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cluster Mapping | Flickr30k rev. (inductive) | Win-rate60 | 7 | |
| Cluster Mapping | Flickr30k (inductive) | Win Rate45 | 7 | |
| Bridged Clustering | Flickr30k standard (transductive) | Win-rate56 | 6 | |
| Bridged Clustering | WIT standard (transductive) | Win-rate12 | 6 | |
| Bridged Clustering | WIT reversed input/output mapping (transductive) | Win Rate9 | 6 | |
| Cluster Mapping | WIT (inductive) | Win Rate12 | 6 | |
| Cluster Mapping | WIT rev. (inductive) | Win Rate9 | 6 | |
| Bridged Clustering | BIOSCAN standard (transductive) | Win-rate67 | 5 | |
| Bridged Clustering | Flickr30k reversed input/output mapping (transductive) | Win Rate71 | 5 | |
| Cluster Mapping | BIOSCAN (inductive) | Win Rate67 | 5 |