Matrix Information Theory for Self-Supervised Learning
About
The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% on the GSM8K dataset over the standard cross-entropy loss. Code available at https://github.com/yifanzhang-pro/Matrix-SSL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH | Accuracy30.2 | 535 | |
| Mathematical Reasoning | GSM8K | Accuracy (GSM8K)72.3 | 358 | |
| Image Classification | ImageNet (val) | -- | 300 | |
| Instance Segmentation | MS COCO 2014 2017 (val) | AP Mask @ IoU=0.7538 | 46 | |
| Object Detection | COCO 2014 (val) | AP7544.2 | 9 | |
| Image Classification | ImageNet 1k (train) | Accuracy @ 100 Epochs69.2 | 8 | |
| Semi-Supervised Learning | ImageNet 1% labels (train val) | Top-1 Acc45.158 | 2 | |
| Semi-Supervised Learning | ImageNet 10% labels (train val) | Top-1 Acc63.94 | 2 |