TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation
About
Pixel-wise image segmentation is demanding task in computer vision. Classical U-Net architectures composed of encoders and decoders are very popular for segmentation of medical images, satellite images etc. Typically, neural network initialized with weights from a network pre-trained on a large data set like ImageNet shows better performance than those trained from scratch on a small dataset. In some practical applications, particularly in medicine and traffic safety, the accuracy of the models is of utmost importance. In this paper, we demonstrate how the U-Net type architecture can be improved by the use of the pre-trained encoder. Our code and corresponding pre-trained weights are publicly available at https://github.com/ternaus/TernausNet. We compare three weight initialization schemes: LeCun uniform, the encoder with weights from VGG11 and full network trained on the Carvana dataset. This network architecture was a part of the winning solution (1st out of 735) in the Kaggle: Carvana Image Masking Challenge.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Surgical Instrument Segmentation | EndoVis 2018 (test) | Ch_IoU46.22 | 32 | |
| Surgical Instrument Segmentation | EndoVis 2017 (test) | mIoU33.78 | 22 | |
| Surgical Tool Segmentation | CaDIS (test) | IoU (m)46.47 | 7 | |
| Tool Segmentation | Sankara-MSICS (test) | mIoU42.76 | 6 | |
| Surgical Instrument Segmentation | Endovis to Surgery Case 1 | Mean Dice (Domain A)94.2 | 5 | |
| Surgical Instrument Segmentation | UCL to Surgery Case 2 Domain A: UCL, Domain B: Surgery | Dice (Domain A)95.8 | 5 | |
| Surgical Instrument Segmentation | Endovis to UCL Case 3 | Mean Dice (Domain A)93.3 | 5 | |
| Surgical Instrument Segmentation | UCL to Endovis Case 4 (Domain A: UCL, Domain B: Endovis) | Mean Dice (Domain A)93.4 | 5 | |
| Instrument Part Segmentation | EndoVis 2018 (test) | mDice (%)61.78 | 5 |