Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes

About

There has been a lot of recent research on improving the efficiency of fine-tuning foundation models. In this paper, we propose a novel efficient fine-tuning method that allows the input image size of Segment Anything Model (SAM) to be variable. SAM is a powerful foundational model for image segmentation trained on huge datasets, but it requires fine-tuning to recognize arbitrary classes. The input image size of SAM is fixed at 1024 x 1024, resulting in substantial computational demands during training. Furthermore, the fixed input image size may result in the loss of image information, e.g. due to fixed aspect ratios. To address this problem, we propose Generalized SAM (GSAM). Different from the previous methods, GSAM is the first to apply random cropping during training with SAM, thereby significantly reducing the computational cost of training. Experiments on datasets of various types and various pixel counts have shown that GSAM can train more efficiently than SAM and other fine-tuning methods for SAM, achieving comparable or higher accuracy.

Sota Kato, Hinako Mitsuoka, Kazuhiro Hotta• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	Cityscapes	mIoU76.25	494
Medical Image Segmentation	Synapse (test)	--	123
Semantic segmentation	ACDC (test)	mIoU87.83	103
Semantic segmentation	CamVid	mIoU73.99	82
Semantic segmentation	Kvasir-SEG (test)	IoU86.99	60
Semantic segmentation	ISBI 2012	mIoU80.53	21
Semantic segmentation	Kvasir-Seg	mIoU88.76	13
Semantic segmentation	M-Building	mIoU80.69	9
Semantic segmentation	Synapse	mIoU72.78	9
Semantic segmentation	ISBI 2012 (test)	mIoU80.52	9

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord