ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration

About

In the last decade, convolutional neural networks (ConvNets) have dominated and achieved state-of-the-art performances in a variety of medical imaging applications. However, the performances of ConvNets are still limited by lacking the understanding of long-range spatial relations in an image. The recently proposed Vision Transformer (ViT) for image classification uses a purely self-attention-based model that learns long-range spatial relations to focus on the relevant parts of an image. Nevertheless, ViT emphasizes the low-resolution features because of the consecutive downsamplings, result in a lack of detailed localization information, making it unsuitable for image registration. Recently, several ViT-based image segmentation methods have been combined with ConvNets to improve the recovery of detailed localization information. Inspired by them, we present ViT-V-Net, which bridges ViT and ConvNet to provide volumetric medical image registration. The experimental results presented here demonstrate that the proposed architecture achieves superior performance to several top-performing registration methods.

Junyu Chen, Yufan He, Eric C. Frey, Ye Li, Yong Du• 2021

Related benchmarks

Task	Dataset	Result
Image Registration	OASIS (test)	Dice Coefficient46.59	57
Image Registration	OASIS	Dice78.05	22
Medical Image Registration	XCAT to-CT	DSC58.2	19
Brain MRI registration	JHU inter-patient	DSC72.9	18
Brain MRI registration	IXI atlas-to-patient	DSC0.734	18
3D Brain tissues registration	CANDI 3D Brain MRI	DSC (%)76.8	11
3D Cardiac structure registration	MM-WHS, ASOCA, and CAT08 3D Cardiac CT	DSC (%)73.5	11
2D Brain tissues registration	OASIS 2D Brain MRI 1	DSC0.491	11
Image Registration	IXI	DSC76.81	9
Volumetric Medical Image Registration	Brain MRI (test)	Dice72.6	5

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord