CytoCLIP: Learning Cytoarchitectural Characteristics in Developing Human Brain Using Contrastive Language Image Pre-Training

About

The functions of different regions of the human brain are closely linked to their distinct cytoarchitecture, which is defined by the spatial arrangement and morphology of the cells. Identifying brain regions by their cytoarchitecture enables various scientific analyses of the brain. However, delineating these areas manually in brain histological sections is time-consuming and requires specialized knowledge. An automated approach is necessary to minimize the effort needed from human experts. To address this, we propose CytoCLIP, a suite of vision-language models derived from pre-trained Contrastive Language-Image Pre-Training (CLIP) frameworks to learn joint visual-text representations of brain cytoarchitecture. CytoCLIP comprises two model variants: one is trained using low-resolution whole-region images to understand the overall cytoarchitectural pattern of an area, and the other is trained on high-resolution image tiles for detailed cellular-level representation. The training dataset is created from NISSL-stained histological sections of developing fetal brains of different gestational weeks. It includes 86 distinct regions for low-resolution images and 379 brain regions for high-resolution tiles. We evaluate the model's understanding of the cytoarchitecture and generalization ability using region classification and cross-modal retrieval tasks. Multiple experiments are performed under various data setups, including data from samples of different ages and sectioning planes. Experimental results demonstrate that CytoCLIP outperforms existing methods. It achieves a weighted F1 score of 0.87 for whole-region classification and 0.91 for high-resolution image tile classification.

Pralaypati Ta, Sriram Venkatesaperumal, Keerthi Ram, Mohanasankar Sivaprakasam• 2026

Related benchmarks

Task	Dataset	Result
Image-to-Image Retrieval	DHARANI Complete Region 1.0 (val)	Recall@14.8	6
Region Classification	DHARANI Complete Region	Precision95.8	6
Image-to-Image Retrieval	DHARANI High Res. Tiles 1.0 (val)	Recall@14.4	5
Image-to-Text Retrieval	DHARANI Complete Region 1.0 (val)	Recall@14.8	5
Text-to-Image Retrieval	DHARANI Complete Region 1.0 (val)	Recall@15.7	5
Image-to-Text Retrieval	DHARANI High Res. Tiles 1.0 (val)	Recall@15.6	4
Text-to-Image Retrieval	DHARANI High Res. Tiles 1.0 (val)	Recall@15.3	4
Region Classification	DHARANI High Res. Tiles	Precision91.6	1

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord