Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CytoCLIP: Learning Cytoarchitectural Characteristics in Developing Human Brain Using Contrastive Language Image Pre-Training

About

The functions of different regions of the human brain are closely linked to their distinct cytoarchitecture, which is defined by the spatial arrangement and morphology of the cells. Identifying brain regions by their cytoarchitecture enables various scientific analyses of the brain. However, delineating these areas manually in brain histological sections is time-consuming and requires specialized knowledge. An automated approach is necessary to minimize the effort needed from human experts. To address this, we propose CytoCLIP, a suite of vision-language models derived from pre-trained Contrastive Language-Image Pre-Training (CLIP) frameworks to learn joint visual-text representations of brain cytoarchitecture. CytoCLIP comprises two model variants: one is trained using low-resolution whole-region images to understand the overall cytoarchitectural pattern of an area, and the other is trained on high-resolution image tiles for detailed cellular-level representation. The training dataset is created from NISSL-stained histological sections of developing fetal brains of different gestational weeks. It includes 86 distinct regions for low-resolution images and 384 brain regions for high-resolution tiles. We evaluate the model's understanding of the cytoarchitecture and generalization ability using region classification and cross-modal retrieval tasks. Multiple experiments are performed under various data setups, including data from samples of different ages and sectioning planes. Experimental results demonstrate that CytoCLIP outperforms existing methods. It achieves an F1 score of 0.87 for whole-region classification and 0.91 for high-resolution image tile classification.

Pralaypati Ta, Sriram Venkatesaperumal, Keerthi Ram, Mohanasankar Sivaprakasam• 2026

Related benchmarks

TaskDatasetResultRank
Image-to-Image RetrievalDHARANI Complete Region 1.0 (val)
Recall@14.8
6
Region ClassificationDHARANI Complete Region
Precision95.8
6
Image-to-Image RetrievalDHARANI High Res. Tiles 1.0 (val)
Recall@14.4
5
Image-to-Text RetrievalDHARANI Complete Region 1.0 (val)
Recall@14.8
5
Text-to-Image RetrievalDHARANI Complete Region 1.0 (val)
Recall@15.7
5
Image-to-Text RetrievalDHARANI High Res. Tiles 1.0 (val)
Recall@15.6
4
Text-to-Image RetrievalDHARANI High Res. Tiles 1.0 (val)
Recall@15.3
4
Region ClassificationDHARANI High Res. Tiles
Precision91.6
1
Showing 8 of 8 rows

Other info

Follow for update