MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation

doi:10.1109/TIM.2024.3497060

UM > Faculty of Science and Technology

Residential College	false
Status	即將出版Forthcoming
	MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation
	Liu, Jun 1; Li, Kunqi 1; Huang, Chun 1; Dong, Hua 1; Song, Yusheng 2; Li, Rihui3
	2024-12
Source Publication	IEEE Transactions on Instrumentation and Measurement
ISSN	0018-9456
Volume	74 Pages:5001220
Abstract	Background and Objectives: Transformers using self-attention mechanisms have recently advanced medical imaging by modeling long-range semantic dependencies, though they lack CNNs' ability to capture local spatial details. This study introduced a novel segmentation network derived from a mixed CNN-Transformer (MixFormer) feature extraction backbone to enhance medical image segmentation. Method: The MixFormer network seamlessly integrates global and local information from Transformer and CNN architectures during the downsampling process. To comprehensively capture the inter-scale perspective, we introduced a Multi-scale Spatial-aware Fusion (MSAF) module, enabling effective interaction between coarse and fine feature representations. Additionally, we proposed a Mixed Multi-branch Dilated Attention (MMDA) module to bridge the semantic gap between encoding and decoding stages while emphasizing specific regions. Lastly, we implemented a CNN-based upsampling approach to recover low-level features, substantially improving segmentation accuracy. Results: Experimental validations on prevalent medical image datasets demonstrated the superior performance of MixFormer. On the Synapse dataset, our approach achieved a mean Dice Similarity Coefficient (DSC) of 82.64% and a mean Hausdorff Distance (HD) of 12.67 mm. On the ACDC dataset, the DSC was 91.01%. On the ISIC 2018 dataset, the model achieved a mean Intersection over Union (mIOU) of 0.841, Accuracy of 0.958, Precision of 0.910, Recall of 0.934, and an F1 score of 0.913. For the Kvasir-SEG dataset, we recorded a mean Dice of 0.9247, mIOU of 0.8615, Precision of 0.9181, and Recall of 0.9463. On the CVC-ClinicDB dataset, the results were a mean Dice of 0.9441, mIOU of 0.8922, Precision of 0.9437, and Recall of 0.9458. Conclusion: These findings underscore the superior segmentation performance of MixFormer compared to most mainstream segmentation networks such as CNNs and other Transformerbased structures.
Keyword	Medical Image Segmentation (Seg) Mixed Convolutional Neural Network (Cnn)–transformer Backbone Mixed Multibranch Dilated Attention (Mmda) Multi-scale Spatial-aware Fusion (Msaf)
DOI	10.1109/TIM.2024.3497060
URL	View the original
Indexed By	SCIE
Language	英語English
WOS Research Area	Engineering ; Instruments & Instrumentation
WOS Subject	Engineering, Electrical & Electronic ; Instruments & Instrumentation
WOS ID	WOS:001370775800005
Publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141
Scopus ID	2-s2.0-85209255807
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	Faculty of Science and Technology DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING INSTITUTE OF COLLABORATIVE INNOVATION
Corresponding Author	Li, Rihui
Affiliation	1.Nanchang Hangkong University, Department of Information Engineering, Nanchang, Jiangxi, 330063, China 2.The People's Hospital of Ganzhou, Department of Interventional Radiology, Ganzhou, Jiangxi, 341000, China 3.University of Macau, Faculty of Science and Technology, Center for Cognitive and Brain Sciences, Institute of Collaborative Innovation, The Department of Electrical and Computer Engineering, Macao
Corresponding Author Affilication	INSTITUTE OF COLLABORATIVE INNOVATION
Recommended Citation GB/T 7714	Liu, Jun,Li, Kunqi,Huang, Chun,et al. MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation[J]. IEEE Transactions on Instrumentation and Measurement, 2024, 74, 5001220.
APA	Liu, Jun., Li, Kunqi., Huang, Chun., Dong, Hua., Song, Yusheng., & Li, Rihui (2024). MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation. IEEE Transactions on Instrumentation and Measurement, 74, 5001220.
MLA	Liu, Jun,et al."MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation".IEEE Transactions on Instrumentation and Measurement 74(2024):5001220.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh