UM  > Faculty of Science and Technology
Residential Collegefalse
Status即將出版Forthcoming
MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation
Liu, Jun1; Li, Kunqi1; Huang, Chun1; Dong, Hua1; Song, Yusheng2; Li, Rihui3
2024-12
Source PublicationIEEE Transactions on Instrumentation and Measurement
ISSN0018-9456
Volume74Pages:5001220
Abstract

Background and Objectives: Transformers using self-attention mechanisms have recently advanced medical imaging by modeling long-range semantic dependencies, though they lack CNNs' ability to capture local spatial details. This study introduced a novel segmentation network derived from a mixed CNN-Transformer (MixFormer) feature extraction backbone to enhance medical image segmentation. Method: The MixFormer network seamlessly integrates global and local information from Transformer and CNN architectures during the downsampling process. To comprehensively capture the inter-scale perspective, we introduced a Multi-scale Spatial-aware Fusion (MSAF) module, enabling effective interaction between coarse and fine feature representations. Additionally, we proposed a Mixed Multi-branch Dilated Attention (MMDA) module to bridge the semantic gap between encoding and decoding stages while emphasizing specific regions. Lastly, we implemented a CNN-based upsampling approach to recover low-level features, substantially improving segmentation accuracy. Results: Experimental validations on prevalent medical image datasets demonstrated the superior performance of MixFormer. On the Synapse dataset, our approach achieved a mean Dice Similarity Coefficient (DSC) of 82.64% and a mean Hausdorff Distance (HD) of 12.67 mm. On the ACDC dataset, the DSC was 91.01%. On the ISIC 2018 dataset, the model achieved a mean Intersection over Union (mIOU) of 0.841, Accuracy of 0.958, Precision of 0.910, Recall of 0.934, and an F1 score of 0.913. For the Kvasir-SEG dataset, we recorded a mean Dice of 0.9247, mIOU of 0.8615, Precision of 0.9181, and Recall of 0.9463. On the CVC-ClinicDB dataset, the results were a mean Dice of 0.9441, mIOU of 0.8922, Precision of 0.9437, and Recall of 0.9458. Conclusion: These findings underscore the superior segmentation performance of MixFormer compared to most mainstream segmentation networks such as CNNs and other Transformerbased structures.

KeywordMedical Image Segmentation (Seg) Mixed Convolutional Neural Network (Cnn)–transformer Backbone Mixed Multibranch Dilated Attention (Mmda) Multi-scale Spatial-aware Fusion (Msaf)
DOI10.1109/TIM.2024.3497060
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaEngineering ; Instruments & Instrumentation
WOS SubjectEngineering, Electrical & Electronic ; Instruments & Instrumentation
WOS IDWOS:001370775800005
PublisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141
Scopus ID2-s2.0-85209255807
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
INSTITUTE OF COLLABORATIVE INNOVATION
Corresponding AuthorLi, Rihui
Affiliation1.Nanchang Hangkong University, Department of Information Engineering, Nanchang, Jiangxi, 330063, China
2.The People's Hospital of Ganzhou, Department of Interventional Radiology, Ganzhou, Jiangxi, 341000, China
3.University of Macau, Faculty of Science and Technology, Center for Cognitive and Brain Sciences, Institute of Collaborative Innovation, The Department of Electrical and Computer Engineering, Macao
Corresponding Author AffilicationINSTITUTE OF COLLABORATIVE INNOVATION
Recommended Citation
GB/T 7714
Liu, Jun,Li, Kunqi,Huang, Chun,et al. MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation[J]. IEEE Transactions on Instrumentation and Measurement, 2024, 74, 5001220.
APA Liu, Jun., Li, Kunqi., Huang, Chun., Dong, Hua., Song, Yusheng., & Li, Rihui (2024). MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation. IEEE Transactions on Instrumentation and Measurement, 74, 5001220.
MLA Liu, Jun,et al."MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation".IEEE Transactions on Instrumentation and Measurement 74(2024):5001220.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Liu, Jun]'s Articles
[Li, Kunqi]'s Articles
[Huang, Chun]'s Articles
Baidu academic
Similar articles in Baidu academic
[Liu, Jun]'s Articles
[Li, Kunqi]'s Articles
[Huang, Chun]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Liu, Jun]'s Articles
[Li, Kunqi]'s Articles
[Huang, Chun]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.