Residential College | false |
Status | 即將出版Forthcoming |
MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation | |
Liu, Jun1; Li, Kunqi1; Huang, Chun1; Dong, Hua1; Song, Yusheng2; Li, Rihui3 | |
2024-12 | |
Source Publication | IEEE Transactions on Instrumentation and Measurement |
ISSN | 0018-9456 |
Volume | 74Pages:5001220 |
Abstract | Background and Objectives: Transformers using self-attention mechanisms have recently advanced medical imaging by modeling long-range semantic dependencies, though they lack CNNs' ability to capture local spatial details. This study introduced a novel segmentation network derived from a mixed CNN-Transformer (MixFormer) feature extraction backbone to enhance medical image segmentation. Method: The MixFormer network seamlessly integrates global and local information from Transformer and CNN architectures during the downsampling process. To comprehensively capture the inter-scale perspective, we introduced a Multi-scale Spatial-aware Fusion (MSAF) module, enabling effective interaction between coarse and fine feature representations. Additionally, we proposed a Mixed Multi-branch Dilated Attention (MMDA) module to bridge the semantic gap between encoding and decoding stages while emphasizing specific regions. Lastly, we implemented a CNN-based upsampling approach to recover low-level features, substantially improving segmentation accuracy. Results: Experimental validations on prevalent medical image datasets demonstrated the superior performance of MixFormer. On the Synapse dataset, our approach achieved a mean Dice Similarity Coefficient (DSC) of 82.64% and a mean Hausdorff Distance (HD) of 12.67 mm. On the ACDC dataset, the DSC was 91.01%. On the ISIC 2018 dataset, the model achieved a mean Intersection over Union (mIOU) of 0.841, Accuracy of 0.958, Precision of 0.910, Recall of 0.934, and an F1 score of 0.913. For the Kvasir-SEG dataset, we recorded a mean Dice of 0.9247, mIOU of 0.8615, Precision of 0.9181, and Recall of 0.9463. On the CVC-ClinicDB dataset, the results were a mean Dice of 0.9441, mIOU of 0.8922, Precision of 0.9437, and Recall of 0.9458. Conclusion: These findings underscore the superior segmentation performance of MixFormer compared to most mainstream segmentation networks such as CNNs and other Transformerbased structures. |
Keyword | Medical Image Segmentation (Seg) Mixed Convolutional Neural Network (Cnn)–transformer Backbone Mixed Multibranch Dilated Attention (Mmda) Multi-scale Spatial-aware Fusion (Msaf) |
DOI | 10.1109/TIM.2024.3497060 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Engineering ; Instruments & Instrumentation |
WOS Subject | Engineering, Electrical & Electronic ; Instruments & Instrumentation |
WOS ID | WOS:001370775800005 |
Publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141 |
Scopus ID | 2-s2.0-85209255807 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING INSTITUTE OF COLLABORATIVE INNOVATION |
Corresponding Author | Li, Rihui |
Affiliation | 1.Nanchang Hangkong University, Department of Information Engineering, Nanchang, Jiangxi, 330063, China 2.The People's Hospital of Ganzhou, Department of Interventional Radiology, Ganzhou, Jiangxi, 341000, China 3.University of Macau, Faculty of Science and Technology, Center for Cognitive and Brain Sciences, Institute of Collaborative Innovation, The Department of Electrical and Computer Engineering, Macao |
Corresponding Author Affilication | INSTITUTE OF COLLABORATIVE INNOVATION |
Recommended Citation GB/T 7714 | Liu, Jun,Li, Kunqi,Huang, Chun,et al. MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation[J]. IEEE Transactions on Instrumentation and Measurement, 2024, 74, 5001220. |
APA | Liu, Jun., Li, Kunqi., Huang, Chun., Dong, Hua., Song, Yusheng., & Li, Rihui (2024). MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation. IEEE Transactions on Instrumentation and Measurement, 74, 5001220. |
MLA | Liu, Jun,et al."MixFormer: A Mixed CNN-Transformer Backbone for Medical Image Segmentation".IEEE Transactions on Instrumentation and Measurement 74(2024):5001220. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment