Residential College | false |
Status | 已發表Published |
SQ-ViT: A Multi-Scale Vision Transformer With Quaternion For Endoscopic Images Classification | |
Jin, Zhanjun1; Huang, Guoheng1![]() ![]() ![]() | |
2024-12-16 | |
Source Publication | IEEE Transactions on Consumer Electronics
![]() |
ISSN | 0098-3063 |
Abstract | In the field of medical consumer electronics, endoscopic imaging technology especially electronic nasopharyngoscope imaging, often suffers from low resolution, which poses a difficulty for endoscopic images classification due to the loss of image details. Recent advancements in Vision Transformer (ViT) based methods have shown promise in addressing this problem. However, ViT relies heavily on global context information to maintain performance, and the limited pixel count in lowresolution images poses a challenge in capturing adequate global context information. To address these challenges, we propose the Sequential Quaternion Vision Transformer (SQ-ViT), which improves multi-scale feature utilization by feeding sampled features into the subsequent encoder layers. Specifically, we introduce the Multi-scale Visual Feature Fusion (MVFF) module, which segments the image into multiple superpixel blocks and refines the contour and color information of the processed image, which helps to enhance the representation of visual features. Additionally, visual information would be captured more effectively by our proposed Quaternion Interactive Encoder (QIE). Experiments demonstrate the effectiveness of SQ-ViT in improving multi-scale feature utilization and addressing challenges in low-resolution endoscopic imaging for endoscopic images classification. The source code will be released at https://github.com/jinzhanjun625/SQViT. |
Keyword | Endoscopic Images Classification Endoscopy Interpretability Quaternion Convolution Superpixel Vision Transformer |
DOI | 10.1109/TCE.2024.3518755 |
URL | View the original |
Language | 英語English |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Scopus ID | 2-s2.0-85213027035 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Huang, Guoheng; Zhong, Guo |
Affiliation | 1.Guangdong University of Technology, School of Computer Science and Technology, Guangzhou, Guangdong, 510000, China 2.Sun Yat-sen University First Affiliated Hospital Department of Nephrology, Department of Otorhinolaryngology, Guangzhou, Guangdong, 510000, China 3.Macao Polytechnic University, Faculty of Applied Sciences, 999078, Macao 4.University of Macao, Department of Computer and Information Science, 999078, Macao 5.Guangdong University of Foreign Studies, School of Information Science and Technology, Guangzhou, Guangdong, 510000, China |
Recommended Citation GB/T 7714 | Jin, Zhanjun,Huang, Guoheng,Zhang, Feng,et al. SQ-ViT: A Multi-Scale Vision Transformer With Quaternion For Endoscopic Images Classification[J]. IEEE Transactions on Consumer Electronics, 2024. |
APA | Jin, Zhanjun., Huang, Guoheng., Zhang, Feng., Yuan, Xiaochen., Zhu, Dingzhou., Tan, Zhe., Pun, Chi Man., & Zhong, Guo (2024). SQ-ViT: A Multi-Scale Vision Transformer With Quaternion For Endoscopic Images Classification. IEEE Transactions on Consumer Electronics. |
MLA | Jin, Zhanjun,et al."SQ-ViT: A Multi-Scale Vision Transformer With Quaternion For Endoscopic Images Classification".IEEE Transactions on Consumer Electronics (2024). |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment