Residential College | false |
Status | 已發表Published |
RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM | |
Song, Ziying1,2; Zhang, Guoxing3; Liu, Lin1,2; Yang, Lei4; Xu, Shaoqing5; Jia, Caiyan1,2; Jia, Feiyang1,2; Wang, Li6 | |
2024 | |
Conference Name | 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 |
Source Publication | Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence |
Pages | 1272-1280 |
Conference Date | 3-9 August 2024 |
Conference Place | Jeju, South Korea |
Publisher | International Joint Conferences on Artificial Intelligence |
Abstract | Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference. At last, we employ self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise. In summary, RoboFusion significantly reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multimodal 3D object detection. Consequently, RoboFusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenes-C benchmarks. Code is available at https://github.com/adept-thu/RoboFusion. |
Keyword | Computer Vision |
DOI | 10.24963/ijcai.2024/141 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85204313151 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | Faculty of Science and Technology |
Corresponding Author | Jia, Caiyan |
Affiliation | 1.School of Computer Science and Technology, Beijing Jiaotong University, China 2.Beijing Key Lab of Traffic Data Analysis and Mining, China 3.Hebei University of Science and Technology, China 4.Tsinghua University, China 5.University of Macau, Macao 6.Beijing Institute of Technology, China |
Recommended Citation GB/T 7714 | Song, Ziying,Zhang, Guoxing,Liu, Lin,et al. RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM[C]:International Joint Conferences on Artificial Intelligence, 2024, 1272-1280. |
APA | Song, Ziying., Zhang, Guoxing., Liu, Lin., Yang, Lei., Xu, Shaoqing., Jia, Caiyan., Jia, Feiyang., & Wang, Li (2024). RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 1272-1280. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment