UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM
Song, Ziying1,2; Zhang, Guoxing3; Liu, Lin1,2; Yang, Lei4; Xu, Shaoqing5; Jia, Caiyan1,2; Jia, Feiyang1,2; Wang, Li6
2024
Conference Name33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
Source PublicationProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Pages1272-1280
Conference Date3-9 August 2024
Conference PlaceJeju, South Korea
PublisherInternational Joint Conferences on Artificial Intelligence
Abstract

Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference. At last, we employ self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise. In summary, RoboFusion significantly reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multimodal 3D object detection. Consequently, RoboFusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenes-C benchmarks. Code is available at https://github.com/adept-thu/RoboFusion.

KeywordComputer Vision
DOI10.24963/ijcai.2024/141
URLView the original
Language英語English
Scopus ID2-s2.0-85204313151
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionFaculty of Science and Technology
Corresponding AuthorJia, Caiyan
Affiliation1.School of Computer Science and Technology, Beijing Jiaotong University, China
2.Beijing Key Lab of Traffic Data Analysis and Mining, China
3.Hebei University of Science and Technology, China
4.Tsinghua University, China
5.University of Macau, Macao
6.Beijing Institute of Technology, China
Recommended Citation
GB/T 7714
Song, Ziying,Zhang, Guoxing,Liu, Lin,et al. RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM[C]:International Joint Conferences on Artificial Intelligence, 2024, 1272-1280.
APA Song, Ziying., Zhang, Guoxing., Liu, Lin., Yang, Lei., Xu, Shaoqing., Jia, Caiyan., Jia, Feiyang., & Wang, Li (2024). RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 1272-1280.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Song, Ziying]'s Articles
[Zhang, Guoxing]'s Articles
[Liu, Lin]'s Articles
Baidu academic
Similar articles in Baidu academic
[Song, Ziying]'s Articles
[Zhang, Guoxing]'s Articles
[Liu, Lin]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Song, Ziying]'s Articles
[Zhang, Guoxing]'s Articles
[Liu, Lin]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.