RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

doi:10.24963/ijcai.2024/141

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM
	Song, Ziying 1,2; Zhang, Guoxing 3; Liu, Lin 1,2; Yang, Lei 4; Xu, Shaoqing 5; Jia, Caiyan 1,2; Jia, Feiyang 1,2; Wang, Li 6
	2024
Conference Name	33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
Source Publication	Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Pages	1272-1280
Conference Date	3-9 August 2024
Conference Place	Jeju, South Korea
Publisher	International Joint Conferences on Artificial Intelligence
Abstract	Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference. At last, we employ self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise. In summary, RoboFusion significantly reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multimodal 3D object detection. Consequently, RoboFusion achieves SOTA performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenes-C benchmarks. Code is available at https://github.com/adept-thu/RoboFusion.
Keyword	Computer Vision
DOI	10.24963/ijcai.2024/141
URL	View the original
Language	英語English
Scopus ID	2-s2.0-85204313151
Fulltext Access	View Full-Text via DOI View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	Faculty of Science and Technology
Corresponding Author	Jia, Caiyan
Affiliation	1.School of Computer Science and Technology, Beijing Jiaotong University, China 2.Beijing Key Lab of Traffic Data Analysis and Mining, China 3.Hebei University of Science and Technology, China 4.Tsinghua University, China 5.University of Macau, Macao 6.Beijing Institute of Technology, China
Recommended Citation GB/T 7714	Song, Ziying,Zhang, Guoxing,Liu, Lin,et al. RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM[C]:International Joint Conferences on Artificial Intelligence, 2024, 1272-1280.
APA	Song, Ziying., Zhang, Guoxing., Liu, Lin., Yang, Lei., Xu, Shaoqing., Jia, Caiyan., Jia, Feiyang., & Wang, Li (2024). RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 1272-1280.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh