Residential College | false |
Status | 即將出版Forthcoming |
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching | |
Chu, Meng1; Zheng, Zhedong2; Ji, Wei1; Wang, Tingyu3; Chua, Tat Seng1 | |
2025 | |
Source Publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 15069 LNCS |
Pages | 213-231 |
Abstract | Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets and the stringent precision requirements for aligning visual and textual data. To address this pressing need, we introduce GeoText-1652, a new natural language-guided geolocalization benchmark. This dataset is systematically constructed through an interactive human-computer process leveraging Large Language Model (LLM) driven annotation techniques in conjunction with pre-trained vision models. GeoText-1652 extends the established University-1652 image dataset with spatial-aware text annotations, thereby establishing one-to-one correspondences between image, text, and bounding box elements. We further introduce a new optimization objective to leverage fine-grained spatial associations, called blending spatial matching, for region-level spatial relation matching. Extensive experiments reveal that our approach maintains a competitive recall rate comparing other prevailing cross-modality methods. This underscores the promising potential of our approach in elevating drone control and navigation through the seamless integration of natural language commands in real-world scenarios. |
Keyword | Drone Navigation Geolocalization Spatial Relation Matching Text Guidance |
DOI | 10.1007/978-3-031-73247-8_13 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85210022886 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | 1.School of Computing, National University of Singapore, Singapore, Singapore 2.FST and ICI, University of Macau, Macao 3.School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, China |
Recommended Citation GB/T 7714 | Chu, Meng,Zheng, Zhedong,Ji, Wei,et al. Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching[C], 2025, 213-231. |
APA | Chu, Meng., Zheng, Zhedong., Ji, Wei., Wang, Tingyu., & Chua, Tat Seng (2025). Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 15069 LNCS, 213-231. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment