UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Top-k Spatio-Textual Similarity Join
Hu, Huiqi1; Li, Guoliang1; Bao, Zhifeng2; Feng, Jianhua1; Wu, Yongwei1; Gong, Zhiguo3; Xu, Yaoqiang4
2016-02-01
Source PublicationIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
ISSN1041-4347
Volume28Issue:2Pages:551-565
Abstract

With the development of location-based services (LBS), LBS users are generating more and more spatio-textual data, e.g., checkins and attraction reviews. Since a spatio-textual entity may have different representations, possibly due to GPS deviations or typographical errors, it calls for effective methods to integrate the spatio-textual data from different data sources. In this paper, we study the problem of top-k spatio-textual similarity join (TOPK-STJOIN), which identifies the k most similar pairs from two spatio-textual data sets. One big challenge in TOPK-STJOIN is to efficiently identify the top-k similar pairs by considering both textual relevancy and spatial proximity. Traditional join algorithms that consider only one dimension (textual or spatial) are inefficient because they cannot utilize the pruning ability on the other dimension. To address this challenge, we propose a signature-based top-k join framework. We first generate a spatio-textual signature set for each object such that if two objects are in the top-k similar pairs, their signature sets must overlap. With this property, we can prune large numbers of dissimilar pairs without common signatures. We find that the order of accessing the signatures has a significant effect on the performance. So, we compute an upper bound for each signature and propose a best-first accessing method that preferentially accesses signatures with large upper bounds while those pairs with small upper bounds can be pruned. We prove the optimality of our best-first accessing method. Next, we optimize the spatio-textual signatures and propose progressive signatures to further improve the pruning power. Experimental results on real-world datasets show that our algorithm achieves high performance and good scalability, and significantly outperforms baseline approaches.

KeywordSimilarity Join Spatio-textual Join Top-k Join Spatio-textual Signature
DOI10.1109/ICDE.2016.7498433
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Artificial Intelligence ; Computer Science, Information Systems ; Engineering, Electrical & Electronic
WOS IDWOS:000369006800020
Scopus ID2-s2.0-84980416753
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.Tsinghua Univ, Dept Comp Sci, Tsinghua Natl Lab Informat Sci & Technol TNList, Beijing 100084, Peoples R China
2.RMIT Univ, Comp Sci & Info Tech, Melbourne, Vic, Australia
3.Univ Macau, Dept Comp & Informat Sci, Macau, Peoples R China
4.East China Grid, Shanghai, Peoples R China
Recommended Citation
GB/T 7714
Hu, Huiqi,Li, Guoliang,Bao, Zhifeng,et al. Top-k Spatio-Textual Similarity Join[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28(2), 551-565.
APA Hu, Huiqi., Li, Guoliang., Bao, Zhifeng., Feng, Jianhua., Wu, Yongwei., Gong, Zhiguo., & Xu, Yaoqiang (2016). Top-k Spatio-Textual Similarity Join. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 28(2), 551-565.
MLA Hu, Huiqi,et al."Top-k Spatio-Textual Similarity Join".IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 28.2(2016):551-565.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Hu, Huiqi]'s Articles
[Li, Guoliang]'s Articles
[Bao, Zhifeng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Hu, Huiqi]'s Articles
[Li, Guoliang]'s Articles
[Bao, Zhifeng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Hu, Huiqi]'s Articles
[Li, Guoliang]'s Articles
[Bao, Zhifeng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.