Residential College | false |
Status | 已發表Published |
FastThetaJoin: An Optimization on Multi-way Data Stream θ-join with Range Constraints | |
Ziyue Hu1,2; Xiaopeng Fan1; Yang Wang1; Chengzhong Xu3 | |
2020-09-29 | |
Conference Name | 20th International Conference on Algorithms and Architectures for Parallel Processing |
Source Publication | ICA3PP 2020: Algorithms and Architectures for Parallel Processing |
Conference Date | 2020/10/02-2020/10/04 |
Conference Place | New York, NY |
Country | USA |
Abstract | In this paper, we propose FastThetaJoin, an optimization technique for θ-join operation on multi-way data streams, which is an essential query often used in many data analytical tasks. The θ-join operation on multi-way data streams is notoriously difficult as it always involves tremendous shuffle cost due to data movements between multiple operation components, rendering it hard to be efficiently implemented in a distributed environment. As with previous methods, FastThetaJoin also tries to minimize the number of θ-joins, but it is distinct from others in terms of making partitions, deleting unnecessary data items, and performing the Cartesian product. FastThetaJoin not only effectively minimizes the number of θ-joins, but also substantially improves the efficiency of its operations in a distributed environment. We implemented FastThetaJoin in the framework of Spark Streaming, characterized by its efficient bucket implementation of parameterized windows. The experimental results show that, compared with the existing solutions, our proposed method can speed up the θ-join processing while reducing its overhead; the specific effects of the optimization is correlated to the nature of data streams–the greater the data difference is, the more apparent the optimization effect is. |
Keyword | Θ-join Theta Join Multi-way Data Streams Data Streams Spark Streaming |
DOI | 10.1007/978-3-030-60245-1_12 |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Hardware & Architecture ; Computer Science, Software Engineering ; Computer Science, Theory & Methods |
WOS ID | WOS:000719289200012 |
Scopus ID | 2-s2.0-85092650298 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | Faculty of Science and Technology |
Corresponding Author | Yang Wang |
Affiliation | 1.Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 2.University of Chinese Academy of Sciences, Beijing, China 3.University of Macau, Macau, China |
Recommended Citation GB/T 7714 | Ziyue Hu,Xiaopeng Fan,Yang Wang,et al. FastThetaJoin: An Optimization on Multi-way Data Stream θ-join with Range Constraints[C], 2020. |
APA | Ziyue Hu., Xiaopeng Fan., Yang Wang., & Chengzhong Xu (2020). FastThetaJoin: An Optimization on Multi-way Data Stream θ-join with Range Constraints. ICA3PP 2020: Algorithms and Architectures for Parallel Processing. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment