Efficient parallel skyline query processing for high-dimensional data

Tang, Mingjie; Yu, Yongyang; Aref, Walid G.; Malluhi, Qutaibah M.; Ouzzani, Mourad

المؤلف	Tang, Mingjie
المؤلف	Yu, Yongyang
المؤلف	Aref, Walid G.
المؤلف	Malluhi, Qutaibah M.
المؤلف	Ouzzani, Mourad
تاريخ الإتاحة	2024-07-17T07:14:43Z
تاريخ النشر	2019
اسم المنشور	Proceedings - International Conference on Data Engineering
المصدر	Scopus
المعرّف	http://dx.doi.org/10.1109/ICDE.2019.00251
الرقم المعياري الدولي للكتاب	10844627
معرّف المصادر الموحد	http://hdl.handle.net/10576/56745
الملخص	Given a set of multidimensional data points, skyline queries retrieve those points that are not dominated by any other points in the set. Due to the ubiquitous use of skyline queries, such as in preference-based query answering and decision making, and the large amount of data that these queries have to deal with, enabling their scalable processing is of critical importance. However, there are several outstanding challenges that have not been well addressed. More specifically, in this paper, we are tackling the data straggler and data skew challenges introduced by distributed skyline query processing, as well as the ensuing high computation cost of merging skyline candidates. We thus introduce a new efficient three-phase approach for large scale processing of skyline queries. In the first preprocessing phase, the data is partitioned along the Z-order curve. We utilize a novel data partitioning approach that formulates data partitioning as an optimization problem to minimize the size of intermediate data. In the second phase, each compute node partitions the input data points into disjoint subsets, and then performs the skyline computation on each subset to produce skyline candidates in parallel. In the final phase, we build an index and employ an efficient algorithm to merge the generated skyline candidates. Extensive experiments demonstrate that the proposed skyline algorithm achieves more than one order of magnitude enhancement in performance compared to existing state-of-the-art approaches.
راعي المشروع	III. EXPERIMENTAL RESULTS We evaluate the performance of the proposed techniques using synthetic and real-word data. We use a Hadoop cluster consisting of 6 computing nodes, and setup one Hadoop virtual machine cluster (version 2.6) based on the the Amazon EC2 with 48 nodes, each node has an Intel Xeon E5-2666 v3 (Haswell) and 8GB of memory. Figure 2 shows the proposed approaches can achieve up to one order of magnitude speedup over existing state-of-the-art approach IV. CONCLUSIONS This paper demonstrates an efficient solution for the problem of parallel skyline query processing. Part of this work is supported by the National Natural Science Foundation of China (Grant No. 61802364).
اللغة	en
الناشر	IEEE Computer Society
الموضوع	Big data; Parallel computation; Skyline query
العنوان	Efficient parallel skyline query processing for high-dimensional data
النوع	Conference
الصفحات	2113-2114
رقم المجلد	2019-April
dc.accessType	Abstract Only

الملفات في هذه التسجيلة

الملفات	الحجم	الصيغة	العرض
لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

علوم وهندسة الحاسب [‎2426‎ items ]

عرض بسيط للتسجيلة

Efficient parallel skyline query processing for high-dimensional data

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video