Similarity Group-by Operators for Multi-Dimensional Relational Data

Tang, Mingjie; Tahboub, Ruby Y.; Aref, Walid G.; Atallah, Mikhail J.; Malluhi, Qutaibah M.; Ouzzani, Mourad; Silva, Yasin N.

المؤلف	Tang, Mingjie
المؤلف	Tahboub, Ruby Y.
المؤلف	Aref, Walid G.
المؤلف	Atallah, Mikhail J.
المؤلف	Malluhi, Qutaibah M.
المؤلف	Ouzzani, Mourad
المؤلف	Silva, Yasin N.
تاريخ الإتاحة	2021-05-20T04:35:57Z
تاريخ النشر	2016
اسم المنشور	IEEE Transactions on Knowledge and Data Engineering
المصدر	Scopus
الرقم المعياري الدولي للكتاب	10414347
معرّف المصادر الموحد	http://dx.doi.org/10.1109/TKDE.2015.2480400
معرّف المصادر الموحد	http://hdl.handle.net/10576/18429
الملخص	The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarity aware grouping provides a more realistic view on real-world data that could lead to better insights. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily equal values. While existing similarity-based grouping operators efficiently realize these approximate semantics, they primarily focus on one-dimensional attributes and treat multi-dimensional attributes independently. However, correlated attributes, such as in spatial data, are processed independently, and hence, groups in the multi-dimensional space are not detected properly. To address this problem, we introduce two new SGB operators for multi-dimensional data. The first operator is the clique (or distance-to-all) SGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if the tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. We implement and test the new SGB operators and their algorithms inside PostgreSQL. The overhead introduced by these operators proves to be minimal and the execution times are comparable to those of the standard Group-by. The experimental study, based on TPC-H and a social check-in data, demonstrates that the proposed algorithms can achieve up to three orders of magnitude enhancement in performance over baseline methods developed to solve the same problem. 1989-2012 IEEE.
راعي المشروع	The authors would like to thank three anonymous reviewers for their criticism and comments. This publication was made possible by NPRP Grants 4-1534-1-247 and 09-622-1-090 from the Qatar National Research Fund (a member of Qatar Foundation) and by the National Science Foundation Grants IIS-1117766, CPS-1329979, and Science and Technology Center CCF-0939370; and by sponsors of the Center for Education and Research in Information Assurance and Security. The statements made herein are solely the responsibility of the authors
اللغة	en
الناشر	IEEE Computer Society
الموضوع	multidimensional data query processing relational database similarity query SQL operators
العنوان	Similarity Group-by Operators for Multi-Dimensional Relational Data
النوع	Conference
الصفحات	510-523
رقم العدد	2
رقم المجلد	28
dc.accessType	Abstract Only

الملفات في هذه التسجيلة

الملفات	الحجم	الصيغة	العرض
لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

الابحاث المتعددة التخصصات والتصاميم االذكية [‎15‎ items ]

عرض بسيط للتسجيلة

Similarity Group-by Operators for Multi-Dimensional Relational Data

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video