The similarity-aware relational intersect database operator
Author | Al Marri, Wadha J. |
Author | Malluhi, Qutaibah |
Author | Ouzzani, Mourad |
Author | Tang, Mingjie |
Author | Aref, Walid G. |
Editor | Traina, Agma Juci Machado |
Editor | Traina Jr., Caetano |
Editor | Cordeiro, Robson Leonardo Ferreira |
Available date | 2016-05-01T13:33:02Z |
Publication Date | 2014 |
Publication Name | Similarity Search and Applications: 7th International Conference, SISAP 2014, Los Cabos, Mexico, October 29-31, 2014. Proceedings |
Resource | Scopus |
Citation | Al Marri, W.J., Malluhi, Q., Ouzzani, M., Tang, M., Aref, W.G. "The similarity-aware relational intersect database operator" (2014) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8821, pp. 164-175. |
ISBN | 978-3-319-11987-8 |
ISBN | 978-3-319-11988-5 (Online) |
Abstract | Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators. |
Sponsor | NPRP grant 4-1534-1-247 from the Qatar National Research Fund and by the National Science Foundation Grants IIS 0916614, IIS 1117766, and IIS 0964639. |
Language | en |
Publisher | Springer International Publishing |
Series relation | Lecture Notes in Computer Science |
Subject | bioinformatics data integration pattern recognition query processing semantics database operators query processing algorithms regular operators relational operator set intersection similarity group byes three orders of magnitude Tpc-h benchmarks |
Type | Conference Paper |
Pagination | 164-175 |
Volume Number | 8821 |
Files in this item
Files | Size | Format | View |
---|---|---|---|
There are no files associated with this item. |
This item appears in the following Collection(s)
-
Computer Science & Engineering [2402 items ]
-
Interdisciplinary & Smart Design [15 items ]