The similarity-aware relational intersect database operator

Al Marri, Wadha J.; Malluhi, Qutaibah; Ouzzani, Mourad; Tang, Mingjie; Aref, Walid G.

Date

2014

Author

Al Marri, Wadha J.
Malluhi, Qutaibah
Ouzzani, Mourad
Tang, Mingjie
Aref, Walid G.

Metadata

Show full item record

Abstract

Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

DOI/handle

http://dx.doi.org/10.1007/978-3-319-11988-5_15
http://hdl.handle.net/10576/4483

Collections

Computer Science & Engineering [‎2518‎ items ]
Interdisciplinary & Smart Design [‎45‎ items ]