Search

Now showing items 1-7 of 7

The similarity-aware relational intersect database operator

Al Marri, Wadha J.; Malluhi, Qutaibah; Ouzzani, Mourad; Tang, Mingjie; Aref, Walid G. ( Springer International Publishing , 2014 , Conference Paper)

Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core ...

Efficient processing of hamming-distance-based similarity-search queries over MapReduce

Tang, Mingjie; Yu, Yongyang; Aref, Walid G.; Malluhi, Qutaibah M.; Ouzzani, Mourad ( OpenProceedings.org, University of Konstanz, University Library , 2015 , Conference Paper)

Similarity search is crucial to many applications. Of particular interest are two flavors of the Hamming distance range query, namely, the Hamming select and the Hamming join (Hamming-select and Hamming-join, respectively). ...

Approving updates in collaborative databases

Mershad, Khaleel; Malluhi, Qutaibah M.; Ouzzani, Mourad; Tang, Mingjie; Aref, Walid G. ( Institute of Electrical and Electronics Engineers Inc. , 2015 , Conference Paper)

Data curation activities in collaborative databases mandate that collaborators interact until they converge and agree on the content of their data. Typically, updates by a member of the collaboration are made visible to ...

LocationSpark: A distributed in-memory data management system for big spatial data

Tang, Mingjie; Yu, Yongyang; Malluhi, Qutaibah M.; Ouzzani, Mourad; Aref, Walid G. ( VLDB Endowment , 2015 , Conference Paper)

We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, ...

Efficient parallel skyline query processing for high-dimensional data

Tang, Mingjie; Yu, Yongyang; Aref, Walid G.; Malluhi, Qutaibah M.; Ouzzani, Mourad ( IEEE Computer Society , 2019 , Conference Paper)

Given a set of multidimensional data points, skyline queries retrieve those points that are not dominated by any other points in the set. Due to the ubiquitous use of skyline queries, such as in preference-based query ...

Similarity Group-By operators for multi-dimensional relational data

Tang, Mingjie; Tahboub, Ruby Y.; Aref, Walid G.; Atallah, Mikhail J.; Malluhi, Qutaibah M.; ... more authors ... less authors ( Institute of Electrical and Electronics Engineers Inc. , 2016 , Conference Paper)

The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard ...

In-memory distributed matrix computation processing & optimization

Yu, Yongyang; Tang, Mingjie; Aref, Walid G.; Malluhi, Qutaibah M.; Abbas, Mostafa M.; ... more authors ... less authors ( IEEE Computer Society , 2017 , Conference Paper)

The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on ...