Efficient processing of hamming-distance-based similarity-search queries over MapReduce
الملخص
Similarity search is crucial to many applications. Of particular interest are two flavors of the Hamming distance range query, namely, the Hamming select and the Hamming join (Hamming-select and Hamming-join, respectively). Hamming distance is widely used in approximate near neighbor search for high dimensional data, such as images and document collections. For example, using predefined similarity hash functions, high-dimensional data is mapped into one-dimensional binary codes that are, then linearly scanned to perform Hamming-distance comparisons. These distance comparisons on the binary codes are usually costly and, often involves excessive redundancies. This paper introduces a new index, termed the HA-Index, that speeds up distance comparisons and eliminates redundancies when performing the two flavors of Hamming distance range queries. An efficient search algorithm based on the HA-index is presented. A distributed version of the HA-index is introduced and algorithms for realizing Hamming distance-select and Hamming distance-join operations on a MapReduce platform are prototyped. Extensive experiments using real datasets demonstrates that the HA-index and the corresponding search algorithms achieve up to two orders of magnitude speedup over existing state-of-the-art approaches, while saving more than ten times in memory space.
المجموعات
- علوم وهندسة الحاسب [2266 items ]
وثائق ذات صلة
عرض الوثائق المتصلة بواسطة: العنوان، المؤلف، المنشئ والموضوع.
-
Substring search over encrypted data
Shikfa, Abdullatif ( Hamad bin Khalifa University Press (HBKU Press) , 2018 , Conference Paper)Our data, be it personal or professional, is increasingly outsourced. This results from the development of cloud computing in the past ten years, a paradigm that shifts computing to a utility. Even without realizing it, ... -
Pure random orthogonal search (PROS): A plain and elegant parameterless algorithm for global optimization
Plevris, Vagelis; Bakas, Nikolaos P.; Solorzano, German ( MDPI AG , 2021 , Article)A new, fast, elegant, and simple stochastic optimization search method is proposed, which exhibits surprisingly good performance and robustness considering its simplicity. We name the algorithm pure random orthogonal search ... -
MINTED: Multicast virtual network embedding in cloud data centers with delay constraints
Ayoubi, Sara; Assi, Chadi; Shaban, Khaled; Narayanan, Lata ( Institute of Electrical and Electronics Engineers Inc. , 2015 , Article)Network virtualization is regarded as the pillar of cloud computing, enabling the multi-tenancy concept where multiple Virtual Networks (VNs) can cohabit the same substrate network. With network virtualization, the problem ...