Efficient processing of hamming-distance-based similarity-search queries over MapReduce

Tang, Mingjie; Yu, Yongyang; Aref, Walid G.; Malluhi, Qutaibah M.; Ouzzani, Mourad

المؤلف	Tang, Mingjie
المؤلف	Yu, Yongyang
المؤلف	Aref, Walid G.
المؤلف	Malluhi, Qutaibah M.
المؤلف	Ouzzani, Mourad
تاريخ الإتاحة	2024-07-17T07:14:49Z
تاريخ النشر	2015
اسم المنشور	EDBT 2015 - 18th International Conference on Extending Database Technology, Proceedings
المصدر	Scopus
المعرّف	http://dx.doi.org/10.5441/002/edbt.2015.32
معرّف المصادر الموحد	http://hdl.handle.net/10576/56770
الملخص	Similarity search is crucial to many applications. Of particular interest are two flavors of the Hamming distance range query, namely, the Hamming select and the Hamming join (Hamming-select and Hamming-join, respectively). Hamming distance is widely used in approximate near neighbor search for high dimensional data, such as images and document collections. For example, using predefined similarity hash functions, high-dimensional data is mapped into one-dimensional binary codes that are, then linearly scanned to perform Hamming-distance comparisons. These distance comparisons on the binary codes are usually costly and, often involves excessive redundancies. This paper introduces a new index, termed the HA-Index, that speeds up distance comparisons and eliminates redundancies when performing the two flavors of Hamming distance range queries. An efficient search algorithm based on the HA-index is presented. A distributed version of the HA-index is introduced and algorithms for realizing Hamming distance-select and Hamming distance-join operations on a MapReduce platform are prototyped. Extensive experiments using real datasets demonstrates that the HA-index and the corresponding search algorithms achieve up to two orders of magnitude speedup over existing state-of-the-art approaches, while saving more than ten times in memory space.
اللغة	en
الناشر	OpenProceedings.org, University of Konstanz, University Library
الموضوع	Binary codes Bins Clustering algorithms Hash functions Learning algorithms Query processing Redundancy Corresponding search algorithms Document collection High dimensional data Near neighbor searches Orders of magnitude Search Algorithms Similarity search State-of-the-art approach Hamming distance
العنوان	Efficient processing of hamming-distance-based similarity-search queries over MapReduce
النوع	Conference
الصفحات	361-372
dc.accessType	Open Access

الملفات في هذه التسجيلة

الاسم:: paper-263.pdf
الحجم:: 1.881Mb
الصيغة:: PDF

عرض / فتح

هذه التسجيلة تظهر في المجموعات التالية

علوم وهندسة الحاسب [‎2484‎ items ]

عرض بسيط للتسجيلة

Efficient processing of hamming-distance-based similarity-search queries over MapReduce

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

وثائق ذات صلة

Substring search over encrypted data ﻿

Studying effectiveness of Web search for fact checking ﻿

A single-objective Sequential Search Assistance-based Multi-Objective Algorithm Framework ﻿

Video

Substring search over encrypted data

Studying effectiveness of Web search for fact checking

A single-objective Sequential Search Assistance-based Multi-Objective Algorithm Framework