عرض بسيط للتسجيلة

المؤلفTang, Mingjie
المؤلفYu, Yongyang
المؤلفAref, Walid G.
المؤلفMalluhi, Qutaibah M.
المؤلفOuzzani, Mourad
تاريخ الإتاحة2024-07-17T07:14:49Z
تاريخ النشر2015
اسم المنشورEDBT 2015 - 18th International Conference on Extending Database Technology, Proceedings
المصدرScopus
المعرّفhttp://dx.doi.org/10.5441/002/edbt.2015.32
معرّف المصادر الموحدhttp://hdl.handle.net/10576/56770
الملخصSimilarity search is crucial to many applications. Of particular interest are two flavors of the Hamming distance range query, namely, the Hamming select and the Hamming join (Hamming-select and Hamming-join, respectively). Hamming distance is widely used in approximate near neighbor search for high dimensional data, such as images and document collections. For example, using predefined similarity hash functions, high-dimensional data is mapped into one-dimensional binary codes that are, then linearly scanned to perform Hamming-distance comparisons. These distance comparisons on the binary codes are usually costly and, often involves excessive redundancies. This paper introduces a new index, termed the HA-Index, that speeds up distance comparisons and eliminates redundancies when performing the two flavors of Hamming distance range queries. An efficient search algorithm based on the HA-index is presented. A distributed version of the HA-index is introduced and algorithms for realizing Hamming distance-select and Hamming distance-join operations on a MapReduce platform are prototyped. Extensive experiments using real datasets demonstrates that the HA-index and the corresponding search algorithms achieve up to two orders of magnitude speedup over existing state-of-the-art approaches, while saving more than ten times in memory space.
اللغةen
الناشرOpenProceedings.org, University of Konstanz, University Library
الموضوعBinary codes
Bins
Clustering algorithms
Hash functions
Learning algorithms
Query processing
Redundancy
Corresponding search algorithms
Document collection
High dimensional data
Near neighbor searches
Orders of magnitude
Search Algorithms
Similarity search
State-of-the-art approach
Hamming distance
العنوانEfficient processing of hamming-distance-based similarity-search queries over MapReduce
النوعConference Paper
الصفحات361-372
dc.accessType Open Access


الملفات في هذه التسجيلة

Thumbnail

هذه التسجيلة تظهر في المجموعات التالية

عرض بسيط للتسجيلة