Enabling indexing and retrieval of historical Arabic manuscripts through template matching based word spotting
Abstract
We present a holistic segmentation-free query by example word spotting technique based on template matching. We have applied this technique to a dataset of historical Arabic handwritten manuscript images. First, the documents as well as query word images are pre-processed for separating text from the noisy background and converting to their binary equivalents. Then a pixel based approach is used for computing the similarity between the pre-processed template query word and document images by using the Correlation similarity measure. Slight variations in font sizes are tolerated by adjusting the threshold of similarity. Our robust pre-processing algorithm significantly enhances the performance of the learning-free template matching based word spotting approach. The proposed technique is simple as well as efficient as it does not involve any time-consuming learning steps. Experiments with a historical Arabic dataset yield promising results. This technique can generate locations of occurrences of query word images which is the fundamental step towards building searchable indexes for historical manuscripts.
Collections
- Computer Science & Engineering [2402 items ]