Learning-free handwritten word spotting method for historical handwritten documents
Abstract
Word spotting on degraded and noisy historical documents can become a challenging task considering the computational time and memory usage required to scan the entire document image. This paper proposes a new effective technique for multi-language word spotting using a two different feature extraction techniques, Histogram of Oriented Gradients (HOG) and Speeded Up Robust Features (SURF) features. First, regions of interest (ROIs) are extracted using a cross-correlation measure, and the extracted ROIs are re-ranked using feature extraction and matching methods. The algorithm handles two types of scenarios: Segmentation-based and segmentation-free. It also facilitates the search for words that occur once as well as multiple times in the image. Evaluations were conducted on the George Washington and HADARA datasets using a standard evaluation method. The proposed methodology shows improved performance over contemporary technologies currently being used in the word spotting research field.
Collections
- Computer Science & Engineering [2402 items ]