Pashto Characters Recognition Using Multi-Class Enabled Support Vector Machine
Abstract
During the last two decades significant work has been reported in the field of cursive language’s recognition especially, in the Arabic, the Urdu and the Persian languages. The unavailability of such work in the Pashto language is because of: the absence of a standard database and of significant research work that ultimately acts as a big barrier for the research community. The slight change in the Pashto characters’ shape is an additional challenge for researchers. This paper presents an efficient OCR system for the handwritten Pashto characters based on multi-class enabled support vector machine using manifold feature extraction techniques. These feature extraction techniques include, tools such as zoning feature extractor, discrete cosine transform, discrete wavelet transform, and Gabor filters and histogram of oriented gradients. A hybrid feature map is developed by combining the manifold feature maps. This research work is performed by developing a medium-sized dataset of handwritten Pashto characters that encapsulate 200 handwritten samples for each 44 characters in the Pashto language. Recognition results are generated for the proposed model based on a manifold and hybrid feature map. An overall accuracy rates of 63.30%, 65.13%, 68.55%, 68.28%, 67.02% and 83% are generated based on a zoning technique, HoGs, Gabor filter, DCT, DWT and hybrid feature maps respectively. Applicability of the proposed model is also tested by comparing its results with a convolution neural network model. The convolution neural network-based model generated an accuracy rate of 81.02% smaller than the multi-class support vector machine. The highest accuracy rate of 83% for the multi-class SVM model based on a hybrid feature map reflects the applicability of the proposed model.
Collections
- Accounting & Information Systems [543 items ]