Offline Pashto Characters Dataset for OCR Systems
Author | Khan, Sulaiman |
Author | Khan, Habib Ullah |
Author | Nazir, Shah |
Available date | 2022-12-27T09:39:12Z |
Publication Date | 2021-07-27 |
Publication Name | Security and Communication Networks |
Identifier | http://dx.doi.org/10.1155/2021/3543816 |
Citation | Khan, S., Khan, H. U., & Nazir, S. (2021). Offline Pashto Characters Dataset for OCR Systems. Security and Communication Networks, 2021. |
ISSN | 1939-0114 |
Abstract | In computer vision and artificial intelligence, text recognition and analysis based on images play a key role in the text retrieving process. Enabling a machine learning technique to recognize handwritten characters of a specific language requires a standard dataset. Acceptable handwritten character datasets are available in many languages including English, Arabic, and many more. However, the lack of datasets for handwritten Pashto characters hinders the application of a suitable machine learning algorithm for recognizing useful insights. In order to address this issue, this study presents the first handwritten Pashto characters image dataset (HPCID) for the scientific research work. This dataset consists of fourteen thousand, seven hundred, and eighty-four samples - 336 samples for each of the 44 characters in the Pashto character dataset. Such samples of handwritten characters are collected on an A4-sized paper from different students of Pashto Department in University of Peshawar, Khyber Pakhtunkhwa, Pakistan. On total, 336 students and faculty members contributed in developing the proposed database accumulation phase. This dataset contains multisize, multifont, and multistyle characters and of varying structures. |
Language | en |
Publisher | Hindawi |
Subject | Hand-written characters Text recognition Machine learning |
Type | Article |
Volume Number | 2021 |
ESSN | 1939-0122 |
Files in this item
This item appears in the following Collection(s)
-
Accounting & Information Systems [527 items ]