Offline Pashto Characters Dataset for OCR Systems
المؤلف | Khan, Sulaiman |
المؤلف | Khan, Habib Ullah |
المؤلف | Nazir, Shah |
تاريخ الإتاحة | 2022-12-27T09:39:12Z |
تاريخ النشر | 2021-07-27 |
اسم المنشور | Security and Communication Networks |
المعرّف | http://dx.doi.org/10.1155/2021/3543816 |
الاقتباس | Khan, S., Khan, H. U., & Nazir, S. (2021). Offline Pashto Characters Dataset for OCR Systems. Security and Communication Networks, 2021. |
الرقم المعياري الدولي للكتاب | 1939-0114 |
الملخص | In computer vision and artificial intelligence, text recognition and analysis based on images play a key role in the text retrieving process. Enabling a machine learning technique to recognize handwritten characters of a specific language requires a standard dataset. Acceptable handwritten character datasets are available in many languages including English, Arabic, and many more. However, the lack of datasets for handwritten Pashto characters hinders the application of a suitable machine learning algorithm for recognizing useful insights. In order to address this issue, this study presents the first handwritten Pashto characters image dataset (HPCID) for the scientific research work. This dataset consists of fourteen thousand, seven hundred, and eighty-four samples - 336 samples for each of the 44 characters in the Pashto character dataset. Such samples of handwritten characters are collected on an A4-sized paper from different students of Pashto Department in University of Peshawar, Khyber Pakhtunkhwa, Pakistan. On total, 336 students and faculty members contributed in developing the proposed database accumulation phase. This dataset contains multisize, multifont, and multistyle characters and of varying structures. |
اللغة | en |
الناشر | Hindawi |
الموضوع | Hand-written characters Text recognition Machine learning |
النوع | Article |
رقم المجلد | 2021 |
ESSN | 1939-0122 |
الملفات في هذه التسجيلة
هذه التسجيلة تظهر في المجموعات التالية
-
المحاسبة ونظم المعلومات [527 items ]