Crosslingual automatic diacritization for Egyptian Colloquial Dialect

Zayyan, Ayman A.; Elmahdy, Mohamed; Husni, Husniza Binti; Yousf, Shahrul Azmi

Date

2016

Author

Zayyan, Ayman A.
Elmahdy, Mohamed
Husni, Husniza Binti
Yousf, Shahrul Azmi

Metadata

Show full item record

Abstract

In this paper, the problem of missing diacritic marks in most of dialectal Arabic written resources is addressed. Our aim is to implement a scalable and extensible platform for automatically retrieving the diacritic marks for undiacritized dialectal Arabic texts. Different rule-based and statistical techniques are proposed. These include: morphological analyzer-based, maximum likelihood estimate, and statistical n-gram models. The proposed platform includes helper tools for text preprocessing and encoding conversion. Diacritization accuracy of each technique is evaluated in terms of Diacritic Error Rate (DER) and Word Error Rate (WER). The approach trains several n-gram models on different lexical units. A data pool of both Modern Standard Arabic (MSA) data along with Dialectal Arabic data was used to train the models. 2016 IEEE.

DOI/handle

http://dx.doi.org/10.1109/AICCSA.2016.7945665
http://hdl.handle.net/10576/22397

Collections

Computer Science & Engineering [‎2402‎ items ]