Automatic diacritics restoration for modern standard Arabic text

Zayyan, Ayman A.; Elmahdy, Mohamed; Husni, Husniza binti; Al Ja'am, Jihad M.

Date

2016

Author

Zayyan, Ayman A.
Elmahdy, Mohamed
Husni, Husniza binti
Al Ja'am, Jihad M.

Metadata

Show full item record

Abstract

In this paper, the problem of missing diacritic marks in most of Arabic written resources is investigated. Our aim is to implement a scalable and extensible platform to automatically restore missing diacritic marks for Modern Standard Arabic text. Different rule-based and statistical techniques are proposed. These include: morphological analyzer-based, maximum likelihood estimate, and statistical n-gram models. Diacritization accuracy of each technique was evaluated based on Diacritic Error Rate (DER) and Word Error Rate (WER). The proposed platform includes helper tools for text preprocessing and encoding conversion. It yielded a WER of 7.1% and DER of 3.9%. When the case ending was ignored, the platform yielded a WER and DER of 5.1% and 2.7%, respectively. 2016 IEEE.

DOI/handle

http://dx.doi.org/10.1109/ISCAIE.2016.7575067
http://hdl.handle.net/10576/21085

Collections

Computer Science & Engineering [‎2520‎ items ]