Simple but not naive: Fine-grained arabic dialect identification using only n-grams

Eltanbouly, Sohaila; Bashendy, May; Elsayed, Tamer

عرض / فتح

W19-4624.pdf (267.9Kb)

التاريخ

2019

المؤلف

Eltanbouly, Sohaila
Bashendy, May
Elsayed, Tamer

البيانات الوصفية

عرض كامل للتسجيلة

الملخص

This paper presents the participation of Qatar University team in MADAR shared task, which addresses the problem of sentence-level fine-grained Arabic Dialect Identification over 25 different Arabic dialects in addition to the Modern Standard Arabic. Arabic Dialect Identification is not a trivial task since different dialects share some features, e.g., utilizing the same character set and some vocabularies. We opted to adopt a very simple approach in terms of extracted features and classification models; we only utilize word and character ngrams as features, and Naive Bayes models as classifiers. Surprisingly, the simple approach achieved non-naive performance. The official results, reported on a held-out testing set, show that the dialect of a given sentence can be identified at an accuracy of 64.58% by our best submitted run.

DOI/handle

http://hdl.handle.net/10576/60891

المجموعات

علوم وهندسة الحاسب [‎2482‎ items ]