Show simple item record

AuthorEltanbouly, Sohaila
AuthorBashendy, May
AuthorElsayed, Tamer
Available date2024-11-05T06:05:20Z
Publication Date2019
Publication NameACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop
AbstractThis paper presents the participation of Qatar University team in MADAR shared task, which addresses the problem of sentence-level fine-grained Arabic Dialect Identification over 25 different Arabic dialects in addition to the Modern Standard Arabic. Arabic Dialect Identification is not a trivial task since different dialects share some features, e.g., utilizing the same character set and some vocabularies. We opted to adopt a very simple approach in terms of extracted features and classification models; we only utilize word and character ngrams as features, and Naive Bayes models as classifiers. Surprisingly, the simple approach achieved non-naive performance. The official results, reported on a held-out testing set, show that the dialect of a given sentence can be identified at an accuracy of 64.58% by our best submitted run.
PublisherAssociation for Computational Linguistics (ACL)
SubjectCharacter sets
Classification (of information)
Arabic dialects
Dialect identification
Fine grained
Modern standards
Qatar university
Sentence level
Simple approach
University teams
Bayesian networks
TitleSimple but not naive: Fine-grained arabic dialect identification using only n-grams
dc.accessType Open Access

Files in this item


This item appears in the following Collection(s)

Show simple item record