Normalization of Arabic Dialects into Modern Standard Arabic using BERT and GPT-2
Journal of Data Mining & Digital Humanities2024Vol. NLP4DH
Citations Over TimeTop 10% of 2024 papers
Abstract
We present an encoder-decored based model for normalization of Arabic dialects using both BERT and GPT-2 based models. Arabic is a language of many dialects that not only differ from the Modern Standard Arabic (MSA) in terms of pronunciation but also in terms of morphology, grammar and lexical choice. This diversity can be troublesome even to a native Arabic speaker let alone a computer. Several NLP tools work well for MSA and in some of the main dialects but fail to cover Arabic language as a whole. Based on our manual evaluation, our model normalizes sentences entirely correctly 46\% of the time and almost correctly 26\% of the time.
Related Papers
- → Diglossic aphasia and the adaptation of the Bilingual Aphasia Test to Palestinian Arabic and Modern Standard Arabic(2018)8 cited
- → Sentiment analysis of modern standard Arabic and Egyptian dialectal Arabic tweets(2017)26 cited
- → The Attitude of Egyptian Arabic Speakers Towards Standard Arabic and Egyptian Arabic(2016)2 cited
- → Translating Proper Names in Up Disney Movie into Arabic: Modern Standard Arabic Versus Colloquial Egyptian Arabic ترجمة اسماء العلم فی فیلم دیزنی فوق : العربیة الفصحى مقابل العامیة المصریة(2022)
- Relative clauses : a comparison between french, classical arabic, modern standard arabic and egyptian arabic(2012)