Fine-Tuning MARBERT for Sentiment Analysis in Jordanian Arabic Dialects Using a Synthetic Dialectal Corpus

Hashem AL Drous; Hashem Barakat; Anas Bani Atta

doi:10.32996/jcsts.2025.7.11.36

Research Article

Fine-Tuning MARBERT for Sentiment Analysis in Jordanian Arabic Dialects Using a Synthetic Dialectal Corpus

Authors

Hashem AL Drous Independent Researcher
Hashem Barakat Independent Researcher
Anas Bani Atta Assistance Professor, Faculty of Business, Middle East University, Amman, Jordan

Abstract

Arabic Natural Language Processing (NLP) has recently witnessed remarkable progress with the emergence of transformer-based architectures such as AraBERT and MARBERT. However, dialectal variation across the Arab world continues to pose a substantial challenge to model generalization, particularly for underrepresented dialects such as Jordanian Arabic. The present study introduces an efficient end-to-end framework for evaluating and improving sentiment analysis performance on Jordanian social media data. A dedicated corpus of 900 authentic social media posts was collected through a Python-based scraping and preprocessing pipeline, designed to capture Jordanian lexical markers, code-switching with English tokens, emojis, and platform-specific linguistic noise. The dataset was evenly distributed across three sentiment categories positive, negative, and neutral and subsequently partitioned into training, validation, and test sets following an 80/10/10 ratio. We fine-tuned the MARBERT model on this curated corpus using transfer learning and evaluated its performance through macro-averaged F1-scores, precision, and recall metrics. The results indicate a marked improvement in both dialect recognition and sentiment differentiation compared with baseline performance, with the fine-tuned model achieving a macro F1-score of 0.88. This study contributes an openly reproducible pipeline for low-resource dialect modeling, offering methodological insight into sentiment analysis for Jordanian Arabic and establishing a foundation for future validation on larger, human-annotated datasets.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

7 (11)

DOI

https://doi.org/10.32996/jcsts.2025.7.11.36

Pages

373-386

Published

2025-11-09

How to Cite

AL Drous, H., Barakat, H., & Bani Atta, A. (2025). Fine-Tuning MARBERT for Sentiment Analysis in Jordanian Arabic Dialects Using a Synthetic Dialectal Corpus. Journal of Computer Science and Technology Studies, 7(11), 373-386. https://doi.org/10.32996/jcsts.2025.7.11.36

Journal of Computer Science and Technology Studies

Fine-Tuning MARBERT for Sentiment Analysis in Jordanian Arabic Dialects Using a Synthetic Dialectal Corpus

Authors

Abstract

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

7 (11)

DOI

https://doi.org/10.32996/jcsts.2025.7.11.36

Pages

373-386

Published

How to Cite

Downloads

122

62

Keywords:

rightbar

submission

menus