Research Article

Fine-Tuning MARBERT for Sentiment Analysis in Jordanian Arabic Dialects Using a Synthetic Dialectal Corpus

Authors

  • Hashem AL Drous Independent Researcher
  • Hashem Barakat Independent Researcher
  • Anas Bani Atta Assistance Professor, Faculty of Business, Middle East University, Amman, Jordan

Abstract

Arabic Natural Language Processing (NLP) has recently witnessed remarkable progress with the emergence of transformer-based architectures such as AraBERT and MARBERT. However, dialectal variation across the Arab world continues to pose a substantial challenge to model generalization, particularly for underrepresented dialects such as Jordanian Arabic. The present study introduces an efficient end-to-end framework for evaluating and improving sentiment analysis performance on Jordanian social media data. A dedicated corpus of 900 authentic social media posts was collected through a Python-based scraping and preprocessing pipeline, designed to capture Jordanian lexical markers, code-switching with English tokens, emojis, and platform-specific linguistic noise. The dataset was evenly distributed across three sentiment categories positive, negative, and neutral and subsequently partitioned into training, validation, and test sets following an 80/10/10 ratio. We fine-tuned the MARBERT model on this curated corpus using transfer learning and evaluated its performance through macro-averaged F1-scores, precision, and recall metrics. The results indicate a marked improvement in both dialect recognition and sentiment differentiation compared with baseline performance, with the fine-tuned model achieving a macro F1-score of 0.88. This study contributes an openly reproducible pipeline for low-resource dialect modeling, offering methodological insight into sentiment analysis for Jordanian Arabic and establishing a foundation for future validation on larger, human-annotated datasets.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

7 (11)

Pages

373-386

Published

2025-11-09

How to Cite

AL Drous, H., Barakat, H., & Bani Atta, A. (2025). Fine-Tuning MARBERT for Sentiment Analysis in Jordanian Arabic Dialects Using a Synthetic Dialectal Corpus. Journal of Computer Science and Technology Studies, 7(11), 373-386. https://doi.org/10.32996/jcsts.2025.7.11.36

Downloads

Views

0

Downloads

0

Keywords:

Arabic NLP, Sentiment Analysis, Dialectal Arabic