Article contents
Fine-Tuning MARBERT for Sentiment Analysis in Jordanian Arabic Dialects Using a Synthetic Dialectal Corpus
Abstract
Arabic Natural Language Processing (NLP) has recently witnessed remarkable progress with the emergence of transformer-based architectures such as AraBERT and MARBERT. However, dialectal variation across the Arab world continues to pose a substantial challenge to model generalization, particularly for underrepresented dialects such as Jordanian Arabic. The present study introduces an efficient end-to-end framework for evaluating and improving sentiment analysis performance on Jordanian social media data. A dedicated corpus of 900 authentic social media posts was collected through a Python-based scraping and preprocessing pipeline, designed to capture Jordanian lexical markers, code-switching with English tokens, emojis, and platform-specific linguistic noise. The dataset was evenly distributed across three sentiment categories positive, negative, and neutral and subsequently partitioned into training, validation, and test sets following an 80/10/10 ratio. We fine-tuned the MARBERT model on this curated corpus using transfer learning and evaluated its performance through macro-averaged F1-scores, precision, and recall metrics. The results indicate a marked improvement in both dialect recognition and sentiment differentiation compared with baseline performance, with the fine-tuned model achieving a macro F1-score of 0.88. This study contributes an openly reproducible pipeline for low-resource dialect modeling, offering methodological insight into sentiment analysis for Jordanian Arabic and establishing a foundation for future validation on larger, human-annotated datasets.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment