Research Article

A Comparative Study of Machine Learning Models for Predicting Customer Churn in Retail Banking: Insights from Logistic Regression, Random Forest, GBM, and SVM

Authors

  • Md Parvez Ahmed Master of Science in Information Technology, Washington University of Science and Technology
  • Md Arif Department of Management Science and Quantitative Methods, Gannon University, USA
  • Abdullah Al Mamun Department of Computer & Info Science, Gannon University, Erie, Pennsylvania, USA
  • Fuad Mahmud Department of Information Assurance and Cybersecurity, Gannon University, USA
  • Tauhedur Rahman Dahlkemper School of Business, Gannon University, USA
  • Md Jamil Ahmmed Department of Information Technology Project Management, Business Analytics, St. Francis College, USA
  • Sanjida Nowshin Mou Department of Management Science and Quantitative Methods, Gannon University, USA
  • Pinky Akter Master of Science in Information Technology, Washington University of Science and Technology, USA
  • Muhammad Shoyaibur Rahman Chowdhury Department of Information Technology, Gannon University, USA
  • Md Kafil Uddin Dahlkemper School of Business, Gannon University, USA

Abstract

Customer churn poses a significant challenge in the retail banking sector, leading to substantial financial losses and undermining long-term growth. This study explores the effectiveness of various machine learning models, including Logistic Regression, Random Forest, Gradient Boosting Machine (GBM), and Support Vector Machine (SVM), in predicting customer churn. Utilizing a comprehensive dataset derived from a leading bank, we conducted extensive data preprocessing and feature engineering before evaluating model performance through metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. Our findings reveal that the Gradient Boosting Machine outperforms its counterparts, achieving an accuracy of 87.2%, with an AUC-ROC score of 0.91, indicating its exceptional ability to distinguish between churned and non-churned customers. Random Forest follows closely, exhibiting robust performance, while SVM and Logistic Regression demonstrate moderate accuracy levels. This research underscores the transformative potential of machine learning in enhancing customer retention strategies within the banking industry. By identifying at-risk customers and understanding the underlying factors contributing to churn, banks can implement targeted interventions to improve customer satisfaction and loyalty. The study further suggests avenues for future research, including the exploration of real-time data analysis and the integration of qualitative customer insights, to refine predictive models and retention strategies.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

6 (4)

Pages

92-101

Published

2024-10-08

How to Cite

Md Parvez Ahmed, Md Arif, Abdullah Al Mamun, Fuad Mahmud, Tauhedur Rahman, Md Jamil Ahmmed, Sanjida Nowshin Mou, Pinky Akter, Muhammad Shoyaibur Rahman Chowdhury, & Md Kafil Uddin. (2024). A Comparative Study of Machine Learning Models for Predicting Customer Churn in Retail Banking: Insights from Logistic Regression, Random Forest, GBM, and SVM. Journal of Computer Science and Technology Studies, 6(4), 92–101. https://doi.org/10.32996/jbms.2024.6.4.12

Downloads

Keywords:

Customer Churn, Retail Banking, Machine Learning, Predictive Modeling, Logistic Regression, Random Forest, Gradient Boosting Machine (GBM), Support Vector Machine (SVM)