Comparative Analysis of Machine Learning Models for Data Classification: An In-Depth Exploration

Abdul Wajid Fazil; Musawer Hakimi; Rohullah Akbari; Mohammad Mustafa Quchi; Khudai Qul Khaliqyar

doi:10.32996/jcsts.2023.5.4.16

Research Article

Comparative Analysis of Machine Learning Models for Data Classification: An In-Depth Exploration

Authors

Abdul Wajid Fazil Lecturer, Department of Information Systems, Badakhshan University, Afghanistan
Musawer Hakimi Lecturer, Department of Computer Science, Samangan University, Afghanistan https://orcid.org/0009-0001-6591-2452
Rohullah Akbari Student, Information Systems Department, Kabul University, Afghanistan
Mohammad Mustafa Quchi Lecturer, Department of Network Engineering, Faryab University, Afghanistan
Khudai Qul Khaliqyar Lecturer, Department of IT, Badakhshan University, Afghanistan

Abstract

This research delves into the realm of data classification using machine learning models, namely 'Random Forest', 'Support Vector Machine (SVM) ' and ‘Logistic Regression'. The dataset, derived from the Australian Government's Bureau of Meteorology, encompasses weather observations from 2008 to 2017, with additional columns like 'RainToday' and the target variable 'RainTomorrow.' The study employs various metrics, including Accuracy Score, 'Jaccard Index', F1-Score, Log Loss, Recall Score and Precision Score, for model evaluation. Utilizing libraries such as 'NumPy', Pandas, matplotlib and ‘sci-kit-learn', the data pre-processing involves one-hot encoding, balancing for class imbalance and creating training and test datasets. The research implements three models, Logistic Regression, SVM and Random Forest, for data classification. Results showcase the models' performance through metrics like ROC-AUC, log loss and Jaccard Score, revealing Random Forest's superior performance in terms of ROC-AUC (0.98), compared to SVM (0.89) and Logistic Regression (0.88). The analysis also includes a detailed examination of confusion matrices for each model, providing insights into their predictive accuracy. The study contributes valuable insights into the effectiveness of these models for weather prediction, with Random Forest emerging as a robust choice. The methodologies employed can be extended to other classification tasks, providing a foundation for leveraging machine learning in diverse domains.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

5 (4)

DOI

https://doi.org/10.32996/jcsts.2023.5.4.16

Pages

160-168

Published

2023-12-04

Journal of Computer Science and Technology Studies

Comparative Analysis of Machine Learning Models for Data Classification: An In-Depth Exploration

Authors

Abstract

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

5 (4)

DOI

https://doi.org/10.32996/jcsts.2023.5.4.16

Pages

160-168

Published

Downloads

638

399

rightbar

submission

menus