Article contents
Comparative analysis of explainable machine learning models for cancer classification using cytological features
Abstract
Breast cancer is among the causes of cancer related deaths globally with the greatest impact being in the low resource and high volume health care facilities where timely and accurate screening is paramount. This research report is a explainable machine learning model of breast cancer diagnosis based on quantitative features of fine needle aspirate images of breast masses. The data set contains 569 samples and 30 real-valued predictors of cell nuclei morphology, and it does not contain missing values, and the class distribution is moderate. It uses a structured preprocessing pipeline, such as the division of data into training and held-out test sets, feature normalization, and the careful management of class imbalance.Several classification models are compared, among them, Random Forest and Gaussian Naive Bayes, to compare the predictive accuracy and reliability of the model. Experimental outcomes have shown that the Random Forest model obtains the best performance with an accuracy of 0.96 on the held out test set, and balanced precision and recall on the benign and malignant classes. The confusion matrix shows that there is low misclassification rate and only false positive and false negative are three and three respectively. Contrastingly, Gaussian Naive Bayes has a higher accuracy of 0.93, and is less sensitive to malignant cases because of its independence assumptions which are not completely met in the dataset as verified by correlation analysis. These results are also supported by receiver operating characteristic analysis, whose area under the curve value is 1.00 in random forest and 0.99 in Gaussian naive bayes.The findings emphasize the role of model selection in clinical decision support systems, especially in cases where false negatives have to be minimized. The suggested structure focuses on interpretability and high practical use thus appropriate in deployment-oriented screening processes in resource-constrained settings. This paper shows that explainable machine learning models are trained on structured cytological features and can give effective and reliable support in early detection of breast cancer.
Article information
Journal
Journal of Medical and Health Studies
Volume (Issue)
4 (5)
Pages
110-150
Published
Copyright
Copyright (c) 2023 https://creativecommons.org/licenses/by/4.0/
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment