Research Article

Optimizing Lung Cancer Risk Prediction with Advanced Machine Learning Algorithms and Techniques

Authors

  • Joy Chakra Bortty Department of Computer Science, Westcliff University, Irvine, California, USA
  • Proshanta Kumar Bhowmik Department of Business Analytics, Trine University, Angola, IN, USA
  • Syed Ali Reza Department of Data Analytics, University of The Potomac (UOTP), Washington, USA
  • Irin Akter Liza Master of Science in Business Analytics, College of Graduate and Professional Studies (CGPS), Trine University, USA
  • Mohammed Nazmul Islam Miah Master of Public Administration, Management Sciences, and Quantitative Methods, Gannon University, Erie, PA, USA
  • Muhammad Shoyaibur Rahman Chowdhury Master’s in information technology, Gannon University, Erie. PA, USA
  • Md Al Amin School of Business, International American University, Los Angeles, California, USA

Abstract

Lung cancer is among the leading causes of cancer death in the U.S.A. as well as globally and causes more deaths than breast, prostate, and colorectal cancers combined. It thus presents a significant health burden globally, with an estimated new case diagnosed and death toll at 2.2 and 1.8 million annually, respectively. Given the complexity of the etiology of lung cancer, there is a real urgent need for more accurate and reliable prediction models with the capability to integrate diverse risk factors. While current modalities for screening and imaging clinical conditions are effective, they are often costly and invasive. The study's main objective was to develop and evaluate machine learning models, using integrated demographic, environmental, and lifestyle variables for predicting lung cancer risk. The source of dataset for lung cancer risk prediction was retrieved from multiple sources, particularly, Cleveland hospital records as well as public health databases in the U.S; Besides, we also used large-scale epidemiology studies such as the National Lung Screening Trial (NLST) or the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. These sources provided invaluable datasets to which machine learning models were developed, as they contained very valuable information on demographic data, past medical history, lifestyle habits, and clinical symptoms. In this study, the experiment used 3 machine learning algorithms: Logistic Regression, XG-Boost, and Random Forest. Accuracy, precision, recall, as well as F1 score, are used as performance metrics. Overall, the performance of the Logistic Regression model surpassed the Random Forest and XG-Boost models. It had the highest scores in all the metrics, particularly, accuracy, precision, recall, and F1 score. This is indicative that the model Logistic Regression was slightly better at balancing the true positives and false positives and false negatives. The Random Forest model exemplified an intermediate performance, positioning itself second to the Logistic Regression. A significant volume of empirical studies has established that the different machine learning techniques, such as Logistic Regression and Random Forest considerably improve the detection of lung cancer. Although logistic regression, due to its simplicity and interpretability, remains very useful, Random Forest and XG-Boost are much more capable of modeling difficult nonlinear interactions in high-dimensional data. Advanced models like these will provide far more accurate, personalized risk estimates and have the potential to be a powerful contribution to early detection and better clinical decisions regarding lung cancer.

Article information

Journal

Journal of Medical and Health Studies

Volume (Issue)

5 (4)

Pages

35-48

Published

2024-10-21

How to Cite

Joy Chakra Bortty, Proshanta Kumar Bhowmik, Syed Ali Reza, Irin Akter Liza, Mohammed Nazmul Islam Miah, Muhammad Shoyaibur Rahman Chowdhury, & Md Al Amin. (2024). Optimizing Lung Cancer Risk Prediction with Advanced Machine Learning Algorithms and Techniques. Journal of Medical and Health Studies, 5(4), 35–48. https://doi.org/10.32996/jmhs.2024.5.4.7

Downloads

Keywords:

Lung cancer prediction; Early detection; Advanced machine learning algorithms; Logistic Regression; Random Forest; XG-Boost