Article contents
Heart Disease Risk Prediction Using Machine Learning: A Data-Driven Approach for Early Diagnosis and Prevention
Abstract
Cardiovascular diseases continue to be a major cause of death worldwide and a major challenge to healthcare systems in both the developing and developed world. In the US alone, nearly a fifth of all deaths in a year are caused by cardiovascular diseases, which imposes a huge burden on public and economic resources. The chief aim of this work was to create and rigorously test machine learning models that are effective in the prediction of heart disease risk for various populations. Based on well-annotated datasets and well-labeled variables like age, systolic/diastolic blood pressure, cholesterol level, type of chest pain, and electrocardiogram results. We used the publicly accessible Cleveland Heart Disease data for this study on Heart Disease Risk Prediction Using Machine Learning. The data consisted of 303 patient records and 14 important attributes typical for cardiovascular health: age, sex, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, and ST depression caused by exercise, among others. The target variable marks the presence or absence of heart disease as labeled in the data using five categories, later binarized for classification purposes (1 = disease, 0 = no disease). To develop a strong predictive model for the identification of people vulnerable to heart disease, three established supervised classification algorithms have been adopted: Logistic Regression, Random Forest Classifier, and XG-Boost Classifier (Extreme Gradient Boosting). To determine the accuracy and reliability of the designed machine learning models for heart disease risk prediction, a battery of evaluation metrics was utilized that presented distinct insights into model performance. The XG-Boost model had a substantial training accuracy, followed very closely by a high test accuracy, which indicated good generalization to the unseen test data. The deployment of machine learning-based heart disease risk prediction models in preventive care represents a major push in the U.S. public healthcare sector. These models can easily be implemented within electronic health record systems utilized in clinics, hospitals, and primary care to automatically indicate high-risk individuals using real-time clinician data. Machine learning-driven heart disease prediction models also have transformative value in remote monitoring of health and telemedicine, which have emerged as big trends in the U.S., particularly in the aftermath of the COVID-19 pandemic. One of the key strengths of machine learning models is that they can provide customizable risk scores that are attuned to the multifaceted demographic profile of the United States. As machine and AI technologies continue to mature, there is increasing potential to expand their use to predict not only heart disease but also associated comorbid conditions such as stroke, metabolic syndrome, chronic kidney disease, and type 2 diabetes.