Research Article

Machine learning based clinical decision support for heart disease prediction using structured patient data

Authors

  • Saeed Ur Rashid Master of Business Administration in Data Analytics, Westcliff University, Irvine, California, USA
  • Md Ismail Hossain Siddiqui Master of Science in Engineering/Industrial Management, Westcliff University, Irvine, California, USA
  • Farhad Uddin Mahmud Master of Business Administration in Management Information Systems, International American University Los Angeles, California, USA
  • Md. Soebur Rahman Master of Business Administration in Management Information Systems, International American University Los Angeles, California, USA
  • Abdul Aziz Kabir Master of Business Administration in Data Analytics, Westcliff University, Irvine, California, USA
  • Ramisa Samin Shammah College of Technology and Engineering, Westcliff University, Irvine,USA

Abstract

Heart disease remains a leading cause of mortality worldwide, necessitating reliable and efficient predictive models for early diagnosis and clinical decision support. This study presents a comprehensive machine learning framework for heart disease prediction using a structured clinical dataset comprising 920 patient records with diverse demographic, physiological, and diagnostic attributes. The dataset includes both numerical and categorical features, requiring careful preprocessing and encoding to ensure compatibility across different model architectures.A range of classification algorithms is systematically evaluated, including Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, Extreme Gradient Boosting, and Light Gradient Boosting Machine. Model performance is assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, and receiver operating characteristic area under the curve, along with five-fold cross-validation to examine stability and generalization behavior.The experimental results demonstrate consistently high predictive performance across most models, with several approaches achieving near-perfect classification metrics and minimal variation across cross-validation folds. In contrast, K-Nearest Neighbors exhibits slightly lower performance, highlighting differences in sensitivity to local data structure. Analysis of feature distributions and pairwise relationships indicates strong separability between classes, particularly driven by clinically relevant variables such as chest pain type, exercise-induced angina, ST depression, and maximum heart rate.Further evaluation using confusion matrices, receiver operating characteristic curves, and precision–recall curves confirms the robustness of the predictive models and their ability to distinguish between diseased and non-diseased cases with high reliability. Despite the strong performance, the study acknowledges potential dataset-specific characteristics that may influence model behavior and emphasizes the importance of external validation for clinical deployment.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

6 (1)

Pages

340-350

Published

2024-02-25

Downloads

Views

32

Downloads

8

Keywords:

heart disease prediction, clinical decision support, machine learning, classification models, structured clinical data, cross-validation, ROC AUC, precision recall analysis, medical diagnosis, predictive analytics