Research Article

Stacking-Based Ensemble Learning for Prostate Cancer Prediction Using Tabular Clinical Data

Authors

  • Ahmed Ali Linkon Department of Computer Science, Westcliff University, Irvine, CA 92614, USA
  • Mostafizur Rahman Shakil Department of Engineering Management, Westcliff University, Irvine, CA 92614, USA
  • Shahriar Ahmed School of Business, International American University, 3440 Wilshire Blvd STE 1000, Los Angeles, CA 90010, USA
  • Md Rashel Miah Department of Business Administration, Westcliff University, Irvine, CA 92614, USA
  • Asif Hassan Malik Department of Chemistry, York College, The City University of New York (CUNY), Jamaica, NY 11451, USA

Abstract

This study introduces ProstaEnsembleNet, a tabular learning framework designed to integrate diverse predictors for preliminary risk stratification based on epidemiological data and routinely collected clinical features. We utilized a public Kaggle prostate cancer prediction dataset comprising 29 predictors to benchmark various classical machine learning models, including Gradient Boosting, XGBoost, LightGBM, Random Forest, Support Vector Machine (SVM), Gaussian Naïve Bayes, and KNN, as well as deep tabular models such as TabNet and multilayer perceptron. Our preprocessing steps included categorical encoding and z-score normalization, while we addressed class imbalance using within-fold SMOTE to reduce resampling leakage. We evaluated performance using stratified 10-fold cross-validation, measuring accuracy, recall, F1-score, balanced error rate, and PR-AUC. Among the individual learners, LightGBM demonstrated strong sensitivity with a Recall of 0.9714 (±0.0051) and an F1 score of 0.9062 (±0.0025). The ProstaEnsembleNet’s stacking ensemble, featuring a logistic regression meta-learner, achieved the best overall performance with an Accuracy of 0.8390 (±0.0019), a Recall of 0.9839 (±0.0025), an F1 score of 0.9122 (±0.0011), and a PR-AUC of 0.8592 (±0.0058). This method significantly outperformed voting for F1 and recall in paired fold-wise testing (Holm-adjusted p-value = 0.008). Ablation analyses confirmed that SMOTE substantially enhances minority-sensitive metrics across models and that logistic regression serves as a stable meta-learner with negligible losses compared to more complex alternatives. These findings suggest that stacked ensembles are a robust decision-support approach for tabular prostate cancer risk prediction. However, external validation, calibration analysis, and prospective evaluation are crucial before clinical deployment.

Article information

Journal

Journal of Medical and Health Studies

Volume (Issue)

7 (4)

Pages

43-56

Published

2026-03-01

How to Cite

Ahmed Ali Linkon, Mostafizur Rahman Shakil, Shahriar Ahmed, Md Rashel Miah, & Asif Hassan Malik. (2026). Stacking-Based Ensemble Learning for Prostate Cancer Prediction Using Tabular Clinical Data. Journal of Medical and Health Studies, 7(4), 43-56. https://doi.org/10.32996/jmhs.2026.7.4.4

Downloads

Views

18

Downloads

1

Keywords:

Prostate cancer, Imbalance handling, Significance testing, Feature selection, Ensemble learning, Decision support