Stacking-Based Ensemble Learning for Prostate Cancer Prediction Using Tabular Clinical Data

Ahmed Ali Linkon; Mostafizur Rahman Shakil; Shahriar Ahmed; Md Rashel Miah; Asif Hassan Malik

doi:10.32996/jmhs.2026.7.4.4

Research Article

Stacking-Based Ensemble Learning for Prostate Cancer Prediction Using Tabular Clinical Data

Authors

Ahmed Ali Linkon Department of Computer Science, Westcliff University, Irvine, CA 92614, USA
Mostafizur Rahman Shakil Department of Engineering Management, Westcliff University, Irvine, CA 92614, USA
Shahriar Ahmed School of Business, International American University, 3440 Wilshire Blvd STE 1000, Los Angeles, CA 90010, USA
Md Rashel Miah Department of Business Administration, Westcliff University, Irvine, CA 92614, USA
Asif Hassan Malik Department of Chemistry, York College, The City University of New York (CUNY), Jamaica, NY 11451, USA

Abstract

This study introduces ProstaEnsembleNet, a tabular learning framework designed to integrate diverse predictors for preliminary risk stratification based on epidemiological data and routinely collected clinical features. We utilized a public Kaggle prostate cancer prediction dataset comprising 29 predictors to benchmark various classical machine learning models, including Gradient Boosting, XGBoost, LightGBM, Random Forest, Support Vector Machine (SVM), Gaussian Naïve Bayes, and KNN, as well as deep tabular models such as TabNet and multilayer perceptron. Our preprocessing steps included categorical encoding and z-score normalization, while we addressed class imbalance using within-fold SMOTE to reduce resampling leakage. We evaluated performance using stratified 10-fold cross-validation, measuring accuracy, recall, F1-score, balanced error rate, and PR-AUC. Among the individual learners, LightGBM demonstrated strong sensitivity with a Recall of 0.9714 (±0.0051) and an F1 score of 0.9062 (±0.0025). The ProstaEnsembleNet’s stacking ensemble, featuring a logistic regression meta-learner, achieved the best overall performance with an Accuracy of 0.8390 (±0.0019), a Recall of 0.9839 (±0.0025), an F1 score of 0.9122 (±0.0011), and a PR-AUC of 0.8592 (±0.0058). This method significantly outperformed voting for F1 and recall in paired fold-wise testing (Holm-adjusted p-value = 0.008). Ablation analyses confirmed that SMOTE substantially enhances minority-sensitive metrics across models and that logistic regression serves as a stable meta-learner with negligible losses compared to more complex alternatives. These findings suggest that stacked ensembles are a robust decision-support approach for tabular prostate cancer risk prediction. However, external validation, calibration analysis, and prospective evaluation are crucial before clinical deployment.

Article information

Journal

Journal of Medical and Health Studies

Volume (Issue)

7 (4)

DOI

https://doi.org/10.32996/jmhs.2026.7.4.4

Pages

43-56

Published

2026-03-01

Copyright

Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Ahmed Ali Linkon, Mostafizur Rahman Shakil, Shahriar Ahmed, Md Rashel Miah, & Asif Hassan Malik. (2026). Stacking-Based Ensemble Learning for Prostate Cancer Prediction Using Tabular Clinical Data. Journal of Medical and Health Studies, 7(4), 43-56. https://doi.org/10.32996/jmhs.2026.7.4.4

Journal of Medical and Health Studies

Stacking-Based Ensemble Learning for Prostate Cancer Prediction Using Tabular Clinical Data

Authors

Abstract

Article information

Journal

Journal of Medical and Health Studies

Volume (Issue)

7 (4)

DOI

https://doi.org/10.32996/jmhs.2026.7.4.4

Pages

43-56

Published

Copyright

Open access

How to Cite

Downloads

18

1

Keywords:

rightbar

submission

menus