A Fusion of Machine Learning and Traditional Statistical Forecasting Models for Analyzing American Healthcare Expenditure

Anik Biswas; Abdullah Al Mahmud Ashik; Safiul Islam

doi:10.32996/jmhs.2024.5.1.12

Research Article

A Fusion of Machine Learning and Traditional Statistical Forecasting Models for Analyzing American Healthcare Expenditure

Authors

Anik Biswas College of Graduate and Professional Studies, Trine University, Angola, Indiana, USA
Abdullah Al Mahmud Ashik College of Graduate and Professional Studies, Trine University, Angola, Indiana, USA
Safiul Islam College of Graduate and Professional Studies, Trine University, Angola, Indiana, USA

Abstract

US healthcare expenditure rose from 5.0% of GDP in 1960 to 18.3% in 2021 far exceeding comparable high-income nations despite persistently inferior health outcomes, most notably lower life expectancy. Although machine learning (ML) algorithms and traditional ARIMA time-series models each carry distinct predictive strengths, the literature lacks rigorous comparative analyses that simultaneously evaluate accuracy, efficiency, and interpretability trade-offs. This study addresses that gap by benchmarking five modeling approaches: (1) Random Forest, (2) Gradient Boosting Machine, (3) Support Vector Regression, (4) ARIMA (0,1,1), and (5) a novel Hybrid Fusion model that integrates Bayesian feature selection with an ML ensemble and ARIMA residual correction, drawing on 62 years (1960–2021) of annual US healthcare expenditure data. National Health Expenditure Accounts (NHEA) data from the Centers for Medicare & Medicaid Services (1960–2021; N=62 annual observations) were used. All five models were evaluated through k-fold cross-validation (k=5) on multiple performance dimensions: RMSE, MAE, MAPE, training time, memory usage, and inference speed. Random Forest achieved lowest RMSE (0.297%) among individual ML algorithms, forecasting 18.81% healthcare spending for 2050 compared to ARIMA(0,1,1) at 17.92% (RMSE 0.456%). Support Vector Regression demonstrated severe overfitting (RMSE 12.75%), confirming its unsuitability for small datasets (N=62). Novel hybrid fusion approach achieved 0.261% RMSE, 12% improvement over Random Forest alone. Linear regression analysis confirmed statistically significant annual healthcare spending growth of 0.239% GDP (t=107.79, p<2.16E-70) across full 62-year period. Ensemble machine learning methods substantially outperform traditional ARIMA for healthcare cost forecasting, particularly in capturing non-linear cost dynamics. Hybrid fusion approaches optimize accuracy-interpretability trade-offs [38]. Practical decision frameworks guide healthcare organizations in algorithm selection based on organizational priorities: accuracy-critical contexts favor Random Forest/Hybrid Fusion; real-time systems benefit from ARIMA; balanced requirements suit standard RF implementation.

Article information

Journal

Journal of Medical and Health Studies

Volume (Issue)

5 (1)

DOI

https://doi.org/10.32996/jmhs.2024.5.1.12

Pages

95-105

Published

2024-03-30

Copyright

Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Journal of Medical and Health Studies

A Fusion of Machine Learning and Traditional Statistical Forecasting Models for Analyzing American Healthcare Expenditure

Authors

Abstract

Article information

Journal

Journal of Medical and Health Studies

Volume (Issue)

5 (1)

DOI

https://doi.org/10.32996/jmhs.2024.5.1.12

Pages

95-105

Published

Copyright

Open access

Downloads

76

47

Keywords:

rightbar

submission

menus