Research Article

A Fusion of Machine Learning and Traditional Statistical Forecasting Models for Analyzing American Healthcare Expenditure

Authors

  • Anik Biswas College of Graduate and Professional Studies, Trine University, Angola, Indiana, USA
  • Abdullah Al Mahmud Ashik College of Graduate and Professional Studies, Trine University, Angola, Indiana, USA
  • Safiul Islam College of Graduate and Professional Studies, Trine University, Angola, Indiana, USA

Abstract

US healthcare expenditure rose from 5.0% of GDP in 1960 to 18.3% in 2021 far exceeding comparable high-income nations despite persistently inferior health outcomes, most notably lower life expectancy. Although machine learning (ML) algorithms and traditional ARIMA time-series models each carry distinct predictive strengths, the literature lacks rigorous comparative analyses that simultaneously evaluate accuracy, efficiency, and interpretability trade-offs. This study addresses that gap by benchmarking five modeling approaches: (1) Random Forest, (2) Gradient Boosting Machine, (3) Support Vector Regression, (4) ARIMA (0,1,1), and (5) a novel Hybrid Fusion model that integrates Bayesian feature selection with an ML ensemble and ARIMA residual correction, drawing on 62 years (1960–2021) of annual US healthcare expenditure data. National Health Expenditure Accounts (NHEA) data from the Centers for Medicare & Medicaid Services (1960–2021; N=62 annual observations) were used. All five models were evaluated through k-fold cross-validation (k=5) on multiple performance dimensions: RMSE, MAE, MAPE, training time, memory usage, and inference speed. Random Forest achieved lowest RMSE (0.297%) among individual ML algorithms, forecasting 18.81% healthcare spending for 2050 compared to ARIMA(0,1,1) at 17.92% (RMSE 0.456%). Support Vector Regression demonstrated severe overfitting (RMSE 12.75%), confirming its unsuitability for small datasets (N=62). Novel hybrid fusion approach achieved 0.261% RMSE, 12% improvement over Random Forest alone. Linear regression analysis confirmed statistically significant annual healthcare spending growth of 0.239% GDP (t=107.79, p<2.16E-70) across full 62-year period. Ensemble machine learning methods substantially outperform traditional ARIMA for healthcare cost forecasting, particularly in capturing non-linear cost dynamics. Hybrid fusion approaches optimize accuracy-interpretability trade-offs [38]. Practical decision frameworks guide healthcare organizations in algorithm selection based on organizational priorities: accuracy-critical contexts favor Random Forest/Hybrid Fusion; real-time systems benefit from ARIMA; balanced requirements suit standard RF implementation.

Article information

Journal

Journal of Medical and Health Studies

Volume (Issue)

5 (1)

Pages

95-105

Published

2024-03-30

Downloads

Views

59

Downloads

29

Keywords:

Healthcare cost prediction, machine learning, ARIMA forecasting, Random Forest, Support Vector Regression, hybrid ensemble methods, comparative analysis, healthcare expenditure forecasting.