Article contents
A Fusion of Machine Learning and Traditional Statistical Forecasting Models for Analyzing American Healthcare Expenditure
Abstract
US healthcare expenditure rose from 5.0% of GDP in 1960 to 18.3% in 2021 far exceeding comparable high-income nations despite persistently inferior health outcomes, most notably lower life expectancy. Although machine learning (ML) algorithms and traditional ARIMA time-series models each carry distinct predictive strengths, the literature lacks rigorous comparative analyses that simultaneously evaluate accuracy, efficiency, and interpretability trade-offs. This study addresses that gap by benchmarking five modeling approaches: (1) Random Forest, (2) Gradient Boosting Machine, (3) Support Vector Regression, (4) ARIMA (0,1,1), and (5) a novel Hybrid Fusion model that integrates Bayesian feature selection with an ML ensemble and ARIMA residual correction, drawing on 62 years (1960–2021) of annual US healthcare expenditure data. National Health Expenditure Accounts (NHEA) data from the Centers for Medicare & Medicaid Services (1960–2021; N=62 annual observations) were used. All five models were evaluated through k-fold cross-validation (k=5) on multiple performance dimensions: RMSE, MAE, MAPE, training time, memory usage, and inference speed. Random Forest achieved lowest RMSE (0.297%) among individual ML algorithms, forecasting 18.81% healthcare spending for 2050 compared to ARIMA(0,1,1) at 17.92% (RMSE 0.456%). Support Vector Regression demonstrated severe overfitting (RMSE 12.75%), confirming its unsuitability for small datasets (N=62). Novel hybrid fusion approach achieved 0.261% RMSE, 12% improvement over Random Forest alone. Linear regression analysis confirmed statistically significant annual healthcare spending growth of 0.239% GDP (t=107.79, p<2.16E-70) across full 62-year period. Ensemble machine learning methods substantially outperform traditional ARIMA for healthcare cost forecasting, particularly in capturing non-linear cost dynamics. Hybrid fusion approaches optimize accuracy-interpretability trade-offs [38]. Practical decision frameworks guide healthcare organizations in algorithm selection based on organizational priorities: accuracy-critical contexts favor Random Forest/Hybrid Fusion; real-time systems benefit from ARIMA; balanced requirements suit standard RF implementation.
Article information
Journal
Journal of Medical and Health Studies
Volume (Issue)
5 (1)
Pages
95-105
Published
Copyright
Copyright (c) 2024 https://creativecommons.org/licenses/by/4.0/
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment