Product Demand Forecasting with Neural Networks and Macroeconomic Indicators: A Comparative Study among Product Categories

In the fiercely competitive global corporate arena, the intricacies of demand forecasting in the retail sector have become a focal point. While previous research has delved into various methodologies, it consistently overlooks the distinct performances of forecasting models within different retail product categories. Understanding these variations in prediction performances is pivotal, enabling firms to fine-tune forecasting models for each category. This study bridges this gap by scrutinizing the prediction performances of models tailored to different product categories. Building on recent research, we incorporate external macroeconomic indicators like the Consumer Price Index, Consumer Sentiment Index


Introduction
In a highly competitive market, accurate demand forecasting holds paramount importance for firms, directly influencing their financial performance.It enables firms to align their operations with market demand, granting them a competitive edge in the highly globalized market landscape.An increasing number of firms are turning to advanced prediction models to enhance their supply chain management.This trend has spurred a surge in studies dedicated to forecasting future demand.The literature has explored a wide array of statistical techniques as well as advanced Artificial Intelligence (AI) and Machine Learning (ML) approaches, including sophisticated Neural Networks-based algorithms.The literature predominantly relies on time series data of customer demand for specific retail products, coupled with relevant variables such as product price, store attributes, promotions, and information about special events or occasions, in developing forecasting models.
With the current development of forecasting capabilities, firms typically develop a general predictive model that is common for all types of product categories.However, it is important to recognize that different product categories have different patterns of product demand.Therefore, developing a single forecasting model for all of the product categories together is not optimal for capturing all the underlying patterns across product categories.This necessitates building separate forecasting models for individual product categories.
In this study, we address this crucial concern by developing a separate forecasting model for each product category.We also enrich time series data of customer demand through the inclusion of macroeconomic variables, namely CPI, ICS, and the unemployment rate, as reflective indicators of the economic environment.This enriched dataset is employed to develop a multi-layer Deep Neural Networks model for forecasting future demand for each product category.Specifically, we implement LSTM models, which are capable of capturing latent non-linear trends from observed data through hidden layers and retaining a memory of past information.We further extend the analysis by examining the contribution of features using dominance analysis and quantifying the strength of features in explaining retail product demand.Our proposed approach yields reasonably low prediction errors for all models developed.

Literature Review
Due to the escalating need for companies to enhance supply chain decisions, a substantial body of literature has emerged, focusing on refining prediction accuracy through various methodologies.These encompass statistical techniques like Auto-Regressive (AR), Auto-Regressive Integrated Moving Averages (ARIMA), Seasonal ARIMA (SARIMA), and Vector Auto-Regressive (VAR) models alongside advanced AI algorithms utilizing Artificial Neural Networks.Recent studies have leaned towards hybrid prediction models, amalgamating diverse approaches to generate forecasts.Each approach has its own set of strengths and limitations, tailored for specific applications.
Initially, early studies in demand forecasting predominantly relied on statistical techniques such as MA, ARIMA, SARIMA, and VAR, which necessitate stationarity and assume linear relationships between variables.For instance, Fattah et al. (2018) applied various ARIMA models, utilizing the Box-Jenkins time series procedure to forecast demand in a food company.Ediger and Akar (2007) employed ARIMA and SARIMA methods to estimate Turkey's future primary energy demand from 2005 to 2020.However, these statistical approaches, designed for linear relationships, may fall short in real-world scenarios characterized by non-linear associations in time series data.
To address this, computational intelligence methods like Artificial Neural Networks, as well as tree-based techniques such as Random Forests and Support Vector Machines (SVM), emerged to overcome the limitations of linear assumptions.Empirical findings consistently highlight the superior performance of models developed using Neural Networks algorithms.Despite the strides made in developing models with improved forecasting accuracy, the literature has mostly ignored the importance of examining model performances individually across product categories.Furthermore, existing studies in the literature have failed to highlight feature dominance.In this study, we contribute to the literature by providing a comprehensive evaluation of the comparative performances of forecasting models for different categories, along with a detailed analysis of the comparative strengths of the features included in the models.Following Haque (2023) and Haque et al. (2023), we enrich historical retail demand data with macroeconomic information.This augmented dataset is leveraged to construct an LSTM model for projecting future demand for each category.

Data
Data employed in this study is generously provided by Walmart.The original historical sales dataset is expansive, encapsulating information on the sale of 3049 items over a duration of 1,913 days.It is structured with a column for each unique day and a corresponding row for each unique product.This dataset also encompasses product IDs, product categories, and store locations.The calendar dataset furnishes insights into special events, holidays, promotions, event types, and related information for each day across the respective states.Furthermore, the product pricing dataset contains records of the price of each individual product each day at every store.

Data Preprocessing
We restrict our analysis to 300 product items due to compute power limitations.Data employed in this study comprises the sales history of 300 distinct items of three different product categories across ten different stores situated in three states (CA, TX, and WI): 100 food items, 100 hobby items and 100 household items.Notably, each store within these three states stocks an identical set of 300 products.In addition to the records of products sold, the dataset encompasses essential attributes, including product ID, pricing details, product category, department, store specifications, promotional activities, day of the week, and any pertinent events coinciding with the sale of a product.This comprehensive information is distributed across three distinct datasets: historical sales data, calendar data, and product pricing data.We finally split this dataset into three different datasets, where each dataset represents a group of 100 items within a product category.Subsequently, we train a model for each product category comprising 100 items.
Incorporated into this study are key macroeconomic variables, namely the Consumer Price Index (CPI), Consumer Sentiment Index (ICS), and the unemployment rate.The historical CPI and unemployment data were sourced from the World Bank's World Development Indicators (WDI) database, while the historical ICS data was obtained from the University of Michigan's website.
To ensure comprehensive data integration, we merged the calendar dataset with the macroeconomic data based on the "date" column, culminating in a unified dataset, which now features a dedicated row for each day.This combined dataset was subsequently merged with the product pricing dataset and the sales dataset.This final data set encapsulates critical information regarding product sales, pricing, and promotional activities for each of the 300 products across each store in the three states.To maintain relevance and avoid incorporating outdated trends, we restricted the dataset to encompass the most recent 600 days.Additionally, we introduced rolling averages and rolling standard deviation calculations for several lag values of sale prices as integral components of the input features.

Neural Networks
In this investigation, we employ the LSTM Neural Networks algorithm to train a predictive model for each product category.Neural Networks also referred to as Artificial Neural Networks, seek to emulate the workings of the human brain.They consist of interconnected nodes arranged in layers, including input, hidden, and output layers.Each node in the input and hidden layers is linked to every node in the subsequent layer with a specific weight.This interconnected network of nodes has the capacity to discern intricate, non-linear, and concealed relationships within the data.Neural Networks learn from training data in a manner akin to how the human brain learns initially.
Conceptually, each node can be likened to a linear regression with associated weights.These initial weights are set at the outset of the training process.Following each cycle of forward and backward propagation, these weights are fine-tuned to minimize the cost function.The rate at which this adjustment occurs is dictated by a parameter known as the learning rate.Various types of Neural network algorithms exist, with LSTM being one of them.LSTM possesses a memory component that relies on past observations, rendering it well-suited for applications involving time-series data.In this study, we opt for a multi-layer LSTM neural network algorithm to facilitate the model training process.

Results and Discussion
As discussed in the previous sections, we implemented the LSTM model on three different datasets to train three different models.Furthermore, we compare the outcomes derived from a dataset containing macroeconomic indicators with those from a dataset devoid of such variables.The outcomes from each model are based on identical architecture, featuring several hidden layers designed to forecast demand for 28 future days.To ensure that any prior trends not present in current data do not lead to undesirable outcomes, we utilize time series data from the most recent 600 days for training.In time series forecasting, stakeholders seek not only to predict future values but also to discern the contribution of each feature in explaining the outcome.While Deep Learning models typically outperform conventional statistical methods in prediction accuracy, they are often perceived as black boxes, lacking interpretability.We continue our examination by identifying the relevant features for the model building and quantifying relative strengths by using dominance analysis.Dominance analysis compares pairs of features across all possible subsets of the features in a model to determine the additional contribution that each feature makes to the model.A candidate feature is considered dominant when it makes a larger contribution in terms of R-square value to every possible subset of features than any of the other features.We use the Python Dominance Analysis package to implement it and find the strength of dominance.Results for the most dominant features are presented in Table 2.These results show that product price, 7 day rolling mean of sale, ics, cpi, day of week and unemployment rates are dominant predictors; results also report their strength of importance.A comparative picture of the performances of the models trained on different datasets highlights the importance of training separate models for individual categories of products.We also observe that the average RMSE and MAE are smaller when we train individual models than the RMSE and MAE of the model trained on the data set for all three product categories.Furthermore, the error is lowest for the model trained on food items.These results are intuitive as food items are less elastic and are less affected by external factors, and more predictable, whereas luxury items are more sensitive to market conditions and are less predictive.

Conclusion
In our current research, we advocate for developing separate tailored forecasting models for different product categories.In our study, we identify the most dominant features and their relative strengths and develop individual models for different product categories.In general, performance is superior for individual models when compared with a model trained using all product categories.The analysis also reveals that necessity items are more predictive and have lower prediction errors, whereas luxury items have higher prediction errors.Our study makes a noteworthy contribution to the existing body of literature by empirically demonstrating ways to improve model performances.The insights gleaned from this study present valuable opportunities for firms to refine their supply chain management practices and enhance overall financial performance.These results are based on an openly available data set available in Kaggle and examine three different product categories.Future studies can look into a larger set of product categories to make sure that the results are generalizable.
For example, Wang et al. (2021) compared ARIMA and LSTM for predicting future order demand, with LSTM demonstrating enhanced performance in short-term forecasting.Prybutok and Mitchel (1999) compared the performance of Neural Networks with Box-Jenkins ARIMA and regression models for forecasting daily maximum ozone levels, favoring the Neural Networks model.Similarly, Mitrea and Wu (2009) compared various statistical methods with Neural Networks models, corroborating the superiority of the latter.Neural Network algorithms possess distinct advantages; being non-parametric and data-driven, they can capture non-linear relationships without the need for prior specification.This makes them well-suited for real-world data characterized by complex non-linear associations.Given this adaptability and superior performance, recent literature predominantly gravitates towards Neural Networks-based models for forecasting applications.Studies by Tanizaki et al. (2018), Yue-Fang Gao et al. (2009), Palkar et al. (2020), and Chen et al. (2021) exemplify the widespread adoption of Machine Learning models, particularly Neural Networksbased approaches, in diverse forecasting contexts.Haque (2023) utilizes Neural Networks and takes into account macroeconomic variables to forecast retail demands.

Table 1 :
Comparative Model PerformancesComprehensive results for model performances for three types of items are outlined in Table1.Columns 1 and 2 represent performances obtained from the dataset that is devoid of macroeconomic indicators, whereas columns 3 and 4 are from a dataset that includes such variables.For the dataset that excludes such variables, the LSTM model trained on Hobby items has an RMSE of 1.61 and MAE of 2.03.The RMSE and MAE of the LSTM model trained on household items are 1.57and 1.66, respectively.Similarly, the model trained on food items data has an RMSE of 1.25 and MAE of 1.09.Finally, we have also trained a legacy model that has been trained on items comprising all three categories together.The RMSE and MAE of the model trained on data from all three categories are 2.59 and 2.61, respectively.Moreover, concerning the dataset incorporating external variables, results align closely with those of the dataset omitting macroeconomic factors, albeit with slight indications suggesting potential enhancement in model performance.

Table 2 :
Dominance Analysis Results