Predicting Bank Failures with Machine Learning Algorithms: A Comparison of Boosting and Cost-Sensitive Models

Predicting bank failures has been an essential subject in literature due to the significance of the banks for the economic prosperity of a country. Acting as an intermediary player of the economy, banks channel funds between creditors and debtors. In that matter, banks are considered the backbone of the economies; hence, it is important to create early warning systems that identify insolvent banks from solvent ones. Thus, Insolvent banks can apply for assistance and avoid bankruptcy in financially turbulent times. In this paper, we will focus on two different machine learning disciplines: Boosting and Cost-Sensitive methods to predict bank failures. Boosting methods are widely used in the literature due to their better prediction capability. However, Cost-Sensitive Forest is relatively new to the literature and originally invented to solve imbalance problems in software defect detection. Our results show that comparing to the boosting methods, Cost-Sensitive Forest particularly classifies failed banks more accurately. Thus, we suggest using the Cost-Sensitive Forest when predicting bank failures with imbalanced datasets. the same model in bank failure prediction due to similar dataset behaviour in datasets while bank failure datasets mostly include non-failure banks than failed banks and algorithms mostly confused with this matter therefore, they might perform less than they could actually do. Our results show that though total classification performance is somewhat similar, predicting the Recall rate for failed banks with CS-Forest is much higher than GLMBoost and XGBoost. Thus, we suggest researchers consider the CS-Forest model when dealing with bank failure prediction with imbalanced datasets.


Introduction 1
Banks are one of the most important key players in a society where they act as an intermediary and channel funds from creditors to debtors efficiently to sustain economic stability. A well-structured financial system is essential for a modern economy and the performance of the banks has a significant impact on society. Therefore, understanding and analyzing bank failures are essential parts of the literature. The study of bank failure is important for two reasons. First, understanding the factors related to a bank's failure enables regulatory authorities to manage and supervise banks more efficiently. Second, the ability to differentiate between sound banks and troubled ones will reduce the expected cost of bank failure. In other words, if examiners can detect problems early enough, regulatory actions can be taken either to prevent a bank from failing or minimize the costs to the public and thus taxpayers (Thomson, 1991). This paper is organized as follows: The first section gives a general outlook on the banking sector and the importance of banks, introductory literature on how bank failure predictions have evolved, and brief information on our datasets and models in our article.
The second section gives a comprehensive list of previous studies related to bank or company failure prediction and several machine learning algorithms and their performance.
The third section initially gives a general description of the data that we used in the research. Afterwards, we defined the methods and performance metrics used in this study and most of the terms that can be found in the results section in our paper.
In the fourth and last section, all the results were displayed together with the selected indicators, which can be found in Table II and their confusion matrix in Table III where we discussed the model performances and our methodology to demonstrate our results. Cleary and Hebb (2015) analyzed the failures of 132 banks for the period of 2002-2009 by using discriminant analysis. The prediction efficiency of bank failure was 92% by the sample data. Furthermore, they did the same analysis to predict bank failures between 2010-2011 and the efficiency range was between 90-95%. Chiaramonte et al. (2016) investigated US commercial bank data for 2004 to 2012 to analyze how Z Score can forecast bank failure. The outcome of the investigation showed that Z-Score could forecast 76% of bank failures. Important to state that macro-level indicators did not increase the precision of the forecast. On the other hand, the forecast efficiency of the Z-score to predict bank default remains stable within the three-year forward window. Ekinci and Erdal (2016) analysed the bank failure prediction of 37 commercial banks in Turkey between 1997 and 2001. In the dataset 20 banks were healthy and 17 of them were failed. They used Logistic Regression, J48, and voted perceptron as the base learners along with different Hybrid Ensembles. Their empirical findings indicated that hybrid ensemble machine learning models perform better than the traditional base and ensemble models. Le and Viviani (2017) analyzed the failure of banks using both traditional techniques and machine learning for the sample of 3000 US banks, where 1438 of them were failures and 1562 of them were active banks. They used discriminant analysis and Logistic regression for traditional techniques and, as for machine learning, they used artificial neural networks, support vector machines, and k-nearest neighbours. CAMEL ratios were used in the analysis and 31 financial ratios represented them. The empirical findings state that the artificial neural network and k-nearest neighbor methods are the most precise ones to predict bank failures. Gogas et al. (2018) implemented machine learning models to forecast bank failures. Their data consist of a total of 1443 U.S. banks with 481 banks that have failed between 2007-2013. They implemented a two-step feature selection procedure to define the most informative variables. Afterwards, selected variables were put into an SVM model to proceed with a training-testing learning process. The model showed a 99.22% forecasting accuracy and outperformed the well-established Ohlson's score.

2.Literature Review
In their article, Carmona et al. (2019) implemented Extreme Gradient Boosting Method to forecast bank failure of 157 US national commercial banks from 2001 to 2015. They considered 30 financial ratios in their model. Their results state that lower values for retained earnings to average equity, pretax return on assets, and total risk-based capital ratio are linked with bank failure. Additionally, they suggest that retained earnings should be kept within the company in stressful periods and that dividend policies should be reconsidered. Results showed that non-performing loans account for a large share of banks' balance sheets and the levels of risk coverage and capitalization are low, which would mean that the chances of bank failure are high. Petropoulos et al. (2020) used various modeling techniques to forecast bank insolvencies. They used the data from the US financial institutions with a time frame between 2008-2014. CAMELS were analyzed as the main data. Their results show that Random Forests consistently performs better than a series of other models, such as Logistic Regression, Linear Discriminant Analysis, Support Vector Machines, and Neural Networks. Zizi et al. (2020) used logistic regression to determine the reasons for the financial failures of SMEs. They retrieved the data of healthy and failed companies from Moroccan Bank. Their findings state that the Autonomy ratio, interest to sales, asset turnover, days in accounts receivable, and duration of trade payables increase the probability of financial failure, while repayment capacity and return on assets reduce the probability of failure. On the other hand, given variables show an overall classification rate of healthy and failing SMEs of 91.11% three years before failure and 84.44% two years and one year before failure.

Data
The data was collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. In total, there are 1000 banks in which 780 are healthy and 220 are failed banks. Additionally,ratios used for this research can be found in Appendix I.

Methods
In this experiment, we used three different machine learning algorithms to classify bank failures in Taiwan. We used two boosting and one cost-sensitive algorithms. For boosting, we chose XGBoost and GLMBoost and for a cost-sensitive model, we used CS-Forest.
We used WEKA 3.9.5 (Waikato Environment for Knowledge Analysis) for the experiments for the data processing. All the models we used included as a package of WEKA; however, for XGBoost, we used the R extension of the WEKA where analysis could be done via WEKA,but algorithms will work under R Console. Therefore, it requires an R programming language installed in the computer to proceed with the analysis.

Gradient Boosting
Guelman (2012) defines gradient boosting as an iterative algorithm that combines parameterized functions with poor performance to produce more accurate forecasting rules. While other statistical learning methods provide comparable accuracy, such as support vector machines and neural networks, gradient boosting brings more interpretable results. The method is highly robust and can be applied for both classification and regression problems with a variety of response distributions ,for instance, Gaussian, Poisson, and Laplace.

GLMBoost
Adamu et al. (2019) defined Generalized Linear Models as a generalization of linear regression that allows normally distributed response variables and non-normally distributed variables. GLM is a broad class of models for relating responses to linear combinations of predictive variables. Fitting generalized models using the gradient boosting algorithm is called GLMBoost. GLMBoost can be used to fit linear models through component-wise reinforcement, and each column of the design matrix is individually fitted and selected using a simple linear model. Therefore, the algorithm is a gradient booster to optimize certain loss functions using linear models on a per-component basis. Using GLMBoost for any linear model, the results are more accurate and reliable than the corresponding GLM function.

XG Boost
It is an application of gradient boosting machines created by Tianqi Chen in 2014. Designed for efficiency and scalability, its parallel tree reinforcement capabilities make it significantly faster than other tree-based ensemble algorithms. (Quinto,2020) Analytics Vidhya Content Team (2018) defines the features of XGBoost as below:


The model has an option to penalize complex models through both L1 and L2 regularization. Therefore, it prevents overfitting.  XGBoost incorporates a sparsity-aware split finding algorithm to handle different types of sparsity patterns in the data.  XGBoost has a distributed weighted quantile sketch algorithm to handle weighted data effectively.    Hanley and McNeil (1982) defined AUC as a single scalar value that measures the overall performance of a binary classifier. The area under a receiver operating characteristic (ROC) curve is abbreviated as AUC.

Hosmer and Lemeshow (2000) ranked ROC values as:
If ROC = 0.5: this suggests no discrimination (same as flipping a coin).
If ROC ≥ 0.9: This is considered outstanding discrimination.

K-fold cross-validation
In this experiment, we used 10-fold cross-validation to measure model performance while it minimizes bias associated with a random sampling of training (Chou and Pham (2013)). Yumurtaci et al. (2015) define the 10-fold CV as the dataset is randomly split into ten (k) subsets of the same size in which the class is represented in approximately the same proportions as in the full dataset. Next, each subset is held out in turn and the learning scheme is trained on the remaining nine-tenths (k-1); then, its error rate is calculated on the holdout set. Thus, the learning procedure has been executed a total of ten times on different training sets. We analyzed three classification models: GLMBoost, XGBoost, and CS-Forest. Our ranking methodology when considering the success of the models is based on three metrics: Classification Rate, Recall, and AUC results. The classification rate stands for the number of correctly classified instances (Both true positive and true negative) in the model, as shown in Table IV. AUC (ROC curve) is a commonly used metric for classification problems. True positive responses converge AUC closer to 1 while falsepositive results diverge it to 0.5, bringing the same results with tossing a coin. Finally, Recall Rate measures the accurately predicted instances of the Failed Banks. Therefore, it needs to be taken into consideration when making a comparison between models while the main purpose is to increase the Recall rate as high as possible to reduce the false-positive rate. Best classification rate provided by XGBoost, followed by CS-Forest and GLMBoost as the least successful classifier. On the other hand, CS-Forest performed the best AUC, followed by XGBoost and GLMBoost as the least successful performer.

Results
Finally, CS-Forest once again performed the best results of Recall Rate, followed by XGBoost and GLMBoost as the least performer. We can conclude from the above-mentioned results that there is a strong competition between XGBoost and CS-Forest in terms of prediction capability. XGBoost provides the best results in predicting non-Failed banks while CS-Forest provides the best results in predicting failed banks with a strong distinction. It correctly classifies 38 failed banks better than XGBoost and 72 failed banks better than GLMBoost. It is important to state that most of the time purpose of related researches is to misclassify the failed banks the least. Therefore, even though XGBoost provides very strong results in healthy banks, it doesn't particularly focus on reducing Type II error. Thus, we highly suggest using CS-Forest when there are imbalanced datasets for bank failure prediction. Despite the excellent performance of XGBoost, it cannot solve imbalance problems as CS-Forest does.

Conclusion
Prediction of bank failures has been a quite interesting study due to the nature of risky behavior of the financial sector, especially in turbulent times. Developing technologies and algorithms help us to advance in our predictions and increase the accuracy of our experiments. In our experiment, we used the Taiwanese bank failure dataset to assess the up-to-date models' performance in predicting capability. Most successful bank failure models are currently based on grading boosting methods due to their advancement in understanding the dataset and predicting better outputs. However, in our article, we used a new technique to predict bank failure and compare it with the best methods in the literature. Cost-Sensitive Forest was created to predict software defects with datasets with imbalance problems. We implemented the same model in bank failure prediction due to similar dataset behaviour in datasets while bank failure datasets mostly include non-failure banks than failed banks and algorithms mostly confused with this matter therefore, they might perform less than they could actually do. Our results show that even though total classification performance is somewhat similar, predicting the Recall rate for failed banks with CS-Forest is much higher than GLMBoost and XGBoost. Thus, we suggest researchers consider the CS-Forest model when dealing with bank failure prediction with imbalanced datasets.
Appendix I:

ROA (Before interest and depreciation before interest.)
Cash flow rate.
Net Value Per Share.

Persistent EPS in the Last Four Seasons.
Cash Flow Per Share.

Revenue Per Share in Yuan.
Cash Reinvestment in PCT.

Interest Expense Ratio.
Debt ratio in PCT.
Net worth To Assets.
Operating profit To Paid-in capital.
Total Asset Turnover.
Operating Funds to Liability.