Comparative Analysis of the Predictive Power of Machine Learning Models for Forecasting the Credit Ratings of Machine-Building Companies

The purpose of this study is to compare the predictive power of different machine learning models to reproduce Moody’s credit ratings assigned to machine-building companies. The study closes several gaps found in the literature related to the choice of explanatory variables and the formation of a data sample for modeling. The task to be solved is highly relevant. There is a growing need for high-precision and low-cost models for reproducing the credit ratings of machine-building companies (internal credit ratings). This is due to the ongoing growth of credit risks of companies in the industry, as well as the limited number of assigned public ratings to these companies from international rating agencies due to the high cost of the rating process. The study compares the predictive power of three machine learning models: ordered logistic regression, random forest, and gradient boosting. The sample of companies includes 109 machine-building enterprises from 18 countries between 2005 and 2016. The financial indicators of companies that correspond to Moody’s industry methodology and the macroeconomic indicators of the companies’ home countries are used as explanatory variables. The results show that artificial intelligence models have the greatest predictive ability among the models studied. The random forest model demonstrated a prediction accuracy of 50%, the gradient boosting model - 47%. Their predictive power is almost twice as high as the accuracy of ordered logistic regression (25%). In addition, the article tested two different ways of forming a sample: the random method and one that accounts for the time factor. The result showed that the use of random sampling increases the predictive power of the models. The incorporation of macroeconomic variables into the models does not improve their predictive power. The explanation is that rating agencies follow a “through the cycle” rating approach to ensure rating stability. The results of the study may be useful for researchers who are engaged in assessing the accuracy of empirical methods for modeling credit ratings, as well as banking industry practitioners who use such models directly to assess the creditworthiness of machine-building companies.


Introduction
In the past few years the fourth industrial revolution has fundamentally changed the business environment and business models of machine-building companies (MBC). It provides new opportunities for profit and increases company value in this industry, but exposes them to elevated risks. The dangers are as follows: 1) uncertainty in regard to key suppliers and delivery prices; 2) reduction of the product life cycle; 3) discontinuity of operations caused by technology breakdowns, information failures and outer interference; 4) shortage of qualified staff at all levels; 5) increased competition created by manufacturers from emerging markets, as well as by companies from other industries; 6) other internal and external risks [1]. Growing uncertainty, volatility and variability of the external and internal environment increase the probability of default of MBC. This makes relevant the task of constructing high-precision models of MBC credit risk assessment. Investors need these models to evaluate MBC creditworthiness within the planning horizon and the landscape of making decisions on provision of financing.
In order to assess MBC creditworthiness, investors use credit ratings (CR) assigned by expert international rating agencies, such as Moody's Investor Service, Fitch Ratings or Standard and Poor's [2]. They provide an opportunity to thoroughly examine MBC's financial and business profiles, evaluate their advantages and disadvantages and predict the likelihood of MBC's timely settlement of their financial obligations. CR also helps to compare the credit quality of companies from various countries and markets [3]. The credit rating is a kind of MBC's "seal of excellence". It enables MBC to appeal to more investors as well as to increase the amounts and periods of financing, reduce the cost of capital and gradually increase the probability of cooperating with investors when their credit profile is improved [4].
The high cost of assigning and maintaining a CR, as well as the demanding requirements of international rating agencies for the minimal company size and quality of corporate governance are among the drawbacks of a CR [3]. Therefore, the scope of a CR use is limited to large multi-industry manufacturers, mainly from developed markets. Thus, credit ratings do not cover small and medium-size MBC or firms from emerging markets because they lack the financial and organizational resources to maintain a CR. Another disadvantage of a CR is big update intervals, typically, one year long [4].
In order to eliminate these blind spots, investors evaluate internal credit ratings (ICR) of companies, including MBC. The approach, which has proved to be efficient, implies a reproduction of the missing credit ratings using empirical models based on public financial and non-financial company data [3]. The obtained ICR are unbiassed and uncostly assessments of companies' creditworthiness. However, the predictive power of ICR (i.e. the ability to reproduce CR accurately) varies greatly depending on the models at the basis of the ICR [5]. In its turn, the literature review demonstrated that the majority of studies in this sphere use companies from numerous industries (as a rule, from developed countries) as a sample, thus leaving out the specific nature of MBC's operations and special features of their work in developed markets. Some other drawbacks were also revealed: a small observation period in samples and inconsistency of explanatory variables in the models with the factors used by international rating agencies.
Our research fills the abovementioned gaps in literature. Its purpose is to 1) compare the predictive power of different machine learning models in order to reproduce Moody's credit ratings focused on MBC; and 2) to define the optimum model in terms of data availability, forecast accuracy and result interpretability. For modelling we selected the creditworthiness factors which explicitly examine the special aspects of MBC operations and correspond to Moody's credit rating methodologies. The MBC sample comprises companies from both developed and emerging markets. We have also verified whether the addition of macroeconomic factors enhances the accuracy of CR prediction, as demonstrated in literature [6]. We use the 2005-2016 period in this paper. Research results may be useful to theorists who evaluate the accuracy of empirical CR modelling methods and practicians who use such models to assess MBC's creditworthiness.

Literature review
There is a range of models aimed to assess and predict credit ratings. They differ in their assumptions. The majority of studies use linear regression, logistic regressions or the discriminant analysis method. These are standard approaches to credit rating modelling. Besides, some studies use neural networks or duration and hazard models to predict rating transitions.

Econometric Methods
Early studies [7] use the univariate parameter method to predict the probability of default. Later Altman [8] used linear discriminant analysis in his paper to predict credit quality. At the close of the XX century logit and probit models were first applied because they have a greater predictive power than the models that use the discriminant and quadratic discriminant analysis. Martin [9] and Ohlson [10] were the first ones to use logit regression to construct a model of bank bankruptcy probability. Empiric studies [11] revealed that ordered logistic regression models yield more results and have a greater predictive power than the least squares and discriminant analysis methods. The ordered logistic regression method is used in many new studies dedicated to business and economics issues [12][13][14]. This method is superior in defining credit ratings because of its ordered structure. Apart from that, it was noted that those methods had the greatest predictive power in comparison to linear regression, linear discriminant analysis, quadratic discriminant analysis and discriminant analysis of the mixture of distributions.
At present a lot of studies are dedicated to the use of the LAS-SO model [16] in order to search for the parameters that are most significant for the prediction of corporate credit rating.

Machine Learning Methods
The issue of assigning a credit rating may be considered a classification objective as well. In the XXI century machine learning methods which were used to forecast the probability of default and corporate credit quality have gained popularity. Machine learning models may be "trained" using the sample of ratings and corresponding data. For example, in neural networks training is defined as a search for weights in order to obtain the most accurate result [17]. However, the majority of such studies are conducted beyond the scope of economic analysis, as part of development and use of alternative methods in informatics. Support Vector Machines (SVM) [18] were proposed as a method characterized by a great predictive power, however, its formation requires numerous financial and non-financial indicators. Apart from Support Vector Machines, classification trees [19][20][21] and neural networks [22][23][24] gained popularity in terms of rating prediction and probability of bankruptcy. Thus, in some studies Support Vector Machines and neural networks method demonstrate the same predictive accuracy of about 80% [25]. Comparison of the predictive power of the neural network model to linear discriminant analysis when forecasting Moody's ratings for different companies [26] showed that the use of a neural network delivers accuracy of 79%, which exceeds the result of discriminant analysis (33%).
Gradient boosting is another alternative method of credit rating forecasting. Paper [27] proves that gradient boosting outperforms the decision tree method from the viewpoint of the credit scoring models' predictive power. Another paper [28] notes that the gradient boosting algorithm demonstrates the greatest predictive power in the random forest, decision trees and neural networks models.
Each of the above methods of credit rating forecasting has its advantages and disadvantages. For instance, econometric methods are easy to use and interpret. However, these methods have low predictive power and amounts to 40-50% on average [11]. Apart from that, it is necessary to select data before using it in econometric methods. Machine learning models have a great predictive power, however, the majority of them are uninterpretable and may be subject to data overfitting [29].

Explanatory Variables
Literature defines three groups of factors that explain CR. The first category comprises financial ratios and financial data [11]. The second category consists of corporate management and risk management factors [14; 30]. The third category includes macroeconomic factors. Studies [5; 13] reveal that in case of CR prediction for financial organizations, the introduction of macroeconomic variables in the models significantly improves the quality of model fitting and enhances its predictive power. However, when CRs were modelled for non-financial companies, some of the macroeconomic indicators (i.e., GDP growth) turned out to be insignificant or their signs failed to meet expectations [6]. A major issue in the selection of variables for analysis is multicollinearity between dependent variables [13], therefore, the choice of the model specification and variable selection assume a great significance.
Absence of focus on a certain industry (in our case it's machine building) is a gap in CR modelling because in the majority of studies CR modelling is performed using a sample of companies from various industries (in most cases the industries are identified by introducing dummy variables into the models). This makes it impossible to clearly define the explanatory variables characteristic of a certain industry. Also, companies from certain countries (Taiwan, USA, Korea, China) are examined, preventing one from generalizing the results of modelling of a wide range of such companies. Besides, studies are limited by the following: 1) a short time interval applied in the samples; 2) use of explanatory variables other than the ones utilized by rating agencies. The purpose of this paper is to fill the above gaps in studies.

Research Methodology
We have built an MBC credit quality assessment model that emulates Moody's rating. For this purpose, we applied the following methods: ordered logistic regression (OLR), random forest (RF) and gradient boosting (GB).
For an MBC, the model predicting CR may be expressed as follows: where Y t is a dependent variable, MBC's credit rating assigned by Moody's at the time t. The agency assigned a rating expressed as a literal notation in accordance with its own scale [34]. We transferred the rating to a qualitative scale, where whole numbers correspond to literal notations of the rating, they are presented in ascending order: the lower the rating, the bigger the number (Table 1); = is a numerical value of rating from Table 1. Ordered logistic regression. As long as the dependent variable Y t is an ordered one and accepts k values of the rating levels kϵ) [1; 18], we applied ordered logistic regression (OLR) [6]. We introduce the latent variable z related to the rating value and dependent variables as follows: where i is the observation sequential number; τ r are threshold values of the rating level cut-off; e i -errors which are supposed to be estimated, normally distributed and have a zero mathematical expectation. By using this model we expect to obtain an assessment of the coefficient vector ϴ, as well as a set of threshold values of cut-offs for each rating level (τ 1 ,τ k-1 ) by applying the maximum likelihood method for the system of the following equations: is a logistic function [6]; P(у i = r) is the probability of assigning MBC with the set of values x i to the rating grade r.
In equation (3) standard errors are specified in the White-Huber form, thus reducing their heteroscedasticity.
After obtaining ϴ and τ scores, predictive probabilities  j P from equation (3) are calculated. MBC is assigned the rating j, for which the value of  j P is the biggest. We will use McFadden R 2 criterion [6] as a measure of quality of the model approximation to actual data, which is a variation of criterion R 2 widely used in econometrics. Other indicators presented in section 2 will also be quality criteria.
Random forest. Unlike OLR, random forest (RF) is a machine learning algorithm. which results in building of a multitude of decision trees models during training [32]. Output data is obtained on the basis of voting results of individual tree classes for the classification model and as an average response (averaging) -for the regression model [35]. The result of the rating forecasting objective is an average value of multiple regression trees where G is the number of trees; h is the regression tree function obtained at the input T g .
Gradient boosting (GB). This method is also an ensemble learning method, but it applies another ensemble formation strategy. The algorithm trains weak models consistently, in many iterations, taking into consideration the error of the whole ensemble defined at the moment in order to provide a more accurate assessment of the corporate credit rating. A gradient descent is used for optimization [36] ( ) where ϴ -parameters for evaluation; φ(y,f(x)) -the target function. However, the sample is not balanced according to rating categories ( Figure 2). Explanatory variables comprise financial indicators that represent MBCs' performance results, as well as macroeconomic variables in their countries of business. We used Moody's methodology for manufacturing companies [34] to make a list of financial indicators. Financial indicators and ratings data were obtained from Thomson Reuters Eikon, macroeconomic variable data -from the World Bank network. Table 2 contains the list of variables, their descriptive statistics and expected signs of influence on the rating.

Data Preparation
We built a correlation matrix and excluded the most correlated variables (with paired correlation coefficients exceeding 0.8) in order to solve the multicollinearity problem in the OLR model. For other variables we evaluated the variance inflation factors (VIF) [37] and eliminated all variables with the VIF exceeding 5 from the sample. In order to evaluate the predictive power of explanatory variables, we also applied principal component analysis (PCA) [38]. When modelling ratings using machine learning methods, we applied the entire set of independent variables with no regard for the abovementioned selection. Machine learning methods are not susceptible to multicollinearity problem, while a large set of variables in ML allows to find the optimum combination of factors. In order to build models, in this paper we used the data not included in the set intended for verification of model quality (out of sample) at the ratio of 70% (training set) and 30% (test set).

Research Hypotheses
Hypothesis H1. Use of the gradient boosting model will provide an opportunity to get the greatest predictive power of the rating model. In other words, this model will demonstrate the greatest probability of concordance of the predicted and observed rating ( ( ) 0 . P ∆ = Consequently, the random forest model will be the second in predictive accuracy after gradient boosting. OLR will have the lowest predictive power among the three considered models. This corresponds with the evidence presented in paper [27]. A nother reason against the high predictive power of the OLR model is that coefficients are assessed using the maximum likelihood function, and as long as the sample is unbalanced its results may be biased towards the most frequent rating values.
Hypothesis H2. Random data separation into the training and test samples will provide a greater predictive power for the model than data separation, which takes into consideration the time factor where the training set (70% of the sample) comprises data on the earliest observations and the test sample (30% of the sample) consists of the data on new observations. As long as the sample is unbalanced, we presume that a random separation into the training and test samples may provide a more accurate rating prediction.
Hypothesis H3. Addition of macroeconomic variables to the model will improve its predictive power. This is consistent with the data from [5; 31] which demonstrated that macroeconomic variables were statistically significant and their addition to the model enhanced its predictive power. In order to validate this hypothesis, we evaluated specifications of models with macroeconomic explanatory variables and without them.
Hypothesis H4. The gradient boosting model has the lowest probability of deviation of the predicted rating from the observed one by more than one step ( ( )

. P ∆ ≥
Among the considered models OLR will demonstrate the highest probability of deviation by more than one step. This corresponds to the evidence presented in the paper [27].
The smaller the dispersion of deviations of the predicted rating from the observed one, the ampler the possibilities of using the ICR model in order to assess the level of interest rates an MBC can expect to receive. It is related to the fact that interest rates may change significantly along with the rating change of more than one step [6]. Table 3 presents the results of forecasting MBC credit ratings by applying the abovementioned models. For the purpose of comparability, we submit the results of credit rating prediction using the "naive model", i.e. a randomly obtained value of an MBC credit rating using a random number generator. In order to evaluate the predictive power, we applied multiclass classification models assessment metrics [39]. The predictive power metric (Accuracy) evaluates the correlation between the correct forecasts of the rating and the general number of assessed ratings. The modified accuracy evaluates the correlation between the number of forecasts with the maximum error of one rating and the general number of observations. The completeness metric (Recall) evaluates the model's capability to select the correct rating, and the Precision metric measures the positive results defined accurately from the total number of predicted results in the positive grade and assesses the model capability to distinguish a correct rating from other ratings. The F1 Score metric evaluates the harmonic mean value of predictive accuracy. The Kappa Accuracy metric indicates the ratio of the difference between the probability of the correct model classification and the probability of a random correct classification to the probability of a random wrong classification. Finally, the Akaike information criterion (AIC) indicates a relative order of the compared models: the smaller the indicator, the better the model from the point of view of its predictive power.  (Figures 3 and 4).

Results and Discussion
H3 was not confirmed. Addition of macroeconomic variables did not enhance the predictive power of the models.
On the contrary, it made the results worse. This conclusion was confirmed by analysis of diagrams of variable information significance in the GB and RF models ( Figures 5  and 6). This may be due to the fact that international rating agencies trying to provide consistency of rating scores used the "skip-cycle" approach and evaluated the constant component of MBC's credit risk. However, as long as our conclusion disagrees with conclusions of other research papers [5; 31], it is necessary to study the obtained result further.
H4 was confirmed partially. In the GB model. modified accuracy is the highest indicator in all model specifications except for the model that does not account for the time factor or macrovariables. In its turn, in the OLR model the modified accuracy indicator is the lowest one in all model specifications. Analysis of obtained differences in modified accuracy for the GB and RF models when applying various sample creation methods requires further research. Nevertheless, in our opinion, the gradient boosting model is more promising for building the ICR model in order to evaluate the level of interest rates an MBC may count on.  Source: [36].

Conclusion
In this paper we compared the predictive power of empirical models of logistic regression and machine learning models for modelling the internal credit ratings of machine-building companies. Random forest and gradient boosting were used as machine learning models. The objective is of relevance because, on the one hand, MBCs' credit risks are still increasing and, on the other hand, just a few MBCs have a public credit rating. The paper filled the gaps in literature in the following ways: 1) use of explanatory indicators that take into consideration the specific character of the machine-building industry to the greatest extent; 2) use of the sample for a significant period of time that covers the whole credit cycle; 3) adding companies from the developed and emerging economies to the sample. The results showed that the predictive power of machine learning models is almost twice as high as the predictive power of ordered logistic regression and the share of predicted ratings, which deviate from the actual ones by more than one step is low. Therefore, use of machine learning models may have a wide practical application for building internal credit ratings of machine-building companies. Apart from that, we've discovered that a random division into the training and test samples enhanced the models' predictive power when compared to a division according to the time factor.
However, we failed to prove that addition of macroeconomic indicators to the model as explanatory variables enhances its predictive power. Therefore, in future studies it is necessary to perform additional testing of the effect of adding macroeconomic factors. Another line of research is the evaluation of the influence produced by the addition of non-financial indicators to model specification on its predictive power. The non-financial factors comprise the factors which define MBCs' competitive advantages in the target markets, operational performance indicators, knowledge capital efficiency indicators and MBC corporate governance efficiency indicators. Finally, a separate line of research may be represented by comparison of various sets of explanatory variables in order to improve the predictive power of CR assessment models from different industries, such as: oil and gas industry, metalworking and mineral industry, chemical industry, automobile construction etc.