The Yield Curve as a Recession Leading Indicator. An Application for Gradient Boosting and Random Forest

Most representative decision tree ensemble methods have been used to examine the variable importance of Treasury term spreads to predict US economic recessions with a balance of generating rules for US economic recession detection. A strategy is proposed for training the classifiers with Treasury term spreads data and the results are compared in order to select the best model for interpretability. We also discuss the use of SHapley Additive exPlanations (SHAP) framework to understand US recession forecasts by analyzing feature importance. Consistently with the existing literature we find the most relevant Treasury term spreads for predicting US economic recession and a methodology for detecting relevant rules for economic recession detection. In this case, the most relevant term spread found is 3 month to 6 month, which is proposed to be monitored by economic authorities. Finally, the methodology detected rules with high lift on predicting economic recession that can be used by these entities for this propose. This latter result stands in contrast to a growing body of literature demonstrating that machine learning methods are useful for interpretation comparing many alternative algorithms and we discuss the interpretation for our result and propose further research lines aligned with this work.

According to this idea, in a competitive financial environment, the term structure should respond to international market forces, considered as key for assessing the impact of monetary policy and more importantly, to express the economy's behaviour. Indeed, if a monetary policy is effective, changes in short-term policy interest rates should impact long-term ones [2]. In this sense, the need to forecast and prevent economic recessions has become of great importance to policymakers, practitioners and researchers. In this respect, the use of economic and financial variables as predictive information containers joint to the application of several econometric methods and machine learning models have focused on detecting a better accuracy in predicting the possible turning points of the business cycle and, more deeply, economic recessions [3]. This literature review has tried to shed some light on the more important and highlighted topic works.
As previously mentioned, the term structure holds implications in macroeconomics or finance and the shape of the yield curve (see [4] for a survey). According to this, an upward sloping yield curve suggests that future short-term rates are expected to rise.
Contrariwise, a descending sloping yield curve may mean that future short-term rates are expected to drop. Like [5] state, the yield curve's slope -the difference between the longer maturity of interest rates and the shorter maturity-gives an important source of information of the real economy evolution. Accordingly, they found that a positive curve slope is associated with future increases in real economic activity when using macroeconomic variables, possessing a significant predictive power or its economic implications in the monetary policy [6] and [7]. To understand the background of the term structure, we briefly treat the Expectations Hypothesis of Term Structure (EHTS). This hypothesis illustrates the relationship between short and longterm interest rates and represents the most influential theory explaining the term structure relations. This hypothesis establishes that long-term interest rates are defined by an average of the contemporary and expected short-term interest rate [8]. Therefore, this relationship between both types of interest rates indicates that their spread holds meaningful information on future changes in shortterm rates and is an important function in the potential effectiveness of monetary policy [9] and [10] or reflecting economic agents' anticipations of future events such as recessions, for instance (see [11] for a survey). According to [12], the inversion of the yield curve is viewed as a consistent predictor of recessions and future economic activity, providing an important reason to explain the flattening or inversion of the yield curve: a monetary lightening. A tightening monetary policy would be considered a rise in short-term interest rates, focusing on reducing inflation. The consequence of the monetary tightening is that the economy may slow down.
Consequently, shorter-term interest rates are considered indicators of demand for credit and future inflation. Therefore, longer-term interest rates would tend to decrease and flatten the yield curve, an example of the relation between the yield curve behaviour and recessions. Definitely, the yield curve's steepness would help us predict and determine a future recession [13].
The literature on this topic has tried to demonstrate the role of the term structure or the yield curve as a good forecasting tool for recessions [14]. The influential papers of [5] and [15] should be noted. These works evidenced that the yield curve might be employed to predict real growth in consumption, investment, or aggregate GNP, and more importantly, they demonstrated the relation with NBERdated recessions. For its part, [16] suggests that among different variables used in his work, the term spread is the significant predictor of recessions at horizons beyond three months. In this respect, many previous papers have treated the topic by relating the GDP growth with the yield curve slope (see [17]- [25], among others or [26] for a deep survey of the topic.). Another important work by [27] argues the convenience of applying models which use the yield curve to predict recessions. In other influential papers in the literature, the term spread is also useful in predicting recession even for professional forecasters, as [28] suggested and [29] combined the term spread with stock returns to measure the accuracy of the term spread the latter to predict recessions. His results were positive, and the term spread was found as a valuable predictor of recessions for German and US economies. In a similar work by [28], [30] compared the strength of the yield curve in forecasting recessions with the data used in [28], evidencing the power of the former and suggesting the suitability of using this indicator. For its part, [31] also treated the capability of predicting recessions of the term structure and highlighted the power of this indicator over other leading indicators. Its strength decreased as a predictor after the financial crisis due to the volatility of macroeconomic variables, but unfortunately, its predictive power over the last decade has fallen. Furthermore, [3] in line with the previous literature, find that the ability of the term structure to predict recessions is stronger over the twelve-month horizon when using a similar probit model than [5] or [13] used. Additionally, [32] further evidenced the potential of the yield curve in forecasting future situations of the US economy over horizons ranging from one quarter to two years. Besides, [33] recognized that the yield curve contains information on future GDP growth and that its predictability varies with time, forecast horizons, and quantiles of the distribution of future growth; nonetheless, a significant empirical contribution of their work is that it seems more efficient to predict future expansionary phases, which are more common than recessions, for which the latter appears to perform better. Finally, although [34] find that developments in the stock market diminish the efficacy of the yield curve in forecasting future economic activity, they show the fitness of this indicator for predicting economic activity in many most important world economies, such as the US, Canada and Europe and, more importantly, when periods of financial stress are analyzed.
From another empirical perspective, it emerges in the literature the use of techniques based on machine learning algorithms. In this sense, [35] claims the suitability of machine learning techniques on central banking or monetary policy issues as applied in other real-life topics. In this sense, [36] demonstrated the yield curve as a robust and consistent predictor of economic activity when US business cycle turning points are checked by using four different methods, i.e., equally-weighted forecasts, Bayesian Model Averaging (BMA), and linear and non-linear machine learning boosting algorithms. An important paper in the literature by [37] compares different Support Vector Machine (SVM hereafter) and logit models when using the yield curve as a leading indicator, being "the first empirical investigation on the relation between the yield curve and an economy's real output, using an SVM classifier". The model created is helpful for policymakers in order to forecast future recessions. In order to reaffirm this latter study, [38] the yield curve is a useful tool for assessing future economic activity, achieving a 100% forecasting accuracy for recessions. For its part, [39] demonstrated that the predictive power of boosted regression trees is considerably better than standard probit models. Their findings show that short rates and the yield curve are crucial leading indicators for recession forecasts during the 1974-2014 period. Finally, [40] employ several machine learning methods such as Least Absolute Shrinkage and Selection Operator (LASSO), and Elastic Net, Discriminant Analysis classifiers, Bayesian classifiers, and classification and regression trees (CART), in line with the existing literature and reveal the ability of the yield curve to act as an early warning system to predict recessions in the United States is reconfirmed. Specifically, the yield curve keeps on a consistent and reliable predictor of recession over the 12-month forecast horizon and [41] also apply a battery of machine learning methods: decision trees, random forests, extremely randomized trees, support vector machines (SVM), and artificial neural networks, finding that almost all the machine learning models appropriately predict the global financial crisis of 2007-2008 and, additionally, they indicate that the flatter or more inverted the yield curve is, the higher the chance of a crisis, exposing the tendency of chasing performance or increased risk-taking that can often be seen before financial crises.
To the best of our knowledge, our approach, i.e., Gradient Boosting and Random Forest Machine Learning methods, allows us to reach a better accuracy than in those previous papers on the topic. These Machine Learning algorithms let us identify the more relevant variables associated with the main variable, which has not been done before in the literature. Additionally, we extend the time horizon, i.e., we update data compared to previous studies. Indeed, our results indicate that our algorithm let us signal and choose the most influential variables for predicting economic recessions amongst the term spreads analysed. This case highlights some of the most important term spreads as 3-month-6-month, 2-year-5-year and 5year-10-year. Furthermore, concerning these variables, the lift metric is computed to detect intervals with a higher probability of accounting for a recession, applied to the rules description methods. Results suggest that the most important term spread is 3-month-6month compared with the term spreads mentioned in the literature. Results give some considerations for monetary authorities, policymakers and practitioners, such as the monitorisation of this term spread above mentioned as a tool for evidencing economic recessions.
The rest of the paper is as follows. Section II presents the data and methodology used in the paper. Later, section III show and discuss the results; the concluding remarks are in section

A. Introduction
A supervised method is proposed to predict economic crisis cycles and can also identify the key factors that lever this phenomenon. Assessing variable importance is an important task; this is reflected in many studies fields; besides, several approaches address this question [42]- [45].
A decision-tree ensemble classification method is proposed for interpretability rather than only predicting economic recessions from the different term spread as independent variables. In this way, the variable importance is computed to measure which variables are the most relevant to predict economic crisis cycles. More interpretation of the model is performed by analyzing the dependencies with the most correlated variables and the feature value dependency regarding the target variable to understand this phenomenon better. Finally, a rule extraction process is proposed that could be useful for interpreting and detecting economic recession.

B. Data Description
For our empirical analysis, we employ a monthly sample of Treasury Constant interest rates at nine different maturities from January 1969 to November 2020 (amounting to 601 observations for each interest rate series). The data corresponds to the constant maturity rates of 3-month, 6-month, 1-year, 2-year, 3-year, 5-year, 7year, 10-year and 20-year.
The data is collected from the Federal Reserve Economic Data (FRED) collected by the Economic Research Division of the Federal Reserve Bank of St. Louis. Since the 1-month Treasury Constant maturity rate is only accessible since January 2001, we have picked these maturities considering the availability of consistent interest rate data with the period studied. We reveal 3-month, 6-month and 1-year as short-run, including the latter variable 1-year as short-term because it offers more robustness in our assessment. Conversely, we contemplate the rest of the maturity rates as long term. Table I   From 9 interest rates, 36 spread variables are obtained, the calculation being a subtraction of two elements; this follows a combination without repetition C(n,r), being n and r the set and subset size, respectively. As shown in Table I, the interest rates show similar statistical properties. Nevertheless, the short term interest rates 3-month and 6-month presents lower mean and median and higher standard deviation. On the contrary, long term interest rates show the opposite higher mean and median and lower standard deviation. Henceforth for representing term spread at figures and tables, due to saving space, an abbreviation is used, being M and Y for month and year interest rates respectively, i.e. M3-Y10 for 3-month-10-year term spread. At Fig.1 A, the interest rates are plotted where the general trend is decreasing, Fig.1 B shows the computed Term spread for all combinations of interest rates, it is stated that there are some expansion stages with the behaviour of divergence and flattening stage where the term spreads are inverted with the behaviour of convergence which could be an early indicator of economic recession.
As a combinatory result, the term spread variables show several strong correlations. The correlation coefficient is used to verify collinearity, and it is argued that collinearity is certain at the 0.9 level of a correlation coefficient or higher [46]. A correlation analysis is shown between variables at Fig 2, where the correlation plot shows the coefficients: Pearson's correlation results in Fig.2 shows high correlated features. In line with the literature, results show a consistent negative relationship in the difference between long-term and short-term interest rates and consequently in the term spreads [1]. This is taken into account to interpret the importance of the features exposed in the results.
Literature mainly focused on continuous variables whose values, for instance, growth rates in GNP, GDP, industrial production, consumption, investment, among others [1]. In this work, only interest rates are used as predictors as the main purpose of this work is not to offer the better predictive model results of literature but to understand the relationships, importance and rules regarding interest rates with an economic recession.

Variable Target Lift
Regarding materials, you should include a description of examined objects and tools used during the experiment. Give every detail that could affect experiment results.
In machine learning, Lift is a metric used to assess the performance of a targeting model at predicting or classifying cases as having an enhanced response concerning the population as a whole.
This metric is pretty straightforward to understand, and a targeting model is performing well if the response within the target is much better than the average for the population. In other words, Lift is simply the ratio of these values: target response divided by average response [47]. It is defined as: These indicators, shown in Table II, are useful in the exploratory data analysis stage to understand at each variable's decile which range of values of the response variable has more impact on positive target. This can be used as an early exploratory rule for detecting economic recession, and this is complementary information as the decile split does not guarantee the optimal value range for a variable for maximizing the lift; on the contrary, the computed lift for tree base rules ranges may give a better separation as it is a supervised method, for this reason, it helps initially to understand this economic processes. For the sake of simplicity, in Table II the target lift is computed only for the most important variables, as shown in section III. From this table, some initial patterns there can be found. Generally, almost every term spread at high deciles has a high lift in economic recession except for 3-year-3-month. On the contrary, the 3-year-3-month and 2-year-6-month term spreads show high lift for low and mid deciles. This is an initial indicator due to the higher probability of recession in those deciles; for specific range values, the decile's interval table can be found in the appendix.

C. Methodology
The main purpose of this work is not only to offer a model for predicting economic recessions but also to offer a methodology of a good enough model that is able to explain variable importance, dependencies and economic recession detection rules.
Decision-tree ensemble methods are supervised learning methods for modeling the relationship between the dependent variable y with the characteristic vector x. Besides, these techniques are a common choice on the actual machine learning research scenario, it has a wide range of applications for regression, classification and other tasks [48], [49].
The two main decision-tree ensemble methods in bagging and boosting for classification scenario are applied in this work for estimating the economic crisis cycles. The advantage of this methods are that often provides predictive accuracy that cannot be beat, it can optimize on different loss functions and provides several hyperparameter tuning options that make the function fit flexible, generally no data pre-processing required and often works great with categorical and numerical values.
To train the models, a training and test data split is performed, where the training set consist on all available variables for all observations from January of 1969 to December of 1999 and the test set comprises from January of 2000 to January of 2020, with the correspondent binary supervised target of economic crisis cycle. In other words, the models should learn which features are relevant in order to predict from an time interval selected for another more recent time interval which should be relevant not only for predicting the economic crisis cycles but also for Interpretability of the actual situation.

Random Forest Classifier
Random Forest (RF) was proposed by [50] as an ensemble method for regression based on individual decision trees, the original classification approach based on Stochastic Discrimination was proposed by [51], [52].
In this way, Ranger is a fast implementation of RF [53] or recursive partitioning, particularly suited for high dimensional data. The R implementation Ranger was used to adjust a RF model respectively the considered optimal settings [54].
Which makes Random forest powerful is that builds several weak decision trees in parallel, resulting computationally cheap process, by combining the trees to form a single, strong learner by averaging or taking the majority vote results often to be accurate learning algorithms.
The pseudocode is illustrated at Algorithm scheme 1. The algorithm works as follows: for each tree in the forest, a bootstrap sample is selected from S where S(i) is the ith bootstrap. Then it is trained a decision-tree as follows: at each node of the tree, instead of examining all possible feature-splits, a random features subsect selection is made f ‚äÜ F. where F is the set of features. The node then splits on the best feature in f rather than F. In practice f is much, much smaller than F. By narrowing the set of features, it drastically speed up the learning of a tree. 7  end for  8  return H  9  end function  10 function RandomizedTreeLearn( , ) 11

ALGORITHM I. Random Forest algorthm
At each node: ← small subset of 13 Split on best feature in 14 return learned tree 15 end function RF algorithm is a bagging technique for building an ensemble of decision trees, and this technique is known to reduce the variance of the algorithm. Traditionally bagging with decision trees, the constituent decision trees may be highly correlated because the same features will tend to be used repeatedly to split the bootstrap samples. At the same time, restricting each split-test to a small, random sample of features decreases the correlation between trees in the ensemble and improves the performance of the algorithm.

Gradient Boosting Machine
The gradient boosting machines (GBM) proposed by [55] is a robust machine learning algorithm due to its flexibility and efficiency in performing regression tasks [55].
The main difference between boosting and traditional machine learning techniques is that optimization is held out in the function space. In other words, the function estimate f ̂ is parametrized in the additive functional form: (2) In this notation, M is the number of iterations, ̂0 is the initial guess and {̂} =1 are the function increments, also known as "boosts".
To ensure that the functional approach is achievable in practical terms, a comparable approach to parameterization of the family of functions can be implemented. It is introduced to the reader the parameterized "base-learner" functions ℎ( , ) to differentiate it the overall ensemble functions estimates ̂( ). Different families of basic learners can be chosen, such as decision trees and loss functions.
The "greedy stagewise" approach of function incrementing with the base-learners can be formulated.
For the function estimate at the t-th iteration, the optimization function is: The optimal step-size , should specified at each iteration.
The gradient boosting algorithm proposed by Friedman [55], can be summed up with the following pseudocode at algorithm 3.

end for
The theory and formulation of GBM are available in reference [55], which interested readers in a more profound explanation for a better understanding of this method.
In this work, the so-called Extreme Gradient Boosting Training(XGB), proposed by [56], a version of GBM, was applied as a boosting method for classification with the R library xgboost.

Classifier Evaluation
For training the model, a data partition was performed; as explained in the previous sections, the predictive accuracy of the models was measured by splitting the data into training and test sets.
The training set comprehends from 1970 to 1999 with 360 instances and a binary target variable with 16% positives(5 crisis cycles). The test set comprehends from 2000 to 2020, which are 251 instances with 14% of positives in the binary target(3 crisis cycles).
As a classification task, the error assessment was performed using the predicted class for the selected models and computing some accuracy metrics from the confusion matrix.
Let {P, N} the positive a negative instance class and let {̃,̃} be the predictions produced by a classifier. Let P(P|I) be the posterior probability that an instance I is positive. There is no unique metric for assessing a classification task, depending on the characteristics to be evaluated, we consider precision as the most suitable metric for this purpose as considers the positives correctly classified within the observations correctly classified.

Model Interpretation
The interpretability of a statistic model helps to understand why certain decisions or predictions have been made; for this reason, measuring variable importance is an important task in many applications. In this sense, this is the era of making machine learning explainable; several authors have conducted an extensive review of methods [57,58].
The most common variable importance based has been tested by several researchers using both simulated and real data; this metric tends to be biased in many scenarios [58]- [60]. As studied in subsection II.B., there is the presence of mutually correlated and collinearity; Gini variable importance is expected to be biased [59], [60].
Nevertheless, there is also another classification for interpretability, and it could be either local or global; in other words, it is explaining an individual prediction or the entire model behaviour [61].

a. SHAP Variable Importance
SHapley Additive exPlanations(SHAP) is a model additive explanation approach in which each prediction is explained by the contribution of the features of the dataset to the model's output [62], [63]. SHAP comes from the game theory field, that is, the solution for the problem of computing the contribution to a model's prediction of every subset of features given a dataset with m features.
A model retraining is required on all feature subsets ⊆ , where F are all the available features. A value of importance it is assigned to every variable that accounts for the impact on the model's prediction of incorporating that feature. A model ∪{ } is trained with that feature present and another model is trained with the feature withheld in order to compute this effect. Then, both models predictions are compared on the current input Calculating the importance of the features based on SHAP contributions, the mean of each feature is retrieved for each SHAP matrix. Then, the resulting vectors are summed.

b. SHAP Dependence Plots
For every feature and data instance, a point is plotted with the feature value on the x-axis and the corresponding Shapley value on the y-axis, this is the SHAP feature dependence plot.
Mathematically, the plot contains the following points: SHAP dependence plots are an alternative to partial dependence plots and accumulated local effects. While other methods show average effects, SHAP dependence also shows the variance on the yaxis.

c. Rules Extraction
Tree ensembles such as random forests and boosted trees are accurate but difficult to understand. In this work, the framework of the interpretable tree (inTrees) is used to extract, measure, prune, select, and summarize rules from a tree ensemble and calculate frequent variable interactions [64].
Tree ensemble methods consist of multiple decision trees [53], [55]. A rule can be extracted by means of a decision tree's root node to a leaf node. This rule summarization process explained at algorithm 3, is relevant in order to understand and filter the rules for phenomenon interpretability.
Given a rule {C ⇒ T}, where C is the condition's rule, being a conjunction of variable-value pairs aggregated from the path from the root node to the current node, Cnode denote the variable-value pair used to split the current node, leafNode denote the flag whether the current node is a lead node, prednode denote the prediction at a leaf node, and T for rule's output.
ALGORITHM III. ruleExtract algorithm Precondition: • The method ruleExtract explained at pseudocode Algorithm 3 shows the method used to extract rules from a decision tree. As tree ensembles are multiple decision trees, the final rules are a combination of rules extracted from each decision tree in the tree ensemble.
In the following work, it is applied the inTrees framework to the data set. For the winning classifier, the ruleExtract method is applied. As a result, several rules are extracted, and a post-processing rules step is performed. This post-processing comprises de-duping rules and rules metrics computation for rules quality. The rule's metrics are length which is the number of conditions within a rule, support which is the percentual frequency of observations that fulfil the rule, the rule's error for classification tasks which is the number of correctly classified instances within a rule condition and the target lift (epigraph II.B.1) for every rule as the number proportion of positive targets in the rule condition compared with the variable range.

II. RESULTS & DISCUSSION
In this work, a methodology is proposed for understanding the economic recession phenomenon and extracting rules as an early economic recession detection method with a balance of getting a model with a suitable accuracy for prediction, which is the main scope of interpretable models in machine learning. This methodology begins with benchmarking proposed models to get the feature importance for the winning model (see epigraph II.C.4.a). From this step, the main variables that lever the economic recession are detected by understanding the dependencies with the most correlated variables and the feature value interaction regarding the target variable to understand this phenomenon better (see epigraph II.C.4.b). To conclude, a rule extraction process is performed for proposing rules useful for early detection of economic recession (see epigraph II.C.4.c).
As the first step, two tree-based classification models are fitted to the data; as a result, Table IV shows the results for the proposed accuracy metrics for the fitted models. When assessing the predictive accuracy, the yield curve performs quite well. Additional information can improve its predictive performance [65]. Thus, the main purpose of this work is through term spreads as unique independent variables to build a model for interpretability with a balance on predictive accuracy. Despite adding only variables about interest rate nature, suitable classification metrics are obtained employing term spread variables for predicting an economic recession. XGB model has better classification metrics results; for the positive target class, the precision shows us how no false positives are obtained; for this reason, specificity also has the maximum value. However, recall has a high value but not the maximum, showing that despite a balanced classification of negative and positive labels, false negatives are present. After fitting and selecting the winning model, the model interpretation for understanding the phenomenon as the most important part of this work comes with the feature importance as the first relevant output to interpret which variables are the main predictors for economic recessions. The variable importance is obtained by computing the mean of absolute SHAP value for all instances for every feature at the training and test set. As a result, Table V, which is in the appendix, is plotted in Fig.3 for better understanding. In Fig.3, the features are sorted by variable importance in descending order from top to bottom for the most relevant and less relevant, respectively. Besides, by only considering the presence of variables Fig.3.A and 3.B shows similar results at the most important variables; however, as the test set has the more recent data, it is expected to be more representative for future values and may be more accurate in order to extrapolate this information for a near future, due to this, the main analysis is focused in the test set analysis. In previous studies, the best results are obtained when forecasting an economic recession by taking the difference between two interest rates whose maturities are far apart. [65] suggested that the 3month-10-year term spread provides a suitable combination of accuracy and validity in the long term to predict economic recessions. However, most term spreads are highly correlated and provide similar information about the economy, so the particular choices regarding the maturity amount mainly to fine-tuning process.
In previous studies, the best results are obtained when forecasting an economic recession by taking the difference between two interest rates whose maturities are far apart. [65] suggested that the 3month-10-year term spread provides a suitable combination of accuracy and validity in the long term to predict economic recessions. However, most term spreads are highly correlated and provide similar information about the economy, so the particular choices regarding the maturity amount mainly to fine-tuning process.
Results suggest that the most important term spreads are 3month-6-month, 2-year-5-year, 5-year-10-year, 3-year-7-year, 3year-3-month and 2-year-6-month. Although this work has more recent data than previous studies, the literature suggests as a rule of thumb that the difference between 10-year and 3-month Treasury rates becomes negative in early recessions providing a reasonable accuracy and time prevalence [65]. Despite not having this term spread as the more relevant, most term spreads are highly correlated and provide similar information about the economy's behaviour, so the particular choices concerning maturity amount mainly to finetuning and not to reversal of results [65]. The cautionary is that a reference point that works for one spread may not work for others. For example, the 2-year to 10-year term spread may reverse in advance of the 3-month to 10-year term spread, which tends to be higher [1]. In this line, some of the most critical variables like 5-year -10-year term spread align with the literature statements as could invert earlier than 10-year-3-month term spread. Fig.4.A and 4.B. This method estimates an individual sample because they are local explainers. Nonetheless, this can lead to different results as training and test set have different instances; in this case, there are slight differences between both results. Besides, this plot retrieves additional information about the feature value analysis and the position of the instances on the plot. The horizontal location shows whether the effect of that value is associated with a higher or lower prediction from right to left; respectively, the vertical location shows the variable importance. The colour gradient shows whether that variable is high (dark) or low (light) for that observation. As argued before, the analysis is focused on test set results, SHAP contribution values analysis could be complementary to decile target lift results at Table II as it is a preliminary analysis that has not the best splitting method for finding a range with the maximum split. SHAP contribution analysis shows that 3-year-6-month and 5-year-10-year term spreads have a higher lift for higher values, the 9-10 deciles. The term mentioned above spreads shows this relationship information at the SHAP contribution plot at Fig.4, the dark gradient colour for instances are at the right side of the plot and the light ones at the left, which indicates that high values are associated with positive predictions of economic recession. On the contrary, an opposite behaviour is shown on 2-year-5-year, 3-year-7-year, 3year-3-month and 2-year-6-year spreads, which is somehow aligned with the decile target lift values of Table II, the lower values, the higher lift, in other words, higher probability of economic recession.

SHAP contribution values are plotted for training and test sets in
As the SHAP contribution plot shows local interpretability and the decile target lift is not an optimized method for splitting ranges for maximizing lift, these complementary results also may present different nuances at both results due to are different perspective analyses.
Once the main features that impact economic recession prediction are detected, the dependent variables with more important variables on the target variable are studied. Dependence plots have been explained at epigraph II.C.4.b; more information can be found at [62], [63]. In essence, this plot shows feature values of the most important variables on the x-axis and SHAP values of the most correlated variable on the y-axis; additionally, a gradient colour to the points by the feature value of the designated variable is added.
For selecting the most correlated variable, the pairwise Pearson's correlation is performed at subsection II.B. By sorting the correlation coefficient, the most important variable is selected as the most correlated feature; as a result, Table VI at the appendix. Results suggest that the most correlated variables for the most important ones are in the same time term; for long term time spreads, most correlated are long term ones. The relevance of this information is to complement the previous findings with the dependencies of other variables to know the dependence and relationship between the most important variables and the most correlated to them; this helps complete the overview of the processes that affect the economic recessions.
The dependence plot for the most important variables is shown in Fig.5. At the x-axis, the horizontal location is the actual value from the most correlated variable, and at the y-axis, the vertical location shows what having that value did to the prediction. Additionally, the relationship between both information is shown with a loess regression line. For positive slopes, this trend says that the more variable value, the higher the model's prediction is for the most correlated variable; it is the opposite with negative slopes. As a result, two kinds of relationships are found: one with a positive trend at Fig.5 plots A, B, D and F with a positive slope, having the highest correlation with 20-year-6month and 1-year-3-months term spread respectively. Besides, the positive trend with an asymptotic behaviour at Fig.5 plots C and E is found to correlate a 1-year-2-year term spread. In addition, the colour gradient shows the y-axis feature value from light to dark when variables value is low to high, respectively. Generally speaking, the more considerable value of the most correlated variables, the smaller the SHAP value of this variable is. At this point, decile target lift, feature contribution, feature importance and feature dependence are presented; this information let understanding as early indicators which initial range variable values have more probability of having economic recessions and which variable are the most relevant for the economic recession process respectively.
To finalize, at epigraph II.C.4.c is proposed a methodology for identifying rules for economic recession detection. As a result of rules extraction and initial postprocessing, 359 rules are extracted followed by rules metrics; due to saving space, and the table is not presented in the appendix; this can be asked in a document enclosed to this work.
The extracted rules from the winning model can be filtered in several ways; as an initial exploratory study, this work proposes a frequency maximization and Lift Maximization criterion for discovering interesting rules. Frequency maximization criterion is when rules are sorted by support in descending order, and the first rules are the most frequent. The frequency maximization criterion does not sort results by lift, error or length metric for the rules.  Table VII shows some rules for the Frequency maximization criterion, and results show a maximum Support for a rule of 0.95% of observations that satisfy the condition. By analyzing lift criterion, these rules show values nearly to 1, which is equivalent to saying that these rules could guarantee that there is no special probability of finding an economic recession compared with other data range; however, a rule with values near to 0 could show a high probability of not finding an economic recession. As previously explained, XGB is a tree-ensemble model through assembling simple trees, making a complex non-linear model. In this way, the rules extraction may provide rules with a low level of complexity. Due to this sorting method, the most important rules present low Length, low Lift and error rate, qualifying these as simplistic and inaccurate rules.
By sorting rules by lift in descending order, the first rules impact the economic recession detection more. Nevertheless, these rules could affect little observations, but as a recession is a rare event, support for recession identification should be a small percentage.  Table VIII shows some rules for lift maximization criterion; results show a maximum Lift for a rule of 7.17 times more probability of economic recession for the observations that satisfy the condition comparing the overall observations. Nonetheless, as an economic recession is a rare event, these rules usually have low support due to the nature of the economic recession, which is a rare event. More complex rules are found by this sorting criterion, with a low error rate and high probability of economic recession; therefore, the more interesting rules may be found. The interpretation of these rules is pretty straightforward, and a condition value is presented for every term spread involved in the rule; when this condition is satisfied, support, the percentage of observation that satisfies this rule is computed with the respective lift.
For the first rule, 2-year-6-month and 20-year-3-month are involved; this also indicates an interaction in the rule between these variables regarding the economic recession detection. Besides, the 20-year-the 3-month term spread is also an important term spread indicator as it may invert earlier than the 3-month-the 10-year term spread stated as relevant in previous studies [65].
Regarding the threshold values interpretation, the values are compared with the min, mean and max values for all the historical data for every term spread(see Table IX at appendix) in order to interpret the threshold value as a small, average or big value as those thresholds are closer to any of this feature descriptive statistics, in the case a value is close to two statistics the priority for the average is given. As a result, the first threshold number is labelled as a small value and the second as an average value. In this way, the qualitative interpretation of this rule will be formulated as follows: "When the 2year-6-month term spread is lower or equal a small value and 20year-3-month term spread is greater than the average value there is over seven times more probability of economic recession than the probability of economic recession for the complementary conditions". Besides, historically this rule fulfilled the economic recessions accounted for 2008.
For the second rule, 2-year-3-year, 5-year-10-year and 3-month-6-month are involved, mainly describing an interaction between these variables regarding the economic recession detection. "When the Y2-Y3 and Y5-Y10 term spread is lower or equal of the average value of this term spread and greater than the average value of M3-M6 term spread, there is over six times more probability of economic recession than the probability of economic recession for the complementary conditions". Besides, these conditions were fulfilled in the economic recessions accounted at 1990, 1991, 2001 and 2008.
The other rules from Table VIII can be described similarly to the previously explained rules, and these rules fulfil the conditions of the economic recession accounted at 1980, 1981, 1982, 1974 & 1970 years. This technique allows us to have a set of rules for detecting economic recession; with proper data updating & model retraining, these rules can be used in real life and act consequently with economic policies, among other uses.
To summarize the findings, Table X shows the main results except for dependencies analysis results.

 ✓
As a result, main variables on predicting economic recession are detected, and the variable dependence concerning the most correlated is studied; the SHAP value for positive economic recession is taken into account with the preliminary information of Decile Target Lift. Besides, some of the top rules contain the most important variables and fulfil the ideas mentioned in this work.

III. CONCLUSION
Regarding the term structure, long-term rates could explain changes in future short-term rates. Understanding the term structure and yield curve, our goal is to create an interpretable forecasting model that can accurately inform us about future recessions, which could be a valuable tool for practitioners, researchers, governments and central banks. For three main groups, the public sector and the private sector are households, banks and investors, and the Federal Reserve. From an investors point of view, this information could be useful to make the right decisions for investing considering different strategies regarding this information, as the expanding economic activity is correlated with the stock market expansion [66]. By using the term spread to know in advance a possible economic recession, Federal Reserve could modify the interest rates to try to reduce the effect of this phenomenon.
Relevant term spreads are found, 3-month -6-month, 2-year-5year, 5-year-10-year, 3-year-7-year, 3-year-3-month and 2-year-6month. Furthermore, for these variables, the lift metric is computed in order to detect initial intervals with a higher probability of accounting for a recession which is complementary to the SHAP contribution values analysis, applied into the rules description methods implementing the necessary policy mix they can dampen the effects of the recession, minimize its duration, or steer the economy away from it altogether. As the model provides some false negative alarms, we expect that implementing fiscal and monetary policy may put some inflationary pressure on the economy. Finally, the methodology proposes a novelty application in this topic by extracting rules for economic recession understanding and detection. With this technique, several descriptive conditions allow the user to understand this phenomenon and have indicators with the goal of detecting to minimize the magnitude of the effect of the recession.
It is important to note that the yield curve's predictive power is statistical evidence and that, despite its accuracy, it is impossible to assure future results.
Thus, we encourage validating and updating these rules with reasonable frequency as the market evolves.
The literature suggests that the USA's best predictor of economic recessions is the 3-month-10-year term spread. Nevertheless, we found that the 3-month-6-month spread is the most relevant for detecting recessions, including the main recession detection rules. Therefore, monitoring this spread can be a useful tool for recession identification and a valid indicator for market expectations. In this context, it is found that the best rule associates this short-term 3month-6-month predictor with the long-term term spreads, such as 5-year-10-year and 2-year-3-year, illustrating the rule as "When the Y2-Y3 and Y5-Y10 term spread is lower or equal of the average value of this term spread and greater than the average value of M3-M6 term spread there is over six times more probability of economic recession than the probability of economic recession for the complementary conditions".
As a future work suggestion, several paths can be followed. On one accuracy side, the improvement of the model predictive accuracy is relevant to have tools with high quality and impact on predicting this phenomenon. On the interpretability side, as different exogenous variables can be added, more study on the variable interactions can be performed to understand the yield curve inversion with other variables relevant for generating policies to prevent and control. On the rules generation side, as rules are potentially changing over time as variable importance may variate, a predictive maintenance system could be proposed to keep rules updated and valid over time.