A comparison of methodologies in the stress testing of credit risk – alternative scenario and dependency constructs

In the aftermath of the financial crisis of the last decade, banking supervisors have sought the solution to the problem of determining the optimal capital levels that an institution should hold, in order to support their risk taking activities. The experience of this financial downturn has given rise to the conclusion that traditional approaches, such as regulatory or economic capital are inadequate to this end, leading to the prevalence of supervisory stress testing as a primary tool of prudential supervision. A critical input into this process is the set of macroeconomic scenarios, either provided by the prudential supervisors, or developed by financial institutions. Prevalent among approaches in the industry is the combination of expert opinion and an econometric methodology, for example the Vector Autoregression (“VAR”) model that captures the dependency structure among and between macroeconomic explanatory variables and banking loss / income target variables. Despite the prevalence of this approach, we know from the previous finance literature that Gaussian VAR models are unable to cope with the empirical fact of deviation from normality. In this paper we investigate the alternative Markov Switching VAR (“MS-VAR”) model, featured more commonly in the academic realm as opposed to being applied in practice. We conduct an empirical experiment using data from regulatory filings and Federal Reserve macroeconomic data released by the regulators for mandated stress testing exercises. Our finding is that the MS-VAR model performs better than the VAR model, both in terms of producing severe scenarios conservative than the VAR model, as well as showing superior predictive accuracy. Furthermore, we find that the multiple equation VAR model outperforms the single equation autoregressive (“AR”) models according to various metrics across all modeling segments.


Introduction
Following the financial crisis of the prior decade prudential supervisors have turned to stress testing as a primary mechanism with which to gauge the resiliency of financial institutions' with respect to their capital and liquidity resource adequacy to withstand extreme economic scenarios (Acharya, 2009;Demirguc-Kunt et al., 2010).Prior to this, the primary means of risk measurement and management-particularly in the field of credit risk (Merton, 1974)-has been through advanced mathematical, statistical and quantitative techniques and models, which leads to model risk.Model risk (Board of Governors of the Federal Reserve System, 2011; "FRB-BOG") can be defined as the potential that a model does not sufficiently capture the risks it is used to assess, and the danger that it may underestimate potential risks in the future.Stress testing has been used by supervisors to assess the reliability of credit risk models, as can be seen in the revised Basel framework (Basel Committee on Banking Supervision 2006, 2009a, b, c, d, 2010 a, b; "BCBS") and the Federal Reserve's Comprehensive Capital Analysis and Review ("CCAR") program.
A clear pattern that we have recorded is that most of the high-profile failures of the financial crisis era included firms for which the supervisors considered the internal risk models to be robust and that they were deemed to have sufficient capital resources in order to survive a downturn (Schuermann, 2014).This set of surprise failures revealed that the question of capital adequacy was not answered, since as internal models estimate a positive default probability that is in line with the supervisor's or institution's risk appetite, the inability of such methodologies to estimate the actual potential dangers was an impetus in the search for different capital adequacy assessment methodologies, prime among these being the discipline of stress testing.
There are a number of modeling considerations that institutions must consider in estimating losses on their credit portfolios, and in the context of stress testing for CCAR purposes we can focus on some particularities relevant to the design of scenarios and the choice of risk factors.There are two broad categories of model types in use.Bottom-up models are loan-or obligor-level models used by banks to forecast the expected losses of retail and wholesale loans for each loan.The expected loss is calculated for each loan, and then the sum of expected losses across all loans provides an estimate of portfolio losses, through conditioning on macroeconomic or financial/obligor specific variables.The primary advantages of bottom-up models are the ease of modeling heterogeneity of underlying loans and interaction of loan-level risk factors.The primary disadvantages of loan-level models are that while there are a variety of loan-level methodologies that can be used, these models are much more complex to specify and estimate.These models generally require more sophisticated econometric and simulation techniques, and model validation standards may more stringent.In contrast, top-down models are pool (or segment) level models used by banks to forecast charge-off rates by retail and wholesale loan types as a function of macroeconomic and financial variables.In most cases for these models, banks use only one to four macroeconomic and financial risk drivers as explanatory variables.These variables are usually determined by interaction between model development teams and line of business experts.The primary advantage of top-down models has been the ready availability of data and the simplicity of model estimation.The primary disadvantage of pool-level models is that borrower specific characteristics are generally not used as variables, except at the aggregate level using pool averages.Modeling challenges include determination of an appropriate loss horizon (e.g., for CCAR it is a 9-quarter duration), determination of an appropriate averaging methodology, appropriate data segmentation and loss aggregation, as well as the annualization of loss rates.In this paper we consider top-down models.
This paper shall proceed as follows.Section 2 reviews the available literature on stress testing and scenario generation.Section 3 presents the competing econometric methodologies for generating scenarios, time series Vector Autoregressive ("VAR") and Markov Switching VAR ("MS-VAR") models.Section 4 presents the empirical implementation, the data description, a discussion of the estimation results and their implications.Section 5 concludes the study and provides directions for future avenues of research.

Review of the literature
Since the dawn of modern risk management in the 1990s, stress testing has been a tool used to address the basic question of how exposures or positions behave under adverse conditions.Traditionally this form of stress testing has been in the domain of sensitivity analysis (e.g., shocks to spreads, prices, volatilities, etc.) or historical scenario analysis (e.g., historical episodes such as Black Monday 1987 or the post-Lehman bankruptcy period; or hypothetical situations such as modern version of the Great Depression or stagflation).These analyses are particularly suited to market risk, where data are plentiful, but for other risk types in data-scarce environments (e.g., operational, credit, reputational or business risk) there is a greater reliance on hypothetical scenario analysis (e.g., natural disasters, computer fraud, litigation events, etc.).
Stress testing first appears in the supervisory realm under the auspices of the Basel I Accord within the 1995 Amendment on Market Risks (BCBS 1988(BCBS , 1996)).The contemporaneous publication of RiskMetrics TM (J.P. Morgan, 1994) established the practice of market risk management as an analytical discipline in its own right and subsumed several stress testing methodologies thereto developed in this context.Jorion (1996) discusses aspects of stress testing in a book addressing Value-at-Risk ("VaR").Kupiec (1999), Berkowitz et al. (1999) and the Committee on Global Financial Systems survey (2000; "CGFS") analyze stress testing in a trading and treasury VaR context.Mosser et al. (2001) noted that most stress testing of the period relied upon transparent and easily identifiable historical market factors with respect to asset classes in the trading book.
However, in the case of the banking book (e.g., corporate/C&I or consumer loans), this approach of asset class shocks does not carry over as well, as to the extent these are less marketable there are more idiosyncrasies to account for.Therefore, stress testing with respect to credit risk has evolved later and as a separate discipline in the domain of credit portfolio modeling.However, even in the seminal examples of CreditMetrics TM (J.P. Morgan, 1997) and CreditRisk+ TM (Wilde, 1997), stress testing was not a component of such models.The commonality of all such credit portfolio models was subsequently demonstrated (Koyluoglu and Hickman, 1998), as well as the correspondence between the state of the economy and the credit loss distribution, and therefore that this framework is naturally amenable to stress testing.In this spirit, a class of models was built upon the CreditMetrics TM framework through macroeconomic stress testing on credit portfolios using credit migration matrices (Bangia et al., 2002).Nevertheless, prior to the financial crisis supervisory guidance for stress testing were rather unformed in the banking book as compared to other areas such as interest rate, counterparty or country risk (FRB-BOG 1996, 1999, 2002).
In the decade following the financial crisis there is a great expansion of the literature on stress testing.Foglia (2009) survey the existing credit risk stress testing literature of this era.Inanoglu and Jacobs, Jr. (2009) address the aggregation of risk types of capital models in the stress testing and sensitivity analysis of economic capital.Jacobs, Jr. (2010) extends Jacobs and Inanoglu (2009) to the validation of models for stressed capital.Schuermann (2014) analyzes the predominance of stress testing as a supervisory tool in terms of rationales for its utility, outlines for its execution, as well as guidelines and opinions on disseminating the output under various conditions.Jacobs, Jr. (2013) surveys of practices and supervisory expectations for the stress testing of credit risk in the context of a ratings migration methodology in the CreditMetrics TM framework.Rebonato (2010) proposes a Bayesian casual network model, for stress testing having the capability to cohesively incorporate expert knowledge in the model design and methodology of the stress testing process.Another recent study features the application of a Bayesian regression model for credit loss implemented using Fed Y9 data, wherein regulated financial institutions report their gains and losses in conjunction with Federal Reserve scenarios, which can formally incorporate exogenous factors such as such supervisory scenarios, and also quantify the uncertainty in model output that results from stochastic model inputs (Jacobs, Jr. et al., 2015).Jacobs (2015) presents an analysis of the impact of asset price bubbles on standard credit risk measures and provides evidence that asset price bubbles are a phenomenon that must be taken into consideration in the proper determination of economic capital for both credit risk management and measurement purposes.The author also calibrates the model to historical equity prices and in a stress testing exercise project credit losses on both baseline and stressed conditions for bubble and non-bubble parameter estimate settings.Jacobs (2017b) extends Jacobs (2015) by performing a sensitivity analysis of the models with respect to key parameters, empirically calibrates the model to a long history of equity prices, and simulates the model under normal and stressed parameter settings.While the author find statistically significant evidence that the historical S&P index exhibits only mild bubble behavior, this translates in underestimation of potential extreme credit losses according to standard measures by an order of magnitude; however, the degree of relative underestimation of risk due to asset price bubbles is significantly attenuated under stressed parameter setting in the model.
The relative merits of various risk measures and the aggregation of varying risk types, classic examples being Value-at-Risk ("VaR") and related quantities, have been discussed extensively by prior research (Jorion 1997(Jorion , 2006).An important result in the domain of modeling dependency structures is a general result of mathematical statistics due to Sklar (1956), allowing the combination of arbitrary marginal risk distributions into a joint distribution while preserving a non-normal correlation structure, readily found an application in finance.Among the early academics to introduce this methodology is Embrechts et al. (1999Embrechts et al. ( , 2002Embrechts et al. ( , 2003)).This was applied to credit risk management and credit derivatives by Li (2000).The notion of copulas as a generalization of dependence according to linear correlations is used as a motivation for applying the technique to understanding tail events in Frey and McNeil (2001).This treatment of tail dependence contrasts to Poon et al. (2004), who instead use a data intensive multivariate extension of extreme value theory, which requires observations of joint tail events.Inanoglu and Jacobs (2009) develop a coherent approach to aggregating different risk types for diversified financial institutions.The authors model the main risks faced-market, credit and operational-that have distinct distributional properties, that historically have been modeled in differing framework, contributing to the modeling effort by providing tools and insights to practitioners and regulators.
On the topic of scenario generation wed find rather limited literature to date.Bidder and McKenna (2015) propose the use of robust forecasting analysis to estimate adverse scenarios in stress testing that are generated from a single pessimistic view with respect to a baseline predictive model, the so-called "worst case distribution", a means of assessing weaknesses within a framework that can account for model misspecifications in a general sense.Frame et al. (2015) examine the Office of Federal Housing Enterprise Oversight's ("OFHEO") risk-based capital stress test for Fannie Mae and Freddie Mac.The authors conclude that the key driver in the model, 30-year fixed-rate mortgage performance, left the model specification and parameters settings fixed in the forecast period, and that the house price stress scenario was insufficiently severe, resulting in a significant underprediction of mortgage credit losses and associated capital requirements during the downturn.Finally, we extend Jacobs (2016) on the topic of scenario generation and stress testing employing the MS-VAR model, by in addition to this framework considering the prediction of credit loss in a multiple equation setting.

Time series VAR methodologies for estimation and scenario generation
Stress testing is concerned principally concerned with the policy advisory functions of macroeconomic forecasting, wherein stressed loss projections are leveraged by risk managers and supervisors as a decision-support tool informing the resiliency institutions during stress periods1 .Traditionally the way that these objectives have been achieved ranged from high-dimensional multi-equation models, all the way down to single-equation rules, the latter being the product of economic theories.Many of these methodologies were found to be inaccurate and unstable during the economic tumult of the 1970s as empirical regularities such as Okun's Law or the Phillips Curve started to fail.Starting with Sims (1980) and the VAR methodology we saw the arrival of a new paradigm, where as opposed to the univariate AR modeling framework (Box and Jenkins, 1976;Brockwell and Davis, 1991;Commandeur and Koopman, 2007), the VAR model presents as a flexible multi-equation model still in the linear class, but in which variables can be explained by their own and other variable's lags, including variables exogenous to the system.We consider the VAR methodology to be appropriate in the application of stress testing, as our modeling interest concerns relationships and forecasts of multiple macroeconomic and bank-specific variables.We also consider the MS-VAR paradigm in this study, which is closely related to this linear time-invariant VAR model.In this framework we analyze the dynamic propagation of innovations and the effects of regime change in a system.A basis for this approach is the statistics of probabilistic functions of Markov chains (Baum and Petrie, 1966;Baum et al., 1970).The MS-VAR model also subsumes the mixtures of normal distributions (Pearson, 1984) and hidden Markov-chain (Blackwell and Koopmans, 1957;Heller, 1965) frameworks.All of these approaches are further related to Markov-chain regression models (Goldfeld and Quandt, 1973) and to the statistical analysis of the Markov-switching models (Hamilton 1988(Hamilton , 1989)).Most closely aligned to our application is the theory of doubly stochastic processes (Tjostheim, 1986) that incorporates the MS-VAR model as a Gaussian autoregressive process conditioned on an exogenous regime generating process.Let   1 ,..., be a k -dimensional vector valued time series, the output variables of interest, in our application with the entries representing some loss measure in a particular segment, that may be influenced by a set of observable input variables denoted by , an r -dimensional vector valued time series also referred as exogenous variables, and in our context representing a set of macroeconomic factors.This gives rise to the   , , VARMAX p q s ("vector autoregressive-moving average with exogenous variables") representation: Which is equivalent to: are autoregressive lag polynomials of respective orders p , s and q , respectively, and B is the back-shift operator that 3 Note that the VARMAX model ( 1)-( 2) could be written in various equivalent forms, involving a lower triangular coefficient matrix for t Y at lag zero, or a leading coefficient matrix for t  at lag zero, or even a more general form that contains a leading (non-singular) coefficient matrix for t Y at lag zero that reflects instantaneous links amongst the output variables that are motivated by theoretical considerations (provided that the proper identifiability conditions are satisfied (Hanan, 1971;Kohn, 1979)).In the econometrics setting, such a model form is usually referred to as a dynamic simultaneous equations model or a dynamic structural equation model.A related model is obtained by multiplying the dynamic simultaneous equations model form by the inverse of the lag 0 coefficient matrix is referred to as the reduced form model, which has a state space representation (Hanan, 1988).
("autoregressive-moving average with exogenous variables") models, the correlations amongst the elements of t Y are not taken into account, hence the parameter vectors j Θ have a diagonal structure (Brockwell and Davis, 1991).
In this study we consider a vector autoregressive model with exogenous variables ("VARX"), denoted by   , VARX p s , which restricts the Moving Average ("MA") terms beyond lag zero to be zero, or * 0 The rationale for this restriction is three-fold.First, in MA terms were in no cases significant in the model estimations, so that the data simply does not support a VARMX representation.Second, the VARX model avails us of the very convenient DSE package in R, which has computational and analytical advantages (R Development Core Team, 2017).Finally, the VARX framework is more practical and intuitive than the more elaborate VARMAX model, and allows for superior communication of results to practitioners.
We now consider the MS-VARX generalization of the VARX methodology with changes in regime, where the parameters of the VARX system B = (Φ T , Θ T ) T ∈ R p+s will be time-varying.
However, the process might be time-invariant conditional on an unobservable regime variable , denoting the state at time t out of M feasible states.In that case, then the conditional probability density of the observed time series t Y is given by: Where m Β is the VAR parameter matrix in regime

y
. Therefore, given a regime t s , the conditional

 
, t VARX p s s system in expectation form can be written as: We define the innovation term as: , The innovation process t  is a Gaussian, zero-mean white noise process having variance-covariance matrix If the

 
, t VARX p s s process is defined conditionally upon an unobservable regime t s as in equation ( 4), the description of the process generating mechanism should be made complete by specifying the stochastic assumption of the MS-VAR model.In this construct, t s follows a discrete state homogenous Markov chain: Where ρ denotes the parameter vector of the regime generating process.We estimate the MS-VAR model using MSBVAR the package in R (R Development Core Team, 2017).Finally note that in the remainder of the document outside this section we will use the acronyms VAR and MS-VAR instead of VARX and MS-VARX to refer to our competing modeling methodologies.

Empirical implementation
The Federal Reserve's CCAR stress testing exercise requires U.S. domiciled top-tier financial institutions to submit comprehensive capital plans conditioned upon prescribed supervisory, and at least a single bank-specific, set of scenarios (base, adverse and severe).The supervisory scenarios are constituted of 9 quarter paths of critical macroeconomic variables ("MVs").In the case of institutions materially engaged in trading activities, in addition there is a requirement to project an instantaneous market or counterparty credit loss shock conditioned on the institution's idiosyncratic scenario, in addition to supervisory prescribed market risk stress scenario.Additionally, large custodian banks are asked to estimate a potential default of their largest counterparty.
Institutions are asked to submit post-stress capital projections in their capital plan starting September 30 th of the year, spanning the nine-quarter planning horizon that begins in the fourth quarter of the current year, defining movements of key MVs.In this study we consider the MVs of the 2015 CCAR, and their base as well as severely adverse scenarios:  Real Gross Domestic Product Growth ("RGDP") within industry accepted thresholds of acceptability  Scenarios rank order intuitively (i.e., severely adverse scenario stress losses exceeding scenario base expected losses) We considered a diverse set of macroeconomic drivers representing varied dimensions of the economic environment, and a sufficient number of drivers (balancing the consideration of avoiding over-fitting) by industry standards (i.e., at least 2-3 and no more than 5-7 independent variables).According to these criteria, we identify the optimal set focusing on 5 of the 9 most commonly used national Fed CCAR MVs as input variables in the VAR model:  Real Gross Domestic Investment ("RDIG")  Unemployment Rate ("UNEMP")  Commercial Real Estate Price Index ("CREPI")  BBB Corporate Credit Spread ("BBBCS")  CBOE's Equity Volatility Index ("VIX") Similarly, we identify the following loss segments (with loss measured by Gross Charge-off Rates-"GCOs") according to the same criteria, in conjunction with the requirement that they cover the most prevalent portfolio types in typical traditional banking institutions:  Residential Real Estate ("RESI")  Commercial Real Estate ("CRE")  Consumer Credit ("CONS")  Commercial and Industrial ("C&I") This historical data, 60 quarterly observations from 1Q01 to 4Q155 , are summarized in Table 1 in terms of distributional statistics and correlations, as in Figures 1 to 9 of this section.Across all series when looking at the time series dimension (in the left panels of the figures, in levels in the top and percent changes on the bottom), we observe that the credit cycle is clearly reflected, with indicators of economic or financial stress (health) and charge-off loss rates displaying peaks (troughs) in the recession of 2001-2002 and in the financial crisis of 2007-2008, with the latter episode dominating in terms of severity by an order of magnitude.However, there are some differences in timing, extent and duration of these spikes across macroeconomic variables and loss rates.These patterns are reflected in the percent change transformations of the variables as well, with corresponding spikes in these series that correspond to the cyclical peaks and troughs, although there is also much more idiosyncratic variation observed when looking at the data in this form.Shifting focus to the smoothed histogram graphs (in the right panels of the figures, in levels in the top and percent changes on the bottom), we note that there are significant deviations from normality in terms of excess skewness and excess kurtosis relative to the Gaussian case, although the extent of these deviations exhibits significant variations across variables (e.g., in the case of the VIX, the non-normality is extreme, and obviously in the case of certain indices or loss rates the bounded domain are clear violations of normality).Furthermore, such deviations from normality are accentuated by an order of magnitude when examining these distributions of the variables in percent change form, which holds generally although with the extent of the deviations varying somewhat across variables.Finally, we note that in general the variation relative to the mean is an order of magnitude greater than looking at percent changes relative to levels.
The correlations amongst all of the independent and dependent variables, in both their level and percentage change forms, are displayed in Tables 2 through 4. First, we will describe main features of the dependency structure within the group of input macroeconomic variables, then the same for the output loss rate variables, and finally the cross-correlations between these two groups.We observe that all correlations have intuitive signs and magnitudes that suggest significant relationships, although the latter are not large enough to suggest any issues with multicollinearity.While the correlations of the percent change transformations are generally lower, they are still intuitive and of reasonable magnitudes.We also note that percent changes of variables are negatively (positively) correlated with levels when indicators are those of economic strength (weakness).The correlation matrix amongst the macroeconomic variables appears in Table 2.For example, considering some of the stronger relationships amongst the levels, the correlations between UNEMP/VIX, CREPI/UNEMP and BBBCY/RDIG are 36.5%,−36.0% and −21.5%, respectively.For example, considering some of the stronger relationships amongst the percent changes, the correlations between BBBCR/CREPI, UNEMP/RDIG and VIX/CREPI are 34.3%, −7.8% and 28.6%, respectively.The correlation matrix amongst the credit loss rate variables appear in Table 3.For example, considering some of the stronger relationships amongst the levels, the correlations between CRE/RESI, CONS/CRE and CNI/CONS are 86.3%, 90.4% and 79.8%, respectively.For example, considering some of the stronger relationships amongst the percent changes, the correlations between CONS/CRE, CNI/CRE, and CNI/CONS are 26.2%,15.5% and 38.1%, respectively.The correlation matrix amongst the credit loss rate and macroeconomic variables appear in the Table 4.For example, considering some of the stronger relationships of the levels, the correlations between UNEMP/CRE, CREPI/CNI and UNEMP/RESI are 89.8%,58.8% and 92.8%, respectively.For example, considering some of the stronger relationships amongst the percent changes, the correlations between UNEMP/CNI, UNEMP/CONS, and VIX/CRE are 41.7%, 25.5% and 27.2%, respectively.
In Table 5 we display the Augmented Dickey-Fuller ("ADF"; Dickey and Fuller, 1981) statistics of the macroeconomic variables under consideration.We observe that we only reject the null hypothesis of a unit root process (or of non-stationarity) in one case for the variables in level form, whereas in percent change for we are able to reject this in all cases at the 5% confidence level or better.We also show results of the Kwiatkowski, Phillips, Schmidt and Shin ("KPSS";Kwiatkowski et al., 1992) test, in which the null hypothesis is a stationary time series, where we are not able to reject the null hypothesis in all cases for the percent changes, but not for the variables in level form where we do reject the null hypothesis in some cases.Taken in combination with the observations regarding the correlation analysis of Table 1, this leads to the choice of modeling the percent changes in the macroeconomic variables in order to generate base and stress scenarios.As a practice, when modeling in a time series framework, it is preferable to work with data that are jointly stationary.
A critical modeling consideration for the MS-VAR estimation is the choice of process generation distributions for the normal and the stressed regimes.As described in the summary statistics of Table 1, we find that when analyzing the macroeconomic data in percent change form, there is considerable skewness in the direction of adverse changes (i.e., right skewness for variables where increases denote deteriorating economic conditions such as UNEMP, and left skewness in variables where declines are a sign of weakening conditions such as RDIG).Furthermore, in normal regimes where percent changes are small we find a normal distribution to adequately describe the error distribution, whereas when such changes are at extreme levels in the adverse direction we find that a log-normal distribution does a good job of characterizing the data generating process. 6nother important modeling consideration with respect to scenario generation is the methodology for partitioning the space of scenario paths across our 6 macroeconomic variables for the Base and Severe scenario.In the case of the Severe scenario, we choose to identify such a path in which all six macroeconomic variables exceed their historical 99.0 th percentile in at least a single quarter, and then in that set for each variable we take an average across such paths in each quarter.It is our view that this is a reasonable definition of a Severe scenario, and in our risk advisory practice we have observed similar definitions in the industry. 7In the case of the Base scenario, we take an average across all paths in a given quarter for a given variable.The scenarios are shown in Figures 10 to 14 where we show for each macroeconomic variable the base and severe scenarios for the VAR and MS-VAR models 8 , and also compare this to the corresponding Fed scenarios, along with the historical time series.We make the following general conclusions regarding the different scenario generation methodologies:  In the Severe scenario, the MS-VAR model is far more conservative than the VAR model, and is always at least matching and in some cases even well exceeding historical peaks or troughs in the adverse direction. In terms of magnitude, the VAR model is similar to the Fed scenarios, but the trajectories of either the VAR or MS-VAR model tend to be more regular, rising at a more gradual pace into the forecast period. In the Base scenarios, the Fed model is rather similar to the VAR model, but in all cases the MS-VAR model produces a higher base, which is driven by the skewness of the mixture error distribution.
modelling U.S. Treasury yields.We observe that this mixture well characterizes the empirical distributions of the data in this paper. 7We have performed a sensitivity analysis, available upon eques, using the 95 th and 99.9 th percentiles, and the results are not greatly changed. 8Estimation results for the VAR and MS-VAR model are available upon request.The models are all convergent and goodness of fit metrics in with industry standards.Signs of coefficient estimates are in line with economic intuition and estimates are all significant at conventional levels.We use the dse, tseries and MSBVAR libraries in R in order to perform the estimations (R Development Core Team, 2017).The estimation results are summarized in Tables 6 and 7. Table 6 tabulates the results of the VAR (1) estimation of a 4-equation system, while Table 7 tabulates the results of the single equation AR (1) models for each portfolio segment separately.Below we highlight the main conclusions of this study in regard to the difference between the multiple and single equation estimations (detailed descriptions of estimation results and residual diagnostics are given in an Addendum to this paper):  In both the VAR and AR models, all coefficient estimates are of intuitive sign, and statistically significant at conventional confidence levels, although we note that the significance levels are generally at higher levels for the VAR as comparted to the AR models. Residual diagnostics reveal lack of serial autocorrelation and a Gaussian distribution in both VAR and AR models, although we note that the quality of residuals if somewhat better for the VAR as comparted to the AR models. Across all 4 segments, according to the likelihood ratio statistic, we reject the hypothesis that the restrictions of the single equation AR models are justified. The results of the estimation are broadly consistent across the VAR and AR models, but with a few notable differences, such that the autocorrelation terms are larger in the AR models than in the VAR model. The VAR models show greater sensitivity to macroeconomic factors than do the AR models. The VAR models are generally more accurate according to standard measures of model fit with respect to each segment. The VAR is more conservative than the AR as by measured by cumulative 9-quarter percentage error in the sense of under-predicting (over-predicting) to a lesser degree during the downturn (recent) period.The results of the scenario analysis with respect to the credit loss segments, for both AR and VAR estimation, as well as for the three scenario generation methodologies (Fed, VAR and MS-VAR), are shown in Tables 8 and 9, as well as in Figures 15 through 18.The results across modeling segments are in line with the scenarios analysis as per macroeconomic variable as discussed in this section, but these results in terms of conservatism of the Severe forecasts are accentuated in the VAR model and dampened in the AR models.In the severe scenario, the MS-VAR model is far more conservative than the VAR model, reflecting the greater sensitivity to macroeconomic factors noted in the estimation results, and always at least matching and in some cases well exceeding historical peaks or troughs in the adverse direction.As an example, in the case of the C&I segment in the VAR estimation and as measured by the cumulative loss relative to that in the downturn period in VAR estimation, in the C&I segment this multiple is 1.05 in the MS-VAR model but only 0.85 (0.36) in the VAR (Fed) scenario generation models.However, in the corresponding multiple is 0.75 in the MS-VAR model but only 0.71 (0.29) in the VAR (Fed) scenario generation models.
In Table 10 we implement the Mariano-Diebold (Diebold and Mariano, 2002) accuracy tests of the hypotheses that the multiple equation VAR model outperforms the RS-VAR model.This is an important exercise, as the literature notes that often regime switching time series models are prone to the problem of over-fitting (Dacco and Satchell, 1999;Engel, 1994).We are able to reject the null hypothesis that the MS-VAR model is outperformed by the VAR model, both on a 1-step ahead and on an out-of-sample basis9 .In the latter, we recalibrate the model leaving out the last 8 quarters of data, and predict credit losses over this period.

Conclusion and future directions
This paper has considered analyzing the estimation methodologies and the macroeconomic scenarios provided, key ingredient of the stress testing process, such as the Federal Reserve's CCAR program.We have analyzed the estimation methodology implications around the supervisory requirements that banks develop their own macroeconomic scenarios.A standard approach such as the VAR statistical model, that exploits the dependency structure between both macroeconomic drivers and modeling segments, has been examined in the context of the well-known phenomenon of fat-tailed distributions that deviate from a Gaussian error structure.We have investigated the implications the MS-VAR challenger model, commonly seen in academics yet not prevalent in practice.These competing models have been empirically tested with Federal Reserve macroeconomic data released for CCAR purposes and Y9 regulatory filings.Our main finding is that the MS-VAR model produces more conservative Severe loss projections as compared to the VAR model, as well as greater forecast greater accuracy according to the KPSS testing, which we explain in the ability of the regime switching paradigm to better accommodate extreme events observed in history that deviate from normality.The MS-VAR model is capable in the Severe scenario of at least matching and sometimes well exceeding historical extremes in the direction of augmented losses, as compared to the VAR model.The VAR model bears similarities to the Fed model in terms of the magnitude of scenarios, but we observe that the trajectories of either the VAR or MS-VAR models tend to be more regular.The Fed model in the Base scenarios is rather close to that of the VAR model, but the MS-VAR model projects an augmented base in all cases, which attribute to the skewness error distribution in the regime-switching or mixture setting.
As a second main conclusion, we have considered the case of banks that model the risk of their portfolios using top-of-the-house modeling techniques, and have addressed an issue of how to incorporate the correlation of risks amongst the different segments.An approach to incorporate this consideration of a dependency structure was proposed, and the bias that results from ignoring this aspect is quantified, through estimating a VAR time series models for credit loss using Fed Y9 data.We found that the multiple equation VAR model outperforms the single equation AR models according to various metrics across all modeling segments.The results of the estimation are broadly consistent across the VAR and AR models, but with a few notable differences (e.g., most segments exhibit significant but mild autocorrelation, and different subsets of the macroeconomic variables are significant across different segments).Across all 4 segments, according to the likelihood ratio statistic, we reject the hypothesis that the restrictions of the single equation AR models are justified.Furthermore, while the VAR models are generally more accurate according to standard measures of model fit with respect to each segment, it is inconclusive whether the VAR or AR models are more or less conservative as measured by cumulative 9-quarter losses.
There are several directions in which this line of research could be extended, including but not limited to the following:  More granular classes of credit risk models, such as ratings migration or PD/LGD scorecard/regression  Alternative data-sets, for example bank or loan level data  More general classes of regression model, such as logistic, semi-parametric or machine learning / artificial intelligence techniques (Jacobs, 2018)  Applications related to stress testing, such as regulatory or economic capital

Φ2
represent sensitivities of output variables to their own lags and to lags of other output variables, while the corresponding matrices j Θ are model sensitivities of output variables to contemporaneous and lagged values of input variables 3 .It follows that the dependency structure of the output variables t Y , as given by the autocovariance function, is dependent upon the parameters j Φ , and hence the correlations amongst the t Y as well as the correlation amongst the t X that depend upon the parameters j Θ .In contrast, in a system of univariate In fact, the exogenous variables   t X can represent both stochastic and non-stochastic (deterministic) variables, examples being sinusoidal seasonal (periodic) functions of time, used to represent the seasonal fluctuations in the output process   t Y , or intervention analysis modelling in which a simple step (or pulse indicator) function taking the values of 0 or 1 to indicate the effect of output due to unusual intervention events in the system.

Figure 1 .
Figure 1.Time series and kernel density plot-Real Domestic Investment Growth.

Figure 2 .
Figure 2. Time series and kernel density plot-Unemployment Rate.

Figure 3 .
Figure 3.Time series and kernel density plot-Commercial Real Estate Index.

Figure 4 .
Figure 4. Time series and kernel density plot-BBB Corporate Bond Rate.

Figure 5 .
Figure 5.Time series and kernel density plot-CBOE Equity Market Volatility Index.

Figure 6 .
Figure 6.Time series and kernel density plot-Residential Real Estate Loan Charge-off Rates.

Figure 7 .
Figure 7. Time series and kernel density plot-Commercial Real Estate Loan Charge-off Rates.

Figure 8 .
Figure 8.Time series and kernel density plot-Consumer Loan Charge-off Rates.

Figure 9 .
Figure 9.Time series and kernel density plot-Commercial and Industrial Loan Charge-off Rates.

Figure 10 .
Figure 10.Historical time series, Base and Severe scenarios for the VAR, MS-VAR and Fed Models-Real Disposable Income Growth.

Figure 11 .
Figure 11.Historical time series, Base and Severe scenarios for the VAR, MS-VAR and Fed Models-Unemployment Rate.

Figure 12 .
Figure 12.Historical time series, Base and Severe scenarios for the VAR, MS-VAR and Fed Models-Commercial Real Estate Price Index.

Figure 13 .
Figure 13.Historical time series, Base and Severe scenarios for the VAR, MS-VAR and Fed Models-BB Corporate Credit Spread.

Figure 14 .
Figure 14.Historical time series, Base and Severe scenarios for the VAR, MS-VAR and Fed Models-VIX Equity Market Volatility Index.
Transformations of chosen variables should indicate stationarity  Signs of coefficient estimates are economically intuitive  Probability values of coefficient estimates indicate statistical significance at conventional confidence levels  Residual diagnostics indicate white noise behavior  Model performance metrics (goodness of fit, risk ranking and cumulative error measures) are Real Gross Domestic Investment ("RDIG")  Consumer Price Index ("CPI")  Real Disposable Personal Income ("RDPI")  Unemployment Rate ("UNEMP")  Three-month Treasury Bill Rate ("3MTBR")

Table 1 .
Summary statistics of historical Y9 credit loss rates and Federal Reserve Macroeconomic Variables.

Table 2 .
Correlations amongst Federal Reserve Macroeconomic Variables.

Table 3 .
Correlations among historical Y9 credit loss rates.

Table 4 .
Correlations amongst historical Y9 credit loss rates and Federal Reserve Macroeconomic Variables.

Table 5 .
Augmented Dickey-Fuller and Kwiatkowski-Phillips-Schmidt-Shin stationarity test statistics of Credit Loss Rates and Macroeconomic Variables.

Table 6 .
Vector Autoregressive Model Estimation Results Compared (Fed Macroeconomic Variables and Aggregate Y9 Bank Charge-offs).

Table 7 .
Single Equation Autoregressive Model Estimation Results Compared (Fed Macroeconomic Variables and Aggregate Y9 Bank Charge-offs).

Table 8 .
Vector Autoregressive Model Estimation and Gaussian Vector Autoregressive vs. Regime Switching Vector Autoregressive Scenario Generation Compared (Fed Macroeconomic Variables and Aggregate Y9 Bank Charge-offs).

Table 9 .
Single Equation Autoregressive Model Estimation and Gaussian Vector Autoregressive vs. Regime Switching Vector Autoregressive Scenario Generation Compared (Fed Macroeconomic Variables and Aggregate Y9 Bank Charge-offs).Vector Autoregressive vs. Single Equation Autoregressive Model Estimation and Gaussian Vector Autoregressive vs. Regime Switching Vector Autoregressive Scenario Generation Compared-Residential Real Estate.Vector Autoregressive vs. Single Equation Autoregressive Model Estimation and Gaussian Vector Autoregressive vs. Regime Switching Vector Autoregressive Scenario Generation Compared-Commercial and Industrial Loans.
Figure 16.Vector Autoregressive vs. Single Equation Autoregressive Model Estimation and Gaussian Vector Autoregressive vs. Regime Switching Vector Autoregressive Scenario Generation Compared-Commercial Real Estate.Figure 17. Vector Autoregressive vs. Single Equation Autoregressive Model Estimation and Gaussian Vector Autoregressive vs. Regime Switching Vector Autoregressive Scenario Generation Compared-Consumer Loans.Figure 18.

Table 10 .
Diebold-Mariano Accuracy Tests-Gaussian Vector Autoregressive vs. Regime Switching Vector Autoregressive Scenario Generation Compared (Fed Macroeconomic Variables and Aggregate Y9 Bank Charge-offs).