Adjusted Extreme Conditional Quantile Autoregression with Application to Risk Measurement

In this paper, we propose an extreme conditional quantile estimator. Derivation of the estimator is based on extreme quantile autoregression. A noncrossing restriction is added during estimation to avert possible quantile crossing. Consistency of the estimator is derived, and simulation results to support its validity are also presented. Using Average Root Mean Squared Error (ARMSE), we compare the performance of our estimator with the performances of two existing extreme conditional quantile estimators. Backtest results of the one-day-ahead conditional Value at Risk forecasts are also given.


Introduction
Correct specification of a loss/returns distribution is key to the accuracy of a risk measure such as Value at Risk. As noted in [9], the major difference among many estimators of Value at Risk lies in estimation of the distribution of returns. Complexity in modeling financial data is due to its failure to exhibit standard statistical properties such as normality, independence, and identical distribution [18]. Statistical tests have revealed that returns exhibit fat-tails, time-varying volatility, and volatility clustering. Moreover, [7] showed that returns exhibit serial correlation over long time horizons. Models based on mean autoregression coupled with results from extreme value theory such as the AR(1)-GARCH(1,1) model in [30] incorporated most of the aforementioned characteristics of financial data but suffer from lack of robustness due to the effect of extreme observations on the mean. Extreme quantile autoregression in [2,24,25] among others leads to a more robust model. is is because they combine regression quantiles introduced by [17] in an autoregressive fashion while using extreme value techniques on the resulting residuals to capture the tail behaviour. A major challenge of this approach is possible quantile crossing. e challenge of quantile crossing has been addressed by smoothing suggestions in [5,6,16] among others in a nonparametric setting. Equally, the conditional locationscale model used in obtaining Restricted Regression Quantiles (RRQ) in [11] averts possible crossing in extreme quantiles but can suffer from the same when estimating the median. To avert quantile crossing even at the middle, [1] added a forced ordering constraint in the estimation of multiple quantiles. Simulation results revealed that, based on standard error of the estimates, noncrossing quantile regression in [1] produces better estimates in the middle. However, the RRQ estimator produced better estimates at the tails, especially when the sample size was large.
e lack of monotonicity in estimation of conditional quantiles is addressed in [3] through sorting of originally estimated nonmonotone quantile curves using a functional delta method. e monotonic quantile functions obtained were found to be closer to the true quantile than the nonmonotonic quantiles. Function limit theory for the rearranged estimators was also derived. e resulting monotonic quantile functions were then used in estimating economic functions using Vietnam data. However, the model was not extended to extreme cases to cover heavy tails beyond the sample. Parametric quantile regression is used in [27] to estimate percentiles in positive valued datasets. Specifically, a linear quantile regression model was used with the error term assumed to follow a generalized gamma distribution. e idea of quantile regression was achieved by allowing parameters for the error distribution to depend on the univariate covariate leading to a location-scale model. e four-, five-, and six-parameter generalized conditional gamma distributions were considered and likelihood ratio test was used in selecting the best-fit model for each dataset. Asymptotics for the three resulting models were also derived. However, the use of generalized gamma distribution limits the model to applications where the covariate is greater than zero. is together with the fact that some financial datasets have heavier tails than the Gamma distribution limits the application of the model in finance.
We seek to improve the extreme conditional quantile estimator in [25] using an interquantile dispersion from the central conditional quantile.

Methods and Estimation
Let S t , t ∈ R + U 0 { } be a real valued financial time series on a complete probability space (Ω, F, P). We assume that S t ∈ R + and it is F t -measurable where F t , t ∈ R + is an increasing sequence of σ-algebras representing information available up to time t. In particular, let S t be the value of a portfolio at trading time t. e return on the portfolio at time t, used to quantify the gain in value of the portfolio from trading time t − Δt to trading time t, is given by so that is the corresponding loss return of S t .
Definition 1 (Risk Measure). A risk measure is a function ρ from a set L of risks in a financial position (in this case, the loss distribution) to R; that is, ρ: L ⟶ R. We assume that X t can be expressed using a linear heteroscedastic model of the following form: where μ t ≡ f: R d ⟶ R is the conditional mean function of X t given F t− 1 and it is defined as μ t � Y t ′ β. e t are errors and Y t is a d-dimensional process which is F t− 1 -measurable. In particular, Y t has 1 as the first element and a collection of the last observed returns up to time t − 1; that is, To ensure that the model is smooth and obeys some of the financial norms such as clustering of shocks, we further assume that e t can be decomposed into where ϵ t are independent and identically distributed random variables and σ t > 0 is the conditional volatility. In this case, X t is said to assume a location-scale model of the form e corresponding α quantile of X t under this formulation is given by where q ϵ α is the α-quantile of ϵ t . Let us now define a conditional quantile autoregressive model on X t of the form where μ t,θ is the central conditional θ-quantile of X t and ε t are errors with zero θ-quantile. Let ε t � σ t,θ Z t , where σ t,θ is the central conditional scale of X t and Z t are i.i.d. residuals.
Using an approach similar to Points Over reshold (POT), we propose an extreme conditional quantile of the form given in equation (8) and refer to it as the adjusted extreme conditional quantile. at is, suppose that we are interested in an extreme quantile, μ t,θ,α , for some α ≈ 1 or 0; the idea is to estimate the central quantile, μ t,θ , and scale σ t,θ , for some level θ in the middle and approximate the extreme quantile as where q z α and q z θ are the α, θ-quantile of Z t , respectively, for α, θ ∈ (0, 1). If the parametric distribution of Z t is known, then q z α and q z θ are easily determined as the inverse of the cumulative distribution of Z t at probability levels α and θ, respectively; otherwise, appropriate estimates are determined. Note that μ t,θ,α is μ m t,α in conditional quantile autoregressive form. Observe that when α � θ, equation (8) reduces to which is the central conditional quantile of X t given Y t � y. is confirms that indeed ε t has a zero θ-quantile. From equation (8), we obtain the following estimator for the extreme conditional quantile: where q z θ and q z α are appropriate estimates of the θ, α-quantiles of the i.i.d. residuals, respectively. We compare this estimator with proposed in [25], and where c α is the resulting coefficient from quantile regression of ε t against ε t at 100α%. Note that μ h t,θ,α is the estimator proposed in [11].

Estimation of Central
Quantiles. Let μ t : R d ⟶ R be an unknown smooth function and define the loss function M θ : where |x − μ t | + and |x − μ t | − represent absolute positive and negative values, respectively, and I x− μ t ≤ 0 is the indicator function. Assuming that the conditional quantile process is well defined, we expect So the θ-conditional quantile of X t is given by Note that equation (14) can be used to check whether the conditional quantile process is correctly specified or not. We impose the following regularity assumptions to ensure consistency of the conditional quantiles. Assumption 1. X t , Y t are identically distributed with the joint probability density f X,Y (x, y) and a continuous conditional probability density f X|Y (x|y) of X t given Y t � y.
From the sample analog of equation (15), we obtain which is the θ-conditional quantile estimator for a sample of size n. To overcome the limitation of quantile crossing, we used the approach in [1] where required quantiles were estimated simultaneously with a noncrossing constraint using the optimization problem: for some weight function w(θ i ) > 0. A conveniently used practical choice of the weight function which was also adopted for this study is w(θ i ) � 1, ∀i � 1, . . . , k.

e Scale Function.
To still maintain dependence and ensure positivity of the scale function, this study incorporated a scale function in the form of a quantile autoregressive (QAR) function on the absolute of the nonstandardized residuals. is was achieved by replacing μ t,θ with its corresponding estimate in equation (7) so that ε t � X t − μ t,θ and where σ t,θ is the central conditional θ-quantile of |ε t |, ϑ t,θ is the conditional scale of |ε t |, η t are i.i.d. residuals, and ϖ t � ϑ t,θ η t are errors with zero θ-quantile. Similarly, as is the case in Section 2.1, we let σ t : R d ⟶ R + be an unknown smooth function and define the loss function M θ (x, σ t ). We assume that the QAR process in equation (18) obeys the four regularity assumptions given earlier so that, using noncrossing quantile regression approach, we obtain the following estimate of the scale: where is a loss function of the form given in equation (13).

Extreme Value eory (EVT).
Most of the financial datasets are heavy-tailed [12]. erefore, it is fundamentally important to incorporate extreme value theory in the estimation of extreme quantiles. A basic requirement for the application of EVT is independence in the particular distribution. e study in [25] observed that it is appropriate to assume that, at high (low) levels of α, the standardized residuals in equation (7) given by where σ t,θ is the estimate of the scale, are approximately independent, which allows us to apply EVT. Let Z 1 , Z 2 , . . ., follow a common distribution function F. Consider a sample Pursuant to Fisher-Tippett's theorem in [8], the random variable Z (or alternatively the distribution F of Z ) is said to belong to the Maximum Domain of Attraction (MDA) of the extreme value distribution H if there exist norming constants c n > 0 and d n ∈ R such that Consequently, H is referred to as the Generalized Extreme Value (GEV) distribution.
is study applied Points Over reshold (POT) method because it uses more data leading to better estimates compared to the Block Maxima method. POT models the distribution Journal of Probability and Statistics of all excesses above a particular threshold u, where 0 ≤ z ≤ z F − u.
Theorem 1 (Pickands-Balkema-de Haan) (see [22]). We can find (positive-measurable function) β(u) such that if and only if F ∈ MDA(H λ ), λ ∈ R, and G λ,β(u) (z) is the Generalized Pareto Distribution (GPD) given by with z ≥ u when λ > 0 and u ≤ z ≤ u − (β/λ) when λ < 0. λ and β are the shape and scale parameters, respectively. A major challenge in POT is accuracy in choosing a threshold to separate extreme observations from the center of the distribution [30]. Among the methods discussed in [32], most authors such as those of [21,28,30] prefer the conventional method in which a threshold that ensures that between 5% and 10% of the sample data is classified as extreme observations is chosen. Although conventional method is subjective, the choice of the threshold can be checked for appropriateness using a mean excess plot. e mean excess function for the GPD is given by where 1 + λ > 0. is implies that an optimal threshold corresponds to start of approximate linearity of the mean excess plot with the sign of the slope, λ/1 − λ, indicating the specific family of the GPD. A positive sign corresponds to the Frechet family, while a negative sign implies the Weibull family [8].
For convenience in making inferences on variability of the estimated quantiles, a recommendation in [13] on the use of Probability Weighted Moments (PWM) method in estimating parameters of the GPD was adopted. Using the first and second PWMs, we obtain the corresponding parameters estimates as where M 0 and M 1 are obtained by replacing for k � 0 and k � 1 in which is the PWM of GPD with λ > − 1. See [10,23] for details on PWM method. For a sample of size n, the corresponding PWM estimates are given by where x 1: n ≤ · · · ≤ x n: n is the ordered sample and P j: n � i + c/n + δ for suitable constants c and δ. As recommended in [20], c � − 0.35 and δ � 0. e overall distribution of the standardized residuals was obtained by splicing the GPD with the empirical bulk distribution at the threshold using the approach in [21,29,30] among others to obtain for z > u and G λ,β (z − u) the Generalized Pareto Distribution. When F(u) is approximated empirically, we obtain the following estimate of F(z): where N is the sample of size, m is the number of exceedances above the threshold u, and β together with λ are the estimated GPD parameters.  (30), we get the following estimate for the quantile of the standardized residuals at level α: Lemma 4 (consistency of error quantiles). Let Z 1 , Z 2 , . . . , Z n be i.i.d. random variables from a CDF F belonging to the MDA of H(λ) satisfying α < F(q z α + ϵ) for any ϵ > 0. en, for every ϵ > 0 and n � 1, 2, . . . , Using equation (10), we obtained the one-step VaR predictions as where μ t+1,θ and σ t+1,θ are the corresponding one-step θ-quantile and scale estimates, respectively, from the linear conditional quantile process.

Simulations
To evaluate the accuracy of our estimators, a sample of size T � 4250 was generated using the model where Z t follows Student's t-distribution with 4 degrees of freedom. e sample was partitioned into design data of size n and test data of size T − n. Figure 1 represents a sample path of the model superimposed with the median. Note that, for simulation purposes, θ � 0.5 was used in estimating the central quantile.
Clearly, from the sample path, there is some level of volatility clustering, which is common in most financial data. An ACF plot of the resulting standardized residuals in Figure 2 confirms that indeed they are independent. z � 2.3283 was chosen as the threshold to ensure that 10% of the resulting ordered standardized errors were classified as extremes. is was confirmed by approximate linearity of the mean excess plot after the threshold as shown in Figure 3. e corresponding shape and scale parameter estimates from the GPD fit were 0.1156182 and 1.272937, respectively. Table 1 outlines the sample statistics of the estimates of the various quantiles together with the data. Figure 4 shows the corresponding quantile estimates at different levels of α. e accuracy of our extreme quantile estimator was evaluated using the Average Root Mean Squared Error (ARMSE). e RMSE seeks to return the Mean Squared Error (MSE) to the original scale of the sample. For k sample paths of simulations of size n of the extreme quantiles, the average RMSE is given by where μ t,θ,α is replaced by μ m t,θ,α or μ h t,θ,α when considering extreme conditional quantiles or restricted regression quantiles, respectively. To check how our model behaves under different choices of central quantile, we computed ARMSE for the extreme conditional quantile at α � 0.95, where θ � 0.25, 0.5 and 0.75 were considered. 1000 sample paths, each of sizes 250, 500, 1000, 2000, and 4000, were used in the computation of ARMSE. Table 2 reports the obtained ARMSE values.
We note that, for a large enough sample (2000 observations and above), ARMSE is lowest when θ � 0.5 and thus this choice of theta is maintained in investigating the accuracy of our estimator and forecasting the one-day-ahead VaR. Table 3 outlines the obtained ARMSE for the extreme conditional quantile at α � 0.95 under three different models. e sample sizes and number or replications are still maintained.
Based on ARMSE, both RRQ and ECQ perform better than AECQ for small samples. However, as the sample size increases, AECQ outperforms both RRQ and ECQ. e decreasing ARMSE with increase in the sample size for AECQ and ECQ confirms that both are consistent estimators of the extreme conditional quantile. Also, for sample size above 2000, the rate of convergence of the AECQ estimator is higher than that of the ECQ estimator. It was not possible to comment on the consistency of the RRQ estimator, since its ARMSE fluctuated with increase in sample size. e consistent reduction in ARMSE when noncrossing constraint is added during estimation, confirming that indeed this constraint increases accuracy of resulting estimators.

Evaluating VaR Forecasts
In Section 3, we evaluated accuracy of our in-sample quantile estimates. In this section, we extend this by evaluating the out-of-sample VaR forecasts from our quantile estimator. To achieve this, we carry out backtests on 250 oneday-ahead VaR forecasts (as recommended in the Basel Accord) using coverage tests in [4,31]. Coverage tests were adopted due to their popularity in literature and practice [26]. Consider the failure process I α t � I X t > VaR α t , where I is the indicator function such that and t � n + 1, n + 2, . . . , T. By Lemma 1 in [4], I α t ∼ ii d Binom(α) which is tested using the conditional coverage test that combines both the unconditional coverage test in [19] and the test for independence in [4] to under the null hypothesis E[I α t |I α t− 1 ] � α. e likelihood under the null hypothesis is where n 1 is the number of VaR exceedances and n 0 � T − n − n 1 . Now consider the first-order Markov chain generated by the transition probabilities of I α t : where π ij � Pr( is has an approximate likelihood function: where n ij is the number of times observation i is followed by j in the failure process I α t . From equation (38), we obtain the maximum likelihood estimate of π ij as Journal of Probability and Statistics      [11]. ECQ, extreme conditional quantiles in [25]. AECQ, proposed adjusted extreme conditional quantiles.  Table 4 reports the obtained P values for the likelihood ratios of the three tests in [4]. e tests were conducted on 250 one-day-ahead 5% VaR forecasts from the three considered models.
LR uc denotes likelihood ratio for the unconditional coverage test. LR ind denotes likelihood ratio for the independence test. LR cc denotes likelihood ratio for the conditional coverage test. Models accepted at 5% level of significance are highlighted in bold. Note that n is size of the sample used in estimation, while testing was done using a sample of size 250 for all n.
Observe that, as a consequence of consistency, the accuracy of ECQ and AECQ forecasts improves with increase in sample size. It can also be seen that all the three models perform poorly under LR ind and LR cc due to dependence in autoregression. Based on LR uc , the RRQ model performed poorly when it comes to forecasting. is can be attributed to the failure to incorporate extreme value theory in estimating residual quantiles in the RRQ model.

Conclusions and Recommendations
We have derived the extreme conditional quantile estimator and used it to obtain the one-step-ahead conditional Value at Risk forecast for a simulated financial distribution. Consistency of our estimators has been proved and illustrated through Monte Carlo simulations. We noticed that adding the noncrossing restriction during estimation improves accuracy of the resulting extreme conditional quantile estimator. Backtesting results from the one-stepahead conditional Value at Risk forecasts indicate that independence and conditional coverage tests in [4] are not appropriate for our estimators due to dependence in autoregressive models. μ t,θ )). Note that, by Assumption 1, ζ n (μ t,θ ) does not depend on n. Since M θ (X t , μ t ) does not depend on μ t,θ , then μ t,θ ∈ argmin μ t (1/n) n t�1 ζ(μ t,θ ). We need to show that the objective function ζ(μ t,θ ) satisfies the following conditions for application of eorem 12.2 in [34]: (μ t,θ ) has a unique minimum at μ t,θ e functional form of ζ(μ t,θ ) and Assumption 3 guarantees measurability of M θ (X t , μ t,θ ). To prove condition (2), we first show that ζ(μ t,θ is Lipschitz continuous. By definition,
Proof of Lemma 2. e proof proceeds in a similar way to the proof of Lemma 1.
Data Availability e data used in the article were simulated, and the data generating process (DGP) is included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.