A Simple R-Estimation Method for Semiparametric Duration Models

Modeling nonnegative financial variables (e.g. durations between trades, traded volumes or asset volatilities) is central to a number of studies across financial econometrics, and, despite the efforts, still poses several statistical challenges. Among them, the efficiency aspects of semiparametric estimation. In this paper, we concentrate on estimation problems in Autoregressive Conditional Duration (ACD) models with unspecified innovation densities. Exponential quasi-likelihood estimators (QMLE) are the usual practice in that context. The efficiency of those QMLEs (the only Fisher-consistent QMLEs) unfortunately rapidly deteriorates away from the reference exponential density—a phenomenon that has been emphasized earlier by Drost and Werker (2003), who propose various semiparametrically efficient procedures to palliate that phenomenon. Those procedures rely on a general semiparametric approach which typically requires kernel estimation of the underlying innovation density. We propose rank-based estimators (R-estimators) as a substitute. Just as the QMLE, R-estimators remain root-n consistent irrespective of the underlying density, and rely on the choice of a reference density under which they achieve semiparametric efficiency; that density, however, needs not be the exponential one. Contrary to the semiparametric estimators proposed by Drost and Werker (2003), R-estimators neither require tangent space calculations nor kernel-based density estimation. Numerical results moreover indicate that R-estimators based on exponential reference densities uniformly outperform the exponential QMLE under such families of innovations as the Weibull or Burr densities. A real data example about modeling the price range of the Swiss stock market index concludes the paper.


Rank-based inference in time series
Exact inference is a red thread running through Jean-Marie Dufour's entire scientific life, providing a strong coherence to his otherwise quite broad and diverse list of publications. That interest in exact inference can be traced back to his early career contributions to rank-based inference. Rank-based methods, in the early eighties, were essentially limited to linear models with independent observations (more precisely, to models driven by exchangeable noise), and Jean-Marie Dufour pioneered the idea of using them in a time series and econometric context: see Dufour [17], Dufour et al. [19] and Dufour and Roy [20] on rank-based testing against serial dependence, or Dufour and Hallin [18] on rank-based tests for AR(1) coefficients.
In this paper, we show how R-estimation is possible also in the context of autoregressive conditional duration (ACD) models, or, more generally, in the setting of multiplicative error models. The main theoretical tools needed for the development of our R-estimators are related to the general theory for Restimation in semiparametric dynamic location-scale models developed in Hallin and La Vecchia [29], where we refer to for further technical details. Whereas the basic ideas of [29] still hold for ACD models, however, their application is not straightforward, since a key assumption on the structure of cross-information matrices does not hold; moreover, covariance and information matrices, in the ACD context, do not admit explicit forms and have to be evaluated numerically. The methods of [29] thus need to be adapted to the specific case of ACD models, which is our main objective here.
Numerical results (see Section 1.3) provide strong evidence that the traditional (and daily practice) exponential QMLE is uniformly, and quite significantly, dominated by the R-estimator with exponential reference density, under the widely used class of Weibull innovation densities. We conjecture that this, more generally, might be the indication that the asymptotic relative efficiencies of that R-estimator with respect to the exponential QMLE is uniformly larger than one under any innovation density. Such a result would be the counterpart, in the context of the positive-valued ACD processes and for exponential scores, of the classical celebrated Chernoff-Savage property (Chernoff and Savage [11], Hallin [26]) for ARMA processes and Gaussian scores. In the absence of an analytical form for (cross-)information matrices, however, the methods that are used in the classical setting are not available here, and we were not able to prove (nor disprove) the conjecture.

Duration models in high-frequency econometrics and finance
Duration is commonly defined as the time interval between consecutive events. The autoregressive conditional duration (ACD) model was proposed by Engle and Russell [23] to model irregularly spaced (ultra) high-frequency financial transaction data and, since then, duration models have been widely applied in economics and finance; see the seminal papers of Dufour and Engle [16], Engle [22] and related literature.
A short survey about possible applications in economics and finance is available in Pacurar [50], while Tsay [53] provides a book-length introduction to ACD models.
The growing interest in duration models is related to the availability of financial data recorded at high-frequency level, like e.g. tick-by-tick data or records every 5 minutes. The high-frequency data has inspired research across the fields of finance, econometrics and statistics. For a book-length discussion of econometric tools for high-frequency data, we refer to Aït-Sahalia and Jacod [1] and Hautsch [39].
The triggers of this interdisciplinary research are the technological progress in trading systems and trade recording, as well as an increasing importance of optimal trade execution and (intraday) liquidity dynamics in the analysis of market microstructure; see, for instance, O'Hara [48] and Hasbrouck [38].
Besides intra-day time intervals between consecutive financial transactions, duration models more generally apply to (financial) time series with non-negative observations. Duration models indeed are multiplicative error models (see Engle [21]), and they are widely used e.g. for volatility and/or volume modeling. As an example, we mention the daily range 1 of an asset, a measure which is applied to proxy the asset price volatility (see, e.g., Parkinson [51], Kunitomo [42] and related literature). Chou [12] considers the class of conditional autoregressive range (CARR) and shows that it provides accurate inference and forecasts for the range. Now, as stressed by Tsay [54], the CARR model is equivalent to the ACD(1,1) model, hence duration models can be effectively applied to conduct inference on asset volatility. Similar considerations apply to the autoregressive conditional volume process introduced by Manganelli [44].
Moreover, ACD models not only apply to different types of financial variables, but also to different types of sampling schemes. Indeed, in case of financial durations, ACD models are suitable when the process under study is observed in event time, so that observations are irregularly spaced in time. However, the same ACD models also are effective in the analysis of processes in calendar time, such as aggregated trading activities (e.g., daily volumes) based on (equidistant) time intervals.
Even if the financial literature has gained several insights into the determinants of trading times and into the dynamics of traded volumes (see e.g. Chapter 1 of Hautsch [39], Biais et al. [8], O'Hara [49] and related papers), the statistical analysis of duration models still raises problems related, mainly, with the accuracy of model parameter estimators. The standard duration model (see section 2.1) depends on an unspecified Euclidean parameter of interest, and is driven by unobserved shocks ǫ i -call them innovations-with innovation density g. That innovation density typically should remain unspecified, in which case the model is of a semiparametric nature. Durations being intrinsically nonnegative, innovation densities are supported on (0, ∞) rather than the real line; obvious candidates are the exponential, log-normal, Weibull, Burr, or Gamma densities, generally with increasing failure rates. Although it can be argued that exponential distributions, being typical for waiting times, are somehow "natural" in the context, empirical studies (including Engle and Russell [23] and Engle [22]) indicate that the assumption of exponential innovations is quite unlikely to hold in practice. Drost and Werker [15] nevertheless show that, in case the ǫ i 's are i.i.d. with density g in a broad class G of densities, quasi-likelihood (henceforth, exponential QMLEs) remain root-n consistent irrespective of g ∈ G. In contrast, quasi-likelihood estimators based on log-normal, Burr, or Weibull densities fail to be root-n consistent under misspecification; see also Gouriéroux et al. [25]. Despite empirical evidence that actual innovation densities are not exponential, exponential quasi-likelihood estimators (hencenforth, QMLEs) therefore have become standard practice in the context. This, since rate-optimal consistent QMLEs in semiparametric models, as a rule, are associated with "least favourable densities", induces a potentially significant loss of efficiency, which is amply confirmed by simulations (Section 1.3).
In theory, the remedy to that major drawback of QML methods is the classical semiparametric approach associated with the names of Bickel, Klaassen, Wellner and Ritov (see Bickel et al. [9] and Chapter 25 of van der Vaart [55] for the case of independent observations, or Drost et al. [14] for the time-series case).
In that approach, estimation equations are based on semiparametric score functions resulting from the so-called tangent space projections, after plugging in a nonparametric estimator of the actual density g.
Building on Drost et al. [14], Drost and Werker (2004) (see also Ranasinghe and Silvapulle [52]) apply this method in the context of ACD model, not only relaxing the specification of the innovation density g, but even giving up the assumption of i.i.d. innovations (replacing it with a full range of dependence structures yielding for ǫ i a conditional density g i itself depending on the past). Under such challenging framework, Drost and Werker succeed in deriving semiparametrically efficient estimators by fully exploiting the resources of the Bickel et al. (1993) methodology. That success, however, comes at a cost: besides its analytical sophistication, it requires the estimation, by means of kernel methods, of the density g or the conditional densities g i , their derivatives, and some conditional expectations. This implies the delicate selection of tuning parameters, which asymptotically do not matter, but for finite n may have a strong impact on the actual value of the estimators; sample splitting is generally required as well.
In this paper, we are avoiding that cost by adopting another and somewhat simpler approach, based on residual ranks: no kernel density estimation, and no sample-splitting here. On the other hand, unlike Drost and Werker [15], we are restricting to the case of i.i.d. (actually, exchangeable) ǫ i 's.
Just as quasi-likelihood, R-estimation relies on the choice of some reference density f , under which they are achieving semiparametric efficiency; f needs not coincide with the actual unspecified g, but is not restricted to be exponential in order for the resulting R-estimators to remain root-n consistent under g = f . Restricting f to the traditional exponential reference density f exp , however, yields, in our numerical exercises, R-estimators that uniformly outperform the exponential QMLE under such commonly used families as the Weibull, while achieving parametric efficiency under g = f exp . A data-driven choicê f (n) of the reference density f is also possible, provided thatf (n) only involves the order statistic of residuals. Contrary to the classical semiparametric estimator, root-n consistency in such cases does not requiref (n) to converge to g.
The good finite-sample performance of R-estimators based on exponential densities, and the resulting efficiency gains over exponential QMLEs, are illustrated via simulations (Sections 1.3 and 5) and a real-data application (Section 6).

A motivating example
To further motivate our approach, let us consider the simple ACD(1,0) model, with equation where ǫ i has a Weibull density g = g W ζ with scale one and shape parameter ζ ≥ 1. Our first objective is a numerical comparison of the efficiency, under such Weibull innovations, of the exponential QMLE with that of the Weibull maximum likelihood estimator (MLE) for the parameter β, as a function of ζ. Both estimators are root-n consistent and asymptotically normal, so that their efficiencies are captured by their asymptotic variances. We perform our comparison under different tailweights, as obtained by varying the shape parameter ζ, which produces different degrees of skewness and kurtosis in the innovation density g W ζ ; note that for ζ = 1 the Weibull density coincides with the exponential while, for ζ > 1, it belongs to the class of increasing failure rate (IFR) densities. To visualise the variety of shapes thus considered, we first plot in Figure 1 (left and middle panels) the Weibull densities g W ζ associated with some representative values of ζ and their failure rates.
Next, in order to highlight the poor performance of the exponential QMLE and the benefits to be expected from R-estimation,  (2) for each trajectory, we computed the MLE and QMLE (using the R routine acdFit) and, anticipating on the results of Section 4, the R-estimator based on exponential reference density; (3) for each value of ζ, we evaluated the variances, over the 1500 replications, of those three estimators, and the ratios of the QMLE and R-estimator variances, respectively, to the variance of the MLE (the parametric efficiency bound).
The resulting ratios are finite-sample evaluations of the asymptotic relative efficiencies ARE ζ (MLE/QMLE) and ARE ζ (MLE/R-Estim) of the MLE and the R-estimator with respect to the QMLE, under Weibull innovation density g W ζ . The results are shown in Figure 1   that, whereas the QMLE performs as well as the MLE for ζ = 1, its performances rapidly deteriorate as ζ moves away from one, with an ARE ζ (MLE/QMLE) value larger than 4 for ζ ≥ 2. This means that the QMLE requires abour four times as many observations as the MLE to attain the same asymptotic variance: blindly relying on exponential QMLEs thus can be extremely costly-much more so than most practitioners would think. The R-estimator with exponential reference density appears to perform uniformly better than the exponential QMLE: at ζ = 1 they asymptotically coincide, yielding a mutual ARE value of one; for all other values of ζ > 1, the variance of the R-estimator is about half the variance of the QMLE, indicating that R-estimation with exponential reference and n observations achieves the same performances as the exponential QMLE with 2n observations.  This example clearly shows that R-estimators can do uniformly better than the QMLE in terms of performance. For simplicity, we restricted to the same exponential reference density as used for the exponential QMLE, but further improvements can be expected from other reference densities-a Weibull 6 reference with estimated ζ, in this context, would perform even better.

Outline of the paper
The paper is organized as follows. The general setting-the definition of ACD models and the main assumptions we are making-is described in Section 2. Section 3 introduces the basic technical tools to be considered in the sequel: local asymptotic normality (LAN) in Section 3.1, and rank-based central sequences in Section 3.2. The construction of R-estimators is explained and commented in Sections 4.1 and 4.2. Section 5 provides some simulation results, while Section 6 deals with a real-life application to the Swiss Stock Index data.

The Autoregressive Conditional Duration model 2.1 General setting
As in the seminal paper of Engle and Russell [23], let Y i denote the duration between some (i − 1)th and ith events (typically, the time elapsed between two successive transactions of some asset). We say that {Y i | i ∈ Z} satisfies the assumptions of the accelerated time or autoregressive conditional duration where the ǫ i 's are i.i.d., with nonvanishing (over (0, ∞)) density g, and ǫ i independent of Ψ i−1 , Ψ i−2 , . . . .
Clearly, under that model, and provided that µ g : Exogenous variables also can be included in the expression for Ψ i−1 , which allows to capture the influence, on the distribution of future durations, of observable covariates. The theory developed in this paper can easily accommodate for the presence of such covariates but, for the sake of notational simplicity, we do not pursue with this.
Let Y (n) := (Y 1 , . . . , Y n ) be the observed finite realization of some solution of (2)-not necessarily a stationary one. Rather, let us assume we also observe initial values (Y −p+1 , . . . , Y −1 , Y 0 ) and (Ψ −q , . . . , Ψ −1 ), and base our (asymptotic) analysis on the likelihoods of Y (n) conditional on those initial values. If θ is such that a stationary solution exists (see page 228 of Tsay [53] for positivity and stationarity conditions), the asymptotic influence of those starting values is nil, and we safely can put along with Ψ −q = . . . = Ψ −1 = 0 (so that Ψ 0 = 1). In practice, however, a more reasonable choice would be, for instance, which is an estimator of the unconditional stationary mean EΨ i of Ψ i ; note that an estimator of µ g The global influence of those starting values is asymptotically negligible, and so is the mild dependence induced byȲ .
Based on those initial values, Ψ i , in view of (2), can be expressed linearly (the coefficients are products and powers of β's and γ's) as a function of the past observations and the parameter value θ: For any given values of the Y i 's, that function ψ i clearly is differentiable with respect to θ, with gradientψ i := grad θ ψ i , say. Unless p and/or q are equal to one, obtaining analytical expressions for ψ i andψ i seems difficult. However, both ψ i andψ i can be computed recursively from the initial val- the second equation in (2) with respect to θ yieldṡ where ψ i k stands forψ i 's kth component, thus providing the desired linear recursion. Again, the asymptotic influence of initial values is nil, hence they safely can be put to zero; a more reasonable choice, however, iṡ which is (as follows from elementary computation) an unbiased estimator of the unconditional stationary expectation Eψ i ofψ i .

Some special cases
For p = q = 1 (the ACD(1,1) model, which is, by far, the ACD model most frequently considered in practice), the estimation problem (for θ = (β, γ) ′ ) is pretty similar to the GARCH(1,1) one considered in Bollerslev [10]. Equation (3) takes the form (positivity and the stationarity conditions require nonnegative 8 values of β and γ such that β+γ < 1, so that O(γ i ) quantities below go to zero exponentially fast as i → ∞) and the gradientψ i satisfies the linear difference equatioṅ Lengthy but elementary calculation then yieldṡ where the O(.) quantities 2 are uniform in i.

Assumptions
Our objective is to conduct inference about θ while avoiding both a complete parametric specification of the data-generating mechanism and a kernel estimate of the underlying innovation density. Throughout, we assume that (2) holds, with unspecified parameter value θ and innovation density g in the class G of densities that do not vanish on (0, ∞) and satisfy a few regularity assumptions described below. The resulting statistical models thus are semiparametric ones, with Euclidean parameter of interest θ, and infinite-dimensional nuisance g.
Denoting by P (n) θ,g the joint distribution, under (2), of Y (n) (conditionally on the initial values, as explained in Section 2.1), consider the semiparametric families P (n) = {P (n) θ,g : θ ∈ Θ, g ∈ G}, n ∈ N, where Θ and G are such that the following assumptions (Assumptions (A) and (B), but also Assumption (C) stated in Section 3.1) hold.
for the residuals associated with the parameter value θ, the hypotheses H . . , Z 1 (θ) , write, with a slight abuse of notation, ψ i (Z , respectively, when the Y i 's are expressed as functions of the Z i (θ)'s. In case the Z i (θ)'s and ǫ i 's coincide, the same quantities will be denoted, with ǫ i := ǫ i , ǫ i−1 , . . . , ǫ 1 , as ψ i (ǫ i ; θ) andψ i (ǫ i ; θ).

Local asymptotic normality, semiparametric efficiency, and ranks
In this section, we introduce the main methodological tools to be used in the sequel. First, we restate the uniform local asymptotic normality (ULAN) as in Drost et al. [14] and Drost and Werker [15], with central sequence ∆ (n) (θ, g), of the parametric fixed-g submodels P θ,g : θ ∈ Θ}. Then, following Hallin and Werker [37], we project ∆ (n) (θ, g) onto the σ-algebra generated by the ranks of the residuals Z i (θ).
The proofs of the following results follow along the general lines developed in Hallin and La Vecchia [29]. 3 See page 228 of Tsay [54] for sufficient conditions.

Uniform local asymptotic normality
Define wherel (Z along with the additional assumption Assumption (C). For all θ ∈ Θ and g ∈ G, (i) the matrix Γ(θ, g) exists, is finite and has full rank, and (ii) the mapping θ → Γ(θ, g) is continuous.
A constant difficulty with ACD models, however, is that no explicit form of Γ(θ, g) is available-not even for exponential g; that difficulty, as we shall see, extends to the cross-information matrices used in R-estimation, to be replaced with consistent estimators. Numerical values easily can be computed, though, either as estimators or via simulations. To be precise, denoting by ε 1 , . . . , ε N an i.i.d. sequence generated from density g, since ϕ g (ǫ i ) is centered (under g) and independent ofψ i−1 (ǫ i−1 ; θ)/ψ i−1 (ǫ i−1 ; θ), is a consistent estimator, as N → ∞, of Γ(θ, g).

Theoretical justification
In the classical semiparametric approach, the semiparametrically efficient central sequence is the tool that one needs to conduct inference and reach semiparametric efficiency bounds. That semiparametrically efficient central sequence, denoted as ∆ * (n) (θ, g), is obtained by projecting the central sequence ∆ (n) (θ, g) along the so-called tangent space; see Drost and Werker [15] for the case of ACD models.
Typically, the actual computation of semiparametrically efficient central sequences for time-series models is a painful case-by-case task. Moreover, once ∆ * (n) (θ, f ) has been obtained for all f ∈ G, the actual density g still has to be estimated by some adequateĝ (n) , to be plugged into ∆ * (n) (θ, f ), yielding ∆ * (n) (θ,ĝ (n) ). Indeed, the asymptotic distribution of ∆ * (n) (θ, f ) under P (n) θ,g is unknown whenever f = g, and, typically, E θ,g [∆ * (n) (θ, f )] = 0. As a consequence, the estimators based on ∆ * (n) (θ, f ) reach the semiparametric efficiency bounds under P (n) θ,f , but are no longer root-n consistent under P (n) θ,g for f = g. Finally, further precautions, such as sample splitting, in principle are required in order for ∆ * (n) (θ,ĝ (n) ) to be asymptotically equivalent, under P (n) θ,g , to ∆ * (n) (θ, g). It has been shown in Hallin and Werker [37] that, for a very broad class of models (including most time series models) where the density g of some underlying white noise remains completely unspecified, the maximal-invariance properties of residual ranks offer an attractive way to achieve semiparametric efficiency at f without recurring to kernel estimation nor sample splitting. More precisely, projecting ∆ (n) (θ, f ) onto the σ-field generated by the ranks of the residuals Z 1 (θ), . . . , Z n (θ) yields a rank-based, hence distributionfree, version ∆ (n) (θ, f ), say, of the semiparametrically efficient central sequence ∆ * (n) (θ, f ). Namely, under a very mild condition on score functions (Assumption (D) below), so that estimators (and the tests) based on ∆ (n) (θ, f ) reach the semiparametric efficiency bounds under P (n) θ,f (that is, when g = f ). However, the distribution-freeness of ranks ensures that those estimators remain root-n consistent under any P (n) θ,g , g ∈ G. Thus, they remain valid (i.e., rate-optimal) irrespective of the actual density g, the estimation of which therefore is not required.
Hallin and La Vecchia [29] show how the above arguments can be adapted to a fairly general class of dynamic location-scale models that includes the ACD model (2). Note that, for exponential f , semiparametric and parametric efficiencies coincide, so that R-estimators based on exponential reference density f achieve, under g = f , parametric efficiency. Moreover, as shown by the motivating example of Section 1.3, they uniformly outperform the exponential QMLE under IFR Weibull densities, offering a most attractive alternative to the existing procedures.

Computation of rank-based central sequences
Let us provide some details on the computation, for ACD models, of the rank-based central sequences ∆ (n) (θ, f ) just described. Let f ∈ G be some reference density. Typical candidates are the Gamma, Weibull, Generalized-F, or Burr densities. Denote by R (n) (θ) the vector (R i , or R(θ), R i (θ) and Z i (θ), dropping the dependence on θ and/or n when no confusion is possible.
To perform the rank-based construction, simply consider the central sequence ∆ (n) (θ, f ) in (7), which (for given starting values) is entirely measurable with respect to the residuals Z i (θ), and (i) replace those residuals Z i (θ) with F −1 (R i (θ/(n + 1)); (ii) recenter 4 the resulting expression about its expected value (this centering, for ACD models, can be dispensed with, as it leads to a negligible o P (1) correction). This yields a vector of rank-based statistics which, being measurable with respect to the ranks, is distributionfree under parameter value θ. Provided that the score function ϕ f associated with the reference density f satisfies the very mild Assumption (D) below, 5 the general results of Hallin and La Vecchia [29] imply the asymptotic equivalence (under parameter value θ) of that vector (an approximate-score rank statistic) with its exact-score counterpart ∆ (n) (θ, f ); we thus safely use the same notation for both.
Assumption (D). The mapping (from R + to R) z → ϕ f (z) is monotone, or a linear combination of monotone mappings.
Let us illustrate the above construction with the simple example of the standard ACD(1,1) model, as in Engle and Russell (1998). Here, where the shocks ǫ i are i.i.d. with density g ∈ G. As in the GARCH(1,1) case discussed in Bollerslev [10], initial values have negligible asymptotic influence on the central sequence, hence can be put to zero (a more reasonable choice is the unbiased estimator It then follows from Proposition 1 that where Z i (θ) = Y i /Ψ i−1 (θ),ψ i is as in (5) and, from the second part of (12), Here again, the asymptotic influence of starting value is negligible, and Ψ 0 for simplicity can be put to one, a reasonable choice being Ψ i 's stationary unconditional expectation (1 + βȲ )/(1 − γ).
Turning ∆ (n) (θ, f ) into a rank-based central sequence first requires expressing it in terms of the residuals Z i (θ). Denote by R n (θ). Letting Ψ 0 = Ψ 0 , start, for i = 1, . . . , n, the recurrence This yields, for reference density f , the rank-based central sequence 6 The exact covariance matrix

Method
We now turn to the specific problem of R-estimating θ. The basic idea consists in treating the rank-based central sequence ∆ (n) (θ, f ) as we would, under density f , treat the central sequence ∆ (n) (θ, f ). The huge advantage of ∆ (n) (θ, f ) over ∆ (n) (θ, f ) is that, due to distribution-freeness, its expectation under any g (and parameter value θ) is zero-the Fisher consistency condition that ∆ (n) (θ, f ) typically only enjoys under g = f . As a consequence, the resulting estimators (called R-estimators) remain root-n consistent irrespective of the actual density g-a property that does not hold for the estimator based on ∆ (n) (θ, f ) unless f is exponential or Gamma.
The technique, described in [29], basically consists in subjecting ∆ (n) (θ, f ) to Le Cam's classical onestep method. It requires a preliminary, root-n consistent (under any P (n) θ,g , g ∈ G), estimatorθ (n) of θ, along with a consistent (under any P (n) That matrix 7 is the one determining, via Le Cam's Third Lemma, the asymptotic behavior under local alternatives of the form P (n) θ+n −1/2 τ ,g of ∆ (n) (θ, f ), as well as its asymptotic linearity (see Assumption (F) below).

Constructing such Γ
(n) f is a delicate task, since Γ (θ, f, g) involves the expectation, under the actual density g, which is unknown, of quantities that themselves depend on g and f . The general method described in [29] does not apply here, as the factorization of Γ (θ, f, g) on which it relies does not hold. As an alternative, we propose where e k , k = 1, . . . , p+q, stands for the kth vector of the canonical basis in R p+q . Proposition 2 establishes under which assumptions Γ (n) f indeed is an adequate estimator of Γ (θ, f, g). The R-estimator of θ based on reference density f then is defined as Before characterizing the asymptotic behavior of θ (n) f , let us summarize the assumptions to be made.

Proposition 2 and (21) implies that θ
(n) f remains root-n consistent and asymptotically normal for any (f, g) ∈ G such that Assumption (F2) holds, while (22) entails semiparametric efficiency under g = f : contrary to many M-and L-estimators, and unlike non-exponential quasi-likelihood estimators, thus, our R-estimators are robust to innovation density misspecification.

Comments
The rank-based estimation procedure just described calls for some comments.
(i) As a rule, preliminary estimators should not aim at efficiency, and should focus on robustness; a natural candidate forθ (n) here is the exponential quasi-likelihood, the consistency of which (under any P (n) θ,g ) has been established by Drost and Werker [15]. The only purpose ofθ (n) is to restrict subsequent estimation to a root-n neighborhood of the actual parameter value θ.
(ii) Assumption (F1) requiresθ (n) to be locally asymptotically discrete. This is easily achieved by discretizing any root-n consistent estimatorθ (n) along a grid of mesh c 0 n −1/2 , c 0 > 0 arbitrarily small but fixed. It should be insisted, however, that such discretization is only required in the statement and proof of asymptotic properties; in applications, there is no point in discretizing, asθ (n) in practice only involves a finite number of digits, and n does not go to infinity; see pages 125 and 188 of LeCam and Yang [43].
(v) The one-step method can be considered as a single iteration, with starting valueθ (n) , of a Newton-Raphson numerical solution of which is the natural generalization to the present setting of the Hodges-Lehmann definition of Restimators (Hodges and Lehmann [40] is restricted to univariate location parameters). In theory, one iteration is sufficient; in practice, though, further iterations can be performed, and should be, until numerical stabilization is achieved-which amounts to computing a Newton-Raphson solution to the Hodges-Lehmann estimation equations.

Simulation study
In this section, we illustrate the actual performances of our R-estimator. For the sake of simplicity, we work on the ACD(1,0) model as in the motivating example. In Section 5.1, we concentrate on the Burr(̺ 1 = 1, ̺ 2 = 2) and Weibull(ζ) innovation densities g. The case of Weibull(ζ) innovations with ζ ≥ 1 has been treated in Section 1.3; Weibull densities with ζ < 1 are considered in Section 5.2.

Estimation under Burr densities
In a first Monte Carlo experiment, we simulate the ACD process (1) QMLEs therefore are not consistent M-estimators; see Gouriéroux et al. [25] or Meitz and Teräsvirta [46].
Nevertheless, the Weibull QMLE belongs to daily practice and, in some settings (such as the Monte Carlo design of this simulation), its bias can be relatively small.
Estimators (a)-(c) were obtained using the R routine acdFit, while the R-estimates are easily derived from the recurrence (14) followed by iterating the one-step update (19) until numerical stabilization.
Results are displayed in Figure 2. They reveal that the R-estimator outperforms both QMLEs, for all sample sizes considered. When n = 1000 ("large" sample), the boxplot of the (infeasible) Burr MLE and the boxplot of the R-estimator are pretty similar (even though the R-estimator is based on a misspecified reference density), whereas the QMLEs exhibit higher variability. When n = 250, the R-estimator still outperforms both QMLEs; inspection of the top left panel in Figure 2 indicates that (d) has smallest bias and smallest interquartile range when compared to (a) and (b). We conjecture that this is due to the robustness/stability features of rank-based procedures. In contrast, it is well known that QMLEs (a) and (b) are, typically, much more sensitive to small-sample effects and/or observed extreme values: the combination of these issues is most likely the cause of the large number of outliers visible in the boxplot of the exponential QMLE in Figure 2, for n = 250 and n = 500-their impact decreases as the sample size increases, though.

Performance for Weibull with ζ < 1
In Section 1.3, we compared the behavior of the Weibull MLE, the exponential QMLE, and our R-estimator (with exponential reference density) in the case of the Weibull densities with ζ ≥ 1, which belong to the class of IFR densities. Here, we are exploring the performance of the same estimators, now under Weibull densities with ζ = 0.6 < 1, which no longer belong to the class of IFR densities. Numerical results suggest that the efficiency gains of the R-estimator over the QMLE carry over to non-IFR case. Our simulations are based on 1000 replications of samples of length n = 225 (moderate sample size) and n = 3500 (large sample size), respectively. Results are displayed in Figure 3.

Real Data
As highlighted by Tsay [54] and Yatigammana et al. [56], ACD models can be fitted to any time series with non-negative observations. In this section, we illustrate the use of our R-estimators in the context of stock volatility modeling based on the daily range (see Chou [12] and Martens and Van Dijk [45]), and consider the time series of daily ranges for the Swiss stock market index (SMI), over the period 02-Jan-2015 to 30-Dec-2015. 8   To validate this conjecture, we model the dynamics of a "standardized version" Y i /α (with α = 0.126) of the range Y i by an ACD(1,1) of the form 9 (12), and estimate the model parameter θ = (β, γ) using (i) the exponential QMLE and (ii) our R-estimator based on the exponential reference density. 9 More precisely, considering for Yi a more general ACD(1,1) model with Ψi−1 = α + β α Yi−1 + γΨi−1, the dynamics of Yi/α clearly satisfy an ACD(1,1) of the form (12) studied here; we obtained (via exponential QMLE, over the period Jan-2013 to Dec-2014) an estimationα = 0.126 (with variance 0.0081), which was used to rescale the observations into Yi/0.126 over the period Jan-2015 to Dec-2015. 20
Some numerical details. The exponential QMLE is obtained using the R routine acdFit and the Nelder-Mead optimisation algorithm; we estimate its asymptotic variance matrix using the sandwich formula in Engle [22]. To define our R-estimator, we use the one-step method as in (19), implementing the recursion in (14) with Ψ 0 = 1 and Y 0 equal to the sample meanȲ , using an exponential reference density (with rate one) and the exponential QMLE ofθ as a preliminary estimator. The estimated cross-information matrix Γ (n) f is obtained as in (18). The asymptotic variance matrix of the R-estimator as in (21) requires the computation of Γ (θ, f ): this can be obtained either using the permutation of the residual ranks (see the last paragraph of Section 3.2.2), or via simulation: since Γ (θ, f ) for f = f exp coincides with the covariance matrix, under f exp , of the QMLE, one indeed can estimate it by simulating an ACD(1,1) with exponential innovation density. The latter method, which is computationally much lighter, was adopted; but this would not be feasible for f = f exp .
As a diagnostic check, we compute the autocorrelation function of the corresponding standardized residuals. In Figure 5, we display the results for the R-estimator 10 : the plot indicates that the ACD(1,1) model is appropriate, since no significant autocorrelation remains in the standardized residuals at any considered lag. Moreover, we also display (left panel of Figure 5) the predicted (as obtained using the R-estimates) versus the observed range: also this picture shows that the model does capture the average behaviour of the observed series. Figure 6 provides graphical evidence of the advantages of R-estimation: since the standard errors of the R-estimates are smaller than those of the QML estimates, the R-estimation method yields a 90%level asymptotic joint confidence region much smaller than (and, in line of the numerical findings of Section 1.3, strictly included in) the QMLE-based one. Moreover, we also remark that the elliptical asymptotic QMLE-based confidence region includes many pairs (β, γ) such that the β + γ > 1: these values violate the stationarity assumption of the ACD(1,1). The same problem is affecting, although to a much smaller extent, the confidence region based on the R-estimates. These considerations remain true for other confidence levels, such as 95% or 97.5%, which we do not report here.
Finally, to gain further understanding of the data-generating process, we work on the density of the standardized residuals associated with β and γ . On those residuals, we first fit a kernel density estimate (the bandwidth is selected via the Silverman rule), then, by minimizing the Cramér-von Mises distance, a Weibull distribution. In Figure 6, we compare those two fits. The estimated Weibull (with shape parameter valueζ ≈ 3) is pretty similar to the kernel density estimate in the left tail and close to the center of the distribution, but it does not capture well the right-tail behavior. This exercise of fitting the 10 Similar plot is available for the QMLE standardized errors.

21
Weibull parameters gives further strength to our results: theζ and the corresponding efficiency gain of the R-estimator (as displayed in the left panel of Figure 6) indeed are perfectly in line with the simulation results of the motivating example; see Figure 1 for large values of ζ.