Confidence sequences with composite likelihoods

In dominated parametric statistical models, confidence sequences provide conservatively valid frequentist inference directly from a likelihood ratio. They ensure a specific mode of replicability when inference is performed on accumulating data: inferential conclusions that are compatible with a guaranteed probability when the sample is enlarged, in the form of overlapping confidence regions. Here we consider both Robbins' mixture confidence sequences and running maximum likelihood confidence sequences recently considered by Wasserman, Ramdas, and Balakrishnan. We compare through simulation the replicability properties of the two kinds of confidence sequences, evaluating, along a prospected enlargement of the sample, the frequency of incompatible estimation intervals and the frequency of failure of simultaneous coverage of the true parameter value. Moreover, we propose a shortcut to extend the application of mixture confidence sequences to pseudo‐likelihoods, in particular to composite likelihood. The main assumption required is that normal asymptotic theory offers a good approximation to the density of the maximizer of the pseudo‐likelihood. When inference is about a scalar parameter of interest, the computation of the proposed sequence of confidence intervals is straightforward. The method is illustrated by an example with replicability properties evaluated through simulation.


INTRODUCTION
Ritualistic application of frequentist inferential tools such as -values, even from likelihood ratio tests, is often pointed out as a source of the replicability crisis in science famously denounced by Ioannidis (2005). Indeed, calibration established on hypothetical repetitions of the experiment that produced the data at hand, considered in isolation, is too vague a guarantee of replicability. Such a calibration gives rise to episodic inferences that are vulnerable, inter alia, to selection bias and interim analyses. See Benjamini (2020) on selective inference as a killer of replicability.  underline that to request that compatible, i.e., noncontradictory, conclusions be reached when the sample is enlarged is a better basis than the repeated sampling principle to embed a concept of replicability into statistical theory. This view emphasizes statistical models as models for sequential environments.
With confidence regions for the same parameter, calculated at various sample sizes, inferential conclusions are compatible if these regions overlap, and incompatible if their intersection is empty. When inference is performed on the basis of accumulating data, hasty announcement of provisional conclusions that will turn out to be incompatible with the final conclusions may cause a large reputational damage.
A confidence sequence (Robbins, 1970; see also Darling & Robbins, 1967a,b) is a sequence of confidence regions constructed so as to be all compatible with a guaranteed probability. Research on confidence sequences seems to have been long neglected after the technical contributions in Lai (1976) and Csenki (1979). In recent years, however, there has been a renewed interest, with various aims. See, e.g., Wasserman, Ramdas & Balakrishnan (2020), Howard et al. (2021), Johari et al. (2021), Vovk & Wang (2021), and Howard & Ramdas (2022).
Outside proper sequential settings, use of confidence sequences enhances replicability of the conclusions of a stand-alone study, but, of course, only actual replication in follow-up studies may give the experimental demonstration of a finding. In the more recent literature, confidence sequences are often termed as "anytime-valid confidence regions" and the whole approach as "safe inference". The mixture device-used to obtain confidence sequences by Robbins (1970) and other early contributors-has been superseded by the device of data splitting in Wasserman, Ramdas & Balakrishnan (2020). While computationally more convenient (no integration is required), data splitting infringes the weak likelihood principle and may look dubious unless applied to intractable models when no other tool with frequentist guarantee is available.
With high-dimensional data, often the full likelihood is difficult to specify and inference may be based on a misspecified likelihood such as composite likelihood. A composite likelihood combines dependent likelihoods from small portions of the data using convenient weights. See Varin, Reid & Firth (2011) for a review. See also Pace, Salvan & Sartori (2019) and Fraser & Reid (2020) for results on optimal weights. When composite likelihood is the basis for inference, it is of interest to construct confidence sequences whose estimation regions are all compatible with an at least approximate frequentist guarantee. One proposal with exact frequentist validity is in Nguyen (2020), which generalizes the data splitting device of Wasserman, Ramdas & Balakrishnan (2020) to composite likelihoods.
In this work, we propose a shortcut to construct confidence sequences à la Robbins from composite likelihood. The main assumption required is that normal asymptotic theory offers a good approximation to the density of the maximizer of the pseudo-likelihood. When inference is about a scalar parameter of interest, the computation of the proposed sequence of confidence intervals is straightforward.
The outline of the article is as follows. Section 2 offers a brief review of the rationale behind confidence sequences. The main devices to obtain confidence sequences, namely mixture as in Robbins (1970) and splitting as in Wasserman, Ramdas & Balakrishnan (2020), are recalled in Section 3, where simple examples are examined, including simulations. In Section 4, attention is devoted to confidence sequences associated with asymptotically normal estimators. The particular case of estimators from composite likelihoods is considered in detail in Section 5. Section 6 presents an example of confidence sequences with composite likelihoods, with simulations supporting the claim of approximate validity when the sample size is large enough. Section 7 concludes.

CONFIDENCE SEQUENCES
Let the potentially observable data be ( ) = ( 1 , … , ), a realization of the random vector ( ) = ( 1 , … , ), = 1, 2, …. We denote by the joint probability distribution of the sequence (∞) = ( 1 , 2 , …) and suppose that belongs to a statistical model with parameter space Θ ⊆ IR . Moreover, we assume that ( ( ) ; ) > 0 is the density of ( ) under , whose support is independent of . Ideally, by observing the sequence ( ) the statistician will eventually discover the truth, that is, the true value of in Θ, denoted by * .
An estimation region based on ( ) is a subset of Θ, denoted byΘ =Θ ( ( ) ) or similar symbols. A confidence sequence is a sequence of estimation regions. A confidence sequence offers compatible inferential conclusions about if there are conclusions that are common to all confidence statements, which are thus noncontradictory. Consistency of an estimator that is always contained inΘ often entails that the sequenceΘ shrinks towards the true value of . A confidence sequenceΘ has persistence level 1 − , where 0 < < 1, if, for every ∈ Θ, This implies the frequentist guarantee that, for every ∈ Θ, so that the probability of observing incompatible conclusions from a sequence with persistence level 1 − as evidence accumulates is as small as desired. Indeed, ) .
Confidence sequencesΘ with persistence level 1 − provide conservatively valid frequentist inference. For any given ,Θ is an estimation region with confidence level at least 1 − . As remarked in Wasserman, Ramdas & Balakrishnan (2020, page 16888), such regions are valid at arbitrary stopping times and at arbitrary data-dependent times that are chosen post hoc.

MIXTURE AND SPLIT CONFIDENCE SEQUENCES
Robbins' (1970) confidence sequences, hereafter called mixture confidence sequences, have the form (2) with ( ( ) ) given by the mixture device Therefore they have the form The weight function ( ) is a preset probability density over Θ with ( ) > 0 for every ∈ Θ and invites Bayesian interpretation. Indeed, in the form (3), ( ( ) ) can incorporate prior information about . One advantage of the choice (3) in definition (2) is that the maximum likelihood estimatê= argmax ∈Θ ( ( ) ; ) is always a point inΘ 1− ( ( ) ) , because ( ( ) ;̂) ≥ ∫ Θ ( ( ) ; ) ( ) . More generally, mixture confidence sequences are likelihood-based. The setΘ 1− ( ( ) ) is the region of values whose likelihood ( is larger than a fraction of the integrated likelihood (3). The ideal mixture confidence sequence has ( ) ∼ ( * ); that is, it degenerates at the true . This choice produces the sequence Of course, sequence (5) has persistence level 1 − but it is not a feasible confidence sequence. However, * may be estimated using information besides ( ) . This fact is at the basis of the data splitting device introduced in Wasserman, Ramdas & Balakrishnan (2020). Considering for simplicity two groups with the same size , = 1, 2, … , the data (2 ) are randomly split as (2 ) . These observations are a realization of the random vector (2 ) = . The split estimation set, denoted byΘ 1− ( (2 ) ) with realizatioñ , is given bỹ wherê( ) 1 is any consistent estimator of depending on ( ) 1 , typically the maximum likelihood estimator or a regularized form of it. The set (6) is still of the form (2), with equal to 2 , . While mixture confidence sequences agree with the strong likelihood principle, data splitting infringes the weak likelihood principle and consequently it may lead to inferior inferences; see, however, Cox (1975). To somehow accommodate this drawback, Wasserman, Ramdas & Balakrishnan (2020, Section 4) have proposed various de-randomized variants, such as -fold splitting and subsampling, which we will not discuss.
Actually, in Wasserman, Ramdas & Balakrishnan (2020), the set (6) is proposed as a universal confidence set having, for every fixed sample size 2 , confidence level at least 1 − , as implied by Markov's inequality. It is not proposed as a confidence sequence with persistence level 1 − . The reason is, we guess, that it is not clear whether the martingale property holds for the sequence ) .
We conjecture that the sequence (6) may be used as an approximate confidence sequence, with asymptotic persistence level 1 − , where we say that a sequenceΘ has asymptotic persistence level 1 − if From this perspective, the sampling property of practical interest is that, for a sample size min large enough and a hypothetical observation horizon max much larger than min , When, as for sequence (6), analytic tools do not provide any guidance, simulation may be used to ascertain whether the above simultaneous frequency guarantee holds. In the simulations in Examples 1 and 2 that follow, the confidence sequence is defined supposing that both datasets DOI: 10.1002/cjs.11749 The Canadian Journal of Statistics / La revue canadienne de statistique ( ) 0 and ( ) 1 are increased by one independent observation at a time and do not mix. Thus, new observations come in pairs and each of the two observations is randomly assigned to one of the two groups once and for all.
With independent and identically distributed (i.i.d.) random variables 1 , 2 , …, Wasserman, Ramdas & Balakrishnan (2020) introduce the confidence sequence with (exact) persistence level 1 − defined as where, for ≥ 2,̂( ( −1) ) is any estimate of depending on ( −1) , for instance the maximum likelihood estimate or a regularized form of it, whilê( (0) ) = , so that, on both sides of the inequality defining the sequence (7), the factor 1 ( 1 ; ) cancels out. If the dimension of is > 1, then 1 ( 1 ; ) in (7) may be substituted by the density of a block of or more observations. The sequence (7) is written supposing that the dataset is increased by one observation at a time. The definition is easily extended to cover cases when data are collected in groups. Wasserman, Ramdas & Balakrishnan (2020, Section 7) refer to the process giving rise to the sequence (7) as "running maximum likelihood ratio" and highlight that the idea originated in Wald (1947) and was further analyzed in Robbins & Siegmund (1972, 1974. In the following, a confidence sequence (6) will be referred to as split-naive while a confidence sequence (7) will be referred to as split-exact. At any given , split-naive intervals are computationally much more convenient than split-exact intervals. The next two examples compare the average length of confidence intervals from mixture and from split confidence sequences, both naive and exact. The empirical percentage of incompatible inferences and of noncoverage of the true parameter value pertaining to confidence intervals from naive and exact split confidence sequences are also compared. The results for mixture intervals are overall satisfactory.
At any given sample size 2 , mixture intervals depend on the data only through the minimal sufficient statistic, while split intervals, both naive and exact, do not. A small simulation experiment was conducted in order to compare the lengths of the three kinds of intervals. The persistence level selected is 1 − = 0.80. As observed in Pace & Salvan (2020, Example 2), the corresponding confidence sequence gives fixed-intervals, for in the range [ min , max ] =[10, 4000], close to conventional confidence intervals with level 0.995. Various values of 2 from 20 to 1000 and several choices of the true in the range [0, 2.5] were considered, with 2 0 = 1. The standard normal weight function ( ) = − 2 ∕2 ∕ √ 2 was used for mixture intervals, such that 0 = 0 and 2 0 = 1 in (8). The results, based on 10,000 Monte Carlo replications, are displayed in Table 1. When the sample size is small or moderate, and the true is not far away from 0 , mixture intervals are generally shorter. With > 0 + 0 , for larger sample sizes, the split intervals of the naive kind are the shortest, whereas the split intervals of the exact kind seem to be slightly inferior even when 2 =1000.
Next, we perform a simulation to investigate the conjecture that, for conveniently large sample sizes, the split-naive intervals have approximately persistence level 1 − . For 10,000 replications, sequences of samples with even size from 2 min to 2 max were generated with min = 100 and max = 40,000. The behaviour along the sequence of mixture intervals (8) with 0 = 0 and 2 0 = 1, split-naive intervals (9), and split-exact intervals (10) was observed, with the aim of detecting sequences that give incompatible conclusions (incompatibilities) and sequences that do not always cover the true parameter value (uncoverages), at various nominal persistence levels, 1 − . The results are displayed in Table 2. Analogous results are displayed in Table 3 for sequences of samples with even size from 2(400) to 2(80,000). The mixture and split-exact intervals look very conservative in the sense that the percentages of incompatibilities and uncoverages are both far below the bound 100 × . The sequences of split-naive intervals with 1 − = 0.80 are anti-conservative. However, they improve their closeness to the nominal bound as min moves from 100 to 400.
Intervals (8) may provide a simple closed-form approximation for confidence sequences (4) for a scalar parameter when a normal weight function is used. Suppose that a normal approximation is available for the maximum likelihood estimator̂, that is,   (8) with 0 = 0 and 2 0 = 1 (mixture), intervals (9) (split-naive), and intervals (10) (split-exact), all with nominal persistence level 1 − = 0.80 in 10,000 samples with size 2 , for various true values of . withv 2 an estimate of the asymptotic variance of̂(that is, of v 2 ( ) = 2 ( )∕ ). If a ( 0 , 2 0 ) density is used as a weight function where 0 is the conjectured central value for , then, in analogy with (8), a closed-form confidence sequence for iŝ for ≥ min with min sufficiently large andv = √v 2 .
As Pace & Salvan (2020, Section 4) showed through simulation in some special models, for sequences starting from a moderate sample size min , this proposal seems to maintain approximately the persistence level 1 − in the examples considered. Closed-form confidence intervals (11) have a Wald-type structure. Consequently, unlike intervals from a genuine likelihood, intervals (11) are not exactly equivariant under reparameterizations. On the other hand, intervals (11) rely only on the assumption that normal asymptotic theory offers a good approximation of the density of the estimator of . Therefore, any asymptotically normal estimator could be used in (11), like a robust estimator or the maximizer of a pseudo-likelihood (such as a composite likelihood).    (8) with 0 = 0 and 2 0 = 1 (mixture), intervals (9) (split-naive), and intervals (10) (split-exact) at various nominal persistence levels 1 − in 10,000 sequences of samples with even size from 800 to 160,000. The true value of is zero.
∼ (log , 1∕(2 )) and using a normal weight function for log with mean 0 and variance 2 0 matching the mean and variance of log when ∼ Gamma( , ).

CONFIDENCE SEQUENCES ON A PARAMETER OF INTEREST
When > 1 and the parameter is partitioned as = ( , ), where ∈ Ψ is a 0 -dimensional component of interest and is nuisance, in special models, safe inference about can be based on a statistic ( ) = ( ( ) ) inducing a marginal or conditional model free of . In full generality, anytime-valid inference about may be obtained from the profile likelihood using the mixture or the data splitting device. The projection on the subspace Ψ of a confidence sequenceΘ 1− ( ( ) ) for with persistence level 1 − ,Ψ is clearly a confidence sequence for with persistence level 1 − . The confidence sequence (15) turns out to be based on the profile likelihood. For instance, with mixture confidence sequences we obtainΨ 1− wherêis the maximum likelihood estimate of in the model for ( ) with fixed and If a normal approximation is available for the maximum likelihood estimator̂of a scalar , that is,̂∼ , withv 2 an estimate of the asymptotic variance of̂, and an ( 0 , 2 0 ) density is used as a weight function, where 0 is the conjectured central value for , a closed-form confidence sequence for , analogous to sequence (11) iŝ The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs.11749 Like sequence (11), closed-form intervals (17) have a Wald-type structure and consequently they are not exactly equivariant under interest-respecting reparameterizations, that is, reparameterizations = ( ) = ( , ) where = ( ) and = ( , ). Also, the splitting device may be applied to the profile likelihood, as we see substitutinĝ Θ 1− ( ( ) ) in (15) with the split-exact set Θ 1− ( (2 ) ) or even the split-naive setΘ 1− ( (2 ) ) , obtaining the confidence sequences Ψ 1− ( ( ) ) andΨ 1− ( ( ) ) , respectively. See Wasserman, Ramdas & Balakrishnan (2020, Section 5) for details on Ψ 1− ( ( ) ) . Admittedly, the persistence level ofΨ 1− ( (2 ) ) will be 1 − asymptotically at best. However, confidence sequenceŝ Ψ 1− ( ( ) ) are likely to be much more conservative than confidence sequencesΘ 1− ( ( ) ) . The same remark holds for the analogous confidence sequences from the splitting device, Ψ 1− ( ( ) ) . Thus, conservativeness should ease the task of respecting the nominal persistence level forΨ 1− ( ( ) ) . Unfortunately, the computational burden imposed by definition (16) may be very heavy. In this respect, things are a little bit better for the confidence sequenceΨ 1− ( ( ) ) . In regular models, asymptotic sufficiency and asymptotic normality of the maximum likelihood estimator of offer a comfortable way out of the predicament.

CONFIDENCE SEQUENCES FROM COMPOSITE LIKELIHOODS: NO NUISANCE PARAMETERS
In this section, after a brief review of composite likelihood theory, we focus on confidence sequences from composite likelihoods when the parameter of the model is scalar and the maximizer of the composite likelihood is asymptotically normal. The more general case of a scalar parameter of interest in the presence of nuisance parameters will be treated in the next section.
In complex models, complexity often entails that the full likelihood ( ) = ( ; ) is computationally intractable, or even difficult to specify. It is then convenient to trade a certain loss of efficiency for material relief of the computational burden. In their simplest form, composite likelihoods are pseudo-likelihoods for composed by multiplying elemental contributions ( ), = 1, … , . Contributions ( ) are in turn genuine likelihoods for based on low-dimensional but dependent parts of the data. Dependence occurs when has dependent components or when the same block of the data appears in more than one of the factors ( ). For spatial data, Besag (1974) proposed to use elemental likelihoods ( ) from conditional densities. More generally, likelihoods ( ) from conditional or marginal densities were considered in Lindsay (1988). See Varin, Reid & Firth (2011) for a comprehensive review of composite likelihoods.
A composite log likelihood has the general form where ( ) = log ( ) and w are convenient weights, often chosen all equal to 1 (Sang & Genton, 2014). When the contributions ( ) are independent random variables, unitary weights are optimal and make ( ) a genuine log likelihood. Under regularity conditions, the score function and the Fisher information matrix of ( ) are ( ) = ∕( ) ( ) and ( ) = The score function from the composite likelihood ( ) is then The estimating equation ( ) = 0 is unbiased, meaning that ( ( ) ) = 0. Under the usual regularity conditions, the resulting estimator̃is asymptotically normal with mean and variance equal to the inverse of the Godambe information matrix, that is, The Godambe information matrix is the × matrix where ( ) and ( ) are, respectively, the variability and sensitivity matrices of the estimating function ( ): Both matrices ( ) and ( ) are symmetric and are assumed to be invertible. When ( ) = ( ), the Godambe information simplifies and the composite likelihood behaves more like a genuine likelihood. The estimating equation ( ) = 0 is then called information unbiased (Lindsay, 1982). As an illustrative example, we consider below inference about a scalar , with true value * , using a composite likelihood and even the optimal composite likelihood.
where ( ) = ∕( ) log ( ) is the score from the full likelihood. Variability and sensitivity of ( ) are ( ) = w ⊤ Σ ( )w and ( ) = w ⊤ ( ), whence we see that to have meaningful weights we have to restrict w so that that ( ) > 0. The Godambe information of the estimating function ( ) is The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs.11749 In the asymptotic variance, may be substituted by a consistent estimator, such as̃, the maximizer of the so-called independence likelihood corresponding to the widely used pseudo-log likelihood̃( ) = ∑ =1 ( ) = 1 ⊤ ( ), where 1 ⊤ = (1, … , 1). Following Fraser & Reid (2020) and Pace, Salvan & Sartori (2019), the asymptotically most efficient estimator from a composite log likelihood of the form (18),̃ * , is obtained when the weights are w * = w( * ) = Σ ( * ) −1 ( * ); that is, w * are the regression coefficients of the multiple linear regression of ( * ) on ( * ), because ( * ) = v * ( ( * ), ( * )). To use the optimal composite log likelihood in practice, the unknown * has, of course, to be replaced by a consistent estimator such as̃. When the maximizer of the composite likelihood is asymptotically normal and we use a ( 0 , 2 0 ) density as a weight function for , a confidence sequence consisting of closed-form intervals is obtained. From approximation (20) we get an interval of the form (11), wherê=ã w∕{w ⊤ (̃)} for a given vector of weights w. Refining the above formula, a sequence obtained from the composite likelihood that uses the optimal weights w * has

CONFIDENCE SEQUENCES FROM COMPOSITE LIKELIHOODS WITH NUISANCE PARAMETERS: AN EXAMPLE
When the -dimensional parameter of the model is = ( , ), where is a scalar parameter of interest and is a nuisance parameter, suppose that a composite log likelihood ( ) provides the estimatẽ⊤ = (̃,̃⊤). If approximation (19) holds, the asymptotic sampling distribution of under is̃⋅ where v 2 ( ) = −1 ( ) 11 is the entry of the inverse of the Godambe information matrix at the first row and first column. Obtaining optimal weights for profile inference about is not straightforward: see Pace, Salvan & Sartori (2019, Section 3). Here, we suppose therefore that the composite likelihood is defined using a given vector of weights w, in general, unrelated to optimality.
With the weight function ( ) corresponding to the a confidence sequence for is then given by formula (17) witĥreplaced bỹandv 2 replaced byṽ 2 . We conjecture that this method provides a confidence sequence with asymptotic persistence level 1 − . To support this conjecture, a simulation study has been performed, considering the following example.
A collection of genuine log likelihoods for provided by the independent marginals 1 , … , is given by with corresponding score functions The estimating equation that is most efficient among the scores from the combined log likelihoods of the form (18) is obtained with w * = 1 : that is, from with corresponding estimating function Indeed, the maximizer of ( ) is = ∑ =1 ∑ =1 ∕( ), which is also the maximizer of the full likelihood.
The sampling distribution of is exact and not merely asymptotic. Therefore, when and 2 are known, using the weight function ( ) corresponding to the ( 0 , 2 0 ) density, the mixture confidence sequence for with exact persistence level 1 − is where v = √ 2 {1 + ( − 1) }∕( ). The sequence is obtained by increasing for a fixed . The parameter is orthogonal to the block of parameters ( 2 , ). This entails that ( ), cf. (21), with 2 and substituted by a consistent estimate, has the same asymptotic properties as a profile likelihood. In particular the optimality of the estimating function ( ) with estimated 2 and is preserved. Moment estimates of 2 and are based on the sum of squares statistics ∕ . They have the following expressions (see Searle, Casella & McCulloch, 1992, Section 3.5): ) .

√̃2
{1 + ( − 1)̃}∕( ). Finally, a simulation to investigate the conjecture that, for conveniently large sample sizes, the confidence sequence (23) has approximately persistence level 1 − is in order. For 10,000 replications, sequences of samples with from min to max have been generated with min = 25 and max = 10,000. The behaviour along the sequence of intervals (22) and (23) has been observed, with the aim of detecting incompatibilities and uncoverages, at various nominal persistence levels 1 − and = 5. The results are displayed in Table 6. The exact intervals look very conservative. The intervals with estimated v show some more incompatibilities and uncoverages. DOI: 10.1002/cjs.11749 The Canadian Journal of Statistics / La revue canadienne de statistique

CONCLUSIONS
In this article, we have dealt with a concept of replicability according to which, under the assumed statistical model, the current region and regions from arbitrarily enlarged samples have a large enough probability of overlapping. The definition of the persistence level 1 − makes the idea precise. We have emphasized, as a means to reach this end, the mixture confidence sequences described in Robbins (1970). An advantage of mixture confidence sequences is their justification under various views of inference, as stressed in . The price to pay for controlling for the probability of sequence-wise overlapping of confidence regions is that wider regions are needed in comparison with the usual confidence regions with the same confidence level. These results are exact and refer to using the full likelihood as the grounds of parametric inference. Using an asymptotic normal approximation for the estimator of a scalar parameter of interest, approximate closed-form confidence sequences are easily calculated. This article has explored such a shortcut to extend the application of approximate confidence sequences to composite likelihoods. Simulation results support the conjecture that confidence sequences obtained in this way, with nominal persistence level 1 − , have guaranteed sequence-wise compatibility.
Although all the examples in the article consider a scalar parameter of interest, the general construction based on (2) applies naturally to a multiparameter setting. Moreover, the approximate expression based on asymptotic normality of the estimator of the parameter extends easily to a vector parameter of interest using a Wald-type statistic.
Directions of future investigation include the efficient computation of confidence sequences when the estimator has no closed-form expression and is the solution of an estimating equation. Focusing on estimating equations might also overcome the lack of parameterization equivariance of the approximate solution considered here.