1 Introduction

In survival analysis, missing data are a central notion, as full observability of all units is rarely achievable. Consequently, statistical methods and approaches are designed to consider the various types of missing information to obtain valid inference. The situation we address in the following is at the intersection of survival analysis and retrospective studies, where truncated data result from sampling only failed units within a certain data collection period.

Example 1

Consider personal insolvencies in Germany (see also Sect. 5). When a person becomes insolvent in Germany, individual information on the person is made available due to legal processes. Since the information is accessible for a limited amount of time through a public website, only units with their insolvency event occurring within the data collecting period can be observed. We have collected data from this source from March 2018 to December 2018 and sampled data on insolvent persons born between 1920 and 2000 in all 16 federal states of Germany. Figure 1 illustrates this truncation mechanism.

Fig. 1
figure 1

Illustration of the truncation mechanism. Only units that fail within the data collection period are sampled (solid circles). Units that fail before or after the data collection period (empty circles) are not sampled

This sampling design is a special case of random double truncation, as the lifetimes are truncated if they are either too short or too long. Random double truncation may occur as an unavoidable characteristic of data collection procedures or may be chosen deliberately as a cost-effective alternative to other sampling designs. In the example above, the stochastic process of births over time poses a specific component of the truncation setup, which is addressed here.

Random double truncation is relevant to many areas including medicine (see Lagakos et al. 1988; Scheike and Keiding 2006; Moreira et al. 2014), astronomy (see Efron and Petrosian 1999), economics and engineering (see Lawless and Kalbfleisch 1992; Ye and Tang 2016). In recent years, attention to this area of research has been growing (see Emura and Konno 2012; Zhang 2015; Emura et al. 2017; Dörre and Emura 2019; de Uña-Álvarez 2020). This is partly due to the fact that administrative and economically collected data are often observational and frequently subject to selective criteria or constraints which can be posed in a truncation framework.

Among the various existing models and methods for doubly truncated data that have been investigated, the nonparametric maximum likelihood estimator under random double truncation (NPMLE, see Shen 2010a; Moreira and Uña-Álvarez 2010; Emura et al. 2015) has perhaps been studied most extensively. The NPMLE is generally applicable under double truncation, but can be computationally demanding and requires caution regarding identifiability aspects (see Shen 2010a) and the existence of the estimator (see Xiao and Hudgens 2019).

Parametric models have been developed partly to address the drawbacks of the nonparametric approach (see Emura et al. 2017; Dörre and Emura 2019). In Dörre (2020), fully parametric models have been studied in a setting similar to the one studied here. Finally, semi-parametric models (see Wang 1989; Shen 2010b) have been proposed in the context of double truncation.

Most existing models for doubly truncated data are primarily designed for homogeneous populations and can not readily be extended to heterogenous populations in an efficient manner. However, heterogeneity may be an immanent feature of survival data. For instance, in the example described above, it can not be assumed that the underlying lifetime distributions of people in different federal states are identical. While existing approaches such as the NPMLE could be applied to each subpopulation separately, this procedure would not be efficient if the subpopulations have common structural properties.

By incorporating available information on the structure of the study population, we derive a semiparametric method for survival data under double truncation. The survival data are modeled as heterogeneous in that the population may consist of separate subpopulations with potentially different lifetime distributions.

In Sect. 2, we specify the sampling setup and parameter space, and derive a corresponding likelihood model in order to define proper maximum likelihood estimates. By reconsidering the model structure, we address asymptotic properties and discuss notions of point processes and inverse probability weighting in Sect. 3. In Sect. 4, finite-sample properties of the proposed maximum likelihood estimates and numerical optimization methods are examined by means of simulation studies. We apply the methods to study personal insolvencies in Germany in Sect. 5.

2 Model and methods

2.1 General framework

We suppose that the units under investigation experience two subsequent events, generically called ’birth’ and ’failure’, throughout time. The meaning of these events may differ with respect to the particular context, and constitute a survival analysis setup, where the ’lifetime’ is defined as the duration between birth and failure. Furthermore, we suppose that the population of interest is composed of a finite number \(m\) of distinct subpopulations, which may exhibit different lifetime distributions.

Units emerge throughout time according to stochastic processes, yielding random subpopulation sizes \(N_j, j=1,\ldots ,m\). Each unit is born at a random time \(b\) and has a random lifetime \(y\). The corresponding failure event therefore occurs at \(b+ y\). The stochastic process describing the birth of units implicitly corresponds to a distribution \(G(\cdot )\) of births in the population.

Due to the sampling setup, only units that fail within the data collection interval \([\tau ^{\mathrm {L}}, \tau ^{\mathrm {R}}]\) are observed. Therefore, any unit fulfilling the selection criterion

$$\begin{aligned} \tau ^{\mathrm {L}}\leqslant b+ y\leqslant \tau ^{\mathrm {R}}\end{aligned}$$
(1)

is being sampled, and for sampled units we assume that full information is available, i.e. birth \(b\), lifetime \(y\) and the variable \(x\in \{1,\ldots ,m\}\) indicating the subpopulation to which the unit belongs (see Fig. 2). Units not fulfilling this selection criterion are truncated in the sense that no information about those units is available.

The sample consists of \(n\leqslant N\) units, and is subdivided into random subsamples of sizes \(n_j, j=1, \ldots , m\), according to the considered subpopulations. Note that the subpopulation sizes \(N_j\) are unobservable, whereas the subsample sizes \(n_j\) are observed. We denote the data as \(D_{n}= \{(b_i, x_i, y_i): i =1, ..., n\}\).

Throughout, we make the following assumptions:

  1. (A)

    For each \(j=1,\ldots ,m\), the units in the jth subpopulation emerge independently of all other subpopulations and stochastically within a common fixed time interval \([s,t]\) according to a Poisson process with intensity function \(b\mapsto \lambda _j(b) > 0, \ b\in [s,t]\).

  2. (B)

    The intensity functions are proportional, i.e. there is a function \(b\mapsto \lambda _0(b)\) and \(a_j > 0, j= 1, \ldots , m\), such that \(\lambda _j(b) = a_j \lambda _0(b)\).

  3. (C)

    The lifetime distributions are continuous and parametric, i.e. for each \(j=1,\ldots ,m\), there is a true parameter \(\varvec{\theta }_j\) such that \(f_j(\cdot |\varvec{\theta }_j)\) is the density function of \(y\) conditional on \(x= j\).

  4. (D)

    The truncation mechanism is such that a unit \((b, x, y)\) from the population is sampled if and only if it fails within \([\tau ^{\mathrm {L}}, \tau ^{\mathrm {R}}]\), i.e. if \(\tau ^{\mathrm {L}}\leqslant b+ y\leqslant \tau ^{\mathrm {R}}\).

  5. (E)

    The lifetimes \(y\) and births \(b\) are independent.

The notation introduced in Assumption (C) reflects that different subpopulations may possess different distribution classes. Furthermore, we write \(\varvec{\theta }= (\varvec{\theta }_1, \ldots , \varvec{\theta }_{m})\), where needed. Regarding Assumption (B), we generally set \(a_1 = 1\), such that \(\lambda _0(\cdot ) \equiv \lambda _1(\cdot )\), to avoid non-identifiability of the proportionality factors \(a_j, j=2, \ldots ,m\).

Assumption (E) is necessary in some derivations below. In particular, this assumption requires that the lifetime distribution in any subpopulation does not change over the course of time. In specific applications, this assumption may be questionable. For testing the (quasi-)independence assumption, several approaches have been studied; see Tsai (1990) and Martin and Betensky (2005). Recent papers which address the aspect of the independence assumption within the truncation context are Chiou et al. (2018, 2019) and Moreira et al. (2021).

Fig. 2
figure 2

The general framework of the sampling setup. Units emerge in subpopulations according to a Poisson process with proportional intensity functions. Depending on the individual time of birth \(b\) and lifetime \(y\), a unit is being sampled if it fulfills the selection criterion \(\tau ^{\mathrm {L}}\leqslant b+ y\leqslant \tau ^{\mathrm {R}}\)

2.2 Proportional birth intensity model

The distribution function of a randomly chosen birth \(b\) generated by a Poisson process with intensity function \(\lambda (\cdot )\) is equal to

$$\begin{aligned} G(b|\lambda (\cdot )) = \frac{\int _{s}^{b} \lambda (c) \, \mathrm{d}c^{}}{\int _{s}^{t} \lambda (c) \, \mathrm{d}c^{}}, \quad b\in [s, t]. \end{aligned}$$
(2)

By Assumption (B) and Eq. (2), it follows that there is a common birth distribution for all considered subpopulations, such that the birth distribution functions of the subpopulations fulfill \(G_j \equiv G\) for all \(j=1, \ldots , m\).

By Assumptions (A) and (D), the observed units of any subpopulation j form a random point process with regard to the observed births \(b_i\) where \(x_i = j\). It can be shown that the observed point process of each subpopulation j is also a Poisson process. Furthermore, the sum of all observed birth processes is a Poisson process, too, by the independence stated in Assumption (A). For any subpopulation \(j=1,\ldots ,m\), we denote the cumulative intensity of the latent birth process as \(\varLambda _j := \int _{s}^{t} \lambda _j(b) \, \mathrm{d}b^{}\) and the cumulative intensity of the observed birth process as \(\varLambda ^{\mathrm {obs}}_j := \int _{s}^{t} \int _{\mathbb {R}}^{} \lambda _j(b) f_j(y|\varvec{\theta }_j)\mathrm {I}(\tau ^{\mathrm {L}}\leqslant b+ y\leqslant \tau ^{\mathrm {R}}) \, \mathrm{d}y^{} \, \mathrm{d}b^{}\).

The probability of observing one unit, conditional on it belonging to subpopulation j with given birth time \(b\), denoted as \(\pi _j(\varvec{\theta }_j|b)\), depends on the lifetime distribution of the corresponding subpopulation, and is equal to

$$\begin{aligned} \pi _j(\varvec{\theta }_j|b)= \int _{\tau ^{\mathrm {L}}- b}^{\tau ^{\mathrm {R}}- b} f_j(y|\varvec{\theta }_j) \, \mathrm{d}y^{}. \end{aligned}$$

The intensity function of the observed point process of subpopulation j is equal to

$$\begin{aligned} \lambda ^{\mathrm {obs}}_j(b) = \pi _j(\varvec{\theta }_j|b)\lambda _j(b), b\in [s,t]. \end{aligned}$$
(3)

Similarly, \(\varLambda ^{\mathrm {obs}}_j = \varLambda _j \overline{\pi }_j(\varvec{\theta }_j|G)\), where

$$\begin{aligned} \overline{\pi }_j(\varvec{\theta }_j|G)= \int _{s}^{t} \pi _j(\varvec{\theta }_j|b) \, \mathrm{d}G(b)^{} \end{aligned}$$
(4)

is the probability of observing a random unit from subpopulation j, given birth distribution function G. To see this, note that

$$\begin{aligned} \varLambda ^{\mathrm {obs}}_j&= \int _{s}^{t} \int _{\mathbb {R}}^{} \lambda _j(b) f_j(y|\varvec{\theta }_j)\mathrm {I}(\tau ^{\mathrm {L}}\leqslant b+ y\leqslant \tau ^{\mathrm {R}}) \, \mathrm{d}y^{} \, \mathrm{d}b^{} \\&= \int _{s}^{t} \lambda _j(b) \int _{\mathbb {R}}^{} f_j(y|\varvec{\theta }_j)\mathrm {I}(\tau ^{\mathrm {L}}\leqslant b+ y\leqslant \tau ^{\mathrm {R}}) \, \mathrm{d}y^{} \, \mathrm{d}b^{} \\&= \int _{s}^{t} \lambda _j(b) \int _{\tau ^{\mathrm {L}}- b}^{\tau ^{\mathrm {R}}-b} f_j(y|\varvec{\theta }_j) \, \mathrm{d}y^{} \, \mathrm{d}b^{} \\&= \int _{s}^{t} \lambda _j(b) \pi _j(\varvec{\theta }_j|b) \, \mathrm{d}b^{} \end{aligned}$$

Using Eq. (2), it follows that \(\lambda _j(b) \mathrm{d}b= \varLambda _j \mathrm{d}G(b)\), and therefore

$$\begin{aligned} \varLambda ^{\mathrm {obs}}_j&= \int _{s}^{t} \pi _j(\varvec{\theta }_j|b)\varLambda _j \, \mathrm{d}G(b)^{} = \varLambda _j \overline{\pi }_j(\varvec{\theta }_j|G). \end{aligned}$$

By \(h(\cdot )\), we denote the density function of \(x\), i.e. the probability of a unit from the population belonging to a specific subpopulation. It is a consequence of Assumption (B) that this density function only depends on \({\varvec{a}}= (a_1, \ldots , a_{m})\), representing a multinomial experiment with

$$\begin{aligned} h(j|{\varvec{a}}) = \frac{a_j}{\sum _{k=1}^m a_k}, \quad j = 1, \ldots , m. \end{aligned}$$
(5)

Furthermore, the sum of all latent birth processes itself forms a Poisson process, and has cumulative intensity \(\varLambda _1 \sum _{j=1}^m a_j\). Similarly, it holds that the sum of all observed birth processes is a Poisson process, too, for which the cumulative intensity is equal to \(\varLambda _1 \sum _{j=1}^m a_j \overline{\pi }_j(\varvec{\theta }_j|G)\). The probability of observing a single random unit from the population is equal to

$$\begin{aligned} \pi (G, {\varvec{a}}, \varvec{\theta })&= \sum _{j=1}^m h(j|{\varvec{a}}) \overline{\pi }_j(\varvec{\theta }_j|G)= \frac{\sum _{j=1}^m a_j\overline{\pi }_j(\varvec{\theta }_j|G)}{\sum _{j=1}^m a_j}, \end{aligned}$$
(6)

where \(\overline{\pi }_j(\varvec{\theta }_j|G)\) is defined as in Eq. (4). Finally, we state the following result, which is fundamental for constructing the likelihood function.

Lemma 1

Under Assumptions (A)–(E), the point process \(\eta (b, x, y) := \sum _{i=1}^{n} \mathrm {I}(b_i \leqslant b, x_i \leqslant x, y_i \leqslant y)\), which counts the observed units in the three-dimensional space \([s,t]\times \{1, \ldots , m\} \times \mathbb {R}_{\geqslant 0}\), is a Poisson process with intensity function

$$\begin{aligned} \mu (b, x, y) = \pi (G, {\varvec{a}}, \varvec{\theta })a_{x} \lambda _1(b) f_{x}(y|\varvec{\theta }_{x}) \mathrm {I}(\tau ^{\mathrm {L}}\leqslant b+ y\leqslant \tau ^{\mathrm {R}}). \end{aligned}$$

The cumulative intensity is equal to \(\pi (G, {\varvec{a}}, \varvec{\theta }) \varLambda _1 \sum _{j=1}^{m} a_j\).

Proof

It can be shown by elementary calculations that \(\eta (\cdot )\) is Poisson distributed with the corresponding intensity. Assumption (A) ensures independent increments of \(\eta (\cdot )\). A detailed proof is omitted. \(\square \)

2.3 Likelihood function

Under Assumptions (A)–(E), the observed Poisson process has the density function (see Snyder and Miller 1991 and Theorem 3.1.1 in Reiss 1993):

$$\begin{aligned} f(D_{n}|G, {\varvec{a}}, \varvec{\theta }) = \left( \prod _{i=1}^n a_{x_i} \lambda _1(b_i) f_{x_i}(y_i|\varvec{\theta }_{x_i})\right) \exp \left\{ - \pi (G, {\varvec{a}}, \varvec{\theta })\varLambda _1 \sum _{j=1}^m a_j\right\} \end{aligned}$$
(7)

Note that Eq. (7) does not consist of iid factors and is therefore not ideal for likelihood inference. By conditioning on the random sample size \(n\), however, we construct a likelihood function with proper iid terms. For more technical details in a similar model, see Dörre (2020).

To estimate the birth distribution function G nonparametrically, we place positive probability weights precisely on all observed births \(b_1, \ldots , b_n\). Specifically, we set \(\varDelta G_i = \mathrm {P}(b= b_i)\) for each \(i=1,\ldots ,n\), and treat \(\varDelta G_i\) as parameters of the unknown birth distribution. The corresponding birth distribution results as \(G(b) = \sum _{i=1}^{n} \varDelta G_i \cdot \mathrm {I}(b_i \leqslant b)\) for \(b\in [s,t]\). By conditioning Eq. (7) on the random sample size \(n\), we obtain the corresponding likelihood function

$$\begin{aligned} L(G, {\varvec{a}}, \varvec{\theta }|D_{n}) = \prod _{i=1}^{n} \frac{\varDelta G_i \cdot h(x_i|{\varvec{a}}) f_{x_i}(y_i|\varvec{\theta }_{x_i})}{\pi (G, {\varvec{a}}, \varvec{\theta })}, \end{aligned}$$
(8)

where \(\varDelta G= (\varDelta G_1, \ldots , \varDelta G_{n}) \in [0,1]^{n}\) with \(\sum _{i=1}^{n} \varDelta G_i = 1\), \({\varvec{a}}= (a_1, \ldots , a_{m}) \in \mathbb {R}_{> 0}^{m}\) with \(a_1 = 1\) and \(\varvec{\theta }= (\varvec{\theta }_1, \ldots , \varvec{\theta }_{m}) \in \varTheta _1 \times \cdots \times \varTheta _{m}\). We define the maximum likelihood estimator \((\widehat{G}^{(n)}, \varvec{\widehat{a}}, \varvec{\widehat{\varvec{\theta }}})\) as the maximizer of Eq. (8), i.e.

$$\begin{aligned} L(\widehat{G}^{(n)}, \varvec{\widehat{a}}, \varvec{\widehat{\varvec{\theta }}}|D_{n}) = \max L(G, {\varvec{a}}, \varvec{\theta }|D_{n}). \end{aligned}$$

Altogether, this model entails \(n+ m+ \sum _{j=1}^{m} \dim (\varTheta _j) - 2\) parameters, not counting \(a_1\) and \(\varDelta G_{n}\) due to being subject to constraints. The model is semiparametric in the sense that the number of parameters increases linearly with growing sample size. The likelihood function Eq. (8) can be rearranged as

$$\begin{aligned} L(G, {\varvec{a}}, \varvec{\theta }|D_{n}) = \pi (G, {\varvec{a}}, \varvec{\theta })^{-n} \prod _{i=1}^{n} \varDelta G_i \prod _{j=1}^{m} \left\{ \left( \frac{a_j}{\sum _{j=1}^{m} a_j} \right) ^{n_j} \cdot \prod _{i: x_i = j} f_j(y_i|\varvec{\theta }_j)\right\} , \end{aligned}$$

so that the log-likelihood function can be written as

$$\begin{aligned} \log L(G, {\varvec{a}}, \varvec{\theta }|D_{n})&= - n\log \pi (G, {\varvec{a}}, \varvec{\theta })+ \sum _{i=1}^{n} \log \varDelta G_i - n\log \left( \sum _{j=1}^{m} a_j\right) \\&\quad + \sum _{j=1}^{m} \left\{ n_j \log a_j+ \sum _{i: x_i = j} \log f_j(y_i|\varvec{\theta }_j)\right\} , \end{aligned}$$

where \(\pi (G, {\varvec{a}}, \varvec{\theta })\) is obtained as in Eq. (6) and, following Eq. (4),

$$\begin{aligned} \overline{\pi }_j(\varvec{\theta }_j|G)&= \sum _{i=1}^{n} \pi _j(\varvec{\theta }_j|b_i)\varDelta G_i \end{aligned}$$

for any distribution G with discrete jumps \(\varDelta G_i\) at \(b_1, \ldots , b_n\).

Note that the likelihood function Eq. (8) represents an inverse probability weighting (IPW) approach, where the probability weights \(\pi (G, {\varvec{a}}, \varvec{\theta })\) are determined endogenously and equally for every observed unit. By conditioning on certain parts of the data, other inverse probability weighting likelihood functions are conceivable under the considered truncation scheme. For instance, for any \(j = 1, \ldots , m\), an estimator \(\widehat{\varvec{\theta }}{}_j^c\) of \(\varvec{\theta }_j\) may be defined as the maximizer of

$$\begin{aligned} {\widetilde{L}}_j(\varvec{\theta }_j|D_{n}) := \prod _{i: x_i = j} \frac{f_j(y_i|\varvec{\theta }_j)}{\pi _j(\varvec{\theta }_j|b_i)}, \end{aligned}$$
(9)

and \({\widetilde{L}}_j\) is valid in the sense that \(\widehat{\varvec{\theta }}{}_j^c\rightarrow \varvec{\theta }_j\) as \(n_j \rightarrow \infty \), which is not proved here. Using Eq. (8) has the advantage of leading to generally more precise estimates due to the stronger assumptions with regard to the birth distribution, while Eq. (9) has the advantage of not being vulnerable to misspecification with regard to the lifetime distribution in other subpopulations than j or the birth distribution G.

2.4 Birth intensity estimation

The proposed likelihood model Eq. (8) can be regarded as an inverse probability weighted approach, in which every sampled unit is weighted by the inverse probability of observing a single random unit from the population. Roughly speaking, every sampled unit corresponds to \(\pi (G, {\varvec{a}}, \varvec{\theta })^{-1}\) hypothetical units in the population, out of which one single unit fulfills the selection criterion in Eq. (1) and has thus been sampled. Similarly, a sampled unit from the jth subpopulation with birth \(b_i\) corresponds to \(\pi _j(\varvec{\theta }_j|b_i)^{-1}\) hypothetical units before the selection. Based on this perspective and following Eq. (3), we define

$$\begin{aligned} \widehat{\varLambda }_j := \sum _{i: x_i = j} \pi _j(\varvec{\theta }_j|b_i)^{-1} \end{aligned}$$
(10)

as an estimate of the cumulative birth intensity of the jth subpopulation.

The IPW perspective also offers an alternative approach to estimate the proportionality factors \(a_2, \ldots , a_{m}\). Due to the proportional intensity assumption, it holds that \(\varLambda _j = a_j \varLambda _1\) for \(j=2, \ldots , m\). One simple way to estimate the coefficients \(a_2, \ldots , a_{m}\) is therefore to use Eq. (10), by which

$$\begin{aligned} \widehat{a}_j^{\mathrm {simple}}= \frac{\widehat{\varLambda }_j}{\widehat{\varLambda }_1} = \frac{\sum _{i : x_{i = j}} \pi _j ( \varvec{\widehat{\theta }}_{j} | b_{i })^{-1}}{ \sum _{i : x_{i = 1}} \pi (\varvec{\widehat{\theta }}_{1} | b_{i} )^{-1}} \end{aligned}$$
(11)

is obtained. Note also that this approach is feasible whenever a consistent estimate of \(\varvec{\theta }\) is available, and no estimate of G is required. We show in Sect. 4, that this method leads to valid, yet inefficient estimates.

3 Large-sample properties

Studying the asymptotic behaviour of the proposed estimators is based on the subsample sizes growing indefinitely. We note, however, that \(n_j \rightarrow \infty \) if and only if \(\varLambda _j \rightarrow \infty \), and thus asymptotics can not be studied without changing the birth intensity function. As a consequence of this perspective, it is not possible to estimate \(\varLambda _j\) consistently in the sense that \(|\widehat{\varLambda }_j - \varLambda _j|\) converges to 0 for \(n\rightarrow \infty \). Nevertheless, Eq. (2) shows that the birth distribution G, which we estimate, is invariant with respect to scaling the birth intensity function \(\lambda \), as long as its shape is not changed.

Due to Assumptions (A)–(E), \(n_j\) is Poisson distributed with mean \(\varLambda ^{\mathrm {obs}}_j = \varLambda _j \overline{\pi }_j(\varvec{\theta }_j|G)\). Therefore, when \(\varLambda _j \rightarrow \infty \), it holds that \(n_j \rightarrow \infty \), almost surely, and the expression

$$\begin{aligned} n_j^{-1} \sum _{i: x_i = j} \pi _j(\varvec{\theta }_j|b_i)^{-1} \end{aligned}$$

converges to \(E(\pi _j(\varvec{\theta }_j|b)^{-1}|x= j)\) by the law of large numbers, where \(E(\cdot )\) is the conditional expectation with respect to the law of observed units in subpopulation j. It holds that

$$\begin{aligned} E(\pi _j(\varvec{\theta }_j|b)^{-1}|x= j) = \int _{s}^{t} \pi _j(\varvec{\theta }_j|b)^{-1} \, \mathrm{d}G^{\mathrm {obs}}_j(b)^{}, \end{aligned}$$

where

$$\begin{aligned} \mathrm{d}G^{\mathrm {obs}}_j(b) = \frac{\pi _j(\varvec{\theta }_j|b)\lambda _j(b)}{\int _{s}^{t} \pi _j(\varvec{\theta }_j|c)\lambda _j(c) \, \mathrm{d}c^{}} = \pi _j(\varvec{\theta }_j|b)\lambda _j(b)/\varLambda ^{\mathrm {obs}}_j \end{aligned}$$

is the density function of observed random births \(b\) in subpopulation j. Therefore,

$$\begin{aligned} E(\pi _j(\varvec{\theta }_j|b)^{-1}|x= j) = \frac{1}{\varLambda ^{\mathrm {obs}}_j} \int _{s}^{t} \lambda _j(b) \, \mathrm{d}b^{} = \frac{\varLambda _j}{\varLambda ^{\mathrm {obs}}_j}, \end{aligned}$$

and thus

$$\begin{aligned} \lim _{\varLambda _j \rightarrow \infty } \frac{\widehat{\varLambda }_j}{\varLambda _j} = \lim _{\varLambda _j \rightarrow \infty } \frac{n_j}{\varLambda ^{\mathrm {obs}}_j} = 1, \end{aligned}$$

since \(n_j\) is Poisson distributed with parameter \(\varLambda ^{\mathrm {obs}}_j\). This derivation shows that \(\widehat{\varLambda }_j\) as defined in Eq. (10) is a reasonable estimate of \(\varLambda _j\). Consequently, \(\widehat{a}_j^{\mathrm {simple}}\) as defined in Eq. (11) is expected to be a reasonable estimate of \(a_j\).

To study the large-sample properties of the proposed estimators based on Eq. (8), we consider the asymptotic behaviour of the log-likelihood function. Define

$$\begin{aligned} Q_{n}(G, {\varvec{a}}, \varvec{\theta }) := \frac{1}{n} \sum _{i=1}^{n} \left\{ \log f_{x_i}(y_i|\varvec{\theta }_{x_i})+ \log h(x_i|{\varvec{a}}) + \log \varDelta G_i - \log \pi (G, {\varvec{a}}, \varvec{\theta })\right\} \end{aligned}$$

as the stochastic criterion function, which is maximized by the maximum likelihood estimator \((\widehat{G}^{(n)}, \varvec{\widehat{a}}, \varvec{\widehat{\varvec{\theta }}})\). Suppose first that the true birth distribution G is known. A conditional criterion function can be defined as

$$\begin{aligned} Q_{n}({\varvec{a}}, \varvec{\theta }|G) := \frac{1}{n} \sum _{i=1}^{n} \left\{ \log f_{x_i}(y_i|\varvec{\theta }_{x_i})+ \log h(x_i|{\varvec{a}}) - \log \pi (G, {\varvec{a}}, \varvec{\theta }) \right\} \end{aligned}$$

and it holds due to the law of large numbers that

$$\begin{aligned} Q_{\infty }({\varvec{a}}, \varvec{\theta }|G) := \lim _{n\rightarrow \infty } Q_{n}({\varvec{a}}, \varvec{\theta }|G) = E_{{\varvec{a}}, \varvec{\theta }} \left[ \log \left( \frac{f_{x}(y|\varvec{\theta }_{x})h(x|{\varvec{a}})}{\pi (G, {\varvec{a}}, \varvec{\theta })} \right) \right] , \end{aligned}$$
(12)

almost surely, where \(E_{{\varvec{a}}, \varvec{\theta }}\) denotes the expectation with respect to the distribution conditional on \(\tau ^{\mathrm {L}}\leqslant b+ y\leqslant \tau ^{\mathrm {R}}\). For known G, the density function of an observed random pair \((x, y)\) under truncation is precisely \(f_{x}(y|\varvec{\theta }_{x})h(x|{\varvec{a}})/\pi (G, {\varvec{a}}, \varvec{\theta })\), such that Eq. (12) in fact represents the Kullback–Leibler divergence with respect to \(({\varvec{a}}, \varvec{\theta })\) up to a constant additional term. It is well-known that this expression is uniquely maximized at the true parameter values \(({\varvec{a}}, \varvec{\theta })\) under appropriate regularity assumptions (see Lemma 5.35 in van der Vaart 1998).

Now, more generally, assume that some consistent estimator of G is available, i.e. an estimator \(\widehat{G}^{(n)}\) based on the data \(D_{n}\) such that

$$\begin{aligned} \Vert \widehat{G}^{(n)}- G\Vert _{\infty } = \sup _{b\in [s,t]} \left| \widehat{G}^{(n)}(b) - G(b)\right| \rightarrow 0, \end{aligned}$$

almost surely, for \(n\rightarrow \infty \). Reconsidering Eq. (4) and Eq. (6) and \(0 < \overline{\pi }_j(\varvec{\theta }_j|G)\leqslant 1\) for every \(j=1,\ldots ,m\), it follows that

$$\begin{aligned} \left| \pi (\widehat{G}^{(n)}, {\varvec{a}}, \varvec{\theta }) - \pi (G, {\varvec{a}}, \varvec{\theta })\right| \rightarrow 0, \ n\rightarrow \infty . \end{aligned}$$

Therefore, \(|Q_{n}({\varvec{a}}, \varvec{\theta }|\widehat{G}^{(n)}) - Q_{n}({\varvec{a}}, \varvec{\theta }|G)| \rightarrow 0\) for any \({\varvec{a}}\) and \(\varvec{\theta }\) and thus

$$\begin{aligned} \lim _{n\rightarrow \infty } Q_{n}({\varvec{a}}, \varvec{\theta }|\widehat{G}^{(n)}) = E_{{\varvec{a}}, \varvec{\theta }} \left[ \log \left( \frac{f_{x}(y|\varvec{\theta }_{x})h(x|{\varvec{a}})}{\pi (G, {\varvec{a}}, \varvec{\theta })} \right) \right] , \end{aligned}$$
(13)

as in Eq. (12). It follows that \(\varvec{\widehat{a}}\) and \(\varvec{\widehat{\varvec{\theta }}}\) are consistent under appropriate regularity assumptions, if \(\widehat{G}^{(n)}\) is a consistent estimator of G.

Assuming asymptotic normality of \(\varvec{\widehat{\gamma }}= (\varvec{\widehat{a}}, \varvec{\widehat{\varvec{\theta }}})\) with the variance given by the Fisher information, it is possible to derive asymptotic confidence intervals in a conventional way. To do so, we abbreviate \(\varvec{\gamma }= ({\varvec{a}}, \varvec{\theta })\) and define the matrix \(A_{n}\) of all negative partial derivatives of the log-likelihood function

$$\begin{aligned} A_{n,jk} = - \frac{\partial ^2 \log L(\widehat{G}^{(n)}, \varvec{\widehat{a}}, \varvec{\widehat{\varvec{\theta }}}|D_{n})}{\partial \gamma _j \partial \gamma _k}. \end{aligned}$$

Given that \(A_{n}\) is a regular matrix, the asymptotic variance matrix of \(\varvec{\widehat{\gamma }}\) can be estimated by the inverse \(\widehat{V}_{n} := A_{n}^{-1}\). The level \((1-\alpha )\) confidence interval for the jth component of \(\varvec{\gamma }\) is then obtained as

$$\begin{aligned} \widehat{\gamma }_j \pm z_{1-\alpha /2} \sqrt{\widehat{V}_{n,jj}}, \end{aligned}$$

where \(z_{1-\alpha /2}\) is the \((1-\alpha /2)\) quantile of the standard normal distribution.

Similarly, confidence intervals for aspects of the latent distribution, say the mean lifetime \(E(y|\varvec{\theta }_j)\) in the jth subpopulation, can be defined. To do so, we first generate a large number M of iid draws

$$\begin{aligned} \varvec{\gamma }^{1}, \ldots , \varvec{\gamma }^{M} \sim \mathrm {N}(\varvec{\widehat{\gamma }}, \widehat{V}_{n}). \end{aligned}$$

An estimated level \((1-\alpha )\) confidence interval of the mean lifetime is then obtained using the \(\alpha /2\) and \(1-\alpha /2\) quantiles of \(\{E(y|\varvec{\theta }_j^1), \ldots , E(y|\varvec{\theta }_j^M)\}\).

4 Numerical implementation and simulation

4.1 Numerical implementation

To find the maximum likelihood estimate for given data \(D_{n}\), we propose a Newton–Raphson-type algorithm. In this procedure, the stopping criterion is defined with respect to a distance function which indicates the average change in an iteration with respect to the different parameter types. We define the distance function

$$\begin{aligned} d(({\widetilde{G}}, \varvec{{{\widetilde{a}}}}, {\widetilde{\varvec{\theta }}}), (G, {\varvec{a}}, \varvec{\theta })) = \frac{1}{n} \sum _{i=1}^{n} | \varDelta {{\widetilde{G}}}_i - \varDelta G_i| + \frac{1}{m} \sum _{j=1}^m |{{\widetilde{a}}}_j- a_j|+ \frac{1}{m} \sum _{j=1}^m \Vert {\widetilde{\varvec{\theta }}}_j - \varvec{\theta }_j\Vert , \end{aligned}$$

in which the number \(n\) of support points is incorporated into measuring the distance of two distribution functions \({\widetilde{G}}\) and G. The following algorithm is proposed for determining the maximum likelihood estimate.

figure a

For the initial values \(\varvec{\theta }_j^0\), we use the maximum likelihood estimates of \(\varvec{\theta }_j\) in the model without truncation. The key step in this algorithm is that the initial value \(G^0\) is determined utilizing the initial values \(\varvec{\theta }_j^0\) and using Eq. (10). Intuitively, for every i, \(N_i^0\) is the hypothetical number of population units among which the single unit i has been observed. By standardizing these hypothetical population numbers as done in Step 4, the corresponding distribution function is derived. It turns out (see Sect. 4.2), that choosing \(\varDelta G^0\) in this way remarkably reduces the computational runtime in comparison to alternative optimization strategies.

The damping factor \(0 < \omega \leqslant 1\) is used to avoid obtaining invalid parameter values, such as \(\varDelta G_i \leqslant 0\) or \(a_j\leqslant 0\) for any \(j \in \{1, \ldots , m\}\). Depending on the sample size, \(\omega \) may need to be chosen close to zero initially. When \(\varvec{\gamma }^\nu \) is sufficiently close to the maximum likelihood estimator, the damping factor can be dropped, i.e. \(\omega = 1\). Similarly, the Newton–Raphson method in Step 3 of the algorithm MNS may potentially yield inadmissible values \(a_j\) due to numerical reasons. To avoid this issue, one may need to adjust the optimization to ensure robustness (e.g., using a damping factor).

4.2 Simulation studies

4.2.1 Simulation study on validity

We formulate the following simulation setup to study the finite-sample properties of the proposed method (see Table 1). Population units are born in the time interval \([s, t] = [0, 8]\). In all considered setups, we consider \(m=2\) subpopulations and set \(a_2 = 1.5\). The birth intensity function of subpopulation 1 is defined as a piecewise constant function \(\lambda _1(b) = \lambda _{11} \mathrm {I}(b\leqslant (t+s)/2) + \lambda _{12} \mathrm {I}(b> (t+s)/2)\). Lifetimes are exponentially distributed according to \(y\mapsto f_j(y|\theta _j) = \theta _j\exp (-\theta _jy)\) with parameters \(\theta _1\) and \(\theta _2\) in the respective subpopulations. For each simulation setup, we generate 1000 random datasets and determine the estimates using the algorithm MNS. The damping factor is set as \(\omega = 1\) and \(\epsilon = 10^{-4}\).

Table 1 Simulation setups

Following Table 1, comparisons between the Setups 1–2 (\(\lambda _{11} > \lambda _{12}\)) and Setups 3–4 (\(\lambda _{11} = \lambda _{12}\)) are of primary interest, as these setups respectively concern the same birth distributions. The Setups 5 and 6 differ from Setup 4 only in that \(\tau ^{\mathrm {R}}\) is increased. The probability of observing a random unit ranges from 0.138 to 0.261 in the considered setups. Note also that although the birth distribution is in fact parametric in these simulation setups, the birth distribution is estimated nonparametrically.

Table 2 Simulation results for the estimated lifetime parameters \((\widehat{\theta }_1, \widehat{\theta }_2)\)

The simulation results (see Table 2) indicate that the proposed method yields valid estimates of the lifetime parameters \(\varvec{\theta }\). While the bias of the estimates is notable due to the relatively small sample sizes, the MSEs are adequately small and decrease with growing sample size (see Setup 1 vs. 2 and Setup 3 vs. 4). Comparing the Setups 5 and 6 with Setup 4, the MSE is not substantially decreased despite the increased sample size (and increased selection probability). This shows that the structure of the truncation mechanism, including the definition of the data collection period, may influence the precision of estimates in a nontrivial manner, and larger samples not necessarily correspond to higher precision.

Table 3 Simulation results for the estimation of \(a_2\)

Table 3 summarises the performance of \(\widehat{a}_2\) and the comparison to the simple estimate \(\widehat{a}^{\mathrm {simple}}_2\). It turns out that the biases of both estimates are considerable, yet of a similar magnitude. While \(\widehat{a}^{\mathrm {simple}}_2\) is a valid estimator of \(a_2\), \(\widehat{a}_2\) is more precise in all considered setups.

Finally, we study the performance of the algorithm MNS and the estimated birth distribution. For the latter purpose, we define the distances

$$\begin{aligned} \Vert G^0 - \widehat{G}^{(n)}\Vert _{\infty }, \ \Vert G^0 - G\Vert _{\infty } \quad \text {and} \quad \Vert \widehat{G}^{(n)}- G\Vert _{\infty }. \end{aligned}$$
Table 4 Performance of the algorithm MNS (\(E(\kappa _3)\) and \(E(\kappa _5)\) denote the respective mean number of Newton–Raphson iterations in Step 3 and Step 5 of the algorithm MNS)

The results displayed in Table 4 demonstrate that the initial values as defined in Step 2 in the algorithm MNS are suitable for optimization. On average, the first Newton–Raphson-type search in Step 3 in the algorithm MNS required about 6 to 8 iterations before reaching the stopping criterion. Considering that the number of parameters in the studied simulation setups is \(n+3\), requiring on average less than 40 iterations in the final Newton–Raphson-type search in Step 5 in the algorithm MNS seems quite satisfactory.

Assessing the estimation of the birth distribution G, the mean distance of \(\widehat{G}^{(n)}\) to G, denoted as \(E(\Vert G^0 - G\Vert _{\infty })\) is remarkably small in all considered setups and generally decreases with growing sample size, ceteris paribus. Note also that the distance of \(G^0\) to \(\widehat{G}^{(n)}\), denoted as \(E(\Vert G^0 - \widehat{G}^{(n)}\Vert _{\infty })\), is relatively small, again demonstrating that the suggested choice of initial values is appropriate. Nevertheless, \(\Vert G^0 - G\Vert _{\infty } \leqslant \Vert G^0 - \widehat{G}^{(n)}\Vert _{\infty } + \Vert \widehat{G}^{(n)}- G\Vert _{\infty }\), and the final Newton–Raphson method in Step 5 in the algorithm MNS with regard to all parameters is necessary to find the maximum likelihood estimate.

4.2.2 Simulation study on computational performance

We further evaluate the proposed initial parameter choice in the Newton–Raphson algorithm by comparing its performance to reasonable alternatives. A local random search (LRS) is a conceptually simple approach to finding local maxima of the likelihood function (see Zhigljavsky 1991). Given numerically permissible initial values, the algorithm successively finds parameters with higher likelihood value, until a defined termination criterion is met. Reasonable termination criteria are, e.g., that no improvements are found for a defined number of steps or that a preset number of iterations has been performed. The algorithm LRS has the advantages of using computationally inexpensive iterations and of being relatively robust in that the local step sizes can be chosen sufficiently conservative to avoid numerically non-permissible parameters. The algorithm LRS does not utilize the local shape of the log-likelihood function, but just requires the function’s value at candidate points. See the Appendix for an implementation of the algorithm LRS.

In addition, we consider a Newton–Raphson-type algorithm, which we call ’Naive Newton–Raphson Search’ (NNS, see Appendix). This algorithm addresses a drawback of the algorithm LRS using the local shape of the log-likelihood function for updating parameter values in each iteration. The algorithm NNS differs from the algorithm MNS in that the initial birth distribution is chosen naively, i.e. without utilizing the IPW relation expressed in Eq. (10).

We apply the three algorithms on 100 simulated datasets each with the following three setups. Births are randomly drawn in the interval \([s,t] = [0,8]\) from \(m=2\) subpopulations with intensity \(\lambda _1(\cdot ) \equiv \lambda _1 \in \{100, 200, 500\}\) and proportion factor \(a_2 = 1\). Lifetimes are exponentially distributed according to \(f_j(y|\theta _j)= \theta _j\exp (-\theta _jy)\) with parameters \(\theta _1 = 0.4\) and \(\theta _2 = 0.15\). If a unit fails within \([\tau ^{\mathrm {L}}, \tau ^{\mathrm {R}}] = [8, 10]\), it is sampled.

For each algorithm and iteration, the distance of the current log-likelihood value to the maximal value is calculated. For the algorithms NNS and the MNS, 10 iterations have been performed, and took roughly the same runtime. Since an iteration in the algorithm LRS is computationally much cheaper than for the algorithms NNS and MNS, the number of iterations of the algorithm LRS has been adjusted to account for this difference. Therefore, the runtime of all three algorithms has been approximately the same in each setup. The initial values are chosen as \(\theta _j^0 = n_j/\sum _{i: x_i = j} y_i\) for each \(j \in \{1, 2\}\), which would be the subpopulation-specific maximum likelihood estimate of \(\theta _j\) if the data were not subject to truncation, and \(a_2^0 = 1\).

Fig. 3
figure 3

Distance to the maximum log-likelihood value for the algorithms NNS, LRS and MNS. (top panel: \(\lambda _1 = 100\), middle panel: \(\lambda _1 = 200\), bottom panel: \(\lambda _1 = 500\); dashed lines indicate the 80% pointwise confidence band; *The number of iterations of the algorithm LRS have been adjusted to yield comparable runtime to the algorithms NNS and MNS)

Figure 3 shows that the three considered algorithms seem to form a clear order with regard to performance. Due to having the same initial values, the algorithms LRS and NNS start with the same log-likelihood value. The algorithm LRS quickly improves over the algorithm NNS, and then progresses rather slowly towards the maximum. The algorithm MNS clearly has the best performance of the three considered algorithms in the considered setups. Interestingly, the initial value of the algorithm MNS has higher log-likelihood value than the algorithms NNS or LRS achieve within the considered runtime. Moreover, after few steps, the algorithm MNS essentially reaches the maximum of the log-likelihood. This simulation demonstrates that a model-driven initial choice of the parameters \(\varDelta G_i\) can remarkably reduce the runtime of the numerical optimization.

5 Application

Personal insolvency occurs when a person is unable to pay their debts to other entities. In Germany, personal insolvencies are reported publicly due to legal regulations. Reports on the individual insolvencies are available through a public website for 14 days, before public access is removed again. The sample, which results from continually tracking such public reports, includes information on all persons who become insolvent during the timespan of data collection, and is thus subject to the described double truncation framework. From March 2018 to December 2018, \(n= 73{,}370\) units that were born between 1920 and 2000 have been sampled (see Table 5 for basic summary statistics).

Table 5 Descriptive statistics of personal insolvencies for 16 federal states in Germany between March 2018 and December 2018

Individual information includes the date of birth \(b_i\), the age at insolvency \(y_i\) (adjusted to biological age 18, when legal age is reached in Germany) and the federal state \(x_i\) in which the insolvency is reported. There are \(m=16\) federal states, and they constitute the subpopulations in this analysis. For the lifetimes in any subpopulation \(j \in \{1, \ldots , 16\}\), we assume a Gamma distribution \(\mathrm {Ga}(\alpha _j, \beta _j)\), defined as

$$\begin{aligned} f_j(y|\alpha _j, \beta _j)&= \frac{\beta _j^{\alpha _j}}{\varGamma (\alpha _j)} y^{\alpha _j-1} \exp (-\beta _j y), \quad y> 0. \end{aligned}$$

In this notation, \(\varvec{\theta }_j = (\alpha _j, \beta _j)\) and \(\varvec{\theta }= ((\alpha _1, \beta _1), \ldots , (\alpha _{16}, \beta _{16}))\). The mean lifetime in the jth subpopulation is equal to \(\alpha _j/\beta _j\). As stated in the Assumptions (A)–(E), we implicitly assume that births are formed independently across the \(m=16\) federal states, and that the corresponding intensity functions are proportional. We have considered this assumption by checking the observed empirical birth distributions for each state in the dataset, and while the impact of the lifetime distributions is neglected via this inspection, we found that there is no apparent reason to discard the proportionality assumption. Note that Assumption (D) is fulfilled by the procedure the data have been collected. Since there is perhaps a substantial percentage of people who never experience insolvency (i.e. ’immortal’ units), the population of interest is implicitly defined as the population of all people in Germany who eventually experience insolvency. Therefore, the birth distribution estimated under the truncation setup does not necessarily represent the actual birth distribution known for Germany. The model altogether contains 73,417 parameters. We apply the algorithm MNS to determine the maximum likelihood estimates.

As initial values in Step 1 of the algorithm MNS, we choose \(a_j^0 = n_j/n_1\). Reasonable initial values \((\alpha _j^0, \beta _j^0)\) were obtained by first fitting the model to a random subsample having size 2000. We choose \(\omega = 0.4\) as the damping factor. The resulting estimates are displayed in Table 6.

Table 6 Maximum likelihood estimates
Fig. 4
figure 4

Estimates of the coefficients \(a_j\) and estimated mean biological ages at insolvency for \(m=16\) federal states in Germany. Squares left to the federal state names indicate the estimated proportion factors \(\widehat{a}_j\), where darker squares indicate higher birth intensity. Numbers in parentheses indicate the mean ages at insolvency. Vertical bars display the 90% confidence interval of the mean lifetime for each subpopulation

Figure 4 summarizes the estimates of the proportion factors and the mean age at insolvency for each subpopulation. The results indicate distinct differences between the federal states, with the estimated proportion factors generally reflecting the proportions of the federal state sizes known for Germany (numerical values not displayed).

Similarly, the estimated mean age at insolvency varies across federal states, ranging from 56.4 in Saxony-Anhalt to 70.6 in Bavaria. In comparison to the observed mean ages at insolvency (see Table 5) in the sample, the estimates are considerably larger. Without accounting for the truncation mechanism, the mean ages are systematically underestimated. We note that larger subsample sizes correlate with smaller confidence intervals for the mean biological age at insolvency in the respective subpopulations, which is not unexpected.

Fig. 5
figure 5

Estimated birth distribution for Germany for (biological) births between 1920 and 2000 (solid line: maximum likelihood estimate, dashed line: empirical distribution function)

Figure 5 indicates that the truncation mechanism substantially distorts the birth distribution, as the empirical distribution function differs significantly from the estimate. In particular, more recent births between 1985 and 2000 seem to be overrepresented in the sample in comparison to the population of interest, while people born between 1940 and 1980 are underrepresented.

6 Discussion

We have focused on a joint model for the birth distribution and the lifetime distribution for heterogeneous survival data under double truncation. In doing so, we have derived maximum likelihood estimates for all involved parameters.

The situation is cast as a high-dimensional problem, where the number of parameters increases linearly with sample size. The main advantage is that the model can be treated with well-known concepts for parametric models, and implemented in a standard fashion. One substantial drawback of the large number of parameters is that the optimization method for finding the maximum likelihood estimates needs to be carefully adjusted.

Concerning the implementation of the model, we have demonstrated that caution in the choice of initial values in a Newton–Raphson-type algorithm is important. Using a generic choice is not ideal and can lead to impractical runtime. Instead, we propose model-driven initial values, which greatly reduce computational runtime. In our calculations, we have found that with this approach, the model can be fitted to even medium-sized datasets such as considered in the application (\(n> 50{,}000, \ m>10\)) within reasonable runtime. However, we note that further improvements on the optimization method are certainly possible and worth studying. This also concerns the choice of the damping factor \(\omega \), which has been introduced to avoid obtaining invalid parameter values within the optimization algorithm. We have not included a general guideline on how to properly choose \(\omega \), because it may depend on many variables such as the sample size, the setting of the birth interval [st], the data collection period \([\tau ^L, \tau ^R]\) and also on the parametric families used for modeling the lifetimes.

Similarly, alternative optimization methods such as the algorithm LRS discussed in Sect. 4, though being theoretically basic, can be valuable in certain situations, particularly when initial values of the Newton–Raphson methods are too far from the optimum. Owing to the relatively robust search strategy, the algorithm LRS can be used as a preparation for more sophisticated optimization methods such as the algorithm MNS.

All three considered algorithms require initial values \(\varvec{\theta }_j^0, j=1, \ldots , m\), for which we suggest using the conventional maximum likelihood estimate in the absence of truncation. Though it has not been further examined in the simulation study, this choice has generally lead to surprisingly satisfactory results. Alternative initial values might yield better numerical performance under certain circumstances.

We have sketched how the asymptotic behaviour of the estimates \(\varvec{\widehat{a}}\) and \(\varvec{\widehat{\varvec{\theta }}}\) can be studied for given consistent estimator \(\widehat{G}^{(n)}\). It remains to prove that the proposed nonparametric estimator \(\widehat{G}^{(n)}\) is in fact consistent.