Testing Truncation Dependence: The Gumbel-Barnett Copula

In studies on lifetimes, occasionally, the population contains statistical units that are born before the data collection has started. Left-truncated are units that deceased before this start. For all other units, the age at the study start often is recorded and we aim at testing whether this second measurement is independent of the genuine measure of interest, the lifetime. Our basic model of dependence is the one-parameter Gumbel-Barnett copula. For simplicity, the marginal distribution of the lifetime is assumed to be Exponential and for the age-at-study-start, namely the distribution of birth dates, we assume a Uniform. Also for simplicity, and to fit our application, we assume that units that die later than our study period, are also truncated. As a result from point process theory, we can approximate the truncated sample by a Poisson process and thereby derive its likelihood. Identification, consistency and asymptotic distribution of the maximum-likelihood estimator are derived. Testing for positive truncation dependence must include the hypothetical independence which coincides with the boundary of the copula’s parameter space. By non-standard theory, the maximum likelihood estimator of the exponential and the

Dependent single and double truncation are lately also studied (Chiou et al., 2019;Emura and Pan, 2020;Rennert and Xie, 2022).Dependent truncation has some similarity with dependent censoring in that the identification of the dependency must be taken into account (see e.g.Czado and van Keilegom, 2023).Retrospective sampling of lifetimes is an example of potentially dependent DT, depicted in Figure 1.We specify the population as all units of a kind with the birth event in a period of length G.We observe units calender time birth period (length G) age-at-death-event X age-at-study-start T observation period (length s) Figure 1: Three cases of the date of 1 st event (black bullet) and date of 2 nd event (white circle): observed (solid) and truncated (dashed) lifetimes affected by a death event in a period of length s, which for simplicity we assume to directly follow on the birth period (see Figure 1).Also for analytic simplicity, the lifetime X is assumed to be exponentially distributed f E , with parameter θ.The data must include the birthdate for each observed unit and we measure it backwards from the start of the observation period and denote this 'age when the study starts' as T (see Figure 1).For simplicity, we assume that births stem from a homogeneous Poisson process, which implies a uniform distribution f T for T (see Dörre, 2020, Lemma 2).In this design, independence of the lifetime X and the birthdate-equivalent T contradicts the scientific consensus, at least for human mortality, of an increase in life expectancy (see e.g.Oeppen and Vaupel, 2002).We model the depen-dency with the one-dimensional parameter ϑ in the Gumbel-Barnett copula, so that, conditionally on the birthdate, the life expectancy trends upwards for the non-negative ϑ, and is free of a trend for ϑ = 0.
Our economic application aims at testing whether the established negative mortality trend in human demography also holds in Germany for business demography.As population we consider enterprises founded in the first quarter century after the German reunification.As data we use 55,279 enterprise lifetimes, double-truncated as a result of their reported closures in the period 2014 to 2016.
In Section 2 we define the bivariate distribution model of the population, formalize the sampling design and especially derive the selection probability and finally the likelihood.Section 3 studies identification as a prerequisite for the asymptotic properties and especially with the focus of the unobserved sample size as an additional parameter.Section 4 studies the asymptotic distribution of the dependence parameter ϑ in the copula, with emphasis on the parameter space boundary.The section also includes the business demography application.Section 5 studies the test in a Monte Carlo simulation.
2 Population model, sampling, and likelihood

Population and latent sample
We consider as the population, units born within a pre-defined time window going back G time units from the start of the study.The unit i of the latent sample carries as a second measure to its lifetime X i (∈ R + 0 ) its birthday coded as 'age when the study starts' T i ∈ [0, G].Define S := R + 0 × [0, G], with 0 < G < ∞, the space for one latent outcome, and let all open subsets of S generate the σ-field B. Each unit is truncated at a different age.Let us collect notations and assumptions.
Testing the independence hypothesis H 0 will coincide with the 0 in the second dimension of Θ.The statistic that indicates a deviation from H 0 will be the point estimate for the second parameter.Hence, for deriving its distribution under H 0 , Θ must include the boundary Θ with density f T and CDF F T of the uniform distribution.
For two independent exponentially distributed random variables X and Y , the bivariate survival function is e −θxx−θyy , so that a simple idea of E.J.
Gumbel is to model dependence by a bivariate survival function e −θxx−θyy−ϑxy .
Our field of application are the social sciences where it is to be expected that societies in general gain progress, e.g. in their life expectation, be it of persons or of enterprises.Our main focus is a test for the hypothesis of stagnation against such progress.A suitable adaption of Gumbel's general idea, applied here to the uniform marginal distribution of T , is the Gumbel-Barnett copula (see Nelsen, 2006, Formula 2.3.5).(We use the non-survival SRS, i.e. i.i.d. random variables (r.v.) mapping from the probability space (Ω, A, P θ ), with θ := (θ, ϑ) ∈ Θ onto the measurable space (S, B).X i and T i are dependent with copula Note that ϑ = 0 represents independence.Scatter plots of simulated (X i , T i ) ′ for different ϑ in Appendix A visualize the degree of negative dependence that is modelled.For large ϑ, a small T , i.e. a late foundation, is associated with longer survival.The CDF of an (X i , T i ) ′ is and zero elsewhere.The joint density with respect to P θ for x > 0 and Kendall's tau , given as τ = 4 is a univariate measure for dependence and has no closed form but is easily seen to range from −0.361 and 0. The latter is associated with ϑ = 0, the independence.
To compare with, later in the data analysis, and for application in other fields, a two-sided dependence might also be of interest.Gumbel (1960) proposed a respective copula.
(FGM) Farlie-Gumbel-Morgenstern copula: Let (X i , T i ) ′ be as in (A3) and Kendall's tau is given by 2ϑ F GM /9 and ranges between −2/9 and 2/9, with independence at ϑ F GM = 0 (see Nelsen, 2006, Example 5.2).Scatter plots Appendix B illustrate that, at the 'extremes', dependence is weaker than for the Gumbel-Barnett copula, corresponding also to a smaller range of Kendall's tau.

Data
The data are a subset of the SRS in Assumption (A3) of n draws governed by f θ 0 , i.e. for the 'true' parameter θ 0 ∈ Θ.A parallelogram D formalizes that a sample unit is only observed when its death falls into the observation period (of length s).
(A4) Observation: For known constant s > 0, the column vector Following up on (A4), we denote an observation by ( X j , T j ) ′ and renumber the observed units with j = 1, . . ., M ≤ n. (Sorting the unobserved units to with c ϑ u (v) := ∂C ϑ (u, v)/∂u.The selection probability is not given in closed form, but note that the numerical calculation is easy, because D is bounded.Furthermore, the last expression of Equation ( 2) is a univariate integral over a compact interval similar to Emura and Pan (2020, Theorem 1).Note that, with the slight re-definition θ := (θ, ϑ F GM ) ′ , the selection probability based on C ϑ F GM is given in closed-form as: For the sake of brevity, the display of the theoretical analysis is limited here to the Gumbel-Barnett copula (i.e.Assumption (A3)), because the FGMcopula is considerably easier.However in the empirical example of Section 4.2 we consider both, and compare.
The selection probability will occur in the likelihood, so that for maximization, its first partial derivatives will be needed.The second and third partial derivatives of α θ will be needed for proving the asymptotic normality and calculating the standard error.The proof of the following and explicit representations of α's derivatives, both needed later, are similar to those of Weißbach and Wied (2022, Corollary 1) but are omitted here.
Furthermore θ → α θ has first, second and third partial derivatives in the directions of θ and ϑ and combinations thereof.Those derivatives are continuous in θ.
We are now in a position to formulate the likelihood, maximize it and apply large sample theory.

Likelihood
The likelihood springs from standard results for point processes (see e.g.Reiss, 1993;Daley and Vere-Jones, 2003, Theorem 3.1.1., Section 7.1 respectively), and we maximize it later as a function of the generic θ and n.
(Distinguishing in notation between the true and a generic n is omitted.) The idea is roughly to decompose the likelihood according to Note that by P r, we cannot mean P θ of Assumption (A3).Detailed definitions of the measures related to the probabilities are the same as for the model with independent truncation, i.e. ϑ = 0, (see Weißbach and Wied, 2022).The latter reference also proves that the ( X j , T j ) ′ are stochastically independent, conditional on observation, so that P r{( X 1 , T 1 ) ′ , . . ., ( X M , T M ) ′ |M } becomes a product over the conditional densities of each observation.With P θ from Assumption (A3), ( X j , T j ) ′ has CDF Leaving out the proof, an explicit relation between the distribution of a under the Assumptions (A1)-(A4) and θ ∈ Θ.The proof for the property of the remainder r is similar to that in Weißbach and Wied (2022).
Hence the density of ( X j , T j ) ′ is f θ (x, t)/α θ .The Binomial-distributed size of the observed sample, M , can be approximated by a Poisson-distributed M ⋆ , when the selection probability α θ for each of the n i.i.d.Bernoulli experiments is small.This is especially the case when the width of the observation period (of length s) is 'short', relative to the population period (of length G).The resulting density P r{M = m ⋆ } ≈ µ m ⋆ m ⋆ ! e −µ , with µ = nα θ , is responsible not only for the very last (exponential) term in the following representation, but also contributes a n M ⋆ to the leading product.With h θ (x, t) := nf θ (x, t)1 D (x, t) and using (4), the proof of Weißbach and Wied (2022, Theorem 3) extends to: Here the proximity is in the sense of a Hellinger distance.Note that α θ as denominator of the density of ( X j , T j ) ′ cancels the other α θ in the density of M ⋆ out.Because almost surely T j < G and ℓ ⋆ > 0, we have In this approximation M ⋆ can exceed n, and e.g. ( X n+1 , T n+1 ) ′ will not be defined.In order to guarantee that the observations fit the model (in the meaning of distributional models for point processes), we further approximate Weißbach and Wied, 2022, Sect. 3) Lemma 2. For any

Identification
Identification is necessary to ensure the consistency of a parameter estimator.
(Consistency will then be necessary for asymptotic normality.This 'near-zero' will be covered by the applied theory from van der Vaart (1998, Sect.5.2).Specifically, insertion yields: With the definition ψ θ := (ψ θ,1 , ψ θ,2 ) ′ , the near-zero estimator θn for the true parameter θ 0 (see Figure 2) is the zero of Ψ n (θ) := 1 n n i=1 ψ θ (X i , T i ), if in Θ, and the nearest boundary value else.Note that Ψ n (θ), is observable after multiplication by n and has the same zero (see Weißbach and Wied, 2022, Sect. 2.2).A boundary value on Θ H is likely under H 0 and will shown to have a probability of 50%.A parameter-independent bound for ψ θ will be needed for proving consistency and asymptotic normality.
Proof.Essentially, bounding sup θ∈Θ ψ θ,j (X i , T i ) (j = 1, 2) is enabled by using the fact that a continuous function on a compact set attains its maximum for α θ and its derivative.For ψ θ,j (X i , T i ) themselves, the numerator is bounded, and the remaining log-term is not bounded but shown to be integrable.Specifically, 1/θ can be bounded by 1/ε.
Of course, concavity of the function to which Ψ n is the gradient, at θ 0 , will be important.Due to analytic intractability we note it as an assumption.
Throughout, a dot on top of a function R 2 → R will signal a gradient.On top of a gradient R 2 → R 2 , it signals its Jacobi matrix, i.e. the Hessian matrix of the function.
be negative definite, where here and throughout ∂ψ θ 0 /∂θ stands for Instead of a proof, Appendix D.2 plots the surface of E θ 0 [ ψθ 0 (X 1 , T 1 )]'s determinant, i.e.only of the even principal minor, by θ 0 on a large subset of the parameter space Θ. Figure 6 (left) shows that the determinant is clearly positive for a large part of the parameter space, but also reveals that in the area θ 0 ∈ [0.01, 0.02] × [0.1, 1] the determinant is near to zero. Figure 6 (right) explores the area and shows that for ϑ 0 > 0.14 the determinant increases again and that the minimum is attained in the range of θ 0 ∈ [0.012, 0.014].Those latter values are inconceivable for our example of business demography, as then the life expectancy was ≈ 0.01 −1 = 100 years.Still, even when estimation in that area will be more difficult numerically and standard errors will be larger, Assumption (A5) obviously holds.Demonstrating the negativity i.e. of the uneven principal minor is omitted here.We now argue that van der Vaart (1998, Theorem 5.9, Condition (ii)) is the relevant analogue of identification for a truncated sample.This 'anti-clockwise' design is not ours (defined by Assumptions (A1)-(A4)), but we will see that its identification enables a helpful result, also for the 'clockwise' design.The (Lebesgue) density of the two measurements is fθ (x, t) = 1 D (x, t)f θ (x, t)/α θ and is also the density of ( X 1 , T 1 ) ′ .Then, a short calculation reveals that, surprisingly, ψ θ is its score function, i.e. ψ θ (x, t) = ∇ θ log fθ (x, t).(The gradient is signalled by ∇, instead of a dot, when the expression is too long, as log f is here.)This justifies the name profile model, and profile score for ψ θ .In order to stress the implication, note that the estimators are equal for both designs.Daley and Vere-Jones (2003, Section 7.1, Example 7.1(a) (continued)) reminds that this is only the case when the data can be approximated by a Poisson process, which we are allowed to, following Weißbach and Wied (2022), because the selection probability α θ 0 will be small enough in our application.(For that, interchange integration and differentiation similar to the proof of Lemma 1 (and Elstrodt, 2018, Theorem 5.7, Chapter IV, § 5).)

M-identification
The study of the profile model, as part of the 'anti-clockwise' design, was helpful as we can now follow-up on the unique solution for (8).It is easy to see that, with E θ 0 relating to Because α θ 0 > 0, by Lemma 1, Ψ(θ) := E θ 0 [ψ θ (X 1 , T 1 )] = 0 also has a unique solution.Additionally, Appendix D.4 proves a result needed now (and again when we prove asymptotic normality).
Lemma 4.Under Assumptions (A1)-(A4) and This ends the proof of van der Vaart (1998, Theorem 5.9, Condition (ii)) that for any ε > 0 and according to van der Vaart (1998, Problem 5.27).We interpret this as an M-identification condition.The remaining Condition (i) for consistency is convergence of Ψ n to Ψ uniformly in θ ∈ Θ, and will be proven in Section 4.1.

Wald-type test for independence
In order to test the hypothesis of independent truncation, i.e.H 0 : ϑ 0 = 0, for a parameter at the edge of the parameter space, a score test would be typical (see e.g.Voß and Weißbach, 2014).As an advantage, calculating the two-dimensional unrestricted estimate θ would not be necessary.Only the restricted one-dimensional estimator, i.e. for ϑ = 0, is necessary and reduces the numerical effort.And this has already been derived in Weißbach and Wied (2022).However, the score asymptotically only depends on θ so that we simply use the Wald-type idea to reject for a θ being too large.
An important further element will be the Fisher information, and we will need that for θ ∈ Θ:

Theory
We especially need to approximate the distribution of the point estimator, the vector of zeros of (7), by a Gaussian distribution.Our data in Section 4.2 will be sufficiently large to do so.We verify the classic conditions for asymptotic normality of M-estimation (see van der Vaart, 1998, Theorem 5.41) for Ψ n (θ), given shortly after (7).One first condition is weak consistency.The method of proof in van der Vaart (1998, Theorem 5.9) even allows us to make a statement on Θ, including the boundary in ϑ-direction.
Theorem 1.Under assumptions (A1)-(A5), θ 0 ∈ Θ and θn defined after Proof.As stated above, the second condition is the content of Section 3, when van der Vaart (1998, Prob.5.27) is taken into account due to Θ being compact.In order to show the first one, we use the uniform law of large numbers (see e.g.Newey andMcFadden, 1994, p. 2129).Its smoothness requirements are all fulfilled by noting that involved functions (including α θ by Lemma 1) are smooth, and compositions do not result in discontinuities due to division by zero.For example, for the involved term 1/θ, originating from the density of the exponential distribution, θ ≥ ε > 0 by Assumption (A1) avoids poles.Calculations for the denominator not to be zero are not presented here, for the sake of brevity.The main requirement is hence to show that the parameter-independent bound g for the profile score ψ θ of Lemma 3 is integrable.This dominating condition is due to A. Wald (see e.g.Gouriéroux and Monfort, 1995, Sect.24.2.3,Condition (D3)).Using the marginal distribution F T of Assumption (A2) and Proving normality for the zeros of (7) for θ 0 in the inner open of Θ follows van der Vaart (1998, Theorem 5.41).Consistency, given by Theorem 1, as well as Lemma 4, are requirements.The remaining arguments are given in the beginning of Appendix E.
Theorem 2. Under Assumptions (A1)-(A5), θ defined after (7) and θ 0 ∈ tribution to a normally distributed random variable with expectation (vector) 0 and covariance matrix From the theorem a Wald-type test can be performed with the respective confidence interval around θn for any hypothetical ϑ 0 > 0. A numerical simplification for such a confidence interval yields the information matrix equality.The second half of Appendix E yields under the Assumptions (A1)-(A4) the analogue so that the asymptotic covariance matrix in Theorem 2 reduces to I(θ 0 ) −1 .
A simple test to reject the independence hypothesis H 0 : ϑ 0 = 0, which is our main interest here, is a too large θn .For the critical value, a distributional statement for ϑ 0 = 0, i.e. a parameter θ 0 at the boundary of the parameter space Θ H , is necessary.
Theorem 3.Under assumptions (A1)-(A5), θn defined after (7) for θ 0 ∈ with a = (a θ , a ϑ ) ′ , where F θ 0 1 is a two-dimensional distribution defined in −∞ < a θ < ∞ and a ϑ > 0 and having in this region the density equal to twice the density of N 2 (0, I(θ 0 ) −1 ).Furthermore While the original work of A. Wald uses a linear Taylor expansion of the score and excludes the boundary of the parameter space, additional arguments given in Moran (1971, Theorem 1) allow to include the here important boundary.For our rather simple model, a quadratic Taylor expansion is readily available and van der Vaart (1998, Theorem 5.41) becomes applicable with cases θn > 0 and θn = 0.

Testing for a negative trend in life expectancy
Given the result in Section 4.2.1, even a negative trend in German business life expectancy seems conceivable and we model it with the Farlie-Gumbel-Morgenstern copula introduced in the Assumption (FGM) (replacing Assumption (A3)).Note that, in contrast to a ordinary linear regression, it is not be possible to calculate a negative trend with the Gumbel-Barnett copula by transforming the dependent variable.Again as in ( 6) we estimate the parameter θ with the first two coordinates of arg max θ∈Θ,n∈N log ℓ F GM (θ, n) after profiling out n.Here ℓ F GM replaces ℓ (of ( 5)) with f θ instead of annual decrease in the life expectancy of θFGM n /(θG), i.e. of around 19 days, results.

Behaviour in finite samples
We conduct a Monte Carlo simulation, primarily to visualize the asymptotic results on consistency given by Theorem 1, measured in mean squared error (MSE), decomposed into bias and variance.In particular, we will find that the asymptotic approximation is rather precise regarding our statements on basis of business closure data in Section 4.2.Also the actual level and power of the test, given by Theorem 3, are studied.

Algorithm for simulating truncated sample
In order to generate the latent sample of n measurements (X i , T i ) ′ of Section 2.1, consider the conditional inversion method using a copula C (see Nelsen, 2006, Section 2.9).Note first that the inverse of c ϑ u , introduced shortly after (2), exists.

Result
Each scenario consists of G, s, n, θ 0 and ϑ 0 .A first impression of the asymptotic fit can be gained for the test on independence given by Theorem 3.
Figure 3 (right) depicts simulated rejection rates of simulated datasets (as of Section 5.1).The rate is the actual level of the test at nominal level of 5% for ϑ 0 = 0 and it approximates the power for ϑ 0 > 0. It can be seen that the test is slightly conservative, as the actual is below 5% at the origin, but quickly exceeds the nominal level and has power of 25% already at a value as small as ϑ 0 = 0.01.
We now study the bias and variance of θn .The finite sample biases of the estimators θ and θ as zeros of the system of equations ( 7) (here and in Table 1 omitting the subscript n) are approximated from the R = 1000 simulated data sets, due to Algorithm 1 by Table 1 in Appendix F lists the results, and it can be seen that the bias of θ decreases to virtually zero as a function of n for all scenarios, all n are smaller than in our example of Section 4.2.The bias of θ is markedly larger than of θ in general, but also decreases in n.
In order to conclude consistency in probability, consider the MSE as the sum of squared bias and variance V ar( θ) (alike for ϑ).The simulated ap- Evident from Table 1 is the generally small variance of θ, quickly decreasing in n, and the quite large and also decreasing variance of θ.Hence the MSE's are approaching zero and consistency is visible for realistic sample sizes.In that respect, note that for ϑ 0 = 0, the number of observations m is around 1%-8% of the sample size n (see Weißbach and Wied, 2022, Table 1) for chosen θ 0 's and similarly for ϑ 0 ∈ {0.001, 0.01}.
The influence of G and s, is, as expected, that for large s, i.e. by observing more units, the insecurity about the parameter, i.e. its variance, decreases.
For instance, for the scenarios with G = 24, combined with s = 2, s = 3 or s = 48, exhibit the tendency that V ar( θ) is much smaller for s = 48.The effect of different G is mixed.Simulations not shown here exhibit that the estimation variances first increase, as a function of G, then decrease and later increase again.Normality for small sample sizes, as indicated by Theorem 2, will be valid as in the case of independent left-truncation (see Weißbach et al., 2023, Appendix A.1.4) and is not studied in detail.
It is mainly evident that the dependence introduces a (higher) bias in the estimation of θ 0 .For instance, in the scenario with G = 24, s = 3 and θ 0 = 0.05, the bias is roughly ten times higher for all n.

Discussion
In some sense, our study follows up on Efron and Petrosian (1999) who assumed independence between units when sampling from the latent population and at the same time also for sampling from the truncated population.
We describe how the two assumptions can be both true.
In view of applicability of the results, the presented model is, at least, larger than the model without dependence in Weißbach and Wied (2022).
However, the model is still very small, and especially assuming the distribution of the lifetime to be exponential can be inadequate, e.g. in human demography.At the other hand, a nonparametric model generates considerable algorithmic effort (see e.g.Efron and Petrosian, 1999;Shen, 2010) and the resulting functional estimator still does not allow statements about popular characteristics such as the expectation or any quantile.Against our assumption of a uniform truncation distribution is a recent finding of Weißbach and Dörre (2022), who showed that business foundations became less and less frequent over the years 1990-2013, however assuming independent truncation.And of course the dependence model could be inadequate and Chiou et al. (2019) call for a goodness-of-fit test.Furthermore a covariate can be available and informative, it may, for instance, reduce or substitute the dependence.An important example is that truncation dependence can be interpreted as dependence of the lifetime on the date of birth as a cohort effect.And comparisons with methods which incorporate calendar time as a time-dependent covariate found in Rennert and Xie (2018); Frank et al.
(2019) should be interesting.For any covariate, such as place of business, it should also be noted that, together with the parameter which relates the covariate to the hazard rate, the marginal distribution of the covariate introduces parameters that can be estimated, or conditioning must be studied in order to avoid a joint estimation (e.g. as in Weißbach and Dörre, 2022).
In the view of theory, inference about the unknown sample size n would be interesting, if relevant applications can be found.
In the view of our particular measurement, the elementary biological question of the time between a well-defined birth and a well-defined death, appears to overly simplify business demography.Even when we only wish to study business closure as analogous to human death.In fact, the data in Section 4.2 contain 'only' insolvencies, i.e.only closures for one particular cause.Data for a competing risk model would be needed.Richer data would probably then be left-truncated and right-censored (LTRC), rather than DT.
Our test can still be applied to LTRC data by dropping the right-censored observations.estimation of survival under dependent truncation and independent cen- A Visualization of Gumbel-Barnett copula  Let f, g : (0, ∞) → R be two functions with f (x) = log(x) − (1 − x −1 ) and g(x) = (x − 1) − log(x) for x ∈ (0, ∞).Obviously, it is f (1) = g(1) = 0. Furthermore, the functions are strictly increasing on the interval [1, ∞), because the derivatives f ′ (x) = x −1 − x −2 and g ′ (x) = 1 − x −1 are positive on the interval (1, ∞) and zero for x = 1.For all x > 1 it follows From f θ 2 (x, t)/f θ 1 (x, t) = 1, for all (x, t) ∈ D, we may conclude equality To this end, insert definition (1), to see that the density ratio is Taking derivatives in the direction of t and take T θ := (ϑθx + 1)(ϑ log(1 − t/G) − 1) + ϑ, results into F 1 F 2 = 0, with The first factor is only zero for t = G.Setting the second factor equal to zero results in to see that two bivariate polynomials of equal grade are equal.This is only the case, if (and only if) their coefficients coincide.A comparison of the coefficients of the absolute terms results into Equating ϑ 1 = ϑ 2 and comparing the coefficients with the variable x results 2 ) from which θ 1 = θ 2 can be concluded for ϑ 2 ̸ = 0.In the case of independence, i.e. for ϑ 1 = ϑ 2 = 0, it would result in and did allow to conclude θ 1 = θ 2 directly.

D.3 Truncated population model
For for all (x, t) ∈ D. By inserting (1) one arrives again at equality (D.1), only with a right hand side as α θ 1 /α θ 2 instead of 1. Taking derivatives with respect to t has the same result as for equality (D.1) because α θ does not depend on t.So that the remaining arguments for identification are as in Section D.1.

D.4 Proof of Lemma 4
One may perform the integrations, introduced by the expectations, and prove equality to zero (see e.g.Weißbach and Wied, 2022, Appendix A).Instead here we prove equality to zero of the integrand.We verify each coordinate separately, for (7a) we write Represent now E θ 0 with (1) for s 1 and s 2 .For s 3 use the representation (2) for α θ 0 and interchange partial derivation and integration.Finally )] of (7), arguments are similar.

E Proof of Theorem 2
As function in θ, the rational functions in (7) are two times continuously differentiable, because the denominator has no zeros for θ ∈ Θ and (X i , T i ) ∈ D. Furthermore, by Lemma 1, the selection probability α θ , as function of θ, is three times partially differentiable.The function θ → ψ θ (x, t) is hence, as composition of two times continuously differentiable functions, for almost all (x, t) ∈ S, as well two times continuously differentiable.That E θ 0 ψθ 0 (X 1 , T 1 ) is invertible follows from Assumption (A5).
Lemma 1 ensures changeability of differentiation and integration, i.e. it is D ḟθ 0 (x, t) d(x, t) = αθ 0 and D fθ 0 (x, t) d(x, t) = αθ 0 .The value of the integral for the second derivative of the density fθ 0 is hence zero.This can be similarly shown for the other three components of the above matrix.
As result one has the IME for the 'anti-clockwise' model and, as described earlier thereof for the model of dependent truncation.θ 0 = 0.1, ϑ 0 = 0.01 α θ 0 = 0.0978 α θ 0 = 0.37574 where again we assume strictly T j < G and T i < G. Profiling out n, we estimate the parameter θ with the first two coordinates of arg max θ∈Θ,n∈N log ℓ(θ, n).(6)Note that the maximum can be on the boundary of Θ in ϑ-direction, especially in Θ H .For the score function, necessary partial derivatives with respect to θ and ϑ are given in closed form in Appendix C.2.1.The latter derivatives depend on n.By Appendix C.2.2, log ℓ(θ, n) is maximized in n by the next smallest integer to n = M/α θ .The latter uses the fact that the logarithm can be bounded from above by a linear function and by a hyperbola from below (with proof Appendix C.1 and also used in the following section).
) The classic definition of identification is tailored to an SRS.An SRS is only latent in the study at hand.It will still be useful to study SRS-identification jointly of the latent univariate model (see Assumption (A2)) and of the dependence model (see Assumption (A3)).Appendix D.1 proves identification of θ.Now in the truncated sample, with an interest in inference for θ, we profiled out the parameter n.This reduces the three-score estimating equations for (6) (see Appendix C.2), by solving for n and inserting in the remaining two estimating equations.Instead of inserting the natural-valued solution for n, we use the real-valued 'near-zero' M/α θ (see again Appendix C.2.2).
Instead of truncating the sample by D, one can think of the data as drawing an SRS of a correspondingly truncated population, of sample size m.The thus defined sub-population P op is depicted in Figure2(bottom left box).