Abstract
The cross ratio function (CRF) is a commonly used tool to describe local dependence between two correlated variables. Being a ratio of conditional hazards, the CRF can be rewritten in terms of (first and second derivatives of) the survival copula of these variables. Bernstein estimators for (the derivatives of) this survival copula are used to define a nonparametric estimator of the cross ratio, and asymptotic normality thereof is established. We consider simulations to study the finite sample performance of our estimator for copulas with different types of local dependency. A real dataset is used to investigate the dependence between food expenditure and net income. The estimated CRF reveals that families with a low net income relative to the mean net income will spend less money to buy food compared to families with larger net incomes. This dependence, however, disappears when the net income is large compared to the mean income.
Similar content being viewed by others
References
Bouezmarni, T., Rombouts, J., Taamouti, A. (2009). Asymptotic properties of the Bernstein density copula estimator for $\alpha $-mixing data. Journal of Multivariate Analysis, 101, 1–10.
Bouezmarni, T., El Ghouch, A., Taamouti, A. (2013). Bernstein estimator for unbounded copula densities. Statistics and Risk Modeling, 30, 343–360.
Chen, M.-C., Bandeen-Roche, K. (2005). A diagnostic for association in bivariate survival models. Lifetime Data Analysis, 11, 245–264.
Clayton, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65, 141–151.
Duchateau, L., Janssen, P. (2008). The frailty model. New York: Springer.
Duchateau, L., Janssen, P., Kezic, I., Fortpied, C. (2003). Evolution of recurrent asthma event rate over time in frailty models. Journal of the Royal Statistical Society (Series C), 52, 355–363.
Gijbels, I., Mielniczuk, J. (1990). Estimating the density of a copula function. Communications in Statistics, Theory and Methods, 19, 445–464.
Glidden, D. V. (2007). Pairwise dependence diagnostics for clustered failure-time data. Biometrika, 94, 371–385.
Härdle, W. (1990). Applied nonparametric regression. Cambridge: Cambridge University Press.
Hsu, L., Prentice, R. L. (1996). On assessing the strength of dependency between failure time variables. Biometrika, 83, 491–506.
Hu, T., Nan, B., Lin, X., Robins, J. M. (2011). Time-dependent cross ratio estimation for bivariate failure times. Biometrika, 98, 341–354.
Janssen, P., Swanepoel, J., Veraverbeke, N. (2012). Large sample behavior of the Bernstein copula estimator. Journal of Statistical Planning and Inference, 142, 1189–1197.
Janssen, P., Swanepoel, J., Veraverbeke, N. (2014). A note on the asymptotic behavior of the Bernstein estimator of the copula density. Journal of Multivariate Analysis, 124, 480–487.
Janssen, P., Swanepoel, J., Veraverbeke, N. (2016). Bernstein estimation for a copula derivative with application to conditional distribution and regression functionals. Test, 25, 351–374.
Janssen, P., Swanepoel, J., Veraverbeke, N. (2017). Smooth copula-based estimation of the conditional density function with a single covariate. Journal of Multivariate Analysis, 159, 39–48.
Leblanc, A. (2012). On estimating distribution functions using Bernstein polynomials. Annals of the Institute of Statistical Mathematics, 64, 919–943.
Li, Y., Lin, X. (2006). Semiparametric normal transformation models for spatially correlated survival data. Journal of the American Statistical Association, 101, 591–603.
Li, Y., Prentice, R. L., Lin, X. (2008). Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data. Biometrika, 95, 947–960.
Müller, H. G., Wang, J.-L. (1994). Hazard rate estimation under random censoring with varying kernels and bandwidths. Biometrics, 50, 61–76.
Nan, B., Lin, X., Lisabeth, L. D., Harlow, S. D. (2006). Piecewise constant cross-ratio estimation for association of age at a marker event and age at menopause. Journal of the American Statistical Association, 101, 65–77.
Nelsen, R. B. (2006). An introduction to copulas. 2nd ed. New York: Springer.
Oakes, D. (1982). A model for association in bivariate survival data. Journal of the Royal Statistical Society Series B—Statistical Methodology, 44, 414–422.
Oakes, D. (1986). Semi-parametric inference in a model for association in bivariate survival data. Biometrika, 73, 353–361.
Oakes, D. (1989). Bivariate survival data induced by frailties. Journal of the American Statistical Association, 84, 487–493.
Omelka, M., Gijbels, I., Veraverbeke, N. (2009). Improved kernel estimation of copulas: Weak convergence and goodness-of-fit testing. Annals of Statistics, 37, 3023–3058.
Parzen, E. (1979). Nonparametric statistical data modeling. Journal of the American Statistical Association, 74, 105–121.
Ruppert, D., Cline, D. H. (1994). Bias reduction in kernel density estimation by smoothed empirical transformations. Annals of Statistics, 22, 185–210.
Sancetta, A., Satchell, S. (2004). The Bernstein copula and its applications to modeling and approximations of multivariate distributions. Economic Theory, 20, 535–562.
Sen, B., Xu, G. (2015). Model based bootstrap methods for interval censored data. Computational Statistics and Data Analysis, 81, 121–129.
Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l’institut de statistique de l’Université de Paris, 8, 229–231.
Spierdijk, L. (2008). Nonparametric conditional hazard rate estimation: A local linear approach. Computational Statistics and Data Analysis, 52, 2419–2434.
Swanepoel, J. W. H., van Graan, F. C. (2005). A new kernel distribution function estimator based on a nonparametric transformation of the data. Scandinavian Journal of Statistics, 32, 551–562.
Van Keilegom, I., Veraverbeke, N. (2001). Hazard rate estimation in nonparametric regression with censored data. Annals of the Institute of Statistical Mathematics, 53, 730–745.
Viswanathan, B., Manatunga, A. (2001). Diagnostic plots for assessing the frailty distribution in multivariate survival data. Lifetime Data Analysis, 7, 143–155.
Wienke, A. (2010). Frailty models in survival analysis. Boca Raton: Chapman and Hall/CRC.
Acknowledgements
The authors thank the editor and two referees for their valuable comments that have led to an improved version of the manuscript. The work was supported by the IAP Research Network P7/13 of the Belgian State (Belgian Science Policy). The third author thanks the National Science Foundation of South Africa for financial support. The fourth author is also extraordinary professor at the North-West University, Potchefstroom, South Africa.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Proofs of Theorems 1–3
Appendix: Proofs of Theorems 1–3
In this appendix, we present the proofs of Theorems 1–3 in the main text.
Proof of Theorem 1
The non-random term (B) equals
where \(\mu _2(K_0) = \displaystyle \smallint t^2 K_0(t) {\mathrm {d}}t\). This is because \(\lambda (t_1 \mid T_2 = t_2)\) is twice continuously differentiable with respect to \(t_1\).
For the integrand in (A), we first note that
Indeed, for n sufficiently large,
because the maximal jump of \(\widehat{F}_{t_2}(\cdot )\) is \(O\left( \displaystyle \frac{m}{n}\right) \) a.s. (see Janssen et al. 2016) and because \(\widehat{F}_{t_2} (t_1)\) converges to \(F_{t_2}(t_1)\). Hence, the term (A) can be written as
By the mean value theorem, we obtain that
for some \(\theta _n(s)\) between \(F_{t_2}(s)\) and \(\widehat{F}_{t_2}(s)\). Hence,
where
From Theorem 3 of Janssen et al. (2016) and the first part of the proof of Lemma 7 available in Electronic Supplementary Material provided by Janssen et al. (2016), we conclude that
by applying the assumptions in condition (d) of the theorem.
Since we assume that \(K_0\) is a continuous density function of bounded variation, there exist two non-decreasing bounded and continuous functions \(K_{01}\) and \(K_{02}\) such that \(K_0(u) = K_{01}(u) - K_{02}(u)\). Assume that \(K_{01}\) and \(K_{02}\) are supported on \([-L,L_1]\) and \([L_1, L]\), respectively, for some \(-L \le L_1 \le L\). Hence, \(K_{01}(-L) = K_{02}(-L) = 0 = K_{01}(L) = K_{02}(L)\) and \(K_{01}(L_1) = -K_{02}(L_1)\). Therefore,
Furthermore, since \(\sup \limits _s |\widehat{F}_{t_2}(s) - F_{t_2} (s)| \rightarrow 0\) a.s., we have that \(\sup \limits _s |\theta _n(s) - F_{t_2}(s)| \rightarrow 0\) a.s.; hence, for some constant \(C > 0\),
by using (13). Therefore, under the conditions in (d) we conclude that
By the mean value theorem, the first term in the expression of (A) given in (12) becomes
for some \(\theta (u)\) between \(t_1\) and \(t_1 - b_n u\).
As above, we have that for some constant \(C > 0\),
Under the conditions in (d), we have
For \((A_{11})\) in Eq. (15), we write
For \((A_{112})\), which contributes to the bias, note that \(E[\widehat{F}_{t_2}(t_1 - b_n u)] - F_{t_2}(t_1 - b_n u) = -\{E[\widehat{S}_{t_2}(t_1 - b_n u)] - S_{t_2}(t_1 - b_n u)\}\). In line with Remark 3 in Janssen et al. (2016), we have
where
Using partial integration, we obtain that
where
with \(b^{(1)}(u,v) = \frac{\partial }{\partial u}b(u,v)\). For the first term we have, after partial integration,
where \(\widehat{f}_{t_2}(t_1)\) is precisely the Bernstein estimator for a conditional density function studied in Janssen et al. (2017).
The proof of the theorem follows directly from (11)–(18) and the Theorem in Janssen et al. (2017) by simply replacing Y by \(T_1\) and X by \(T_2\) in the aforementioned paper.
Also note that the term \(\frac{1}{2}m^{-1}\phi (t_1,t_2)\) in the bias vanishes after multiplication with \((n m^{-1/2} b_n)^{1/2}\). This is because \((n m^{-1/2} b_n)^{1/2} m^{-1} \le n^{1/2} m^{-5/4} b_n^{-1/2} \rightarrow 0\) by the first relation in (d). This proves Theorem 1. \(\square \)
Proof of Theorem 2
Write
For the non-random term \((\widetilde{B})\) we have, similar to (11),
For \((\widetilde{A})\), we perform analogous operations as we did in the proof of Theorem 1. This gives, in analogy with (12),
where
for some \(\widetilde{\theta }_n(t_1-b_nu)\) between \(C_{m,n} [S_{1n} (t_1-b_nu), S_{2n}(t_2)]\) and \(C[S_1 (t_1- b_nu), S_2(t_2)]\). The \(O\left( \displaystyle \frac{m^{1/2}}{nb_n}\right) \) term in (20) comes from the replacement of \(S_{1n}(s-)\) by \(S_{1n}(s)\).
Indeed, for n sufficiently large, we have for some constant \(M >0\):
Using that \(C_{m,n}[S_{1n}(s), S_{2n}(t_2)] \rightarrow C[S_1(s), S_2(t_2)]\) a.s. and the fact that \(K_0\) is of bounded variation we can make an argument completely analogous to the one used for \(R_n(t_1,t_2)\) in (12). This gives the following bound for \(\widetilde{R}_n(t_1,t_2)\):
Now,
by the Lipschitz continuity of C (see Nelsen 2006).
The supremum of the first term on the right-hand side is \(O(n^{-1/2} (\ln \ln n)^{1/2} + m^{-1/2})\) a.s. (see the proof of Theorem 1 in Janssen et al. (2012)) and the supremum of the other two terms is \(O(n^{-1/2} (\ln \ln n)^{1/2})\) a.s. So the bound for \(\widetilde{R}_n(t_1,t_2)\) is
Combining this with (19) and (20), we obtain
For the first term in the right-hand side, we write
with \((\theta _{1n} (t_1), \theta _{2n} (t_2))\) denoting an intermediate point between \((S_{1n} (t_1), S_{2n}(t_2))\) and \((S_1(t_1), S_2(t_2))\). Now using similar ideas as in Lemma 3 of Janssen et al. (2012) and the convergence rate of the Bernstein approximation given in (5) of the same paper, we obtain
where the \(Y_{m} (u_1,u_2)\) are independent zero mean random variables which are bounded. With this
where
Now
by the boundedness of the \(Y_{mi}\) and the fact that \(K_0\) is of bounded variation.
Hence,
and
The imposed conditions in (d) of Theorem 1 and the extra condition \(m^{1/2} b_n \rightarrow \infty \) imply that all the terms in the right-hand side vanish after multiplication with \((nm^{-1/2} b_n)^{1/2}\). \(\square \)
Proof of Theorem 3
Linearization of the ratio gives that \(\widehat{\theta }_m(t_1,t_2) - \theta (t_1, t_2)\) has the same limiting distribution as
Multiplication with \((nm^{-1/2} b_n)^{1/2}\) gives that the second term is \(o_P(1)\) (by Theorem 2) and that the first term is asymptotically normal (by Theorem 1). \(\square \)
About this article
Cite this article
Abrams, S., Janssen, P., Swanepoel, J. et al. Nonparametric estimation of the cross ratio function. Ann Inst Stat Math 72, 771–801 (2020). https://doi.org/10.1007/s10463-019-00709-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-019-00709-3