Wavelet estimation of copula function based on censored data

In this paper, we consider wavelet analysis to obtain an estimator of a copula function based on censored data. We show that optimal convergence rates for the mean integrated squared error (MISE) of linear wavelet-based function estimators are exact under right censoring model. Moreover, we derive asymptotic formulae for MISE. Finally, the simulation results and the analysis of real data validate the proposed procedure.


Introduction
Recently, copulas and their applications in statistics have become a rather popular phenomenon. For a long time, statisticians have been interested in the relationship between a multivariate distribution function and its lower-dimensional margins. When it comes to analyzing the dependence between random variables, the copula function becomes a very useful tool. According to Sklar's theorem [22], if H is a bivariate distribution function with margins F(x) and G(y), then there exists a copula C such that H(x, y) = C(F(x), G(y)). Statisticians are interested in copulas for two reasons: (1) Inspecting how the measures of dependence are scale-free and (2) constructing families of bivariate distributions. For more information about this flexible tool and dependence modeling, we refer to [16] and [20]. A detailed survey on copula models can be found in [24]. Copula models have many applications in finance and insurance, some of which are considered in [2,7,10,11,13]. For more studies on copulas in econometrics, we refer to [3] and [21].
In recent years, the analysis of censored data has become very popular. In many sciences such as statistics, engineering, finances, and other areas, censoring is a condition in which observation is partially known, that is, censoring occurs when a value is outside the range of a measuring instrument, particularly in life-testing data in medicine, where observations often survive or fail at the end of the test (see, e.g., [12] for good examples of the censoring method). Let T 1 , T 2 , . . . , T n be i.i.d. survival times with probability density function f and common distribution function F. Typically, instead of observing the variables of interest, we are able to observe the i.i.d. censoring times C 1 , C 2 , . . . , C n . Let G be the common distribution function of the latter. Also, it is assumed that T i and C i are independent. Let Y i = min(T i , C i ) for i = 1, . . . , n, and define the indicator function This definition denotes the right-censoring. Several approaches have been proposed to deal with the estimation of copulas under uncensored or censored data. See [9] for nonparametric estimation and also [23]. In [24] consistent estimators are considered under some restrictions on the dependence structure of the censored data. For right-censored data, less work has been done on nonparametric copula estimators. Recently, in [15] a new class of nonparametric estimators of copula function for bivariate censoring is described. The aim of this paper is to estimate the copula function via wavelets based on censoring.
The theory of wavelets and their applications in statistics and other sciences have become an important technique. We can find several applications of wavelet estimators for copula functions in different contexts. In [14] smoothed empirical copula estimators considered using ranks and wavelets. See [1] for more information about the procedure to estimate copula functions using wavelets, where the problem of multivariate copula density estimation by wavelet methods has been described. In [19] the wavelet approaches were used, basically following the same route as in [14]. In [5] multiwavelets for estimating copula density were used, and in [6] the same work was prepared based on a Legendre multiwavelets. There are many different papers that considered wavelets, and we proposed some of the most important.
Currently, an extension of wavelets and copulas is increasingly popular as an alternative to many methods such as those briefly discussed. In this paper, we propose a linear wavelet-based estimation for copulas with right censored data in the observation T 1 or T 2 . Let (T 11 , T 21 ), . . . , (T 1n , T 2n ) be independent and have a joint distribution function (d.f.) F(t 1 , t 2 ). As usual, assume that T 1 and C 1 are independent, where C 1 is the censoring variable associated with the variable T 1 . Rather than observing the variables of interest (T 11 , T 21 ), . . . , (T 1n , T 2n ), in the randomly right censor model, (Y i , δ i , T 2i ) is observed. We apply the method of [17], which provides a MISE expansion similar to a density function over (-∞, T] for any fixed T < τ H(·,t) , where τ H(·,t) = inf{u : H(u, t) = 1} ≤ ∞ is the least upper bound for the support of H(·, t), the distribution function of (Z 1 , t).
The rest of the paper is as follows. In Sect. 2, we define the elements of copula density and wavelet transform and provide the wavelet estimators for the copula based on the censored data. The main results are described in Sect. 3, and the simulation study for the proposed estimator is provided in Sect. 4. The proofs are presented in the Appendix.

Preliminary notations
In this section, we provide some necessary concepts about the general framework of this paper. Section 2.1 presents preliminary assumptions and some notations about copula functions. All the major definitions and facts about the wavelets used in the paper are presented in Sect. 2.2. At the end of the section we introduce the estimators for the main result.

Copula function
Copula had been first established in [22] and improved very fast in many spaces. Here we give a brief definition of a copula function in the two-dimensional case. Any extension of the results to higher dimensions is straightforward. Consider a random vector (T 1 , T 2 ) with cumulative distribution function F(t 1 , t 2 ) = P(T 1 ≤ t 1 , T 2 ≤ t 2 ) and margin distribution functions F 1 and F 2 . The relation between these variables is of interest, so the copula function C can be formed as follows: The marginals F 1 and F 2 have uniform distribution on (0,1), and if they are continuous, then C is unique and coincides with the distribution function of the pair (U, V ) = (F 1 (T 1 ), F 2 (T 2 )). In practice, F is unknown. The advantage of using copula is that the joint distribution function (F(t 1 , t 2 )) can be constructed by using marginals (F 1 and F 2 ) when they are from different classes of distributions. Let (T 11 , T 21 ), . . . , (T 1n , T 2n ) be a random sample from the unknown distribution F. Denote by F 1n and F 2n the empirical distributions associated with F 1 and F 2 . A first step in selecting an appropriate class of copulas consists of plotting the pairs ( R i n , S i n ) = (F 1n (T 1i ), F 2n (T 2i )), i = {1, . . . , n}. Here R i is the rank of T 1i among T 11 , . . . , T 1n , and S i is the rank of T 2i among T 21 , . . . , T 2n . The motivation behind this approach is that the pseudo-observations (R i /n, S i /n) are close substitutes to the unobservable pairs ( It is obvious that in real analysis one of our variables (or even both of them) may be subject to be censored, and one observes a minimum between it and another (censoring) random variable, denoted by C j , so Y j = min(T j , C j ) and δ j = I T j ≤C j for j = 1, 2. The i.i.d. replication vector (Y 1i , Y 2i , δ 1i , δ 2i ) 1≤i≤n denotes the random sample of (Y 1 , Y 2 , δ 1 , δ 2 ). Clearly, many different estimators can be considered for a distribution function based on censoring. The one used in this paper has the form where W in are random weights designed to compensate asymptotically the bias caused by censoring and can be assumed in three different forms; see [15]. The weight considered in the present paper is .
In this form only T 1 is assumed to be censored, and then Y 2 = T 2 , δ 2 = 1 a.s., and C 1 is independent from T 1 . AlsoĜ, the Kaplan-Meier estimator of the censoring variable, is defined asĜ(t) = 1i:Y 1i ≤t (1 - , the weight W in can be seen as an approximation of W i = δ 1i 1-G(Y -1i ) . The other weights are discussed in [15]. They took W in = δ 1iĝ (Y 1i ), whereĝ is a consistent estimator of limit function g, estimated from the data, where g satisfies the condition

Wavelets
Wavelets and their applications are still an important subject in statistics. The term wavelet is used to refer to a set of orthonormal basis functions generated by dilation and translation of a compactly supported scaling function (father wavelet) φ and a mother wavelet ψ associated with an r-regular (r > 0) multiresolution analysis of L 2 (R), the space of square-integrable functions on the line.
It is assumed that j ≥ j o for some coarse scale j o ∈ N, which we take as l. We suppose that φ and ψ are bounded and compactly supported. For more on wavelets, see [8] and [18]. The wavelet expansion for f (x, y) can be written as where is a trend of an approximation, and For more details about D j 0 f (x, y) and the functions ψ j 0 k , see [14]. The coefficients α j 0 k and Some important cases of wavelets are the Haar, Daubechies, Shannon, Meyer, and Morlet wavelets (see [8]). We used the Haar and Daubechies wavelets in our simulation studies. Accordingly, by equation (1) the copula density c can be expanded with By the change of variables u = F 1 (t 1 ) and v = F 2 (t 2 ) we get Assuming that I = I(Y 1i ≤ T, T 2i ≤ T), when F 1 and F 2 (the marginal distributions) are known, a moment-based estimator of α j 0 k based on censored data is then given bŷ Then the wavelet-based estimator of c is given bŷ When F 1 and F 2 are unknown, their empirical distribution functions F 1n and F 2n are used. So the rank-based estimator is as follows: Now we can introduce the linear wavelet-based estimator of c based on the ranks as When we have no censoring, the definition reduces to that of the copula estimator introduced in [9]. We further denote by K any constant that may change from one line to another, does not depend on j, k, n, but depends on the wavelet basis and on c ∞ = sup (u,v)∈(0,1) |c(u, v)| and c 2 = c(u, v) 2 du dv.
Since we deal with the wavelet method, it is very common to consider Besov spaces as functional spaces because they are characterized in terms of wavelet coefficients as follows. Besov spaces depend on three parameters s > 0, 1 < p < ∞, and 1 < q < ∞ and are denoted by B s pq . Let f ∈ L 2 (R d ) (that is, L 2 (R 2 ) in our paper), and let s < r (wavelet regularity). Define the sequence norm of the wavelet coefficients of a function f ∈ B s pq by where (|β j,k | p ) 1/p = ( k∈Z 2 ∈S 2 |β j,k | p ) 1/p . We assume that the copula function c belongs to a Besov space.

Main results
In this section, we present the main results. The following theorems show that the waveletbased estimators based on censoring attain nearly optimal convergence rates over a large range of Besov function classes. We also show that our estimator obtains the optimal convergence rates under mean integrated squared error (MISE) by accepting some mild conditions. The mean integrated squared error (MISE) is defined by In view of decomposition (1) for c, it is obvious that The bias term in the equation can be bounded as in the proof of Lemma 1 in [4], Precisely, suppose that c belongs to the ball of radius M > 0 in the Besov space B s p,q (M), where the parameters s > 0 and p ≥ 2, 1 ≤ q ≤ ∞ or s > 2/p -1 and p ∈ [1, 2], 1 ≤ q ≤ ∞; also, s * can be defined as s * = s + 1 -2/p if p ∈ [1, 2] and s * = s otherwise. Also, by defininĝ c j 0 based on Eq. (3), we can easily show that We now look for an optimal upper bound for each of the above sequences. (3), and let j 0 be an arbitrary integer in N. Then

Lemma 1 Letα be as in
for some constant K 1 > 0 depending only on φ and either Also note that the proposed linear wavelet estimator of c j 0 isĉ j 0 (x, y) = k∈Z 2αj 0 k × φ j 0 k (x, y). Then Note that j 0 must be chosen so that 2 j 0 √ n.
Proof of Lemma 1 The proof is similar to optimality results in Sect. 5 of [14].
Theorem 1 Assume that the function φ is m-differentiable, and letc j0 be the copula density estimator of c defined in (5). Then there exists a constant K 1 > 0 such that, for a given (u, v) ∈ (0, 1) 2 and any level j 0 satisfying 2 j 0 (n/ log n) 1/2-1/2m , The error term associated with the use of ranks is negligible with respect to the usual error term as soon as 2 j 0 log(n). For the proof, see the Appendix. Now, combining the results of Lemma 1 and Theorem 1, we are in a position to express the main theorem of this section. Theorem 2 Let φ be a scaling function mentioned in Sect. 2.2 having m derivatives with m > 1 + 1/s * , and for arbitrary j 0 ∈ N, letc j 0 be the estimator in (5). Then there exists a constant K 2 > 0 such that, for all M ∈ (0, ∞), s > 2/p -1, and p, q ∈ [1, ∞), if j 0 satisfies 2 j 0 n 1/2+2s * , then sup c∈B s p,q (M) MISE(c j 0 , c) ≤ K 2 n -s * /1+s * .
Note that the procedure of estimation described is optimal on the Besov space. The simulation studies are applied in the next section. For the proof, see the Appendix.
Remark 1 This study can be regarded as an extension from complete data to randomly right censored data. If we assume that there is no censoring, that is, G ≡ 0 on (-∞, +∞), then δ 1i = 1 for all i = 1, 2, . . . , n, and the estimators defined by (4) and (3) are the same as those of [14] for the complete data case. Therefore our estimators can be regarded as an extension of those of [14] from complete data to randomly right censored data.

Simulation studies and analysis for real data
In this section, we conduct simulation studies to investigate the performance of the proposed estimator (5) for censored data and compare it with the kernel estimator by an average squared error. Wavelet estimators for the copula, for noncensored data, have been proposed in [14].

Simulation study
The simulation scheme consisted in three steps.
Step 1: We simulate a random sample of size n = 2000 to display some scatter and rankrank plots according to the following scheme.
(i) The marginal distribution of random variables T 1 and T 2 (F 1 and F 2 ) are simulated from three distributions: (a) F 1 is the B distribution with shape parameter α 1 = 0.5 and scale parameter α 2 = 0.7, F 2 is the exponential distribution with mean 1/3; N(0, 1), and F 2 is as the same as in part (a); and (c) both F 1 and F 2 are uniform distributions. (ii) For the dependence structure, we consider two copula families, Gumbel and Clayton with Kendall's tau τ = 0.5.  N(0, 1). In the bottom row the left one shows the same scatter plot, but the margins are considered as U (0, 1), and the right one shows the pairs of normalized ranks. It is worth mentioning that Fig. 1 displays scatter plots generated from the Gumbel copula, and Fig. 2 displays the same ones from the Clayton copula. As the number of observations is increased, the rank-rank plots are more unreadable. It is clear that for a random sample of size n = 2000 or more, the square in these plots could be completed filled, and all features of distribution had been lost. As a suggestion, we would consider the plots of the empirical copula function of the pairs (R i /n, S i /n). For the same data as in panel (d, bottom right) in Figs. 1 and 2, 3d-histograms of the relative frequencies of the pseudo-observations (R i /n, S i /n) are illustrated in Fig. 3. The plots show the relative frequency of n = 2000 pairs (R i /n, S i /n) in a 32 × 32 regular partitions of the unit square for Gumbel (left column) and Clayton (right column) copulas in two parts, full data (top row) and censored data (bottom row) with Kendall's tau τ = 1/2.
Step 3: We find the average squared error (ASE) calculated for several wavelet copula estimators, including (5) for different sample sizes n. The results for ASE and the total number of replications N = 100 for two different dependence copula parameters and for wavelet and kernel methods are shown in Table 1. The ASE criterion is defined as wherec (l) denotes an estimator of c at the lth replication. In this simulation study, we have used Daubechies's compactly supported wavelet symmlet8 and level j 0 = 5. The codes are in MATLAB environment using the Wavelab software. Our simulation study is done for sampling in three sizes n = 100, n = 300, and n = 500. The main result in the table is consistent with the conclusions of [14]. In both of these examples the sim- ulation results show that the wavelet estimator performs better than kernel estimators in terms of AMSE criterion. We can do the same work for some other copula functions like Gaussian, Frank, and Student copulas or for other wavelets such as Haar or Adelson wavelets.

Real data
Here we present an application of the proposed methodology for the data of [13], where the first observation is censored. The data consist of the indemnity payment (LOSS) and the allocated loss adjustment expense (ALAE) for 1500 general liability claims. The graphical representation of this data is considered in Fig. 5. Panels (a) and (b) show logarithm scale of the original data and the rank-rank plot, respectively. In panel (c) the 3D-histogram of Loss (censored) and Alae data is shown based on 16 × 16 grid. Panel (d) shows the wavelet-based estimator described in Sect. 2.
Due to our simulation results, the nonparametric procedure tries to provide a smooth copula estimate, and the wavelet-based estimator is a good estimator under the fact that a considerable copula family is Gumbel. Many authors who used this data in their researches claimed that the best representation of Loss and Alae data is the Gumbel copula.

Conclusions
Here we consider wavelet-based identification and estimation of a censored copula density function under a rank-based producer. We proposed a linear wavelet estimator and provided its asymptotic formulae for mean integrated square error. The wavelet methods The numerical performance of the proposed linear wavelet density estimators was illustrated on simulated datasets. Comparisons between full data and censored data for some different sample sizes were also given. Although using wavelet-based estimator of a copula function is very useful for underlying dependence structure, it does not cover the main conditions of parametric models. In future work we might also consider using a nonlinear wavelet-based copula density estimator for randomly censored data or using for goodness-of-fit testing.

A.1 Proof of Theorem 1
The proofs follow along with the lines of that in [14] for the density function. Compared to the uncensored case, the main difficulty here is to handle the weights W in . For this purpose, we use some assumptions of [15]. This is the direct approach to the copula function estimation problem under censoring. To complete the proof, we need some preparations.
In addition, according to Assumption 1, we have Similarly, the bound for (W in -W i )ξ k (Y 1i , T 2i ) follows: It remains to find the bound for the last part of T 2 : where the last equality follows from Assumption 1. At the end, the sharper bound for C 2 is (1 -Ĝ) 2 E Z 2 i ξ k (T 1i , T 2i )ξ k (T 1l , T 2l )