Statistical Considerations of Length Bias for Evaluating Diagnostic Tests in Screening Studies

A diagnostic test in a screening study detects a clinical condition of interest in its asymptomatic stage. Evaluating the diagnostic test in a screening study is a challenging task since a diagnostic test is evaluated based on its analytical and clinical performance. In order to evaluate clinical performance of a diagnostic test in a screening study, it is crucial to investigate clinical outcome studies such as survival studies. Furthermore, there are important biases to be considered in a screening study. In this paper, statistical issues associated with screening studies are discussed, and statistical adjustment for screening-related bias, which is called length bias, is presented. Both Vardi’s bias-adjusted nonparametric maximum likelihood estimator and linear combination estimators have shown to adjust length bias successfully, and generate bias-adjusted survival curve close to the true survival curve. Finally, some practical issues associated with early detection are also presented. Journal of Biometrics & Biostatistics J o ur al of Bio metrics & Bistatis t i c s


Introduction
In general, a diagnostic test is evaluated based on its analytical and clinical performance. Clinical performance examines how diagnostic tests would perform in a clinical practice, which leads to how they would be used in a clinical practice. Diagnostic tests in these days have broad applications in terms of analytes as well as the utility of the test results. Diagnostic tests are used to determine the status of the condition of interest such as disease, infection, or any other specified condition of clinical interest.
The benefit of early detection of the disease has been recognized and addressed in many different clinical conditions, especially when treatment procedure following early detection leads to successful outcomes, whereas late detection of the disease could lead to much worse outcomes. Due to this reason, early detection of the disease has gained a lot of interest in clinical practice. Screening is the process by which asymptomatic people are tested to determine whether they are likely to have a particular disease [1]. The goal of screening is to detect and treat the disease early to benefit the patients. Therefore, successful screening is to detect the condition early and also to decrease causespecific morbidity and mortality rates.
The phenomenon, called length biased sampling, is well-known in screening studies for chronic diseases. Length biased sampling introduces bias, called "length bias", in the study. In a screening study, diagnostic tests are applied to asymptomatic individuals to detect the condition of interest before symptoms appear. The longer the preclinical stage an individual has, the more likely the individual is to be detected in a screening study, leading to a length biased sample. Survival estimates based on length biased samples are overestimated and the success of screening would be overstated. Therefore, the bias should be adjusted to estimate the true survival curve.
In this paper, statistical issues associated with length biased sampling and bias-adjustment procedures are presented. Different bias-adjusted survival estimators are compared in the presence of length biasing. Additional issues associated with the use of diagnostic test in a screening study are also discussed.

Length Biased Sampling
Length biased sampling occurs when the probability of selecting a sample is proportional to its lifelength. Suppose the original observation x has f(x) as the probability density function, then the length biased distribution of f(x), denoted as g(x), is written as Therefore, samples from g(x) are called "length biased samples."

Statistical Consideration for Length Bias and Bias Adjustment
A nonparametric maximum likelihood estimator (NPMLE) of survival in a length biased sample is given by [2]. This bias adjusted NPMLE is based on the mixture of the two independent samples: one from the original distribution and the other from the length biased distribution.
Suppose that the original observation x has f(x) as the probability density function, and the length biased observation y has g(y) as the probability density function. Let η 1i be the multiplicity of the x's and η 2i be the multiplicity of the y's at t i . After pooling samples, {x 1 , x 2 ,…., x m , y 1 , y 2 ,…., y n } and ordering the samples, denote the ordered observations Then its length biased distribution is also gamma distribution with parameters α+1 and β.
In this simulation study, the parameters, α and β, are set at α = 1.5 and β =1.0. Therefore, for a simulation, a gamma distribution with shape parameter 1.5 and scale parameter 1.0 is considered as f(x). Then, its length biased distribution, g(y), becomes a gamma distribution with shape parameter 2.5 and scale parameter 1.0. and true value becomes larger as length biasing is increased. With 60% length biasing, there are noticeable differences between  EDF F and other estimates ( Figure 3).
In summary, the simulation study shows that length bias in observed samples are successfully adjusted using both Vardi's estimator and linear combination estimator which includes Cox estimator. Both estimators successfully adjust bias and generate survival curve close to the true survival under different proportions of length biasing.

Discussion
Screening tests have a tendency to detect more slowly growing (less aggressive) cancers because they are in the asymptomatic population longer than the more rapidly growing ones, which quickly become symptomatic and no longer need screening to be detected. Cases detected in a screening study are likely to have a better prognosis, resulting in overestimated survival. These cases are called length biased samples.
Understanding the natural history of the disease is crucial for evaluating the performance. Biological heterogeneity of the disease is found to be associated with length bias [3] and it is discussed in the study as {t 1 , t 2 ,...., t h }. Note that h ≤ m + n for tied observations. Then the probability of the data is written as The likelihood function becomes, Using Lagrangean multiplier and a routine maximization procedure [2], the unique solution of (1) is where  µ is the solution of The unique Vardi's estimate is obtained by solving equation (3) for  µ first and then plugging the value of  µ into equation (2) to obtain We also consider the linear combination of two estimators for mixture samples. Thus, compute the EDF (empirical distribution function) estimator based only on samples from F(x) and compute Cox estimator based only on samples from G(y). If we only obtain m samples from F, then the resulting survival function is the standard EDF estimator,  EDF F .  . The explicit form for LCE from the mixture distribution is written as where k 1 and k 2 indicate the proportions of samples from different distributions.
As a naïve estimator of F, ignoring length bias, we consider empirical distribution function estimate (EDF) from the observed mixture samples.
where h ≤ m + n (for tied observations).

Simulation Study
Suppose the observations are from gamma distribution with parameters α and β.   of neuroblastoma [4]. Length bias results from biological heterogeneity of disease. For example, some patients have rapidly growing aggressive cancers and the others have slowly growing and less aggressive cancers. In order to account for different subtypes (heterogeneity) of disease in asymptomatic population, the multiple-type heterogeneous model was presented and discussed in [3].
Extensive statistical work has been done in the area of length biased samples and length biased distributions [5,6,7,8,9] which includes adjusting for bias adjustment [2,10,11] However, to date, these statistical approaches have not been widely implemented in study design and in evaluating screening studies in practice.
Early detection capability is the important feature of screening. For example, [12], discussed a potential utility of CA-125 for early detection of ovarian cancer. For colorectal cancers, there were fecal  occult blood tests [13] and fecal DNA tests [14]. For breast cancer, mammography has been widely used. However, early detection could lead to overdiagnosis of disease. Overdiagnosis detects 'cases' that do not cause symptoms or increase morbidity/mortality. Jorgensen and Gotzsche [15] said that there was a 30% overdiagnosis for lung cancer after long term follow-up of patients screened by radiography. Witte [16] said that in prostate cancer testing using PSA (prostate specific antigen), between 20-60% of early stage prostate cancers detected using PSA might be considered "overdiagnosis." Another important type of screen-related bias is lead-time bias. With lead-time bias, the early detection seems to prolong survival even when there is no actual improvement in survival. This is due to the fact that the cases are found at an earlier point in the natural history.
In this paper, a length bias, which is one of major statistical challenges in a screening study, has been discussed, and bias adjustment procedures using statistical methods are presented. The goal of screening is to improve patient outcomes by detecting conditions of interest early and treating/managing them appropriately. In a screening study, the basic requirement of the screening test (diagnostic test) should be accuracy, that it should detect cases without error. However, for the screening procedure to be successful, the test outcomes should lead to the successful patient outcomes. It is crucial to identify and adjust length bias in a screening study in order to successfully evaluate the clinical performance of the diagnostic tests.