A MULTIPLE IMPUTATION APPROACH TO EVALUATE THE ACCURACY OF DIAGNOSTIC TESTS IN PRESENCE OF MISSING VALUES

Diagnostic tests are used to determine the presence or absence of a disease. Diagnostic accuracy is the main tool to evaluate a test. Four accuracy measures are used to evaluate how well the results of the test under evaluation (index test) agree with the outcome of the reference test (gold standard). These measures are sensitivity, specificity, positive predictive value and negative predictive value. Some subjects are only measured by a subset of tests which result in missing values. This leads to biased results. The mechanism of missing data could be missing completely at random (MCA), missing at random (MAR), or missing not at random (MNAR). Various methods such as the completecase analysis (CCA) and the maximum likelihood (ML) method are used to handle missing data. Also, imputation methods could be used. The article aims to use a multiple imputation approach to evaluate binary diagnostic tests with missing data under the MCAR mechanism. The proposed approach is applied to a real data set. Also, a simulation study is conducted to evaluate the performance of the proposed approach. 2 AHMED M. GAD, ASMAA A. M. ALI, RAMADAN H. MOHAMED


INTRODUCTION
Clinicians use diagnostic tests to determine the presence or absence of a disease. Accurate diagnosis of a disease is often the first step toward its treatment and prevention. The aim of diagnostic accuracy studies is usually to find out the ability of a test to differentiate between patients with and without a disease. The presence or absence of a disease is determined by a gold standard test [3]. The gold standard is the best available test with known results. Usually, the gold standard test is often invasive or expensive. The results of a new non-invasive test (the index test) are compared with the results of the gold test. The basic structure of all diagnostic test studies is to select a series of patients to receive the index test(s) then followed by the gold standard test.
Finally, the results of the index test and the gold standard are used to estimate the accuracy parameters. The accuracy measures express how well the results of the test under evaluation (index test) agree with the outcome of the gold standard test. These measures are the sensitivity, specificity, negative predictive value and positive predictive value Sensitivity of a test is the probability of testing positive given the presence of disease. Specificity of a test refers to the probability of testing negative given the absence of the disease. Positive predictive value of a test is the probability of disease given testing positive. Negative predictive value of a test is the probability of no disease given testing negative [9].
Missing values are very common in medical studies and in diagnostic tests. Missing data can be caused by several mechanisms. Missing data mechanism is called missing completely at random (MCAR) if the probability that an observation is missing is not related to any other patient characteristics. A mechanism is said to be missing at random (MAR) when the reason for the missingness is based on other observed patient characteristics. If the probability that an 3

ACCURACY OF DIAGNOSTIC TEST IN PRESENCE OF MISSING VALUES
observation is missing depends on information that is not observed, the missing data are called missing not at random (MNAR) [10].
There are various approaches that are used to handle missing data. These methods ranges from the complete case analysis (CCA) to the imputation techniques. The multiple imputation (MI) is very common as imputation method. The MI method consists of three steps. In the first step an M (M>1) complete (imputed) data sets are obtained by filling each missing value M times using a convenient imputation model. In the second step the analysis of the M data sets is conducted using standard complete-data techniques. In the third step the results from the M imputed complete data sets are combined in an appropriate way to obtain the estimates [10]. the estimation is thus restricted to individuals with observed Y2. The process is repeated for all other variables with missing values in a cycle. In order to stabilize the results, the procedure is generally repeated for several replications (e.g. 10 or 20) to produce a single imputed data set, and the whole procedure is repeated M times to give M imputed data sets [2].
The aim of this article is to use a multiple imputation approach to evaluate binary diagnostic values issue is presented. The proposed MI technique is described in Section 4. Section 5 is devoted to apply the proposed techniques to a dataset described in [9]. In Section 6, the performance of MICE in evaluating diagnostic tests is evaluated using a simulation study. Finally, discussion and conclusion are presented in Section 7.

EVALUATION OF THE ACCURACY MEASURES
There are four methods to evaluate the four accuracy measures. They are the simple proportion , where TP is the true Positive, FP is the false positive, FN is the false negative and TN is the true negative [6].
The logistic regression model may be used to estimate the accuracy parameters. The dependent variable (Y) is defined as the dichotomous results of the test. The presence or absence of disease, as defined by the "gold standard", is included as a binary explanatory variable (D) [5] as follows: vector of regression coefficient for all covariates (X) in the model.

DIAGNOSTIC TESTS WITH MISSING DATA
Barnhart and Kosinski [1] studied the use of subunit-level sensitivities and specificities for assessing the performance of a diagnostic test performed at the subunit level. They obtained an adjusted formula for estimates of the subunit sensitivities and specificities under the assumption that the subunit disease status is missing at random. They introduced a WLS approach for inference concerning correlated subunit-level sensitivities and specificities, especially for testing the equality of subunit-level sensitivities and the equality of subunit-level specificities.
Poleto et al. [9] presented data extracted from an observational study to diagnose endometriosis (D) by a laparoscopy procedure (gold standard) versus three diagnostic tests; Ultrasonography (US), Magnetic Resonance (MR) and Echo-Colonoscopy (EC). They considered models that ignore the missing data mechanism such as the complete case analysis (CCA) method.
Also, they considered models that include the missing data mechanism, such as the maximum likelihood methods (ML). The ML method showed better performance comparable to the CCA under missing completely at random (MCAR) and with high rates of missingness.
Zhang et al. [16] developed an EM algorithm-based approach to evaluate the diagnostic accuracy of multiple imperfect tests in the absence of a gold standard under either an MAR assumption or an MNAR mechanism. The tests are assumed to be independent conditional on the true disease status. They applied the proposed methods to a real data set from the National Cancer Institute (NCI) colon cancer family registry on diagnosing microsatellite instability for hereditary non-polyposis colorectal cancer.

THE PROPOSED APPROACH
Multiple imputation is achieved by three steps.  [14]. The interaction terms will be incomplete if the variables that make up the interaction are incomplete. A reasonable imputation method is considered to take into account the interaction during the imputation process [13]. The interactions are included in the third analysis method: the GEE method [11].
Then the variance estimate associated with Q ̅ is the total variance T = ̅ + (1 + 1 )

MR AND EC) WITH MISSING VALUES
We apply the proposed technique to the data presented in Poleto et al. [9]. The data consider 219 patients submitted to a laparoscopy procedure (gold standard) to diagnose endometriosis (D) were also evaluated with one or more non-invasive methods (ultrasonography (US), magnetic resonance (MR) and echocolonoscopy (EC)). The true status of the patients is determined by laparoscopically (gold standard). The frequencies of patients with positive (+) and negative (−) results are presented in Table 1.  Analyses that depend on all the observed data are not generally implemented in statistical packages, so many users pragmatically decide to consider only some subset of the data which can be analyzed using the available software.
Therefore, MICE is considered a suitable choice for comparing the three tests (US, MR and EC) because multiple imputation generally assumes that the data are, at the least, MAR. This approach can also be used on data that are MCAR [2]. Since the partial response rate is above 10%, this means that using an MICE framework to handle missing data is appropriate [14]. After generating the imputations, we have 35 complete datasets for the simple proportion method, for the LR model (because the percent of missing values = 35%) and 5 for the GEE method. We The results are presented in Table (      The results obtained using the ML approach under MCAR mechanism in Poleto et al. [9] is presented in Table 5

SIMULATION STUDY
The aim of this simulation study is to evaluate the performance of MICE for estimating diagnostic measures using the three proposed estimation methods.

Simulation Setting
A random reference test variable (D) is generated from the Bernoulli distribution with a mean equal to probability (p). Then, three binary diagnostic tests t1, t2, t3 for each subject are generated from the Bernoulli distribution. For the simple proportion method the mean (p1) = sensitivity (true The

Simulation Results
The relative bias (RB%) and the mean square errors (MSE) are presented in Table 6. For the EC test, the GEE method showed the best performance in the estimation of the four parameters according to the percent RB and the MSE. The LR model has lower RB% than the simple proportion method in the estimation of the four parameters, but the two methods have the same MSE.

DISCUSSION AND CONCLUSION
Accurate diagnosis of a disease or classification of a sub-type of a disease is often the first step toward its treatment and prevention. Missing data are common in diagnostic medical settings where some subjects are only measured by a subset of tests (Zhang et al., 2014). Considerable methods are developed to assess the diagnostic accuracy of the index tests whose performance is under evaluation in the presence of the missing values.
The missing data mechanisms can be classified according to the process causing missingness [10]. The missing data mechanisms are: MCAR, MAR, MNAR. Both MCAR and MAR are considered 'ignorable' missing data mechanism, as MCAR is a special case of MAR. MNAR is denoted as non-ignorable missing. Also, an important issue to determine the missing data pattern to select the proper imputation method. The missing data patterns are: univariate, monotone and arbitrary. In this article we considered the multiple imputation by chained equations (MICE) approach to evaluate binary diagnostic tests with missing data under the MCAR assumption. The MICE approach is achieved through three steps. Creating m imputed data sets in the first step. Analyzing the m imputed data sets using the simple proportion method, the LR model and the GEE method in the second step. Finally, combining the estimates from the second step. The applications are conducted using the data analyzed by Poleto et al. [9].
Poleto et al. [9] introduced an ML approach to evaluate three binary diagnostic tests in the presence of missing values. The results that are obtained by the MICE approach including the simple proportion method are consistent with those obtained through the LR model, also they are so close from those obtained by Poleto et al. [9] using the ML approach. The GEE approach has an improvement in the estimates of the parameters, and it also has a drop.
The simulation results of the current study showed that the MICE approach including the simple proportion method and the MICE approach including the LR model have the same performance according to the MSE. The GEE method showed the best performance in the estimation of the four parameters according to the percent RB and the MSE for one of the 3 tests. The percent relative bias of the estimates of the four parameters of the three tests doesn't exceed 3% for the three estimation methods.

CONFLICT OF INTERESTS
The author(s) declare that there is no conflict of interests.