Estimation of Survival Time of HIV/ AIDS Patients While Reducing the Number of Predictors using a Partial Correlation/ Association Technique

A number of techniques have been developed for variable reduction in the survival analysis. However, none of those techniques take into consideration the correlation among the predictors. This paper focuses on the reducing the number of predictors using the concept of partial correlation/ association and then estimating the survival time of HIV/ AIDS patients while. Partial correlation is preferred over pairwise correlation & multiple correlation as pairwise correlation takes into account only the linear relationship between two predictors at a time and multiple correlation may give misleading results if one variable is numerically related to other variable and both the variables again are taken simultaneously to find the association/ correlation. This can be avoided by controlling the confounding variable using partial correlation coefficient. Partial correlation is used to assess the correlation between a continuous predictor and a categorical predictor when the controlling variable is also categorical. ANCOVA is used to determine the association between two categorical predictors when the controlling variable is also categorical. AFT models are used to estimate survival times with/ without reduction in the number of predictors. It is observed that the estimated survival times are not affected by the reduction of predictors from 11 to 4. Also, the estimates obtained by the new proposed technique are found to be more efficient than the existing methods.

Presently, Human Immuno deficiency virus (HIV) has been assumed to have a measure of an international epidemic 1, 2 and it's notably most striking in Asia 3 . Worldwide about 36 Million persons have been estimated to be infected with HIV. It has also resulted in death of 1.2 million people globally 4 . Of all HIV infected people, 90% live in developing countries 1 . India has third largest HIV epidemic in the world. In India, 2.5 million individuals were estimated to have been living with HIV infection till 2014 4 . All the above mentioned figures designate the enormous measure of HIV and as a result validate a crucial requirement to conduct a depth research of the calamity of various prognostic factors to which the HIV patients are left unprotected to.
Anti-retroviral therapy (ART) plays a crucial role in treating the patients infected with HIV/AIDS. It has the capability to reduce mortality as well as morbidity rates among HIV patients, and as a result help in improving the life quality of patients. This treatment (ART) has been available freely for patients since 2004 in India. At ART clinics, HIV patients have access to nutritional advice, testing and counselling (HTC) by experts. They are needed to go through a CD4 count test half yearly 5 . Moreover, the government is also rolling out several campaigns to aware people about their appointments to increase the overall attendance 6 .
With the use of ART and vast improvements in preventive facilities , HIV epidemic in India has slowed down considerably with a 19% reduction in new HIV infections (130,000 in 2013), and a reduction of 38% in deaths contributed by AIDS between 2005 and 2013 4 . By the end of 2013, more than 700,000 people were receiving ART which is a great achievement in itself considering that it is the second largest number of people on treatment in any single country.
To check the intensity of HIV virus in the AIDS patients, one of the most important predictors is the CD4 cell counts. A healthy person is estimated to have 1200 CD4 cells per mm3 in blood. CD4 cells are deteriorated as immune system is attacked by HIV virus.
Center for Disease Control (CDC) recommended initiating Anti-retroviral therapy (ART) when the CD4 cells goes down 200 cells per . and Various observational studies and Two randomized controlled trials (RCTs) showed that initiation of ART at CD4 < 350 cells/mm3 results in significantly reduction of disease progression, mortality and the incidence of opportunistic diseases, especially TB and non-AIDS-defining conditions 6 (2016) conducted an analysis to analyze the correlation between AIDS restriction and metabolic pathway gene expression. They showed that HIV-1 postentry cellular viral from AIDS restriction genes can be coexpressed in human transcriptome microarray datasets 16  In this paper, we have proposed a partial correlation/association screening procedure for reducing the number of predictors to be included in the model without affecting the estimated survival time and also resulting in lower Standard error. So, this research paper aims at We have used SPSS and R software to fit the models and to compute the associations/ correlations among predictors.

Methods Used
Assuming that number of HIV/AIDS patients under ART as and survival time of patient as with survival function . Then assuming a linear relationship between and predictors, we have ... (1) Where ßo is the intercept ßi are the coefficients of "p"explanatory variables for ith patient. s is the scale parameter ei is a random variable used to model the deviation of values of loge (Ti) from the linear part of model.
It may be possible that two or more predictors are associated with a single predictor; we can replace these predictors with a single predictor with which they are associated. However, it may be possible that this correlation will give misleading results if one variable is numerically related to other variable and both the variables are taken simultaneously to find the association/ correlation. This bias can be avoided by controlling the confounding variable which is done by partial correlation. Partial correlation helps in measuring the strength of a relationship between two variables when controlling the effect of other variables. As a result, we can reduce the number of predictors included in the original model by excluding any spurious predictor which will lead in estimating the survival time of HIV/AIDS patients by AFTM. ANCOVA is used to test the association between two categorical predictors with the controlling variable being also categorical predictor.
To test the independence among two categorical predictors while the controlling predictor is also categorical, ANCOVA is used and then on the basis of p-value, we rejected or accepted null hypothesis that these variables are partially independent or not.

Model Comparison
Akaike's Information Criteria (AIC) is used to compare the different models where AIC is given by :-... (2) where k be the number of estimated parameters in the model and be the maximum value of the likelihood function for the model. The model with smaller value of AIC can be considered as a better model compared to other models under consideration.

data sources and results
Data of 767 HIV/AIDS patients undergoing ART in Dr. Ram Manohar Lohia Hospital, New Delhi, India, during the period of January 2004 to December 2014 were collected in this retrospective follow-up study and an Accelerated Failure time model, taking all the possible predictors, is fitted. Different AFT models are fitted to determine the best fitted model. Logistic model is found to be the best model. Results are shown in Table 1 and 2 below:-Two statistical criterion (likelihood ratio test and AIC) are used to compare these models. Gamma model nested exponential model, the Weibull model and the log-normal model ( Table  1).
It is observed that the AIC for logistic model in the presence of all predictors is least among all the fitted models. Also, LR test showed that the logistic model fits best as compared to other models (Table1). Also, Cox-Snell residual fits good to the Logistic Accelerated Failure Time model (Figure 1).So, we fit the survival data using Logistic AFT model and results are shown in the  Table 4 below:-Since p-value for each group is less than 0.05, alternative hypothesis can be accepted and can be concluded that these predictors are jointly dependent. After this partial correlation/ association between each possible pair are determined using suitable methods. The results are shown in the Table 5  Using this procedure, 4 predictors are selected. In the next step, survival times are estimated using AFTM with only these selected predictors. Again, for this model, Logistic model is found to be the best AFT model by LR test and AIC values. Again, Cox-Snell residual was plotted and it fits well as shown in figure 2. Results are presented below in Table 6:- To compare the proposed method with the existing ones, variables are also selected using LASSO and elastic-net variable selection methods. LASSO method selected Six variables, viz, Age, Occupation, Opportunistic Infections, State, Sex and drugs whereas Elastic-net method selects Age, Alcohol status, Smoking Status, Spouse, and Occupation. Also, the standard errors of the coefficients of predictors in the models fitted by different methods are compared. It is observed that the standard error of the coefficients chosen by the proposed method is less than the standard error of the coefficients in the true model and with the standard errors of the coefficients chosen by LASSO and Elastic-net methods as shown in the Table 7 below:-From table 8, it can be observed that AIC value is least for proposed method when compared with the AIC values for True, LASSO and Elastic-Net models and the methods discussed in 20 and 21 .

ConClUsion
Data of 767 patients diagnosed with HIV/ AIDS were collected in this analysis. The AFT modelling of the survival rates of patients from the prognostic factors selected Logistic as the best fitted model based on AIC. The Cox-Snell residual plot further confirmed the Logistic model as well fitted for evaluating the survival time of these 767 HIV/AIDS patients. Then, number of predictors have been tried to reduce by determining partial correlated/ associated variables/predictors using partial correlation/ ANCOVA. Using this technique, 4 predictors have been selected. It was disclosed that the model fitted with proposed technique is a good fit than the models fitted with all other methods. Therefore, from this study it can be concluded that it is not necessary to take all the predictors are not needed to take in the model to estimate the survival time of the patients as many prognostic factors may be partially correlated/ associated with each other and taking only one of them may satisfy the need of all of the remaining correlated/ associated predictors.

aCknowledMent
We sincerely acknowledge and thank Dr. Ram Manohar Lohia hospital for permitting us to use the HIV/AIDS database.