ScholarWorks @ UTRGV ScholarWorks @ UTRGV Bayesian Survival Analysis of Genetic Variants in PTPRN2 Gene Bayesian Survival Analysis of Genetic Variants in PTPRN2 Gene for Age at Onset of Cancer for Age at Onset of Cancer

globally. It is the leading cause of death in both developed and emerging economies [1]. In 2012, there were 14.1 million new cancer cases, 8.2 million cancer mortalities, and 32.6 million people living with cancer globally [2]. Cancers are caused by a complex interplay between genetic predisposition and environment. Family and twin studies have shown the estimated effects of heritability of the colorectal cancer (35%) [3], breast cancer (25-30%) [3-5], prostate cancer (42-58%) [3,6], Abstract Background: The protein tyrosine phosphatase, receptor type, N polypeptide 2 (PTPRN2) gene may play a role in cancer; however, no study has focused on the associations of genetic variants within the PTPRN2 gene with age at onset (AAO) of cancer. Methods: This study examined 220 single nucleotide polymorphisms (SNPs) within the PTPRN2 gene in the Marshfield sample with 716 cancer cases (any diagnosed cancer, excluding minor skin cancer) and 2,848 non-cancer controls. Multiple logistic regression model and linear regression model in PLINK software were used to examine the association of each SNP with the risk of cancer and AAO, respectively. For survival analysis of AAO, both classic Cox regression and Bayesian survival analysis using the Cox proportional hazards model in SAS v. 9.4 were applied to detect the association of each SNP with AAO. The hazards ratios (HRs) with 95% confidence intervals (CIs) were estimated. Results: Single marker analysis identified 10 SNPs associated with the risk of cancer and 9 SNPs associated with AAO (p < 0.05). SNP rs7783909 revealed the strongest association with cancer (p = 6.52x10 -3 ); while the best signal for AAO was rs4909140 (p = 6.18x10 -4 ), which was also associated with risk of cancer (p = 0.0157). Classic Cox regression model showed that 11 SNPs were associated with AAO (top SNP rs4909140 with HR = 1.38, 95%CI = 1.11-1.71, p = 3.3x10 -3 ). Bayesian Cox regression model showed similar results to those using the classic Cox regression (top SNP rs4909140 with HR = 1.39, 95%CI = 1.1-1.69). Conclusions: This study provides evidence of several genetic variants within the PTPRN2 gene influencing the risk of cancer and AAO, and will serve as a resource for replication in other populations.


Introduction
Cancer continues to remain a significant public health issue ISSN: 2469-5831 Personalized Medicine Research Project Cohort -Study Accession: phs000170.v1.p1 (dbGaP).The details about these subjects were described elsewhere [18,19].Cancer cases were defined as any diagnosed cancer excluding minor skin cancer; while AAO cancer was defined by date of the earliest cancer diagnosis in the registry.Social factors used in this study were age, gender, alcohol use in the past month (yes or no), and smoking status (never smoking, current smoking and past smoking).Obesity was determined as a body mass index (BMI) ≥ 30.Genotyping data using the ILLUMINA Human660W-Quad_v1_A are available for 3564 Caucasian individuals (716 cancer cases and 2848 controls).Within the PTPRN2 gene, 220 SNPs were available and therefore included in the analysis.

Linear and logistic regression models in PLINK software:
The categorical variables were presented as frequencies and percentages.The continuous variables were reported as the means ± standard deviation.Quality-control and association analyses were implemented using PLINK V1.07 [20].First, Hardy-Weinberg equilibrium (HWE) was tested for all the SNPs using the controls; then, minor allele frequency (MAF) was determined for each SNP.Multiple logistic regression analysis of each SNP with risk of cancer as a binary trait, adjusted for sex, age*age, alcohol use, smoking status, and obesity, was performed using PLINK; while the asymptotic p-values were observed and the odds ratio (OR) and 95% confident interval (CI) were estimated.Furthermore, AAO values were firstly log transformed, then multiple linear regression analysis of each SNP with log transformed AAO of cancer, adjust for sex, alcohol use, smoking status, and obesity was performed; while the asymptotic p-values were observed and the regression coefficient (β) and 95% CI were estimated.To control for type I errors arising from multiple hypothesis testing, a false discovery rate (FDR) was defined in Benjamini and Hochberg [21] as the expected proportion of false discoveries.In addition, empirical p-values were generated by 100,000 permutation tests using Max (T) permutation procedure.In this procedure, pointwise estimate of an individual SNP's significance (empirical pointwise p-values) was calculated.

Cox proportional hazards models in PROC PHREG:
The proportional hazards model or Cox regression model, is widely used in the analysis of time-to-event data to explain the effect of explanatory variables on hazard rates.The PHREG procedure fits the Cox model by maximizing the partial likelihood function; this eliminates the unknown baseline hazard and accounts for censored survival times.In the Bayesian approach, the partial likelihood function is used as the likelihood function in the posterior distribution [22].In the non-Bayesian analysis, the Akaike information criterion (AIC) was used as a measure of goodness of model fit that balances model fit against model simplicity [23,24].Bayesian Cox regression can be requested by using the BAYES statement in the PHREG procedure.A Markov chain Monte Carlo (MCMC) method by Gibbs sampling was used to simulate samples from the posterior distribution.In a Bayesian analysis, a Gibbs chain of samples from the posterior distribution was generated for the model parameters.Summary statistics (mean, standard deviation, quartiles, the highest posterior density (HPD) and credible intervals, correlation matrix) and convergence diagnostics (Geweke; the effective sample size; and Monte Carlo standard errors) were computed for each parameter, as well as the correlation matrix and the covariance matrix of the posterior sample.Trace plots, posterior density plots, and autocorrelation function plots were created for each parameter [17].The hazards ratios (HRs) with 95% CIs were estimated.
For the present study of the AAO, the normal prior was chosen for the coefficients.In Bayesian analysis, a deviance information criterion (DIC) is available for model comparison instead of AIC.DIC is a hierarchical modeling generalization of the AIC; while DIC is intended as a generalization of AIC [25].The following program showed one SNP rs4909140, sex, alcohol use, smoking status, and obesity with the AAO of cancer.The rs4909140 has 3 genotypes -G_G, G_T and T_T, respectively; while the T_T genotype was considered as the reference.

Genotype quality control and descriptive statistics
We removed 1 SNP with HWE p < 10 -4 .All other 219 SNPs were in HWE with MAF > 1% in the controls.The demographic characteristics of the subjects in the study are presented in Table 1.There were slightly more females than males in both cases and controls.The age ranged from 46 to 90 years and AAO of cancer ranged from 23 to 90 years.

Multiple linear and logistic regression analyses using PLINK
Using a single marker analysis, we identified 10 SNPs associated with the risk of cancer and 9 SNPs associated with AAO (p < 0.05) in the Marshfield sample (Table 2).SNP rs7783909 revealed the strongest association with cancer (p = 6.52x10 -3 ); while the best signal for AAO was rs4909140 (p = 6.18x10 -4 ), which was also associated with risk of cancer (p = 0.0157).For the 10 SNPs associated with risk of cancer, the FDR was 90%; while the FDR for the two AAO mostly associated SNPs (rs4409140 and rs1670340) were 21% and 39%, respectively.Furthermore, we conducted a permutation test in PLINK and found that all the cancer and or AAO associated SNPs had empirical point wise p-values p < 0.05 using a permutation test (Table 2).

Classic and Bayesian Cox regression analyses using PROC PHREG
Classic Cox regression model showed that 11 SNPs were associated with AAO (top SNP rs4909140 with HR=1.38, 95%CI = 1.11-1.71,p = 3.3x10 -3 ).The HRs based on the Bayesian survival analyses revealed similar results to those using the non-Bayesian analyses results (Table 3).The DIC for the 11 SNPs in the Bayesian analyses were similar to those of AIC using classic Cox model.
The trace plot, posterior density plot, and autocorrelation function plot based on Bayesian analysis (Figure 1) indicated that the Markov chain had stabilized with good mixing for rs4909140.The posterior density plot, which estimates the posterior marginal distributions for the 7 regression coefficients showed a smooth, unimodal shape for the posterior marginal distribution (Figure 2).Table 4 shows the posterior summary of rs4909140 with HR = 1.39, 95% CI = 1.1-1.69.

Discussion
In this study, we identified 10 SNPs associated with the risk of cancer and 9 SNPs with AAO using the PLINK software and 11 SNPs Odds ratio for the risk of cancer based on logistic regression using PLINK; f p-value for the risk of cancer based on logistic regression; g empirical p-value for the risk of cancer generated by 100,000 permutation tests using Max (T) permutation procedure implemented in PLINK; h Regression coefficient for AAO of cancer based on linear regression using PLINK; i p-value for AAO of cancer based on linear regression ; j empirical p-value for AAO of cancer generated by 100,000 permutation tests using Max (T) permutation procedure implemented in PLINK.Previous studies have showed that PTPRN2 is an autoantigen for type 1 diabetes which is an insulin-dependent diabetes mellitus and autoimmune disease; while PTPRN2 is reactive with type 1 diabetes patient sera and is likely to be an islet cell antigen useful in the preclinical screening of individuals for the risk of type 1 diabetes ISSN: 2469-5831 [10,[26][27][28].Animal model studies revealed that this gene may be functioned in the regulation of insulin secretion [11,[29][30][31][32]. Recently, another study suggested that PTPRN2 (IA-2beta) is one of the genes potentially relevant to insulin and neurotransmitter release [33].Furthermore, several studies have reported that the PTPRN2 gene may be involved in squamous cell lung cancer tissue [12], metastatic prostate cancer [13] and breast cancer [14].However, the mechanism is not clear.It has been hypothesized that recurrent or clonal somatic mutation underlies the initiation of autoimmune disease such as type 1 diabetes [34]; while many cancers elicit antibodies that are also found in autoimmune diseases [35].Therefore, PTPRN2 may be one of the mechanisms linking autoimmune diseases to cancers.In addition, insulin, insulinlike growth factor 1, and insulinlike growth factor 2 signaling through the insulin receptor and the insulinlike growth factor 1 receptor could induce tumorigenesis, accounting to some extent for the link between diabetes, metabolic syndrome and cancers [36,37].

Iteration
However, no association study of genetic polymorphisms within the PTPRN2 gene with the risk of cancer and AAO has been conducted.The present study provides the first evidence of several genetic variants within the PTPRN2 gene is associated with the risk and AAO of cancer using multiple logistic and linear regression models.We identified the main effects and permutation p-values for single SNPs.Furthermore, we conducted Bayesian survival analysis of genetic variants with AAO.Bayesian methods may have some advantages in flexibility and incorporating information from previous studies.For example, Bayesian method may provide an alternative approach to assessing associations that alleviates the limitations of p-values at the cost of some additional modelling.It has recently made great inroads into many areas of science, including the assessment of associations between genetic variants and disease or related phenotypes [15].We also realized some limitations in this study.First, the definition of cancer status in the Marshfield sample was broad (including any diagnosed cancer omitting minor skin cancer).It would be more informative to investigate the association of PTPRN2 gene with particular types of cancer.Furthermore, our current findings might be subject to type I error and findings need to be replicated in additional samples.

Conclusion
This study provides evidence of several genetic variants within the PTPRN2 gene influencing the risk and AAO of cancer.Future functional study of this gene may help to better characterize the genetic architecture of cancers.

Table 1 :
Descriptive characteristics of cases and controls ISSN: 2469-5831

Table 2 :
SNPs within the PTPRN2 gene associated with risk and age at onset of cancer using PLINK (p < 0.05) a Physical position(bp); b Minor allele; c Minor allele frequency; d Hardy-Weinberg equilibrium test p-value; e

Table 3 :
SNPs within the PTPRN2 gene associated with AAO of cancer using PROC PHREG (p<0.05)Physical position(bp); b Minor allele; c Tested Genotype comparing with the reference; d Hazards ratio for the tested genotype based on classic Cox regression analysis using PROC PHREG; e p-value for the tested genotype based on classic Cox regression analysis; f Akaike information criterion (AIC) value based on classic Cox regression analysis; g Hazards ratio for the tested genotype based on Bayesian Cox regression analysis; h Deviance information criteria (DIC) value based on Bayesian Cox regression analysis. a

Table 4 :
Posterior summary and hazard ratio for rs4909140 Genotype comparing with the reference; b Regression coefficient for G_G genotype comparing with T_T; c Standard error for the regression coefficient; d Lower 95% HPD for the regression coefficient; e Upper 95% HPD for the regression coefficient; f Hazards ratio for G_G genotype comparing with T_T; g Lower 95% HPD of hazards ratio; h Upper 95% HPD of hazards ratio. a