The genetic component of human longevity: New insights from the analysis of pathway‐based SNP‐SNP interactions

Summary In human longevity studies, single nucleotide polymorphism (SNP) analysis identified a large number of genetic variants with small effects, yet not easily replicable in different populations. New insights may come from the combined analysis of different SNPs, especially when grouped by metabolic pathway. We applied this approach to study the joint effect on longevity of SNPs belonging to three candidate pathways, the insulin/insulin‐like growth factor signalling (IIS), DNA repair and pro/antioxidant. We analysed data from 1,058 tagging SNPs in 140 genes, collected in 1825 subjects (1,089 unrelated nonagenarians from the Danish 1905 Birth Cohort Study and 736 Danish controls aged 46–55 years) for evaluating synergic interactions by SNPsyn. Synergies were further tested by the multidimensional reduction (MDR) approach, both intra‐ and interpathways. The best combinations (FDR<0.0001) resulted those encompassing IGF1R‐rs12437963 and PTPN1‐rs6067484, TP53‐rs2078486 and ERCC2‐rs50871, TXNRD1‐rs17202060 and TP53‐rs2078486, the latter two supporting a central role of TP53 in mediating the concerted activation of the DNA repair and pro‐antioxidant pathways in human longevity. Results were consistently replicated with both approaches, as well as a significant effect on longevity was found for the GHSR gene, which also interacts with partners belonging to both IIS and DNA repair pathways (PAPPA,PTPN1,PARK7, MRE11A). The combination GHSR‐MREA11, positively associated with longevity by MDR, was further found influencing longitudinal survival in nonagenarian females (p = .026). Results here presented highlight the validity of SNP‐SNP interactions analyses for investigating the genetics of human longevity, confirming previously identified markers but also pointing to novel genes as central nodes of additional networks involved in human longevity.


| INTRODUCTION
Association studies can identify the association of individual gene variants to a given phenotype. Nevertheless, such analysis is unable to explain the biological complexity of several diseases and complex phenotypes such as human aging and longevity.
Recent advances in DNA technology and association studies have uncovered a very large number of common susceptibility variants for complex traits; however, the translation into practice of their role is complicated by the evidence that such variants operate together to influence the final phenotype. The major challenge is represented by the relatively small genetic contribution to the variation in phenotype, which in human lifespan was estimated to be approximately 25% (Herskind et al., 1996) determined by many, mostly still uncharacterized, genes (Deelen et al., 2013), possibly belonging to conserved pathways (Johnson, Dong, Vijg & Suh, 2015). Single-SNP analyses may miss such a complexity, primarily because if a genetic factor operates through a mechanism involving multiple genes, and is also affected by environmental factors, the single investigation may not examine statistical interactions between loci that are informative about the biological and biochemical pathways underpinning the phenotype. On the contrary, SNP interactions may carry more information about the phenotype than those observed from individual SNPs alone (Gerke, Lorenz & Cohen, 2009;Su et al., 2015). Assessing SNP-SNP interactions at the gene level co-occurring within a specific phenotype and not due to linkage disequilibrium (LD) can overcome this problem, possibly finding specific subprocesses more strongly associated with the phenotype than single-SNP analysis (Cordell, 2009).
However, the search of all possible SNP-SNP interactions is challenging from an analytical point of view, because of the number of pairwise statistical tests to be performed. Only recently, significant advances in statistical approaches make it possible to analyse multiple SNPs in large genetic data sets, considering the main pathway to which the gene belongs and possible covariates, as required in the analysis of complex traits (Curk, Rot & Zupan, 2011).
Several SNPs in genes belonging to distinct pathways have been associated with the longevity phenotype (Soerensen et al., 2012;Dato et al., 2013 and references therein, Rose et al., 2015;Crocco, Montesanto, Passarino & Rose, 2016;De Luca, Crocco, De Rango, Passarino & Rose, 2016). GWAS of human longevity in worldwide samples (North America, Europe and very recently China) generally failed to give new insights into genetic determinants of human longevity: only the TOMM40/APOE/APOC1 locus, associated with longevity, was replicated in different populations (Deelen et al., 2014;Lin et al., 2016;Newman et al., 2010;Sebastiani et al., 2012), while rs2149954 on 5q33.3 (Deelen et al., 2014;Zeng et al., 2016) and the FOXO3A locus (Broer et al., 2015 and references therein) are the other signals showing population-specific associations. Some authors attempted to analyse epistatic intragenic (Tan, Soerensen, Kruse, Christensen & Christiansen, 2013;Zeng et al., 2010) or intergenic effects (Napolioni, Giann ı, Carpi, Predazzi & Lucarini, 2011a;Napolioni et al., 2011b), however comprising a limited number of genetic variants. Recently, the analysis of the joint effect of a SNP set grouped by pathway or gene region (pathway/group-based analysis) on human longevity was carried out by Deelen and co-authors (Deelen et al., 2013), who analysed GWAS data of 1,021 SNPs in 68 genes from the insulin-Igf1 signalling (IIS) pathway and 88 SNPs in 13 telomere maintenance (TM) pathway genes. The comparison of genotype data between 403 unrelated nonagenarians from the Leiden Longevity Study and 1,670 younger population controls confirmed the role of genetic variation in IIS and TM pathways in the predisposition to longevity. However, while IIS had many genes associated with the trait, the association of TM with survival was mainly determined by the POT1 gene. Similarly, we previously performed a pathway analysis of 592 SNPs in the DNA repair pathway in the same in the same study population as investigated in the present study indicating association of subprocesses of this pathway in human longevity (Debrabant et al., 2014). Here, we wanted to explore synergies inside a large SNP data set, by applying statistical methodologies allowing us to test the interaction both inside and among SNP variants belonging to three main candidate pathways of human longevity such as IIS, DNA damage signalling and repair, and pro/antioxidant response. To this aim, we analysed 1,058 SNPs from 140 genes in 1,825 subjects (1,089 cases aged 92-93, 736 controls aged 46-55) of Danish origin, previously analysed for single-SNP associations with longevity (Soerensen et al., 2012).

| RESULTS
In this work, we used a combination of different approaches for understanding the relationships between SNP variants in the predisposition to become long-lived. First, we computed case-control based SNP-SNP interaction without pathway constraints using the whole data set, and we plotted interaction networks from significant SNP combinations. Then, we performed pathway-based case-control analysis, looking for epistatic interactions inside original assigned pathways and among different pathways by applying a MDR approach. Finally, we analysed those pairs of SNPs significantly enriched in cases respect to controls for their influence on survival in the oldest old.

| SNPsyn analysis in the entire data set
By testing all possible combinations among the 1,058 available variants, assuming a significance level p < .0001 and correcting for multiple comparisons (FDR <0.005), we found a set of 73 best-interacting SNP pairs (Table S1). Figure 1 shows the interaction network, composed by 33 intrapathway network (11 IIS; 17 DNA repair; 5 pro-antioxidant) and 40 interpathway network (22 IIS-DNA repair; 11 proantioxidant-DNA repair; 7 pro-antioxidant-IIS). Sixteen of 73 SNP-SNP interactions were inside the same gene: these interactions could be due to noncasual association among markers (LD), however modest (r 2 < 0.8), as indicated by the positive values of synergy (Syn) among variants (all interactions reported significant showed a Syn >0).
In any case, the single contributions (I1 and I2) of SNP1 and SNP2 to the synergic interaction can give information about the more influencing SNP of the pairs and thus help in prioritizing the SNP in future analysis, so that no one SNP was a priori removed (Tables 1 and S1).
Further reducing the significance level, by selecting a p < .0001, we prioritized 22 top-ranked synergies among SNPs, significantly enriched in longevity (Table 1). The best combination (FDR <0.0001) between SNPs in different genes was between TXNRD1-rs17202060 and TP53-rs2078486, respectively, belonging to the pro/antioxidant and DNA repair pathway. The second-best interaction was found among IGF1R-rs12437963 and PTPN1-rs6067484, both belonging to the IIS pathway, and the third among genes TP53-rs2078486 and ERCC2-rs50871, both belonging to the DNA repair pathway. Less significant interactions were found between TP53 and two genes from the same DNA repair pathway: TP53 (rs2078486)-PON2

| MDR analysis
SNPsyn gave us a whole picture of the significantly interacting SNPs in the whole data set available for our study; however, that approach does not include covariates in the analysis, such as sex or age, simply dividing the data in cases and controls. Thus, to improve the casecontrol analysis of SNP-SNP interactions, a MDR approach was applied. Table 2 reports the list of two-and three-loci interactions prioritized by MDR analysis as the most significant for consistency value (minimum 2/10 replicates) and accuracy level (>0.5), between genes belonging to the same pathway and in the whole data set (last rows).

| MDR: Analysis intrapathways
First, the MDR confirmed the SNPsyn approach for what concerns the most significant SNP-SNP interactions. The SNP-SNP combination rs572169-rs512692 inside the gene GHSR significantly interact with rs4837525 in the PAPPA gene inside the IIS pathway. rs572169-GHSR also showed an interesting interaction with rs606748-PTPN1.
As indicated by the interaction graph (a), interaction dendrogram (b) and histograms (c) in Figure 2, these SNP-SNP combinations jointly explain the most entropy, with the rs572169 SNP having the largest univariate effect (1.53% of explained entropy in the network), and the GG genotype driving the high-risk combinations for longevity.
The performance evaluation of each model (Table 2) shows that the two-locus epistatic interaction performs better in terms of modelling (higher CV) compared to three-locus interactions in these pathways.
In the DNA repair sub-data set ( Figure 3), two high-level epistatic interactions were found: one between rs50871 (ERCC2 gene) and F I G U R E 1 Interaction network obtained by the analysis of SNP-SNP interactions by SNPsyn in the whole data set, comprising the all three analysed pathways (IIS, DNA repair and proantioxidant). Each dot represents a SNP, indicated by the corresponding gene, when univocally assigned or by dbSNP notation, when present in a regulatory region, and not assigned by the SNPsyn output DATO ET AL.
| 3 of 12 rs2078486 (TP53), and another between rs170548 (ATM) and rs2436514 (LONP1). Although the two indicated SNP-SNP interactions reported low CV (Table 2), they were both significant by permutation analysis (p < .002). Interestingly, the ERCC2-TP53 interaction was highly significant also in SNPsyn analyses (FDR <.001 and p < .001), thus indicating a consistent association across methods of this SNP-SNP combination with longevity. The largest univariate effect on longevity for the DNA repair pathway was recognized to rs13251813 (WRN gene), with 1.44% of explained entropy in the network ( Figure 3). In the pro-antioxidant data set, a three-level SNP combination was found significantly associated with longevity among rs2236270 (GSS gene)-rs162557 (CYP1B1 gene)-rs3756704-GLRX (p < .001) ( Table 2 and Figure 4). Figure 5 shows the interaction graph (a) and interaction dendrogram (b) for the whole genotypic data set, as resulting from MDR analysis.

| MDR: Analysis interpathways
In this case, we can see whether there are significant epistatic interactions between genes belonging to different metabolic biological pathways, among those investigated in this article. A significant epistasis was found between rs572169 (GHSR gene) belonging to the IIS pathway, which still has the largest univariate effect on the longevity phenotype (1.53% explained entropy), and rs533984 (MRE11A gene), from the DNA repair pathway. rs533984 positively interacts with rs225119 (PARK7), which belongs to the pro-antioxidant pathway. Both the two-locus interaction GHSR-MRE11A and the three-locus combination with PARK7 were found to be significant (p = .001, Table 2); this may still depend on the higher univariate effect on longevity of the rs572169 SNP at the GHSR gene, with homozygotes GG favoured for longevity compared to the other genotypic combinations both in three-and two-locus models.

| Survival analysis
Based on the information about postsurvey mortality available for the oldest cohort, the Danish 1905, we tested two-and three-locus interactions significantly associated with longevity status in the T A B L E 1 List of top-ranked 22 SNP-SNP interactions, sorted by FDR <= 0.001, organized according to the synergy value, as resulting from SNPsyn analysis on the whole genotypic data set [1825 samples (1089 long-lived, 736 younger controls), 1059 SNPs]. SNP-SNP interactions inside the same gene captured by the algorithm could be due to a modest LD between markers, below the r 2 = 0.8 used as cutoff. Syn: synergy value, FDR: false discovery rate; I: information, I1 and I2: contribution of SNP1 and SNP2 to the synergic interaction, respectively  .096 (N = 717) High-risk genetic combinations for longevity resulting from MDR approach was tested for involvement in survival (in the oldest cohort), by analysing carriers of specific candidate combinations, respect to the other genotypes, for the sample stratified by sex. Odds ratio with 95% confidence interval (C.I.) was determined by logistic regression applied to candidate SNP combinations that were reported. p-value of MDR was calculated by permutation analysis, while those reported for survival are referred to the log-rank (Mantel-Cox) test.
previous analyses. With this analysis, we aimed to test whether those combinations, associated with an advantage in older individuals compared to younger subjects, were also favourable for survival at very old ages. Thus, we classified the subjects from the 1905 Danish cohort as carriers or noncarriers of a candidate risk combination for longevity and tested their survival as number of months survived from the recruitment. We performed survival analysis separately in male and female cohorts, considering the great importance of sex in survival differences inside a human population (reviewed in Alberts et al., 2014). As indicated in Table 2, among all relevant combinations tested, only that between rs572169 (GHSR) and rs533984 (MREA11) was found significantly associated with extreme survival in females ( Figure 6) (p = .026, log-rank Mantel-Cox test). In males, no effect on survival was observed for any of the candidate risk combinations of different SNPs.

F I G U R E 2 Interaction graph (a) and interaction dendrogram (b) for the IIS data set, resulting from MDR analysis. In a, for each SNP is reported in per cent the value of information gain (IG), while numbers in the connections indicate the entropy-based IG for the SNP pairs. Red bar and orange bar indicate the high-level synergies on the phenotype, while the brown indicate a medium-level interaction, green and blue connections with negative IG values indicate redundancy or lack of synergistic interactions between the markers. Histograms in (c) reports the distributions of cases (left bars) and controls (right bars) for three-locus and two-locus genotype combinations of SNPs. The effect of rs572169
(GHSR gene) is showed alone, because of its univariate effect. Dark-shaded cells are considered "high risk," while light-shaded cells are considered "low risk" for longevity. White cells indicate no subjects with those genotype combinations that were observed in the data set ERCC2-DNA repair pathway). Furthermore, while the interaction analysis confirmed the association of rs572169 of GHSR reported in a previous study using single-SNP-analysis (Soerensen et al., 2012), single variants in TP53, TXNRD1 or ERCC2 here identified were not previously found. In particular, a central role of TP53 might not seem surprising as TP53 is a very well-known tumour suppressor in DNA damage response which balances tumour surveillance and maintenance of stem cell pools, finally resulting in beneficial effects both for cancer protection and longevity (Reinhardt & Schumacher, 2012).
In addition, the strong interaction with TXNRD1, recently found associated with the quality of aging (Dato, De Rango, Crocco, Passarino & Rose, 2015), suggests a central role of TP53 in mediating the concerted activation of DNA repair and the pro-antioxidant pathway. TP53 has been suggested to exert longevity assurance functions by impacting on the conserved IGF-1 system (Feng & Levine, 2010), yet we did not find interactions at genetic level among TP53 and genes belonging to the IIS pathway in this study population.  (Neel & Tonks, 2016). Experimental data report that the protein interaction IGF1R-PTPN1 is regulated by insulin levels (Bandyopadhyay et al., 1997). Furthermore, in experimental organisms, PTPN1 is involved in inflammatory mechanisms and insulin resistance associated with diabetes and obesity during aging  The genotype data set and genotyping methodology by the Illumina GoldenGate platform has been described in previous papers (Soerensen et al., 2010(Soerensen et al., , 2012. To exclude the rare variants, a minimum allele frequency (MAF) level of 5% was chosen for performing the analyses, leaving 1,058 SNPs from the original data set to test SNP-SNP associations with longevity. A table reporting all SNPs investigated can be found in the Supplementary material (Table S3), classified by gene and pathway. In detail, 317 SNPs were belonging to the IIS pathway, 260 to pro-antioxidant response, 481 to DNA damage signalling and repair. Furthermore, functional prediction of most significant SNPs in SNP pairs found to be associated with F I G U R E 4 Interaction graph (a) and interaction dendrogram (b) for the pro/antioxidant data set, resulting from MDR analysis. In a, for each SNP is reported in per cent the value of information gain (IG), while numbers in the connections indicate the entropy-based IG for the SNP pairs. Red bar and orange bar indicate the high-level synergies on the phenotype, while the brown indicate a medium-level interaction, green and blue connections with negative IG values indicate redundancy or lack of synergistic interactions between the markers. Histograms in (c) reports the distributions of cases (left bars) and controls (right bars) for three-locus and two-locus genotype combinations of SNPs. Darkshaded cells are considered "high-risk" while light-shaded cells are considered "low risk." White cells indicate no subjects with those genotype combinations that were observed in the data set longevity, based on Genome Browser information (Haploreg, http:// archive.broadinstitute.org/mammals/haploreg/haploreg.php) was reported in Table S4.

| Statistical methodology
Genotype counts for testing HWE were performed in the control group sample (Li & Li, 2008;Schaid, Batzler, Jenkins & Hildebrandt, 2006) by Plink (Purcell et al., 2007), choosing a cutoff of 0.001 for excluding SNPs not in HWE in the control sample. Minor allele frequency (MAF) was computed for each SNP, and differences in genotype frequency between cases controls were calculated by RobustSNP (So & Sham, 2011) in the R environment (https://wwwrproject.org/). As reported in Soerensen et al., 2012;tagging SNPs were prioritized by establishing a cutoff of r 2 = .8, LOD = 3, minor allele frequency (MAF)>5% and a minimum distance between SNPs = 60 bp criteria. The selection of common variants (MAF ≥ 5) and a LD r 2 < .8 is widely used in genomewide association studies (Fadista, Manning, Florez & Groop, 2016). Furthermore, we tested LD levels by genomic region in the control sample (data not shown). Obtained significance scores were corrected for multiple testing using the false discovery rate (FDR) method described by Benjamini and Yekutieli (2001). In this paper, significant associations were considered those reporting a FDR< 0.005 (p-value <.0001). Because Interaction analyses were computed online, at the website http://snpsyn.biolab.si/.

| SNPsyn analysis of SNP-SNP interactions
F I G U R E 5 Interaction graph (a) and interaction dendrogram (b) for the DNA whole genetic data set, resulting from MDR analysis. In a, for each SNP is reported in per cent the value of Information Gain (IG), while numbers in the connections indicate the entropy-based IG for the SNP pairs. Red bar and orange bar indicate the high-level synergies on the phenotype, while the brown indicate a medium-level interaction, green and blue connections with negative IG values indicate redundancy or lack of synergistic interactions between the markers. Histograms in (c) reports the distributions of cases (left bars) and controls (right bars) for three-locus and two-locus genotype combinations of SNPs. Darkshaded cells are considered "high-risk" while light-shaded cells are considered "low risk." White cells indicate no subjects with those genotype combinations that were observed in the data set

| Gene set enrichment analysis
Gene set enrichment analysis was performed by SNPsyn on the whole data set to find biological processes in which represented genes were involved in human longevity (based on the known information about their gene products (data driven by Gene Ontology)).
Which gene sets to test were determined by SNPsyn based on significant association with the phenotype (as previously explained).
Subsequently, the gene sets were used as a cluster (query) set in this gene enrichment analysis, with the goal to determine whether members of a gene set tend to occur towards the top (or bottom) of the list. In such case, the gene set was correlated with the phenotypic class distinction. Fold enrichment score reflects the degree to which a set is overrepresented at the extremes (top or bottom) of the entire ranked list.

| MDR analysis of epistatic interactions
For testing the epistatic interaction between pairs of SNPs, multifactor dimensionality reduction (MDR) was applied (Moore, 2004;Ritchie et al., 2001). This methodology can reveal high-order interactions among genes collaborating with respect to a given phe- As for colours, connections in red and orange indicate nonlinear interaction, connections in green and brown indicate independence or additivity and redundancy (blue lines). In a data set of more than two interacting factors, the best model is considered the one found more consistent in different replicates (expressed as CV, consistency values and accuracy, i.e. training balanced accuracy level). For calculating significance, permutation testing is applied, dividing the data set into 10 portions, and using nine portions as a training data set, and the remaining as a testing data set. One thousand permutations were performed, to determine a cutoff threshold for an

| Survival analysis of relevant SNP-SNP interactions
Kaplan-Meier survival curves with the Mantel-Cox log-rank test were used to compare survival of oldest-old subjects with different SNP-SNP combinations (Therneau & Grambsch, 2000). Survival analyses were performed using the survival package in R, as described in Therneau (2005). A significance level of 0.05 was chosen in all tests.