Introduction

The APOE-ε4 allele is the strongest known genetic risk factor for MCI and Alzheimer’s Disease (AD)1. However, its effect on cognitive outcomes in Hispanics/Latinos is weaker and inconsistent compared to non-Hispanic Whites2,3. The genomes of Hispanics/Latinos are admixed, consisting of three predominant continental ancestries: Amerindian, African, and European4. In a recent publication, we performed an association study of APOE alleles and neurocognitive traits in middle-aged and older U.S. Hispanics from the SOL-INCA5. We discovered that their effects were modified by continental global genetic ancestry, e.g., Amerindian genetic ancestry protects from the risk conferred by APOE-ε4 on cognitive decline. Other studies conducted in admixed populations have also concluded that ancestry-specific genetic differences, either genome-wide or in the surrounding region of the APOE gene, modify the effect of APOE alleles on ADRD6,7,8,9. For example, a recent study explored missense variants in the APOE region and identified a variant coinherited with APOE-ε4 which mitigates the AD risk effect, and another variant coinherited with APOE-ε3, which has a protective effect10. Both variants were assessed in Europeans, due to the paucity of variant carriers in non-European ancestries. Another recent study identified a variant proximal to the APOE region, reducing the APOE-ε4 risk for AD in African Ancestry9.

We recently developed a computationally efficient method for ancestry-specific frequency estimation of bi-allelic genetic variants in a multi-way admixed population11. We published a database of ancestry-specific allele frequencies estimated from the HCHS/SOL Hispanic/Latino population. This unique dataset enables us to focus on ancestry-enriched genetic variants from African and Amerindian ancestries, that were previously understudied compared to European-enriched variants. We hypothesize that African and/or Amerindian ancestry-enriched genetic variants interact with the APOE alleles in their associations with AD and related cognitive outcomes, thus potentially explaining the modification of the global genetic ancestry on the effect of the APOE allele on cognitive outcomes5. Here, we study the modification effect of African and Amerindian ancestry-enriched genetic variants in the APOE region, on the effect of the APOE-ɛ4 alleles on MCI and MCI + , where MCI + defines a subset of the MCI group with suspected severe cognitive impairment, in the Study of Latinos-Investigation of Neurocognitive Aging (SOL-INCA) population. We focus on the enriched Amerindian and African variants in a region of 6Mbp encompassing the APOE gene. We further consider only variants with a moderate estimated effect based on SNPEff annotation. We conduct interaction analyses between the identified variants and the APOE-ε4 allele and its effect on MCI in SOL-INCA. We attempt replication of one suggestive interaction association between an African-enriched variant and APOE-ε4 allele on MCI in African Americans from the Atherosclerosis Risk In Communities (ARIC) study.

Methods

Study population

The HCHS/SOL is a population-based longitudinal multi-site cohort study of Hispanic/Latino adults in the U.S. that primarily enrolled participants from six self-identified backgrounds: Cuban, Central American, Dominican, Mexican, Puerto Rican, and South American12,13. A total of 16,415 adults, 18–74-year-old, were enrolled in the baseline visit at four field centers (Bronx, NY, Chicago, IL, Miami, FL, and San Diego, CA) (2008–2011). The baseline visit assessed cognitive function in 9,714 individuals aged 45 years or older. SOL-INCA is an ancillary study of HCHS/SOL, focusing on the middle-aged and older adult group who underwent cognitive assessment at visit 1. Overall, 6,377 individuals 50 or older with baseline cognitive testing participated in the SOL-INCA examination at or after HCHS/SOL visit 2, with an average of 7 years after the baseline exam. Of the 6,377 participants, 2140 were excluded from analyses (1701 did not consent for genetic data, 210 without APOE data, 76 had missing cognitive outcomes, and 153 had missing covariates and/or genetic variant reads), totalling an analytic sample of 4237 individuals.

The HCHS/SOL was approved by the institutional review boards (IRBs) at each field center, where all participants gave written informed consent in their preferred language (Spanish/English) to use their genetic and non-genetic data, and by the Non-Biomedical IRB at the University of North Carolina at Chapel Hill, to the HCHS/SOL Data Coordinating Center. All IRBs approving the study are: Non-Biomedical IRB at the University of North Carolina at Chapel Hill. Chapel Hill, NC; Einstein IRB at the Albert Einstein College of Medicine of Yeshiva University. Bronx, NY; IRB at Office for the Protection of Research Subjects (OPRS), University of Illinois at Chicago. Chicago, IL; Human Subject Research Office, University of Miami. Miami, FL; Institutional Review Board of San Diego State University, San Diego, CA. The study reported here was approved by the Mass General Brigham IRB under protocol #2019P000057. All methods and analyses of HCHS/SOL participants’ materials and data were carried out in accordance with human subject research guidelines and regulations.

Neurocognitive outcome

Individuals were classified with MCI according to National Institute on Aging-Alzheimer’s Association criteria based on cognitive tests and self-reports14. Details about the SOL-INCA MCI diagnostic operational procedures have been previously published15,16. MCI was defined according to three criteria that had to be satisfied: (a) for any of the cognitive tests performed at the SOL-INCA exam, the score was lower than  −1 standard deviation (SD) of the mean, with means and SDs being defined based on SOL-INCA robust internal norms; (b) the rate of a global measure of yearly cognitive decline, estimated between the HCHS/SOL baseline and the SOL-INCA exam, was faster than −0.055 SD; (c) using the Everyday Cognition 12-item version (E-Cog12) questionnaire17 a participant self-reported subjective cognitive decline. The MCI group also included individuals classified as MCI + based on satisfying two conditions: (a) a cognitive test score lower than − 2 SD below the mean of any cognitive test performed at the SOL-INCA exam (with means and SDs based on SOL-INCA internal norms); (b) more than minimal impairment in self-reported instrumental activities of daily living (IADL)18. Individuals with MCI + were pooled together with the MCI individuals and together defined the MCI group.

Genetic data

APOE genotyping was performed using commercial TaqMan assays previously described19. For individuals with missing APOE genotypes, we computed the genotypes based on phased whole-genome sequencing (WGS) data from TOPMed Freeze 8. Other genetic data were used based on genotyping (rather than WGS) using an Illumina custom array, previously reported4. Genome-wide imputation was conducted using the multi-ethnic NHLBI Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel20. Principal components (PCs) were previously computed using PC-Relate21, and the kinship matrix was computed using the genetic data. ‘Genetic analysis groups’ were constructed based on a combination of self-identified Hispanic/Latino backgrounds and genetic similarity, and are classified as Central American, Cuban, Dominican, Mexican, Puerto Rican, and South American4.

Ancestry enriched variants

We focused on a region of 6Mbp encompassing the APOE gene (~ 3Mbp from each side, mimicking the approach of Rajabli et al.9), chr19:42 Mb-48 Mb (GRCh38/hg38). Ancestry-specific frequencies of variants located in this region were calculated using GAFA11, a method we previously developed to estimate the frequencies of bi-allelic variants in admixed populations based on global proportions of genetic ancestries. Overall, the average ancestral global proportion of the three ancestries in the total dataset is 55% European, 30.5% Amerindian, and 14.5% African. Frequency estimation was based on n = 8933 HCHS/SOL individuals who consented to genetic data sharing with the broad scientific community. We defined Amerindian-enriched variants as those with both European and African minor allele frequency (MAF) < 0.01 and Amerindian frequency between 0.05 and 0.95. Using the same principle, we defined African-enriched variants. MAF is computed in a population, and it quantifies how likely it is for an individual from the population to have a specific genetic variant. A variant with a very low MAF (often MAF < 0.01 is considered rare) in a population is likely to be observed only in a few individuals in that population, and a variant with a high MAF (the largest possible value is 0.5) is likely to be observed in many individuals. Thus, an Amerindian (African) enriched variant is likely to be observed almost only in individuals who inherited the corresponding genomic region from an Amerindian (African) ancestor.

Bioinformatics

We performed bioinformatics analyses using publicly available databases and tools for the ancestry-enriched variants. Variant data were annotated using SnpEff (V.4.3). The SnpEff software takes as an input a genetic data file (variant calls on a VCF file) and it annotates each of the variants in the file using sequence ontology terms of predicted effects of the variants on known genes (e.g., codon deletion, exon duplication). It also provides impact prediction, with four categories: high, moderate, low, and modifier. Moderate impact means that the variant might change protein effectiveness, and high impact means that the variant is assumed to have high disruptive impact on the relevant protein. Low impact is annotated for variants that assumed to be harmless, and modifier impact is usually assigned for non-coding variants or variants affecting non-coding genes, where there is no evidence of impact. Variants with estimated moderate putative impact were selected for further analyses. We further annotated the variants by using additional tools: RegulomeDB22, GTEx Portal, GWAS catalog23, and Phegen24. Finally, we computed the MAF and minor allele counts (MAC) of each of the selected variants in the six Hispanic genetic analysis groups.

Statistical analysis

We provided descriptive statistics to characterize the demographic and cognitive outcome and APOE alleles distributions in the analytic dataset of n = 4,237 individuals. For each ancestry-enriched variant with a SnpEff predicted moderate impact, we tested the interaction associations between the variant and APOE-ε4 allele on MCI. Models included the variant, the APOE-ε4 allele (additive mode), and the interaction term of the variant with the APOE-ε4 allele. We used the complex survey design from the R ‘survey’ package25, with a “quasipoisson” family for binary traits. This method accounts for the stratification, clustering, and probability weighting in HCHS/SOL to allow correct generalizations to the HCHS/SOL target population. Models were adjusted for age, sex, education, center, first 5 PCs of genetic data, and genetic analysis group. The significance of the results was evaluated in two ways, to protect from potential high type 1 error due to the low proportion of APOE variant and the enriched variant alleles. First, through 5000 permutations, we performed multivariant Wald tests to jointly test the significance of the variant and variant-APOE-ε4 allele interaction. We also performed multivariant Wald tests to jointly test the association of the variant and variant-APOE-ε4 allele interaction and the APOE-ε4 allele. Second, we used mixed models and the BinomiRare test for low-count variants to test the association of the variant and variant-APOE-ε4 allele interaction26. Mixed models used correlation matrices to account for genetic relatedness (kinship), household, and block unit sharing as random effects, and were implemented, along with the BinomiRare, in the GENESIS R/Bioconductor package27, version 3.15.

Estimation of interaction associations with MCI in the ARIC study

We further evaluated the interaction associations between the African-enriched variant and APOE-ε4 in African Americans from the Atherosclerosis Risk in Communities (ARIC) study. ARIC is a longitudinal cohort study with genetic and cognitive measures28,29. The protocol for MCI/dementia diagnosis in ARIC has been previously described30 and is provided in Supplementary Note 1. Data from ARIC Visit 5 were used in this analysis. Further details are provided in Supplementary Note 1.

Similar to the model in SOL-INCA, the statistical model in ARIC included adjustment for age, sex, education, 5 PCs, and study-site, and tested the interaction associations between the variant and APOE-ε4 allele on a neurocognitive outcome. We also performed multivariant Wald tests as described above for SOL-INCA.

Results

Table 1 characterizes the demographic, health, and lifestyle characteristics of the SOL-INCA analytic dataset (n = 4237). Overall, around 52% of the participants are females, with a weighted mean age of 62 years at the SOL-INCA visit. MCI prevalence is ~ 11.3%. APOE-ε3 is the most frequent allele, with 0.83 allele frequency, while alleles 4 and 2 are relatively rare (frequencies of 0.12 and 0.049 respectively). Fifty-eight of the individuals with MCI (1.4% of the sample) were classified as having suspected severe impairment (MCI +).

Table 1 Demographics, genetic and neurocognitive characteristics of SOL-INCA analytic sample.

Ancestry enriched variants

In the 6Mbp encompassing the APOE gene (~ 3Mbp from each side), chr19:42–48 Mb (GRCh38/hg38), we identified 260 Amerindian-enriched variants in the HCHS/SOL study population, with ≥ 5% frequency for Amerindian, and ≤ 1% frequency for African and European ancestries (Supplementary Table 1). Similarly, we identified 798 African-enriched variants in the HCHS/SOL study population (Supplementary Table 2). Using SnpEff (V.4.3) variant annotation, we selected 5 Amerindian- and 14 African-enriched variants, with a predicted moderate putative impact, for further interaction analyses with APOE-ε4 on MCI. Annotations of the variants and their estimated ancestral frequencies are presented in Table 2. Eighteen out of the total 19 selected enriched variants are missense variants. According to the GTExportal, 11 out of the 14 selected African-enriched variants have previously been associated with gene expressions in various tissues, i.e., they are expression quantitative trait loci (eQTLs) in these tissue, including brain cerebellum tissue.

Table 2 Annotation of the ancestry-enriched genetic variants used for interaction analysis in SOL-INCA analytic dataset.

Supplementary Table 3 provides the computed MAF and MAC of each of the selected variants across the genetic analysis groups corresponding to the six Hispanic backgrounds. As expected, Amerindian-enriched variants tended to be more common in groups with high Amerindian ancestry: Central and South American, and Mexican individuals, while African-enriched variants were more common in the Caribbean group, that have higher African ancestry: Dominican, Cuban, and Puerto Rican.

Interaction association between ancestry enriched variants and APOE alleles on MCI

Results of the enriched Amerindian or African variants’ interaction with APOE-ɛ4 on MCI analysis based on the 5,000 permutations multivariant Wald tests are reported in Table 3. No statistically significant interaction associations were identified. However, results of the enriched Amerindian or African variants’ interaction with APOE-ɛ4 on MCI analysis, the BinomiRare test identified one nominally significant interaction between an African enriched variant, rs8112679, and APOE-ɛ4 on MCI (p-value = 0.017) (Table 4).

Table 3 Permutation results (n = 5,000 permutations) for the ancestry-enriched genetic variants and interaction associations between the variants and APOE alleles on MCI in SOL-INCA analytic dataset.
Table 4 BinomiRare tests for the ancestry-enriched genetic variants and interaction associations between the variants and APOE-e4 allele on MCI in SOL-INCA analytic dataset.

Estimation of interaction associations with MCI in the ARIC study and meta-analysis

Supplementary Table 4 characterizes the demographic, cognitive outcome, and APOE alleles’ distributions in the ARIC African Americans analytic dataset. The African enriched variant’s interaction (rs8112679) with the APOE-ɛ4 on MCI was not significant in the ARIC African Americans analytic dataset (Supplementary Table 5).

A replication attempt of the interaction association between the African variant previously reported by Rajabli et al.9 and the APOE-Ɛ4 allele on MCI in the SOL-INCA analytic dataset did not present a significant result (Supplementary Table 6).

Discussion

In this study, we leveraged recently published data on ancestry-specific genetic variant frequencies in the Hispanic/Latino population11, to explore the interaction effects of African and Amerindian ancestry-enriched genetic variants with APOE-ɛ4 on MCI in the Hispanic/Latino US population. We found suggestive evidence for an interaction effect of an African-enriched variant, rs8112679, with APOE-ɛ4 on MCI, with the minor allele A having a protective effect on MCI. This result did not replicate in the ARIC African American analytic sample. Rs8112679 is a missense variant, located in exon 4 of the ZNF222 gene. ZNF222 gene is predicted to be involved in the regulation of transcription by RNA polymerase II. A previous study suggests the involvement of ZNF222 in late-onset Alzheimer’s disease31. Hopefully, future studies, particularly with African and African American individuals in which the variant is common and, ideally, a familial study with individuals both affected and unaffected by MCI or AD, would follow up on this variant and study its presence in affected individuals.

Our study is based on relatively small sample sizes therefore its statistical power is limited for the association of low-frequency and rare variants, all the more so for interaction analyses. Only large effect sizes could have been discovered. Our results suggest there are no large effect sizes of ancestry-enriched variants interacting with APOE-ɛ4 on MCI in the APOE region in the Hispanic/Latino population. Further analysis with larger sample sizes, and meta-analyses with additional studies, are needed to identify ancestry-enriched variants interacting with APOE-ɛ4 on MCI. It would also be interesting to increase the region in which variants are considered, potentially within other known AD genes or genome-wide. The major limitation is power, reduced by high multiple testing burden. Collaboration between multiple studies with diverse individuals with Amerindian and/or African ancestry will be critical. Another limitation of our study is that a subset of the MCI individuals was classified as MCI + , who fell into a gray zone between MCI and dementia. That is, their cognitive scores or functional abilities (i.e., IADLs) did not meet strict criteria for MCI or dementia. As SOL-INCA study population grows older, we will know whether the MCI and MCI + individuals convert to dementia. The cognitive aging trajectories of MCI is important and unanswered research question facing the field. It is possible that genetic determinants underlying mild and severe cognitive impairment are different. However, larger datasets are required to study this hypothesis, to assess trajectories of mild and severe cognitive impairment, and to identify trait-specific genetic basis.

Identification of variants interacting with APOE-ɛ4 may further delineate the role of APOE in the pathogenesis of MCI and AD and advance novel therapeutics. It may also lead to population-specific risk predictions and help reduce health disparities in the general population.