Abstract
Purpose Genetic variants in complement genes are associated with age-related macular degeneration (AMD). However, many rare variants have been identified in these genes, but have an unknown significance, and their impact on protein function and structure is still unknown. We set out to address this issue by evaluating the spatial placement and impact on protein structureof these variants by developing an analytical pipeline and applying it to the International AMD Genomics Consortium (IAMDGC) dataset (16,144 AMD cases, 17,832 controls).
Methods The IAMDGC dataset was imputed using the Haplotype Reference Consortium (HRC), leading to an improvement of over 30% more imputed variants, over the original 1000 Genomes imputation. Variants were extracted for the CFH, CFI, CFB, C9, and C3 genes, and filtered for missense variants in solved protein structures. We evaluated these variants as to their placement in the three-dimensional structure of the protein (i.e. spatial proximity in the protein), as well as AMD association. We applied several pipelines to a) calculate spatial proximity to known AMD variants versus gnomAD variants, b) assess a variant’s likelihood of causing protein destabilization via calculation of predicted free energy change (ddG) using Rosetta, and c) whole gene-based testing to test for statistical associations. Gene-based testing using seqMeta was performed using a) all variants b) variants near known AMD variants or c) with a ddG >|2|. Further, we applied a structural kernel adaptation of SKAT testing (POKEMON) to confirm the association of spatial distributions of missense variants to AMD. Finally, we used logistic regression on known AMD variants in CFI to identify variants leading to >50% reduction in protein expression from known AMD patient carriers of CFI variants compared to wild type (as determined by in vitro experiments) to determine the pipeline’s robustness in identifying AMD-relevant variants. These results were compared to functional impact scores, ie CADD values > 10, which indicate if a variant may have a large functional impact genomewide, to determine if our metrics have better discriminative power than existing variant assessment methods. Once our pipeline had been validated, we then performed a priori selection of variants using this pipeline methodology, and tested AMD patient cell lines that carried those selected variants from the EUGENDA cohort (n=34). We investigated complement pathway protein expression in vitro, looking at multiple components of the complement factor pathway in patient carriers of bioinformatically identified variants.
Results Multiple variants were found with a ddG>|2| in each complement gene investigated. Gene-based tests using known and novel missense variants identified significant associations of the C3, C9, CFB, and CFH genes with AMD risk after controlling for age and sex (P=3.22×10−5;7.58×10−6;2.1×10−3;1.2×10−31). ddG filtering and SKAT-O tests indicate that missense variants that are predicted to destabilize the protein, in both CFI and CFH, are associated with AMD (P=CFH:0.05, CFI:0.01, threshold of 0.05 significance). Our structural kernel approach identified spatial associations for AMD risk within the protein structures for C3, C9, CFB, CFH, and CFI at a nominal p-value of 0.05. Both ddG and CADD scores were predictive of reduced CFI protein expression, with ROC curve analyses indicating ddG is a better predictor (AUCs of 0.76 and 0.69, respectively). A priori in vitro analysis of variants in all complement factor genes indicated that several variants identified via bioinformatics programs PathProx/POKEMON in our pipeline via in vitro experiments caused significant change in complement protein expression (P=0.04) in actual patient carriers of those variants, via ELISA testing of proteins in the complement factor pathway, and were previously unknown to contribute to AMD pathogenesis.
Conclusion We demonstrate for the first time that missense variants in complement genes cluster together spatially and are associated with AMD case/control status. Using this method, we can identify CFI and CFH variants of previously unknown significance that are predicted to destabilize the proteins. These variants, both in and outside spatial clusters, can predict in-vitro tested CFI protein expression changes, and we hypothesize the same is true for CFH. A priori identification of variants that impact gene expression allow for classification for previously classified as VUS. Further investigation is needed to validate the models for additional variants and to be applied to all AMD-associated genes.
Competing Interest Statement
AdH is employed by Abbvie Pharmeceuticals. There are no other competing interests to declare.
Funding Statement
IAMDGC: NIH 1X01HG006934-01 and RO1 EY022310 Michelle Grunin: M2021006F from the Bright Focus Fellowship for Macular Degeneration
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The IRB of Case Western Reserve University – University Hospitals (IRB Number EM-14-04) gave ethical approval for this work. The study participants were previously ascertained by IAMDGC cohorts as described in Fritsche et al, 2016, Nature Genetics. All participants provided informed consent, and the study was approved by institutional review boards as previously described
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The genotype data analyzed during the current study were generated by the IAMDGC and are available through the database of Genotypes and Phenotypes (dbGAP; Accession: phs001039.v1.p1). Summary statistics for the IAMDGC data is available currently at http://amdgenetics.org.