Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools

Background Imputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants’ imputation in an admixed Caribbean Hispanic population (CH). Methods We evaluated imputation accuracy in CH (N = 1,000), focusing on rare (0.1% ≤ minor allele frequency (MAF) ≤ 1%) and ultra-rare (MAF < 0.1%) variants. We used two reference panels, the Haplotype Reference Consortium (HRC; N = 27,165) and 1000 Genome Project (1000G phase 3; N = 2,504) and multiple phasing (SHAPEIT, Eagle2) and imputation algorithms (IMPUTE2, MACH-Admix). To assess imputation quality, we reported: (a) high-quality variant counts according to imputation tools’ internal indexes (e.g., IMPUTE2 “Info” ≥ 80%). (b) Wilcoxon Signed-Rank Test comparing imputation quality for genotyped variants that were masked and imputed; (c) Cohen’s kappa coefficient to test agreement between imputed and whole-exome sequencing (WES) variants; (d) imputation of G206A mutation in the PSEN1 (ultra-rare in the general population an more frequent in CH) followed by confirmation genotyping. We also tested ancestry proportion (European, African and Native American) against WES-imputation mismatches in a Poisson regression fashion. Results SHAPEIT2 retrieved higher percentage of imputed high-quality variants than Eagle2 (rare: 51.02% vs. 48.60%; ultra-rare 0.66% vs. 0.65%, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 employing HRC outperformed 1000G (64.50% vs. 59.17%; 1.69% vs. 0.75% for high-quality rare and ultra-rare variants, respectively, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 outperformed MaCH-Admix. Compared to 1000G, HRC-imputation retrieved a higher number of high-quality rare and ultra-rare variants, despite showing lower agreement between imputed and WES variants (e.g., rare: 98.86% for HRC vs. 99.02% for 1000G). High Kappa (K = 0.99) was observed for both reference panels. Twelve G206A mutation carriers were imputed and all validated by confirmation genotyping. African ancestry was associated with higher imputation errors for uncommon and rare variants (p-value < 1e-05). Conclusion Reference panels with larger numbers of haplotypes can improve imputation quality for rare and ultra-rare variants in admixed populations such as CH. Ethnic composition is an important predictor of imputation accuracy, with higher African ancestry associated with poorer imputation accuracy.


INTRODUCTION
Genome-wide association studies (GWASs) are a major tool to identify common variants associated with complex diseases. GWAS can include 550 K to over 2 M Single Nucleotide Polymorphisms (SNPs) (Ha et al., 2014) to cover the human genome evenly. Although GWAS has shown to be a robust method to identify disease loci of interest, they rarely point to a causal coding variant. In fact, microarray SNP chips for GWAS are optimally designed to uncover common variants, often associated with small effect sizes mostly located in intronic and intergenic regions. The focus of genetic investigations has since shifted toward rarer alleles with larger effect sizes (Gibson, 2012). With the changing paradigm, imputation of rare variants has become an important topic to enhance the genome coverage in GWAS. Imputation is a process of inferring untyped SNP markers in the discovery population by using densely typed SNPs in external reference panel(s). These 'in silico' markers increase the coverage of association tests while conducting genome-wide association analysis. In addition, large number of SNPs facilitate meta-analysis when merging data from different study cohorts.
The quality of imputation essentially depends on two parameters: available reference datasets and algorithms that employ those reference datasets. Previous studies have shown that imputation quality depends on how well reference panels reflect the study population. To respond to the needs, the 1000 Genome project (1000G), now in its third phase release, has proven to be one of the most frequently used reference panels (Genomes Project et al., 2015). Using these composite reference panels, a number of studies (Pei et al., 2010;Howie et al., 2012;Verma et al., 2014;Liu et al., 2015) have compared imputation accuracy using different imputation tools and algorithms, although the results are equivocal. Few studies (Browning and Browning, 2009;Zheng et al., 2012Zheng et al., , 2015 assessed the impact of reference panel size and input data's featuressuch as density of SNPs -to impute rare variants, suggesting larger size of reference panels work better. Surakka et al. (2016) assessed accuracy of imputed SNPs by evaluating rate of false polymorphisms in a Finnish population using global reference panels -Haplotype Reference Consortium (HRC) release 1, 1000G phase 1 and a local reference panel. They concluded that higher false positive rate was observed in imputation from global reference panels compared to imputation performed using a local panel. Other studies (Huang et al., 2015;Das et al., 2016) found imputation accuracy increases with higher number of haplotypes, specifically for variants with MAF ≤ 0.5%. For Hispanic populations, Nelson et al. (2016) compared imputation performances with 1000G phase 1 (N = 1,092) vs. 1000G phase 3 (N = 2,504), concluding that phase 3 improved accuracy for variants with MAF < 1% by. Further, Nagy et al. (2017) showed that HRC reference panel provides new insight for novel variants particularly for rare variants in a family-based Scottish study cohort. Aforementioned studies highlighted the need of a larger sized reference panel to improve imputation quality. Herzig et al. (2018) assessed tools for haplotype phasing and their impact on imputation in a population isolate of Campora in southern Italy, and showed that SHAPEIT2, SHAPEIT3 and EAGLE2 were highly accurate in phasing; MINIMAC3, IMPUTE4 and IMPUTE2 were found to be reliable for imputation. Roshyara et al. (2014) compared MaCH-Admix, IMPUTE2, MACH, MACH-Minimac in different ethnicities by evaluating accuracy of correctly imputed SNPs; MaCH-Minimac outperformed SHAPEIT-IMPUTE2 in subsamples of different ethnic groups. These studies demonstrated how employed imputation algorithm determines quality of inferred SNPs.
However, no study to our knowledge has evaluated reference panels in tandem with different imputation algorithms to assess imputation quality of inferred SNPs based on MAF in a threeway admixed population. Based on these findings, we assessed imputation quality, focusing on rare and ultra-rare variants, in a large dataset of Caribbean Hispanics (CH) leveraging available GWAS and sequencing data available for our cohort.

GWAS Samples and Genotyping
We selected randomly 1,000 Caribbean Hispanics as part of an original genotyped cohort of 3,138 individuals: genotyped data can be downloaded at dbGaP Study Accession: phs000496.v1.p1. 719 individuals were derived from Estudio Familiar Investigar Genetica de Alzheimer (EFIGA), a study of familial LOAD; and 281 individuals from the multiethnic longitudinal cohort, Washington Heights, Inwood, Columbia Aging Project (WHICAP). The information on study design, recruitment and GWAS methods for the EFIGA and WHICAP study was previously described in Tosto et al. (2015).

Global Ancestry Estimation and Selection of "True Hispanics"
Prior to imputation, we estimated global ancestry using the ADMIXTURE (v.1.3.0) software (Alexander et al., 2009;Zhou et al., 2011). We conducted supervised admixture analyses using three reference populations: African Yoruba (YRI) and non-Hispanic white of European Ancestry (CEU) from the HAPMAP project as representative of African and European ancestral populations; and eight Surui, 21 Maya, 14 Karitiana, 14 Pima and seven Colombian individuals from the Human Genome Diversity Project (HGDP) were used to represent native American ancestry (Li et al., 2008). We used ∼80,000 autosomal SNPs that were: (I) genotyped in all three datasets (Caribbean Hispanics, 1000G and HGDP); (II) common (i.e., MAF > 5 %); and III) in linkage equilibrium. Supervised admixture analyses with the three reference populations (YRI, CEU, and Native Americans) revealed that European lineage accounted for most of the ancestral origins (59%), followed by African (33%) and native American ancestry (8%). We then selected only individuals with at least 1% of all three ancestral populations.

Reference Panels
HRC reference panel contained over 39M SNPs from 27,165 individuals who participated in 17 different studies ( Table 1). The data were downloaded from the Wellcome Trust Sanger Institute (WTSI).

Phasing and Imputation Procedures
We compared SHAPEIT2 (Delaneau et al., 2013) and Eagle2 (Loh et al., 2016) by phasing and then imputing (see next section) a single chromosome (Chromosome 21), using both reference panels. We refer to SHAPEIT2 as SHAPEIT when used in tandem with IMPUTE2 for the remainder of paper.
Imputation was carried out using two bioinformatics tools: IMPUTE2 (Howie et al., 2009) and MaCH-Admix (Liu et al., 2013). For both, imputation quality ranged from 0 to 1, with 0 indicating complete uncertainty in imputed genotypes, and 1 indicating no uncertainty in imputed genotypes.

IMPUTE2 (Version 2.3.2)
IMPUTE2 uses an MCMC algorithm to integrate over the space of possible phase reconstructions for genotypes data. We conducted imputation in non-overlapping 1MB chunk regions; chunk coordinates were specified using the "-int" option. Other options were used with default parameters (Supplementary Section S1). Briefly, we used a default 250KB buffer region to avoid quality deterioration on the ends of chunk region. "-Ne" value as 2000 suggested for robust imputation which scales linkage disequilibrium and recombination error rate.

MaCH-Admix
We used MaCH-Admix because it uses a method based on IBS matching in a piecewise manner. The method breaks genomic region under investigation into small pieces and finds reference haplotypes that best represent every small piece, for each target individual separately. MaCH-Admix imputes in three steps: phasing, estimation of model parameter that includes error rare and recombination rate and lastly, haplotype-based imputation. MaCH-Admix (version Beta 2.0.185) was run on default parameters of 30 rounds, 100 states (-autoFlip flag). Details can be found in Supplementary Section S1. We initially compared performance between MaCH-Admix and IMPUTE2 using the 1000G reference panel for Chromosome 21 only. We then proceeded to impute all remaining chromosomes with the tool that performed better.

Imputation Performance Metrics
IMPUTE2 uses "Info" parameter to report imputation quality that measures relative statistical information about SNP allele frequency from imputed data. It reflects the information in imputed genotypes relative to the information if only the allele frequency were known. "Info" metric is used to filter poorly imputed SNPs from IMPUTE2 and is reported for all imputed SNPs. In addition, IMPUTE2 uses an internal metric known as R 2 , reported for genotyped SNPs only: it measures squared correlation between genotyped SNPs and the same SNPs that have been first masked internally and then imputed. MaCH-Admix uses Rsq to report imputation quality. The R 2 metric is also known as variance ratio, calculated as proportion of empirically observed variance (based on the imputation) to the expected binomial variance p(1-p), where p is the minor allele frequency. A threshold of 0.30 is recommended to filter out poorly imputed SNPs.
Despite quality measures from IMPUTE2 and MaCH-Admix being highly correlated (Marchini and Howie, 2010), we calculated a r2hat score to generate a single common metric to assess imputation quality across the software (Hancock et al., 2012) We compared performance of MaCH-Admix and SHAPEIT-IMPUTE2 by: (a) Reporting raw SNP counts based on quality (MaCH-Admix "Rsq" and IMPUTE2 "Info"); (b) Comparing r2hat for overlapping imputed SNPs from both tools; (c) Conducting a Wilcoxon Signed-Rank Test (R v3.4.2) on r2hat value of overlapping SNPs.
We compared performance of Eagle2 and SHAPEIT2 phasing tools in tandem with IMPUTE2 as imputation tools across reference panels by: (a) Comparing their respective IMPUTE2 R 2 : (b) Conducting a Wilcoxon Signed-Rank Test on R 2 value; (c) Reporting raw counts of imputed SNPs based on IMPUTE2 "Info" metric and stratified by MAF bins (e.g., common, rare, ultra-rare).
In all comparisons, the MAFs are estimated from imputed data according to the reference panel employed. We retained monomorphic SNPs in our analyses for several reasons. A monomorphic SNP in one study might not be monomorphic in other cohorts. This has profound affects, for example, when performing meta-analysis across different studies. In addition, monomorphic SNPs provide information about MAF across studies. Without the information it is difficult to tell, for instance, if a SNP is monomorphic or failed quality control in that study.

Agreement Between Imputed and Sequence Data
To further test the quality of imputation -without relying on software's internal metrics (i.e., "Info" and R 2 ) -we calculated genotyped concordance between imputed and WES data using the VCF-compare tool (v0.1.14-12-gcdb80b8) (Danecek et al., 2011). First, we converted posterior probabilities obtained from imputation into genotype data using the PLINK software (v1.90b4.9) by applying a threshold of 0.9 (Supplementary Section S1), such that SNPs that failed on this criterion were left uncalled. For example, an imputed SNP with P(G = 0,1,2) = (0.01,0.9,0.09) would be called as a '1' (heterozygous), whereas an imputed SNP with P(G = 0,1,2) = (0.2, 0.6, 0.2) would be left uncalled. We restricted the comparison to overlapping SNPs between HRC, 1000G reference panels and whole-exome sequencing (WES) data for Chromosome 14 only, on SNPs with 0% missingness (plink -missing flag) in WES data. We also assessed variants' agreement according to different MAF bins for "high-quality" ("Info" ≥ 0.8) SNPs. The output resulted in number of variant "mismatches, " i.e., the count of 2 http://csg.sph.umich.edu/yli/r2_hat.v107.tgz allele not matching between imputed and sequenced variants per individual. Work-flow for VCF-compare can be found in Supplementary Figure S1. To measure interrater reliability we computed Cohen's kappa coefficient (McHugh, 2012) for both the reference panels against WES data. Kappa coefficient ≤ 0 indicates no agreement, 0.01-0.20 as none to slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial, and 0.81-1.00 as almost perfect agreement. Work-flow for Cohen's kappa coefficient calculation can be found in Supplementary Figure S2.

Effects of Ancestry on Imputation Quality
To assess how ancestry affected imputation quality, we conducted a Poisson regression using R. We used percentage of global ancestry (European (CEU), Native (NAT) and African (YRI) as predictors, and total number of mismatches as the outcome; analyses were restricted to "high-quality" SNPs, only.

Imputation of G206A Mutation in PSEN1
To evaluate imputation performance of a specific rare variant, we examined a founder mutation, p.Gly206Ala (G206A -rs63750082) in the PSEN1 gene (PSEN1-G206A) (Athan et al., 2001;Lee et al., 2015). The PSEN1-G206A mutation is a rare variant observed primarily in Puerto Ricans with familial early onset Alzheimer's disease (EOAD), but it is rare in Puerto Ricans and other populations with late-onset Alzheimer's disease (LOAD) (Arnold et al., 2013). The mutation was present in the 1000G phase 3 reference panel with an allele frequency of 0.001, but was absent in the HRC reference panel. To verify whether individuals who were found to carry the PSEN1-G206A mutation based on 1000G-imputation, they were genotyped using the KASP genotyping technology by LGC genomics 3 , which uses allele-specific PCR for SNP calling. Agreement between imputed and genotype data for the PSEN1-G206A mutation was then assessed. We also tested the effect on imputation quality based on different IMPUTE2-parameters settings, more specifically by modifying the chunk size (i.e., 1 MB vs. 5 MB).

Comparison of Phasing Tools: Eagle2 vs. SHAPEIT2
To select the optimal tool for phasing, we compared SHAPEIT2 with Eagle2 using Chromosome 21 with 13,066 genotyped SNPs by performing subsequent imputation with IMPUTE2 on phased outputs, and using both reference panels. We found SHAPEIT2 better than Eagle2 when evaluated based on mean R 2 and "Info" metric using either the reference panels. For instance, using the 1000G, we observed higher mean R 2 for data phased with SHAPEIT2 as compared to Eagle2 (0.92 vs. 0.91; Wilcoxon p-value < 0.001). Similarly, when HRC panel was employed, mean R 2 of 0.89 was observed for SHAPEIT2 against 0.85 for Eagle2 (Wilcoxon Signed-Rank test p-value < 0.001).
SNP count comparison details can be found in Supplementary Tables S1, S2. Regardless of the reference panel employed, we observed higher percentage of "high-quality" rare and ultra-rare SNPs for SHAPEIT-IMPUTE2 than Eagle2-IMPUTE2. For instance, 1000G-imputation retrieved 51.02% of "high-quality" rare SNPs using SHAPEIT-IMPUTE2 vs. 48.38% with Eagle2-IMPUTE2. Detailed comparisons for different MAF bins and quality threshold can be found in Supplementary Section S2. Nevertheless, we found Eagle2 faster than SHAPEIT2 when computation times were compared; for instance, with HRC Eagle2 was ∼6 times faster than SHAPEIT2 (Supplementary Table S3). We therefore imputed the remaining chromosomes on phased output from SHAPEIT2. Comparison of phasing tools by assessing switch error rate was beyond the scope of this paper due to limited resources, for e.g., availability of phased reference panel for an admixed population.

MaCH-Admix vs. IMPUTE2
We found that SHAPEIT-IMPUTE2 performed better than MaCH-Admix. For Chromosome 21, we imputed 1,104,648 and 646,594 SNPs for SHAPEIT-IMPUTE2 and MaCH-Admix, respectively, 549,091 SNPs were overlapping. For SHAPEIT-IMPUTE2 we observed 446,591 bi-allelic SNPs with "Info" ≥ 0.40, in contrast with 598,943 SNPs with Rsq ≥ 0.30 from MaCH-Admix (Supplementary Table S4). SNP counts for different MAF bins based on platform-specific quality index can be found in Supplementary Table S5. When the two outputs were compared in terms of r2hat, SHAPEIT-IMPUTE2 showed a higheraverage r2hat of 0.62 against 0.36 from MaCH-Admix (Wilcoxon Signed-Rank test p-value < 0.001). Also, MaCH-Admix was 109 times slower than IMPUTE2 (Supplementary  Table S6), thus, comparison between different panels using MaCH-Admix were excluded due to limited resources. For the remaining of this manuscript, we focused on imputation employing SHAPEIT-IMPUTE2, only.
Next, we restricted our analyses to overlapping SNPs across the two reference panels only, based on their chromosome and position mapping, reference and non-reference alleles. For "good-"and "high-quality" SNPs, imputation in both panels performed similarly ( Table 2). When restricted to uncommon, rare and ultra-rare SNPs, we observed higher percentage of "good-" and "high-quality" SNPs for HRC panel as compared to 1000G reference panel (Table 3). For example, 7.44% of HRC-imputed ultra-rare SNPs were "good-quality" vs. 4.95% with the 1000G. 1.69% of HRCimputed ultra-rare SNPs were "high-quality" vs. 0.75% with the 1000G. Further, Wilcoxon test on "Info" value of "high-quality" ultra-rare SNPs (2,972) again showed better performances when HRC was employed vs. 1000G (Pvalue < 0.001). Complete list of counts and percentages across reference panels, MAF bins and quality score can be found in Table 3.
The Case of G206A and the Effect of Chromosomal Chunk Size on Imputation Quality SNP rs63750082 is absent from HRC panel therefore no imputation was achieved. Using 1000G reference panel, 12 individuals were imputed as G206A carriers. SNP rs63750082 was imputed with an IMPUTE2 "Info" score of 0.48 using 1MB as chromosomal region parameter. When we increased the chunk size to 5MB, IMPUTE-Info score drastically improved to 0.94 (Figure 2). Those patients labeled as mutationcarriers according to imputation were then genotyped: all 12 were confirmed to be G206A carriers, therefore achieving a perfect imputation prediction (100% agreement) for that specific SNP.

Genotype Concordance and Kappa Coefficient
Out of the 1,000 individuals included in our study, 262 had whole exome sequencing (WES) data available (Raghavan et al., 2018). We had 14,157 overlapping SNPs in WES, HRC and 1000G reference panels with 0% missingness in WES data on Chromosome 14; SNPs imputed with each reference panel were compared against WES data separately. When concordance was evaluated, HRC panel performed slightly poorer, despite showing higher number of "high-quality" variants as compared to 1000G (Table 4). Using 1000G, we observed 3,542 rare and 35 ultra-rare "high-quality" SNPs; across 262 samples, we counted 1,245 {[(1,245/(3,542 × 262)] × 100 = 0.13%} and 10 (0.10%) mismatches for rare and ultra-rare, respectively. Using HRC, we retrieved 3,759 rare and 93 ultra-rare "high-quality" variants; we observed 2,439 (0.24%) and 32 (0.13%) mismatches for rare and ultra-rare variants, respectively. Details about pipeline can be found in Supplementary Section S3.
Next, we computed Cohen's kappa coefficient (K) for 14,157 imputed SNPs common in WES and the two reference panels. For both HRC and 1000G-imputation, we observed Kappa (K) of ∼0.99 for both rare and ultra-rare "high-quality" variants (Table 4). Details about pipeline can be found in Supplementary Section S4.

Effects of Ancestry on Imputation Quality
We evaluated the effect of individual ancestral component separately on SNP mismatches for Chromosome 14 on 262 individuals. For both reference panels we found that higher African ancestry (YRI) was associated with higher number of mismatches (Supplementary Table S7). For instance, with 1000G reference panel, for rare variants ("Info" ≥ 0.80), we observed an estimate of 1.46 (P-value < 0.001) for YRI component (indicating that for each unit increase in YRI ancestry, it results in 1.46 additional mismatches). Details on confidence intervals and robust standard errors can be found in Supplementary Table S7 and Supplementary Section S5). We did not observe significant effect of ancestry on "high-quality" ultra-rare variants in both panels.

DISCUSSION
This study examined imputation performances in a cohort Caribbean Hispanics, focusing on uncommon, rare and ultrarare variant, by comparing different phasing and imputation tools, as well as evaluating the effects of different reference panels. Overall, uncommon and rare variants can be well imputed in this population, characterized by a unique genetic background. Caribbean Hispanics are admixed with 59% of their genetic component from European, 32% African, and 8% Native American ancestry (Tosto et al., 2015). Due to their genetic makeup and unique linkage disequilibrium patterns, admixed populations offer unique opportunity in studying complex diseases. First, disease prevalence varies across ethnic groups (Igartua et al., 2015) and certain admixed populations show higher incidence rates and prevalence (e.g., Alzheimer's disease, diabetes etc.) or lower ones (e.g., multiple sclerosis). Second, variants that are ethnic-specific may explain a higher prevalence of the disease of interest in admixed groups.
In the present study, we examined multiple parameters of imputation using the Caribbean Hispanics population. First, we found that imputation using SHAPEIT-IMPUTE2 phasing generated better results than Eagle2-IMPUTE2, and SHAPEIT-IMPUTE2 is superior to MaCH-Admix in terms of imputation performances and process time.  Using SHAPEIT-IMPUTE2, 1000G SNPs outnumbered HRC panel because of the higher number of SNPs included in the reference panel itself. However, when we restricted our analyses to overlapping "good-" and "highquality" SNPs (i.e., those variants that most likely would be included in association analyses), HRC-imputation outperformed 1000G with higher. The superior performance of HRC over 1000G was confirmed also when we focused on uncommon, rare and ultra-rare SNPs only. Our findings confirm data in literature, i.e., reference panels with higher number haplotypes perform better in different scenarios. Additional investigations are needed in order to apply our findings to other admixed and non-admixed populations.
Overall, higher quality of imputation for rare and ultra-rare variants was also confirmed when we tested results against sequencing data. Finally, higher YRI global ancestry was found to significantly impair SNP imputation, suggesting that imputation quality decreases with increased African ancestry.
Lastly, SHAPEIT-IMPUTE2 with 1000G reference panel was successful in identifying G206A mutation carriers. We also noticed that imputation quality drastically improved when imputation was conducted using large (5MB) chunk size as compared to small (1MB) chunks. This seems to contradict previous observation: Zhang et al. (2011) studied the effect of window size on imputation in an African-American. They concluded that window size of 1MB could be considered acceptable. Possible explanations for these different results might be the more complex admixture of CH compare to AA (threeway vs. two-way admixed) and a more complex LD pattern for the G206A region. Ultimately, we recommend to consider a wider window size to achieve high-quality imputation in specific variants that fail under default settings.
This work has limitations. First, we could carry out the comparison between the two reference panels restricting the analyses to overlapping variants only, limiting our observation to a subset of the variants included in the 1000G panel. This is a result of the HRC composition, which is composed by several studies and ended up including only a consensus number of variants. Second, we tested the agreement between imputed and sequenced variants in a smaller subset of individuals that had both GWAS and WES data available.

DATA AVAILABILITY
The datasets for this manuscript are not publicly available because data will be available soon through dbgap website.
Requests to access the datasets should be directed to gt2260@cumc.columbia.edu.

ETHICS STATEMENT
All participants provided written informed consent. Ethical approval for this study was obtained from the Columbia University committee.