Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations

Summary Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations’ effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.


Introduction
Through genome-wide association studies (GWASs), many associations between single-nucleotide polymorphisms (SNPs) and diverse phenotypes have been uncovered. 1owever, most GWASs to date have been conducted on individuals of European descent, even though they make up less than one-fifth of the total global population. 2,3Ancestry diversity in human genetic studies is important because linkage disequilibrium and allele frequencies differ among populations and thus associations found within European ancestry individuals may not reflect associations for individuals of other ancestries, and vice versa. 3Some efforts to increase ancestry diversity in human genetics studies include the NHLBI Trans-Omics for Precision Medicine (TOPMed) consortium, 4 the Population Architecture using Genomics and Epide-miology (PAGE) study, 5 the Human Heredity and Health in Africa (H3Africa) initiative, 6 and the Pan-ancestry genetic analysis of the UK Biobank (PanUKBB 7 ).
Alongside GWASs, transcriptome-wide association studies (TWASs) test predicted gene expression levels for association with complex traits of interest, identifying gene-trait associated pairs. 8Different TWAS methods, such as PrediXcan and FUSION, work by estimating gene expression through genotype data using transcriptomic prediction models built on expression quantitative trait locus (eQTL) data. 9,10Similarly to GWASs, TWASs are also negatively affected by ancestry underrepresentation, as gene expression prediction models for use in TWASs are often trained in European descent datasets, which reduces the power of studies conducted with individuals of other ancestries. 11,12Still, we expect the underlying biological mechanisms of complex traits to be shared across human populations, 11,13 and thus prediction methods that account for allelic heterogeneity and better estimate effect sizes can improve the discovery rate and interpretation of TWASs across populations.
Here, we used genomic and transcriptomic data from the Multi-Ethnic Study of Atherosclerosis (MESA) 14 multiomics pilot study of TOPMed to build TWAS prediction models (Figure 1).Using five different methods to estimate effect sizes, elastic net, 15,16 joint-tissue imputation (JTI), 17 Matrix eQTL, 18 multivariate adaptive shrinkage in R (MASHR), 19 and transcriptome-integrated genetic association resource (TIGAR), 20 we built population-specific transcriptomic prediction models for four MESA-defined populations-African American, Chinese, European, and Hispanic/Latinoacross three blood cell types and evaluated their prediction performance in the Geuvadis 21 cohort using PrediXcan. 9rom there, we used S-PrediXcan 22 to apply our models to GWAS summary statistics of complex traits from the multiethnic PAGE 5 study and the PanUKBB. 7We hypothesized that MASHR and JTI were most likely to improve transcriptome prediction and increase the number of TWAS hits compared with the other methods, as they both leverage similar effect size estimates across different conditions-in this case, different populations-to adjust effect sizes.In agreement with that, our results indicated that in cross-population predictions, MASHR models have a higher transcriptome prediction accuracy than other methods.Furthermore, in our TWASs, MASHR models discovered the highest number of associated gene-trait pairs across all population models.These findings illustrate that leveraging genetic diversity and effect size estimates across populations can help improve current transcriptome prediction models, which may increase discovery and replication in association studies in underrepresented populations or multiethnic cohorts.

Training dataset
This study was approved by the Loyola University Chicago institutional review board (project #2014).
To build our transcriptome prediction models, we used data from the MESA 14

Genotype and RNA-seq QC
We performed quality control (QC) on each MESA tissue-population pair separately.For the genotype data 4 (Freeze 8, phs001416.v2.p1), we excluded insertions or deletions (indels), multiallelic SNPs, and ambiguous-strand SNPs (A/T, T/A, C/G, G/C) and removed the remaining variants with minor allele frequencies (MAFs) <0.01 and Hardy-Weinberg Equilibrium (HWE) p <1 3 10 À6 using PLINK 23 v.1.9.For chromosome X, filtering by HWE was only applied in variants found within the pseudoautosomal regions based on GRCh38 positions.Furthermore, for the non-pseudoautosomal region of X, male dosages were assigned either 0 or 2. After QC, the average numbers of non-ambiguous SNPs remaining per population across all cell types were as follows: AFA ¼ 15.7 M, CHN ¼ 8.4 M, EUR ¼ 9.7 M, and HIS ¼ 13.2 M.
For the RNA-seq data, we also performed QC separately by tissue population.First, we removed genes with average TPM values <0.1.For some individuals, RNA expression levels were measured at two different time points (exam 1 and exam 5); thus, after log transforming each measurement and adjusting for age and sex as ).With these transcriptome models, we evaluated their out-of-sample transcriptome prediction accuracy using the GEUVADIS dataset.Additionally, we assessed their applicability in multiethnic TWASs using GWAS summary statistics from the PAGE Study and the PanUKBB.AFA, African American; CHN, Chinese; EUR, European; HIS, Hispanic/Latino; Mono, CD14þ monocytes; PBMC, peripheral blood mononuclear cells; Tcell, CD4þ T cells.covariates using linear regression and extracting the residuals, we took the mean of the two time points (or the single adjusted logtransformed value if expression levels were only measured once), performed rank-based inverse normal transformation, and adjusted for the first 10 genotype and 10 expression principal components (PCs).To estimate PCs, we used PC-AiR 24 with a kinship threshold of $0.022, which corresponds to 4 th -degree relatives.No individuals were removed.For each tissue, we removed genes absent in at least one population.After QC, we had 17,585 genes in PBMCs, 14,503 in Monos, and 16,647 in T cells.We used GENCODE 25 annotation v.38 to annotate gene types (e.g., protein coding, long non-coding RNA [lncRNA], etc.) and gene transcription start and end sites.

Gene expression cis-heritability estimation
We estimated gene expression heritability (h 2 ) using cis-SNPs within the 1 Mb region upstream of the transcription start site and the 1 Mb region downstream of the transcription end site.Using the genotype data filtered only by HWE p <1 3 10 À6 , for each tissue-population pair, we first performed linkage disequilibrium (LD) pruning with a 500 variant count window, a 50 variant count step, and a 0.2 r 2 threshold using PLINK 23 v.1.9.Then, for each gene, we extracted cis-SNPs and excluded SNPs with MAFs <0.01.Finally, to assess cis-SNP expression h 2 , we estimated the genetic relationship matrix and h 2 using GCTA-GREML 26 with the ''-reml-no-constrain'' option.We considered a gene heritable if it had a positive h 2 estimate (h 2 À 2*SE > 0.01 and p < 0.05) in at least one MESA population.In total, 9,206 genes were heritable in PBMCs, 3,804 in Monos, and 4,053 in T cells.We only built transcriptome prediction models for these heritable genes across all populations in their respective cell types.

Transcriptome prediction models
With the aforementioned genotype and gene expression data, we built transcriptome prediction models for each MESA tissuepopulation pair, and for each gene, we considered cis-SNPs as defined in the previous section.Additionally, we only considered SNPs present in the GWAS summary statistics of the PAGE study 5 to build our prediction models to make sure that there would be a high overlap between SNPs in the transcriptome models and SNPs in the GWAS summary statistics.After merging with PAGE SNPs, the average numbers of SNPs left in our dataset were as follows: AFA ¼ 12.8 M, CHN ¼ 6.2 M, EUR ¼ 7.4 M, and HIS ¼ 10.5 M.
We built our population-based models using five different approaches.The first was elastic net (EN) regression using the glmnet package in R, 15,16 with mixing parameter a ¼ 0.5.We considered EN our baseline model, as it has been previously used to make transcriptome prediction models for the TOPMed MESA data. 27he second method implemented was MASHR. 19Unlike EN, MASHR does not estimate weights by itself; rather, it takes Z score (or weight and SE) matrices as input and adjusts them based on correlation patterns present in the data in an empirical Bayes algorithm, allowing for both shared and condition-specific effects.By doing so, MASHR increases power and effect size estimation accuracy. 19Originally, MASHR applicability was demonstrated by leveraging effect size estimates across different tissues; 19 however, herein, we sought to assess its potential to leverage effect sizes across populations.We ran MASHR for each gene at a time, using cis-SNPs weights (effect sizes) estimated by Matrix eQTL 18 and MESA populations as different conditions (Figure 2A).Then, we split MASHR-adjusted weights according to their respective popu-lations and selected the top SNP (lowest local false sign rate) per gene to determine which SNPs would end up in the final models (Figure 2B).Local false sign rate is similar to false discovery rate but is more rigorous, as it also takes into account the direction of effect. 19Thus, by selecting one top SNP per population, the maximum number of SNPs per gene in the final model is 4, which corresponds to the number of populations in our study.If two or more populations had the same variant as the top SNP, it was only included once.To make population-based models, we used population-specific effect sizes taken from the corresponding MASHR output matrices.
The third method was based on the unadjusted effect sizes estimated by Matrix eQTL 18 using the linear regression model.We used the same approach taken to build the MASHR models, including the SNP with the lowest p value from each population, but the key difference is that we made the models using the unadjusted effect sizes.
The fourth method we used was TIGAR, which trains transcriptome imputation models using either EN or non-parametric Bayesian Dirichlet process regression (DPR). 20As we already used EN to make a set of transcriptome prediction models, we opted to make DPR-based models.We used TIGAR's default parameters to train our models, such as using the variational Bayesian algorithm and outputting fixed effect sizes.However, by default, TIGAR performs 5-fold cross-validation (CV) during training and only outputs results if the final average CV R 2 is equal or greater than 0.005; thus, since we did not implement CV for any of the aforementioned methods and instead tested performance in an independent sample, we opted to skip this step of TIGAR's pipeline and generate outputs for all genes.Most gene models generated by TIGAR had hundreds of SNPs with near-zero effect sizes.To reduce memory requirements for storage of these models, we removed SNPs with effect sizes smaller than 1 3 10 À4 .
The fifth and last method we implemented was JTI. 17 JTI was designed to leverage similarity in gene expression and DNase 1 hypersensitive sites across different tissues to possibly improve prediction performance.Thus, similarly to MASHR, we sought to assess whether the method could be adapted to use populations instead of tissues.To assess gene expression similarity between MESA populations, we computed transcriptome-wide pairwise correlations between populations using the median TPM value per gene.Additionally, we did not have population DNase 1 hypersensitivity site data, so we set column five to 1 in our input files.By default, JTI performs 5-fold CV and only produces outputs for genes with an average CV R greater than 0.1.Thus, similarly to TIGAR, we removed this filtering step of the pipeline to generate output for all genes regardless of CV performance.
To perform TWASs using GWAS summary statistics data, it is necessary to have information about the correlation between the SNPs used to predict gene expression levels. 22Thus, for all our transcriptome prediction models previously mentioned, we computed pairwise covariances for the SNPs within each TOPMed MESA population model using the respective population dosage data.All model files are freely available for anyone to use (see data and code availability section).

Assessing transcriptome prediction performance
To evaluate the gene expression prediction performance of all our transcriptome prediction models, we used DNA and lymphoblastoid cell lines RNA-seq data from 449 individuals in the Human Genetics and Genomics Advances , n ¼ 89), which we analyzed both separately and together (ALL).Similarly to our training dataset, we performed rank-based inverse normal transformation on the gene expression levels and adjusted for the first 10 genotype and 10 expression PCs using the residuals as observed expression levels.With the Geuvadis genotype data and our transcriptome prediction models, we used PrediXcan 9 to estimate gene expression levels.PrediXcan is a two-step TWAS method in which the first step is to estimate genetically regulated expression levels (GReXs).Thus, to assess transcriptome prediction performance, we compared GReXs with the adjusted, measured expression levels using Spearman correlation.

Assessing performance in TWASs
To test the applicability of our transcriptome prediction models in multiethnic association studies, we applied S-PrediXcan 22 to GWAS summary statistics from the PAGE study. 5The PAGE study consists of 28 different phenotypes tested for association with variants within a multiethnic, non-European cohort of 49,839 individuals (Hispanic/Latino, n ¼ 22,216; African American, n ¼ 17,299; Asian, n ¼ 4,680; Native Hawaiian, n ¼ 3,940; Native American, n ¼ 652; or other, n ¼ 1,052).Since we tested multiple phenotypes and transcriptome prediction models in our TWASs, we used a conservative approach and considered genes as significantly associated with a phenotype if the association p value was less than the standard Bonferroni-corrected GWAS significance threshold of 5 3 10 À8 .
To replicate the associations found in PAGE, we also applied S-PrediXcan 19 to PanUKBB 7 GWAS summary statistics (total n ¼ 441,331; European, n ¼ 420,531; Central/South Asian, n ¼ 8,876; African, n ¼ 6,636; East Asian, n ¼ 2,709; Middle Eastern, n ¼ 1,599; or admixed American, n ¼ 980).For similarity purposes, we selected summary statistics of phenotypes that overlap with the ones tested in PAGE (Table S1).As previously described, a gene-trait pair association was considered significant if its p value was less than the Bonferroni-corrected GWAS significance threshold of 5 3 10 À8 .Furthermore, we deemed significant gene-trait pair associations as replicated if they were detected by the same MESA tissue-population model and had the same direction of effect in PAGE and the PanUKBB.To assess if the genetrait association pairs found in our study had been previously reported, we compared them with studies found in the GWAS Catalog 1 (all associations v.1.0.2 file was downloaded on November 9, 2022).

Results
Increased sample sizes improve gene expression cis-h 2 estimation With the goal of improving transcriptome prediction in diverse populations, we first determined which gene expression traits were heritable and thus amenable to genetic prediction using genome-wide genotype and RNA-seq data from three blood cell types (PBMCs, Monos, T cells) in TOPMed MESA.We estimated cis-h 2 using data from four different populations (AFA, CHN, EUR, and HIS).Variation in h 2 estimation between populations is expected due to differences in allele frequencies and LD patterns; however, we show that larger population sample sizes yield more significant (p < 0.05) h 2 estimates (Figure 3).Using the PBMC dataset as an example, with the EUR dataset (n ¼ 528), we assessed h 2 for 10,228 genes; however, we estimated h 2 for 8,765 genes using the AFA dataset (n ¼ 334) (Figure 3A).Moreover, we see a great impact on the CHN population, which has the smallest sample size.For that population, we managed to estimate h 2 for only 3,448 genes.The same pattern repeats when analyzing only the heritable genes (h 2 lower bound > 0.01).In EUR, 6,902 genes were deemed heritable, whereas in AFA and CHN, the amounts of heritable genes are 5,537 and 1,367, respectively (Figure 3B).Thus, larger sample sizes are needed to better pinpoint h 2 estimates, especially in non-European populations.In total, analyzing the union across all populations' results, we detected 9,206 heritable genes in PBMCs, 3,804 in Monos, and 4,053 in T cells.

MASHR models improve cross-population transcriptome prediction performance
To improve TWAS power for discovery and replication across all populations, we sought to improve cross-population transcriptome prediction accuracy.For this, we used data from four different populations and built gene expression prediction models using five different methods (EN, TIGAR, Matrix eQTL, MASHR, and JTI).We chose EN as a baseline approach for comparison in our analysis as it has been previously shown to have better performance than other common machine-learning methods such as random forest, K-nearest neighbor, and support vector regression. 28urthermore, we trained gene expression prediction models by applying TIGAR's non-parametric Bayesian DPR pipeline. 20Using Matrix eQTL, we estimated univariate effect sizes for each cis-SNP-gene relationship, and we developed an algorithm to include top SNPs from each population but population-estimated effect sizes in each population's model (Figure 2).Matrix eQTL effect sizes are the input for MASHR, which we hypothesized might better estimate cross-population effect sizes due to its flexibility in allowing both shared and population-specific effects. 19,29Similarly, JTI was designed to leverage correlation across different tissues to improve gene expression prediction; 17 thus, we also adapted its pipeline to perform cross-population leveraging.
By filtering our models to include only genes with positive h 2 (h 2 lower bound > 0.01) in at least one population, we saw that among all methods used, we obtained more gene models in Matrix eQTL and MASHR (Figure 4A).The difference is especially greater in the CHN population model.
To evaluate model performance in population-matched and cross-population transcriptome predictions, we used data from the Geuvadis study, which comprises individuals of West African or European descent.We defined ''population-matched predictions'' as the scenarios in which the transcriptome model MESA training data and Geuvadis test data have the closest genetic distance with available data, and we defined ''cross-population predictions'' as any other pairs (Figure S1).Overall, across all Geuvadis populations, the methods tested show distinct performances (Figure S2).This result, however, may be influenced by the fact that different transcriptome models have a different number of genes in them (Figure 4A).Thus, we sought to compare performances considering the intersection of genes with expression predicted by all methods.Focusing on Geuvadis GBR and YRI populations, which have similar sample sizes and are of distinct continental ancestries, we observed that MASHR models significantly outperform the other methods in cross-population transcriptome predictions, as seen in the AFA-GBR and EUR-YRI MESA-Geuvadis population pairs (Figure 4B; Table S2).The only exception is in AFA-GBR, in which MASHR and Matrix eQTL have similar performances.Additionally, in population-matched scenarios (AFA-YRI and EUR-GBR), prediction performance does not significantly differ between MASHR, Matrix eQTL, and EN.All three aforementioned methods significantly outperform JTI and TIGAR in population-matched predictions (Table S2).Moreover, we also performed pairwise comparisons between all methods using all Geuvadis populations, taking into account the intersection of genes with expression predicted in each case.Overall, across all MESA transcriptome models and Geuvadis populations, MASHR models either performed better or the same as other methods in both populationmatched and cross-population transcriptome prediction scenarios (Table S3).

Leveraging effect sizes across different populations improves discovery rate in multiethnic TWASs
In order to investigate the applicability of the models we built in multiethnic TWASs, we used S-PrediXcan with GWAS summary statistics of complex traits from PAGE and the PanUKBB.We show that across all tissue-population models, MASHR identified the highest number of gene-trait pair associations (208) that replicated in both PAGE and the PanUKBB (p < 5 3 10 À8 ), followed by Matrix eQTL (173), JTI (131), EN (94), and TIGAR (91) (Table S3).When analyzing the total number of discoveries separately for each population, MASHR had the highest number of gene-trait pairs in most population models (Figure 5A).The only exception is with HIS models, in which both MASHR and Matrix eQTL had the same number of discoveries.The discovery rate improvement by MASHR is exceptionally high in CHN models, as it had almost twice the number of discoveries as the second-highest method (27 by MASHR vs. 14 by Matrix eQTL).
Additionally, when comparing gene-trait pairs, we saw that most MASHR hits were shared between population models, whereas other methods have higher populationspecific discoveries (Figure 5B).Most Matrix eQTL hits were also shared by many population models but not to the same degree as MASHR.Altogether, these findings indicate that MASHR models show high consistency and also suggest that TWAS results are not as affected by the MASHR population model used compared with other methods.
To contextualize our models' findings, we investigated whether the discovered gene-trait pairs had been previously reported in any studies in the GWAS Catalog (https://www.ebi.ac.uk/gwas/home).We saw that across 105 distinct gene-trait pairs associations found (totaling 697 across all models), 38 (36.19%) have not been reported in the GWAS Catalog and therefore may be unconfirmed associations that require further investigation (Table S4).Out of those unreported biological associations, most of them (13) were discovered with MASHR AFA models (Table S4).Furthermore, out of the 67 distinct known GWAS Catalog associations discovered, MASHR models identified most of  S2 for all pairwise comparisons).them (Table S3).For instance, MASHR EUR models found 34 known associations, followed by MASHR AFA with 33, and Matrix eQTL EUR with 32 (Figure S3).

Discussion
In this work, we sought to build population-based transcriptome prediction models for TWASs using data from the TOPMed MESA cohort using five distinct approaches.We saw that although the AFA and HIS populations' datasets contained the highest numbers of SNPs after quality control, EUR yielded the highest number of gene expression traits with significant h 2 estimates across all tissues analyzed.This is most likely due to the higher sample size in EUR compared with AFA and HIS, as larger sample sizes provide higher statistical power to detect eQTLs with smaller effects. 30Furthermore, we saw that the number of genes in each population transcriptome model is not the same across all methods tested.Some transcriptome prediction models, such as the ones built using EN or JTI, only contain genes for which the SNPs effect sizes converged during training, which is not a limiting factor for MASHR, Matrix eQTL, and TIGAR.One of the factors that impacts the number of genes for which SNP effect sizes converge during training is sample size, which explains the lower number of genes in the EN and JTI CHN models compared with other population models.Furthermore, although sample size does not impact the number of gene models trained for TIGAR to the same degree as EN and JTI, it influences SNP effect size estimation. 31hus, when we removed SNPs with near-zero effects, there was a drop in the number of genes in the final population transcriptome models for TIGAR.Test data sample size has also been shown to positively correlate with gene expression prediction accuracy. 32n addition to sample size, gene expression prediction accuracy is known to be greater when the training and testing datasets have similar ancestries 11,12,32,33 ; however, non-European ancestries are vastly underrepresented in human genetics studies, 2,3 which compromises the ability to build accurate TWAS models for them.Thus, using data from the Geuvadis cohort, we evaluated the transcriptome prediction performance of our models and found that MASHR models either significantly outperformed all other methods tested or had similar performance.Previous studies have shown that by borrowing information across different conditions, such as tissues 19 or cell types, 34 MASHR identifies shared or condition-specific eQTLs, which can enhance causal gene identification 29 as well as improve effect size estimation accuracy. 19Similarly, by leveraging effect size estimates across multiple populations, MASHR improved cross-population transcriptome prediction without compromising population-matched prediction accuracy.Interestingly, another method we tested, JTI, was also originally designed to leverage similarity in gene expression and DNase 1 hypersensitive sites across tissues in order to improve transcriptome prediction accuracy. 17However, our results showed that it performed worse than MASHR and the same as EN in cross-population transcriptome prediction.This suggests that distinct cross-condition leveraging frameworks may have different performances when applied across populations.One possible reason for differences in performance is that JTI uses EN weighted by condition similarity to estimate effect sizes and select SNPs to be included in the final models, whereas for MASHR, our pipeline selects one SNP per condition.Since more SNPs with less significant effect sizes were included in our EN and JTI models, greater uncertainty in effect sizes likely led to lower transcriptome prediction accuracy compared with MASHR.Furthermore, among the methods evaluated, TIGAR had the lowest prediction performance.Originally, TIGAR was benchmarked against EN and showed better transcriptome prediction accuracy; however, unlike in our analysis, their analysis included only genes whose expression h 2 was equal or lower than 0.2. 20iscovery and replication of TWAS associations are also related to the ancestries of the transcriptome prediction model training dataset and ancestries of the TWAS sample dataset. 11Thus, we assessed the applicability of our models in TWASs using S-PrediXcan on PAGE and PanUKBB GWAS summary statistics and found that across all tissues and populations, MASHR models yielded the highest number of total gene-trait pairs associations, with MASHR AFA reporting the highest number.6][37][38] Our results also showed that although JTI transcriptome prediction was not as accurate as baseline EN, JTI models had more TWAS discoveries than EN.This exemplifies how integrating data from different genetic ancestries may improve TWASs.
By investigating which associations had been previously reported in the GWAS Catalog, we saw that most unreported discoveries were found by MASHR models.Some of these discoveries are unique to MASHR models and have been corroborated previously, such as YJEFN3 (also known as AIBP2) and triglycerides, whose low expression in zebrafish increases cellular unesterified cholesterol levels, 39 consistent with our S-PrediXcan effect size directions (PAGE effect size ¼ À0.52, p ¼ 6.1 3 10 À16 ; PanUKBB effect size ¼ À0.86, p ¼ 7.1 3 10 À86 ).Additionally, we also saw that MASHR models showed higher consistency across the different population transcriptome prediction models, which means that TWAS results are not as affected by the population model used as other methods.
One limitation of our TWAS is that we used transcriptome prediction models trained in PBMCs, monocytes, and T cells, and those tissues might not be the most appropriate for some phenotypes in PAGE or the PanUKBB.Additionally, because of the smaller sample sizes for some populations in our training dataset, h 2 and eQTL effect size estimates have large standard errors, which may affect the ability of MASHR to adjust effect sizes across different conditions based on correlation patterns present in the data.Regardless of that, our results mainly demonstrate that we can implement cross-population effect size leveraging using a method first applied to do cross-tissue effect size leveraging-and improve cross-population transcriptome prediction accuracy in doing so.Thus, increasing sample size for underrepresented populations will improve current MASHR TWAS models' performances as well as increase genetic diversity in the data.Another TWAS method, multi-ancestry transcriptome-wide analysis (METRO), which implements a likelihood-based inference framework to incorporate transcriptome prediction models built on datasets of two different genetic ancestries, has also shown enhanced TWAS power. 40ETRO jointly models gene expression and the phenotype of interest 40 and thus was not directly comparable with the five methods we tested here, which all separate the transcriptome prediction step from the association test.Given that this traditional two-stage TWAS procedure ignores uncertainty in the expression prediction, the joint approach of METRO across more than two populations is an area of future TWAS method research.Furthermore, while our study focused on transcriptome prediction, MASHR could also be adapted to possibly improve cross-population polygenic risk scores (PRSs).Indeed, other methods like PRS-CSx jointly model complex traits effects across populations in order to improve PRSs. 41MASHR is most useful when population effects are shared, as demonstrated by the more consistent S-PrediXcan results, but population-specific effects are also relevant.For instance, a study in a large African American and Latino cohort discovered eQTLs only present at appreciable allele frequencies in African ancestry populations. 38oreover, since our MASHR models focus on the top SNPs, we might not be including enough eQTLs in the models, especially for those genes whose expression is genetically regulated by multiple eQTLs with small effects.A small number of SNPs in the models may also contribute to a reduced degree of SNP overlap between the transcriptome prediction model and the test dataset.Thus, it is important to maximize SNP overlap in the test dataset, such as by performing SNP imputation with proper reference panels.
In conclusion, our results demonstrate the importance and the benefits of increasing ancestry diversity in the field of human genetics, especially regarding association studies.As shown, sample size is valuable for assessing gene expression h 2 and for accurately estimating eQTL effect sizes, and thus some populations are negatively affected due to the lack of data.However, by making transcriptome prediction models that leverage effect size estimates across different populations using multivariate adaptive shrinkage, we were able to increase gene expression prediction performance for scenarios in which the training data and test data have distant (''cross-population'') genetic distances with available data.Additionally, when applied to multiethnic TWASs, the aforementioned models yielded more discoveries across all methods analyzed, even detecting well-known associations that were not detected by other methods.Thus, in order to further improve TWASs in multiethnic or underrepresented populations and possibly reduce healthcare disparities, it is necessary to use methods that consider shared and population-specific effect sizes, as well as increase available data of underrepresented populations.

Figure 1 .
Figure 1.Overall study methodology Using TOPMed MESA as a training dataset, we built population-based transcriptome prediction models using five different methods (elastic net [EN], joint-tissue imputation [JTI], multivariate adaptive shrinkage in R [MASHR], Matrix eQTL, and transcriptome-integrated genetic association resource [TIGAR]).With these transcriptome models, we evaluated their out-of-sample transcriptome prediction accuracy using the GEUVADIS dataset.Additionally, we assessed their applicability in multiethnic TWASs using GWAS summary statistics from the PAGE Study and the PanUKBB.AFA, African American; CHN, Chinese; EUR, European; HIS, Hispanic/Latino; Mono, CD14þ monocytes; PBMC, peripheral blood mononuclear cells; Tcell, CD4þ T cells.

Figure 2 .
Figure2.Design of the methodology implemented to make MASHR models (A) Using effect sizes estimated using Matrix eQTL within each population dataset, we combined them across genes, with the different populations as conditions, to use as input for MASHR.The output matrixes contain adjusted effect sizes.(B) For each population, we selected the top SNP (lowest local false sign rate) per gene.Then, we concatenated the gene-top SNP pairs across populations to determine which SNPs would end up in the final models.Lastly, to make our population-based transcriptome prediction models, we used population-specific effect sizes taken from the corresponding MASHR output matrices.AFA, African American; CHN, Chinese; EUR, European; HIS, Hispanic/Latino.

Figure 3 .
Figure 3. PBMC gene expression cis-heritability estimates across MESA populations (A) Gene expression cis-heritability (h 2 ) estimated for different genes across different MESA population datasets in PBMCs.Only genes with significant estimated h 2 (p < 0.05) are shown.Gray bars represent the standard errors (2*SE).Genes are ordered on the x axis in ascending h 2 order and colored according to the h 2 lower bound (h 2 À 2*SE).(B) Number of significant heritable genes (p < 0.05 and h 2 lower bound > 0.01) within each PBMC population dataset by sample size.AFA, African American; CHN, Chinese; EUR, European; HIS, Hispanic/Latino.

Figure 4 .
Figure 4. Comparison of MESA population transcriptome prediction models (A) The number of genes in each MESA population model by method and tissue.(B) Prediction performance (Spearman's rho) of EN, JTI, MASHR, Matrix eQTL, and TIGAR PBMC MESA population models in Geuvadis GBR and YRI populations.Only the intersections of genes with expression predicted by all methods for each MESA-Geuvadis population pair are shown.MASHR performed better than or the same as all other methods (see TableS2for all pairwise comparisons).

Figure 5 .
Figure 5. Number of significant S-PrediXcan gene-trait pairs in PAGE and PanUKBB GWAS summary statistics (A) Total number of significant gene-trait pairs discovered by each MESA population model (considering the union of the three tissues) by method.(B) Number of significant gene-trait pairs discovered with individual or multiple MESA populations colored by method (considering the union of the three tissues).Population set intersections are indicated on the x axis in color.