Transcriptome-wide association study of HIV-1 acquisition identifies HERC1 as a susceptibility gene

Summary The host genetic factors conferring protection against HIV type 1 (HIV-1) acquisition remain elusive, and in particular the contributions of common genetic variants. Here, we performed the largest genome-wide association meta-analysis of HIV-1 acquisition, which included 7,303 HIV-1-positive individuals and 587,343 population controls. We identified 25 independent genetic loci with suggestive association, of which one was genome-wide significant within the major histocompatibility complex (MHC) locus. After exclusion of the MHC signal, linkage disequilibrium score regression analyses revealed a SNP heritability of 21% and genetic correlations with behavioral factors. A transcriptome-wide association study identified 15 susceptibility genes, including HERC1, UEVLD, and HIST1H4K. Convergent evidence from conditional analyses and fine-mapping identified HERC1 downregulation in immune cells as a robust mechanism associated with HIV-1 acquisition. Functional studies on HERC1 and other identified candidates, as well as larger genetic studies, have the potential to further our understanding of the host mechanisms associated with protection against HIV-1.


INTRODUCTION
HIV type 1 (HIV-1) acquisition is a complex trait that depends on environmental and genetic factors, including dose of viral inoculum (Rodger et al., 2019) and host behavioral, cellular, and immune parameters moderating susceptibility to infection, viral control, and a systemic spread (Powell et al., 2020). Although there have been many advances in the development of successful prevention options (i.e., pre-and post-exposure prophylaxis [PrEP/PEP] [Donnell et al., 2014;Grant et al., 2010;Sultan et al., 2014]), there is no effective vaccine to prevent a systemic infection. Ultimately, advances in prevention science require a far greater understanding of the molecular mechanisms underlying susceptibility to HIV-1.
There are several host factors associated with HIV-1 acquisition, including cytokines, CCR5, APOBEC3G, TRIM5, human leukocyte antigen (HLA) class I genes, and others (Lama and Planelles, 2007;Martin and Carrington, 2013). However, most studies so far have been unable to distinguish causation from consequence of infection or correlation with co-infections. The best established genetic marker of protection against infection is the CCR5 D32 homozygous mutation (Alkhatib, 2009), which results in a faulty coreceptor that stops HIV-1 from entering its target cells (Lopalco, 2010). Nevertheless, homozygotes are rare, representing 1.2% of Europeans and less than 0.1% of individuals from other populations (Auton et al., 2015). Because studies prior to antiretroviral therapy have showed that up to two-thirds of individuals exposed to HIV-1 may not become infected despite exposure (Fowke et al., 1996; The Working Group on Mother-To-Child Transmission of HIV, 1995), it is plausible that there are common genetic variants associated with protection against infection. Yet, the molecular mechanisms underlying host susceptibility to infection, particularly those associated with common genetic variants, remain largely unknown.
Genome-wide association studies (GWAS) have the potential to fast-track the identification of common genetic variants and, consequently, genes and biological processes involved in HIV-1 acquisition. Although there was a lack of large genetic studies monitoring at-risk populations prior to the development of antiretroviral therapy, cross-sectional studies analyzing genetic differences between HIV-1-positive and HIV-1-negative individuals are likely to highlight relevant genetic mechanisms of acquisition. However, because associated variants are often noncoding and located in regions of complex linkage disequilibrium, it is challenging to pinpoint the genes directly affected by the risk variants identified in GWAS. In addition, the genetic architecture underlying complex traits originates from multiple tissues, and therefore, GWAS results alone cannot directly inform how or where the risk signal imparts its effects on gene expression regulation. In this context, transcriptome-wide association studies (TWAS) are effective tools to establish how genetic susceptibility exerts its effect (i.e., up-or downregulation of susceptibility genes) in a tissue-informed manner (Gusev et al., 2016Mancuso et al., 2017Mancuso et al., , 2018. Because genetic variants could impinge upon the regulation of key host molecules moderating viral entry or replication (e.g., proteins that act as viral receptors or DNA repair enzymes that degrade foreign nucleic acid material), identifying such regulatory mechanisms associated with HIV-1 acquisition could provide insight into future research on drug and vaccine development. Furthermore, considering that HIV-1 acquisition genetics likely encompasses immune and behavioral factors (Powell et al., 2020), TWAS can help isolate immune processes by identifying cis-regulatory mechanisms that more strongly explain the genetic association signal in lymphocytes or whole blood relative to other tissues (e.g., brain tissue). Finally, TWAS aggregate SNP associations for each gene in a functionally informed manner. Because they comprise of gene-level analyses, TWAS reduce the multiple testing burden typically associated with GWAS, and thus, they can be useful in identifying susceptibility genes in less well-powered genetic association studies (Dall'Aglio et al., 2021;Gusev et al., 2016).
The largest GWAS of HIV-1 acquisition to date has identified no common genome-wide significant variants outside of the major histocompatibility complex (MHC) locus (McLaren et al., 2013). However, considering the high estimates of SNP heritability (h 2 SNP ) observed for HIV-1 acquisition in that study after removing the MHC signal (h 2 SNP = 28-42%) (Powell et al., 2020) and that larger sample sizes typically provide increased power to detect genetic associations (Ziyatdinov et al., 2021), we meta-analyzed their results with findings from Johnson et al. (2015), FinnGen (2021, and the UK Biobank (Neale Lab, 2018). We also performed the first multi-tissue transcriptome-wide association study of HIV-1 acquisition and identified a robust, and arguably the first, common genetic risk mechanism associated with this trait.

Genome-wide association meta-analysis of HIV-1 acquisition
Our genome-wide association meta-analysis of HIV-1 acquisition included 7,303 cases and 587,343 controls (total N = 594,646, effective N = 17,014.55). We did not expect any sample overlap across the studies included in the meta-analysis, but we confirmed this through a genetic correlation analysis performed using linkage disequilibrium score (LDSC) regression (Bulik-Sullivan et al., 2015b) (adjusted p > 0.05 for all pairwise comparisons, indicating absence of sample overlap; Table S1).
Our study assessed the effects of 5,347,926 genetic variants in relation to HIV-1 acquisition. We identified 25 independent loci across the genome with suggestive association (p < 5 3 10 À6 ), of which one, at the MHC, surpassed genome-wide significance (p < 5 3 10 À8 ) ( Figure 1A). Demonstrating the consistency of the signal across the individual studies included in the meta-analysis, we observed that of the 313 variants with suggestive association that were assessed in all four studies, 301 (96%) showed concordant directions of effect across all studies. The distribution of peaks in the Manhattan plot and the deviation of the observed p values from the null hypothesis in the quantile-quantile (Q-Q) plot suggest a high degree of polygenicity associated with HIV-1 acquisition susceptibility ( Figure 1B; see more about genomic inflation analyses below, in the sensitivity analyses section). LDSC regression estimated the SNP heritability (h 2 SNP ) as 0.2105 (0.0369), corroborating a polygenic signal for HIV-1 acquisition.

OPEN ACCESS
The top associated SNP identified was rs41557415 (p = 1.96 3 10 À8 ), located at the MHC locus, more specifically in the HLA-B gene. As expected, this SNP is in linkage disequilibrium with the top association signal from McLaren et al. (2013), rs4418214 (R 2 = 0.43, D' = 0.98 in Europeans), which the authors suggest tags HLA-B*57:01 and 27:05. However, because our meta-analysis was performed using summary statistics only, we were unable to map the MHC signal to a specific HLA haplotype. Further, and as expected, variants working under a recessive genetic model like the CCR5 D32 mutation (or a proxy for it, rs113010081 [R 2 = 0.94, D' = 0.98 in Europeans]) did not show an association with HIV-1 acquisition (p > 0.05).
We calculated the genetic causality proportion (GCP) (O'Connor and Price, 2018) of HIV-1 acquisition on each of the correlated traits separately in an attempt to clarify these relationships (Table S2). Although these analyses produced GCP estimates indicating a low probability of shared causality (| GCP| < 0.60), the top trait pair identified with nominal confidence suggested that ''alcohol usually taken with meals'' (a sociodemographic indicator) is a protective factor for HIV-1 acquisition (GCP = À0.597, p = 0.036). However, the low-confidence GCP estimates observed for this and other correlated traits after multiple testing correction (Bonferroni p > 0.05) indicate a lack of sufficient evidence to support or dismiss shared genetic causality.

Functional characterization of the GWAS results
To understand whether there were functional genomic categories enriched within the GWAS results, we performed partitioned heritability using LDSC regression (Table S3). We found that the GWAS results were negatively enriched within a category named ''MAF_Adj_LLD_AFR'' (coefficient = À185. 15 [36.6], p = 4.09 3 10 À4 , Bonferroni p = 0.04). This genomic annotation corresponds to variants whose levels of linkage disequilibrium (LLD) are similar in a reference African population, after adjusting for minor allele frequency (MAF) differences. Thus, a negative enrichment suggests that the associated loci in Europeans will likely have a different LD structure in Africans, thus warranting studies in more diverse populations.
We also tested the GWAS results for enrichment with gene sets involved in known biological pathways and cell types, considering 14,462 biological terms and 209 cell and tissue types. The top gene sets associated with HIV-1 acquisition were ''melanoma-associated antigen 1 (MAGEA1) subnetwork'' (p = 4.52 3 10 À4 ) and ''T Lymphocytes Regulatory'' (p = 0.024), respectively, but these did not survive multiple testing correction (Bonferroni p > 0.05).

No significant genetic overlap between HIV-1 acquisition and HIV-1 viral control
To investigate whether the genetic mechanisms underlying susceptibility to HIV-1 acquisition were shared with those regulating viral control in HIV-1-positive individuals, we tested whether PRS for HIV-1 acquisition was associated with viral load in blood, using a cohort of 288 Europeans. We found a nominal genetic overlap between HIV-1 acquisition and viral control, with the most significant effect at the p-value threshold (P T ) of 0.1 (b = À1297.07, S.E. = 601.37, p = 0.03), which explained 1.6% of the variance in viral load in that cohort. However, this effect did not survive multiple testing correction for the number of thresholds tested (n = 8; adjusted p > 0.05), suggesting a nonsignificant and minimal overlap between polygenic risk for HIV-1 acquisition and viral control in people living with HIV-1.

Transcriptome-wide association study of HIV-1 acquisition
To identify gene expression profiles associated with susceptibility to HIV-1 acquisition, we performed a multi-tissue transcriptome-wide association study using FUSION (Gusev et al., 2016) (Figure 2A; see TWAS expression weights tested in Table S4). The p-value distribution and quantile-quantile plot suggest that there are many genes whose expression are regulated in association with susceptibility to HIV-1  Table S5). The most significant feature was the gene UEVLD, from chromosome 11p15.1, in one of the cross-tissue modules (Z = À5.60, p = 2.18 3 10 À8 , Bonferroni p = 6.00 3 10 À4 ). Another significant gene was HERC1, on chromosome 15q22.31, whose downregulation in EBV-transformed B lymphocytes was associated with HIV-1 acquisition (Z = À4.85, p = 1.23 3 10 À6 , Bonferroni p = 0.04). Notably, the other 24 association signals corresponded to genes located within the MHC locus. These signals must be interpreted with caution due to the complex LD structure in this region, which can result in associations driven by correlation with other cis-regulatory mechanisms. We performed conditional analyses in FUSION, where GWAS associations were re-evaluated after controlling for the expression of TWAS-significant genes within their respective loci. Out of the 26 TWAS association signals, only three represented independent associations with HIV-1 acquisition (HERC1, UEVLD, HIST1H4K), whereas the remainder correlated with the expression of these three genes or were unable to explain the GWAS signal in their respective locations. For instance, we observed that the top GWAS signal on chromosome 15, rs6494412 (GWAS p = 3.30 3 10 À6 ), was no longer significantly associated with HIV-1 acquisition after controlling for the expression of HERC1 in EBV-transformed B lymphocytes (conditional p = 0.06; Figure 3A). Similarly, the top GWAS SNP on chromosome 11, rs7942526 (GWAS p = 9.40 3 10 À7 ) was not associated with HIV-1 acquisition after controlling for the expression of UEVLD in the cross-tissue weights (conditional p = 0.59; Figure 3B). For the MHC region, we observed that the top GWAS hit at the HIST1H4K locus, rs13218875 (GWAS p = 1.80 3 10 À7 ), was also no longer significantly associated with HIV-1 acquisition after controlling for the expression of that gene (conditional We observed 15 genes whose expression correlated with HIV-1 acquisition susceptibility, including genes located outside of the MHC complex, such as UEVLD (p = 2.18 3 10 À8 , Bonferroni p = 6.00 3 10 À4 ), and HERC1 (p = 1.23 3 10 À6 , Bonferroni p = 0.04). iScience Article p = 0.0504; Figure 3C). On the other hand, the top GWAS hit at a neighboring MHC location, rs1265099 (GWAS p = 9.70 3 10 À6 ), remained significant after controlling for the expression of MICB in the esophagus muscularis and PSORS1C1 in the anterior cingulate cortex (conditional p = 0.03; Figure 3D), which suggests that other genes or regulatory mechanisms (e.g., trans-regulatory effects) at this locus may be involved in susceptibility. Ultimately, these findings corroborate a role for HERC1, UEVLD, and HIST1H4K in HIV-1 acquisition.

TWAS fine-mapping
We used FOCUS to calculate 90%-credible gene sets (i.e., sets likely to contain causal genes) and probability estimates of causality (PIP) for each gene. We observed that HERC1 expression in EBV-transformed B lymphocytes (PIP = 0.917) and RTP4 in the aorta (PIP = 0.688) were associated with high estimates of causality (i.e., PIP >0.50). The only gene expression mechanism robustly associated with HIV-1 acquisition according to the conditional analyses from FUSION and the FOCUS fine-mapping analysis was the downregulation of HERC1, as identified in EBV-transformed B lymphocytes.

HERC1 expression
Analysis of HERC1 expression in the Human Protein Atlas  showed that HERC1 was detected in virtually all cells and tissues examined. A clustering analysis suggests, with maximum statistical confidence, that HERC1 expression is associated with genes involved in transcription regulation ( Figure 4A). Among blood cell types, HERC1 clustered, also with maximum confidence, with genes associated with basophils ( Figure 4B).

Sensitivity analyses
To ensure that there were no systematic biases in our GWAS meta-analysis, we analyzed it using LDSC regression (Bulik-Sullivan et al., 2015b). This analysis produced a genomic inflation factor (l GC ) estimate Figure 3. Regional association plots of the TWAS hits Conditional analyses corroborate independent TWAS associations for (A) HERC1 in EBV-transformed lymphocytes (rs6494412 association before conditioning, p = 3.30 x 10 -6 ; after, p = 0.06), (B) UEVLD in one of the cross-tissue panels (rs7942526 association before conditioning, p = 9.40 x 10 -7 ; after, p = 0.59), and (C) HIST1H4K in the colon transverse (rs13218875 association before conditioning, p = 1.80 x 10 -7 ; after, p = 0.0504). On the other hand, (D) MICB expression in the esophagus muscularis and PSORS1C1 in the anterior cingulate cortex explain the GWAS association signal at their locus only in part (rs1265099 association before conditioning, p = 9.70 x 10 -6 ; after, p = 0.03). The dots colored in gray and blue correspond to the degree of association of individual SNPs with HIV-1 acquisition, before and after conditioning their association on the predicted expression of the gene(s) highlighted in green at each locus, respectively. The genes highlighted in blue correspond to marginally associated genes.

OPEN ACCESS
iScience 25, 104854, September 16, 2022 5 iScience Article of 1.14 with an intercept of 1.07 (0.0077), indicating that there could be an inflation signal due to population stratification in one of the studies included. In fact, LDSC regression of the McLaren study, the largest study included in the meta-analysis, corresponding to approximately 80% of the final effective sample size, produced a l GC estimate of 1.1491 with an intercept of 1.0833 (0.0076). Although the authors did not find evidence of population stratification in their study, we corrected their summary statistics using the intercept genomic control (intercept-GC) method (Bulik-Sullivan et al., 2015b). We meta-analyzed the corrected summary statistics with the other three studies (i.e., Johnson et al. (2015), FinnGen (2021), UK Biobank [Neale Lab, 2018]). This new meta-analysis, when analyzed by LDSC regression, produced a l GC estimate of 1.0649 with an intercept of 1.0031 (0.0072), suggesting that applying the intercept-GC method to the McLaren study alone was sufficient to remove any potential bias in the meta-analysis iScience Article (i.e., intercept closer to 1). Even though heritability estimates are downward biased after genomic control, we observed that the h 2 SNP of this new meta-analysis remained high (h 2 SNP = 0.197 [0.0337]), corroborating the high degree of polygenicity associated with HIV-1 acquisition. A new TWAS was performed, where we replicated the TWAS associations observed previously: UEVLD in the cross-tissue module (Z = À5.40, p = 6.53 3 10 À8 ), HERC1 in EBV-transformed B lymphocytes (Z = À4.70, p = 2.54 3 10 À6 ), and HIST1H4K in the colon transverse (Z = 5.35, p = 8.77 3 10 À8 ) (Table S5). In the FOCUS analysis, the agnostic TWAS finemapping identified HERC1 expression in EBV-transformed B lymphocytes (PIP = 0.858) and RTP4 in the aorta (PIP = 0.568) as robustly associated with HIV-1 acquisition.

DISCUSSION
Exposure to infectious agents does not always lead to a systemic infection. For instance, epidemiological studies prior to antiretroviral therapy indicated that up to two-thirds of individuals exposed to HIV-1 do not become infected (Fowke et al., 1996; The Working Group on Mother-To-Child Transmission of HIV, 1995). Although dose of viral inoculum (Pedraza et al., 1999) and route (Patel et al., 2014) of exposure are strong predictors of a systemic infection, it has been hypothesized that host genetic differences also moderate susceptibility to viral entry, replication, and a systemic spread (Limou et al., 2009;Liu et al., 1996;McLaren et al., 2013McLaren et al., , 2015Powell et al., 2020). However, aside from CCR5, the host genetic factors involved in susceptibility to HIV-1 acquisition, particularly those related to common genetic variants, remain elusive. Here, we performed the largest genome-wide association meta-analysis and the first multi-tissue TWAS of HIV-1 acquisition to advance our understanding of the genetic factors involved in this trait.
The GWAS identified 25 independent loci with suggestive association, of which one, at the MHC, surpassed genome-wide significance. Although the MHC remains a fundamental target for HIV-1 research (e.g., Pereyra et al., 2010), we were unable to confidently infer protective MHC haplotypes, given the current lack of methods to do this using GWAS summary statistics alone. This is a challenging task considering the large linkage disequilibrium block sizes located in this highly polymorphic region (Dawkins and Lloyd, 2019). However, in our study, we explored the polygenic architecture of HIV-1 acquisition using genetic correlations, partitioned heritability, gene-set enrichment analyses, and a TWAS. This genome-wide characterization of HIV-1 acquisition was needed considering the high h 2 SNP estimated for this trait using the LD score regression method, which disregards SNPs located at the MHC to avoid inflated estimates (Bulik-Sullivan et al., 2015b). Thus, our study represents an important step toward a better understanding of the risk genes involved in HIV-1 acquisition, beyond those explored due to historical relevance alone, which could reveal genuine mechanisms associated with susceptibility, identified through hypothesis-free analyses, as observed in other fields (e.g., psychiatry, see Duarte et al., 2021).
We found that HIV-1 acquisition was associated with a h 2 SNP estimate of 0.21 (0.04), which represented a 25% reduction in heritability relative to what we observed previously in the McLaren study alone (h 2 SNP = 0.28 [0.05]) (Powell et al., 2020). Although increases in sample size improve the power to detect the core genes associated with a GWAS trait, it is plausible that the individual studies included in the meta-analysis also increased the complexity of the phenotype analyzed because of the different case-control selection criteria used in each study (increased phenotype complexity typically decreases heritability estimates; see Tropf et al., 2017). For instance, it is plausible that the Johnson et al. (2015) study, in which cases and controls were matched injecting drug users, may provide insight into the genetic signals that more strongly reflect immune processes. On the other hand, studies analyzing cases versus population controls, such as the McLaren et al. (2013) study, may highlight genetic signals that reflect more socioeconomic or behavioral factors. The observed reduction in heritability is also consistent with the lower estimates of h 2 SNP calculated for susceptibility to infections in general ($2%-7% [Nudel et al., 2019]).
We observed genetic correlations between HIV-1 acquisition and multiple traits that were previously identified in a study from our group, where we performed LD score regression analyses on the McLaren study alone (Powell et al., 2020). For example, we confirmed an association with smoking status, which is consistent with epidemiological findings showing that smoking is more prevalent among HIV-1-positive individuals relative to the general population (Batista et al., 2013). Another correlated trait was ''alcohol usually taken with meals,'' which is a proxy for socioeconomic status (Zhou et al., 2021). It is plausible that this is a reflection of the fact that infection is typically higher among individuals with lower income and fewer years ll OPEN ACCESS iScience 25, 104854, September 16, 2022 7 iScience Article of education (Bunyasi and Coetzee, 2017). As a reflection of the more extensive GWAS catalog used in the present screening, we further identified new traits associated with HIV-1 acquisition, including ''ever had same-sex intercourse'' and ''substances taken for depression.'' These results corroborate known risk factors associated with HIV-1 acquisition, i.e., there is an increased prevalence of HIV-1 among men who have sex with men (Center for Disease Control and Prevention, 2021) and an increased prevalence of depression in people living with HIV-1 (Tran et al., 2019).
To identify the gene expression profiles associated with HIV-1 acquisition, we performed a multi-tissue TWAS. The TWAS identified 15 genes regulated in association with HIV-1 acquisition, of which three genes had their expression considered independent from other genes in their loci (UEVLD, HERC1, and HIST1H4K). None of these genes were identified in a recent TWAS of HIV-1 viral control . Using an agnostic TWAS fine-mapping approach, we found evidence corroborating an association between HIV-1 acquisition and HERC1 expression in EBV-transformed B lymphocytes and RTP4 expression in the aorta. Our findings pertaining to HERC1, UEVLD, HIST1H4K, and RTP4, suggest these genes are important candidates for future research. In fact, the RTP4 protein is a potent inhibitor of pathogenic viruses from the Flaviviridae family (Boys et al., 2020), whereas UEVLD has been found in urinary extracellular vesicles from HIV-1-positive individuals (Anyanwu et al., 2018), and HIST1H4K is a histone shown to bind to HIV-1 Tat (Deng et al., 2001). However, HERC1, which encodes a large ubiquitin ligase protein, was the most interesting candidate identified in our study.
We obtained converging evidence from FUSION's conditional analysis and the FOCUS fine-mapping approach suggesting HERC1 downregulation in EBV-transformed B lymphocytes is robustly associated with HIV-1 acquisition. However, the role of HERC1 in HIV-1 biology remains unclear, particularly in the context of B lymphocytes. Abnormal levels of B lymphocytes are a characteristic of HIV-1 infection (Liechti et al., 2019), but it is plausible that we detected this cis-regulatory effect in EBV-transformed B lymphocytes merely because these cells comprised the only expression dataset in the TWAS consisting of a single immune cell type. Considering that typically 50% of common variants are associated with gene expression regulation in any tissue (GTEx Consortium et al., 2017), it is likely that this regulatory effect also extends to other immune cell types, such as basophils, where HERC1 appears to be more highly expressed. HERC1 is known to regulate the formation of infectious synapses transferring viruses from basophils to T cells, which are crucial for the establishment of a systemic infection (Jiang et al., 2015). Downregulation of this gene is also associated with activation of ERK signaling (Schneider et al., 2018), which is known to decrease the production of the antiviral cytokine IFN-g (Zhang et al., 2018) and to increase HIV-1 infectivity in vitro due to differential phosphorylation of HIV-1 components Vif, Rev, and Tat (Yang and Gabuzda, 1999). The exact function of HERC1 in relation to HIV-1 susceptibility, however, requires further study using functional approaches.
Our study highlights regulatory mechanisms associated with HIV-1 acquisition and one in particular pertaining to the downregulation of HERC1 in immune cells. Analysis of larger genetic cohorts and functional studies of the genes highlighted here are likely to help uncover host mechanisms moderating HIV-1 acquisition, which could ultimately help us to identify novel approaches to tackle and eradicate HIV-1.

Limitations of the study
There are limitations to our study that should be acknowledged. First, the effective sample size of our current study is still limited and represents individuals of European ancestry only, and therefore the analysis of larger and more diverse cohorts is likely to provide additional insight; this is particularly relevant, as we found that the associated loci in Europeans likely have a different LD structure in Africans, and the functional implications of this observation remain unclear. Second, we explored HIV-1 acquisition genetics using a cross-sectional analysis of cases and controls defined based on self-report or immunoreactivity, as opposed to studying at-risk individuals longitudinally using immunoassay tests. Although the latter study design could be better at identifying genetic variants regulating specific immunological mechanisms associated with HIV-1 acquisition, it would also represent a more challenging approach (cohorts likely to be smaller, study likely to be more time-consuming), which is also ethically questionable (following at-risk individuals without ensuring they receive PrEP). Although cross-sectional studies may be influenced by misclassification bias (i.e., population controls are still susceptible to infection) (McLaren and Fellay, 2021), we chose this study design and case-control selection because it would allow us to draw power from large population genetic studies to detect more genetic associations. Ultimately, it is plausible that revisiting historical samples from at-risk individuals, collected prior to the development of PrEP, iScience Article may be a useful way to advance our understanding of HIV-1 acquisition genetics. Third, our TWAS approach assesses only the cis-genetic component of gene expression, and future studies should investigate other regulatory mechanisms (e.g., trans-eQTL effects) associated with HIV-1 acquisition. Finally, although we shed light on the relationship between susceptibility to HIV-1 acquisition and regulated genes, we cannot infer a causal mechanism yet. Functional studies investigating the expression of genes identified here, particularly HERC1 levels in lymphocytes and basophils, in relation to viral susceptibility in vitro, will be crucial to understanding this relationship.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following: iScience Article statistics alongside the other three studies (i.e., Johnson et al., 2015;FinnGen, 2021;UK Biobank [Neale Lab, 2018]), where we repeated our quality control procedure and the downstream processes (e.g., TWAS, conditional analyses, and TWAS fine-mapping).

GWAS functional mapping
We performed a partitioned heritability analysis using LDSC regression (Bulik-Sullivan et al., 2015b;Finucane et al., 2015), according to the author's manual, where only HapMap3 SNPs were analyzed after excluding the MHC locus. LD score weights (baseline model v2.2) and the files corresponding to the European subset of the 1000 Genomes project, used as the reference population, were downloaded from the authors' website. The baseline model used included 97 genomic annotations corresponding to promoter regions, DNA and histone modification sites, DNase I hypersensitivity sites, conserved regions, and others (Finucane et al., 2015). We also analyzed the GWAS summary statistics using The Genetics of Complex Traits Browser (Cué llar-Partida et al., 2019), where independent association signals (variants with p < 1 3 10 À5 , clumped using an R 2 threshold of 0.05 and a 1 Mb window) were tested for enrichment of biological terms using DEPICT and a catalog of 14,462 gene sets (Pers et al., 2015). These sets corresponded to biological annotations reflecting molecular pathways from protein-protein interaction studies, manually curated pathways, and gene sets from mouse knock-out studies. Another enrichment analysis was performed to test the GWAS results for enrichment of genes expressed in specific cell types or tissues, according to 209 Medical Subject Heading (MeSH) annotations derived from 37,427 microarrays (Pers et al., 2015). Only enrichments surviving correction for the number of gene sets tested in each analysis were considered significant (Bonferroni p < 0.05).

Transcriptome-wide association study (TWAS)
We ran a TWAS on the autosomes using default settings in FUSION (Gusev et al., 2016), and the TWAS weights calculated by the authors. We used TWAS weights corresponding to 48 tissues available within GTEx (GTEx Consortium et al., 2017) (including whole blood, EBV-transformed B lymphocytes, colon, multiple brain regions, liver, stomach, lungs, ovary, and others; see details in Table S4). We chose this approach since the impact of susceptibility to HIV-1 in terms of gene expression features across tissues and organs had not yet been explored. We also analyzed two additional blood cohorts (the Netherlands Twin Registry (Wright et al., 2014) and the Young Finns Study (Gusev et al., 2016;Raitakari et al., 2008)), which are better powered to detect heritable expression components, relative to the GTEx whole blood sample. Finally, we included three gene expression models corresponding to cross-tissue expression weights combining all heritable gene expression information from across tissues and individuals within GTEx, which improve the detection of heritable cis expression mechanisms by TWAS (Feng et al., 2021). We used the 1000 Genomes Phase 3 European panel as LD reference for the TWAS and fine-mapping analysis, downloaded from the FUSION website. Association signals were corrected using the Bonferroni method, considering the number of unique genes tested across all weight panels (P cut-off = 0.05/27,540, or 1.82 3 10 À6 ). Plots were generated using the FUSION pipeline and scripts adapted from https://opain.github.io/MDD-TWAS/ (Dall'Aglio et al., 2021). We performed conditional analyses in FUSION to determine whether the significant TWAS associations were independent from the expression of other genes in their respective loci.

TWAS fine-mapping
We used FOCUS (Mancuso et al., 2019) to perform TWAS fine-mapping and to narrow down the genes most likely to be causally related to HIV-1 acquisition. FOCUS uses a standard Bayesian approach to define 90%-credible gene sets (gene sets likely to contain causal genes at susceptibility loci) and calculates the posterior inclusion probability (PIP) for each gene in the region to be causal given the observed TWAS statistics. Genes with PIP >0.50 are more likely to be causal for the association than any other gene, or the null model (i.e., when the causal gene is not present in the TWAS). A FOCUS database was created using the TWAS FUSION weights mentioned above (except for the cross-tissue weights, since the objective of the fine-mapping approach was to define relevant gene-tissue pairs). We applied FOCUS across all tissue-specific SNP weight panels simultaneously using an agnostic analysis to identify genes and tissues more likely to be involved in the trait, without potentially biasing the analysis by prioritizing specific tissues.

Genetic correlations
We