Identification and Interpretation of eQTL and eGenes for Hodgkin Lymphoma Susceptibility

Genome-wide association studies (GWAS) have revealed approximately 100 genomic signals associated with Hodgkin lymphoma (HL); however, their target genes and underlying mechanisms causing HL susceptibility remain unclear. In this study, transcriptome-wide analysis of expression quantitative trait loci (eQTL) was conducted to identify target genes associated with HL GWAS signals. A mixed model, which explains polygenic regulatory effects by the genomic covariance among individuals, was implemented to discover expression genes (eGenes) using genotype data from 462 European/African individuals. Overall, 80 eGenes were identified to be associated with 20 HL GWAS signals. Enrichment analysis identified apoptosis, immune responses, and cytoskeletal processes as functions of these eGenes. The eGene of rs27524 encodes ERAP1 that can cleave peptides attached to human leukocyte antigen in immune responses; its minor allele may help Reed–Sternberg cells to escape the immune response. The eGene of rs7745098 encodes ALDH8A1 that can oxidize the precursor of acetyl-CoA for the production of ATP; its minor allele may increase oxidization activity to evade apoptosis of pre-apoptotic germinal center B cells. Thus, these minor alleles may be genetic risk factors for HL susceptibility. Experimental studies on genetic risk factors are needed to elucidate the underlying mechanisms of HL susceptibility and improve the accuracy of precision oncology.


Introduction
Hodgkin lymphoma (HL), distinguished from non-Hodgkin lymphoma (non-HL) by the presence of Reed-Sternberg (RS) cells, is one of the most common lymphomas. Research efforts have been made to discover the corresponding risk factors for susceptibility to HL, such as age [1], sex [2], family history [3], viral infection (Epstein-Barr virus [4], and human immunodeficiency virus [5]), autoimmune disease [6], pollution exposure [7], and cigarette smoking [8]. In particular, genetic factors have been greatly concerned because of the large concordance within the family. For example, a cohort study for 54 years with 57,475 first-degree relatives of 13,922 HL patients in Denmark, Finland, Iceland, Norway, and Sweden showed that a risk estimate of HL in first-degree relatives of a patient was 3.3-fold greater (95% confidence interval = 2.8-3.9) than that in the general population [3]. A case-control study with 128 childhood HL cases and 848 controls in France reported a significant association of HL with its family history (odds ratio = 5.4; 95% confidence interval = 1.3-22.0) [9]. Nevertheless, knowledge regarding genetic risk factors is limited, especially when compared with the extensive understanding of other cancers, including non-HL.
Early studies have demonstrated that the RS cells in patients with HL may have crippling mutations in genes encoding immunoglobulin (Ig) [10,11]. These studies revealed that RS cells originate from pre-apoptotic germinal center B cells that evade apoptosis. Differential gene expression was reported between HL cases and controls, including in genes

Statistical Analysis for Identifying eQTL
Transcriptome-wide association analysis was conducted to identify eGenes associated with previously reported HL GWAS signals. The association between each eQTL and eGene was statistically tested in a mixed model framework that explains the polygenic effect, thus reducing spurious eGenes [46]. This polygenic effect is explained by the genomic covariance structure of individuals and can be included as a random effect in the corresponding analytical model [47]. The mixed model was produced under the assumption of an additive genetic model, as follows: where y is the n × 1 vector with elements of standardized gene expression levels, n is the number of individuals with gene expression levels, β is the scalar of the fixed candidate nucleotide variant effect, x is the n × 1 design vector for the fixed effect, g is the n × 1 vector with random polygenic effects, and ε is the n × 1 vector with random residual elements. The design vector x consists of 0, 1, and 2 according to the number of minor alleles in the corresponding individual. The random variables in the mixed model were assumed to have normal distributions, with the following parameters: where G is the n × n genomic similarity matrix, σ 2 g is the polygenic variance component, I is the n × n identity matrix, and σ 2 ε is the residual variance component. Elements (g jk ) of the genomic similarity matrix G consist of all pairwise genomic similarity coefficients [47] and were calculated using genotypic information of individuals, as follows: where g jk is the genomic similarity coefficient between individuals j and k, n v is the number of all variants that contribute to the genomic similarity, τ ij and τ ik are the numbers of minor alleles at nucleotide variant i of individuals j and k, respectively, and f i is the frequency of the minor allele [48]. The fixed nucleotide variant effect can be underestimated through a partial redundancy in random polygenic effects via nucleotide variants in linkage to the candidate variant. To avoid this problem in the current eQTL analysis, the genomic similarity coefficient was estimated by excluding all nucleotide variants located on the same chromosome as the candidate variant. Polygenic and residual variance components were estimated using restricted maximum likelihood (REML). The average information algorithm was used to obtain REML estimates of the variance components. Subsequently, the fixed candidate nucleotide variant effect was obtained with variance component estimates by solving Henderson's mixed model equations [48]. The fixed nucleotide variant effect was tested for eQTL. All processes were replicated for each potential eGene-eQTL pair. Mixed model analysis was conducted using GCTA software at https://gump.qimr.edu.au/gcta (accessed on 16 July 2021) [49].
Each eGene-eQTL association was determined by multiple testing with a significance threshold value of 4 × 10 −5 , which was slightly more conservative than the Bonferroni correction. Linkage disequilibrium (LD) blocks were constructed using Haploview [50], with the algorithm established by Gabriel et al. [51], to determine the significantly associated variant as an independent eQTL signal.

Functional Analysis of eQTL
The regulatory functions of eQTL in eGene expression were predicted using deep learning-based artificial intelligence methods. We utilized the ExPecto program to predict tissue-specific transcriptional effects of sequence variants within the LD blocks of cis-eQTL. Transcriptional functions were predicted by the convolutional neural network model trained with all 2002 available profile data containing histone marks, transcription factors, and DNA accessibility information (https://github.com/FunctionLab/ExPecto (accessed on 20 October 2022)) [57].
The roles of the identified functional variants were also predicted using the Sei programs (https://github.com/FunctionLab/sei-framework (accessed on 20 October 2022)) [58]. These programs used 21,907 peak calls of transcription factor binding (9471), histone marks (10,064), and chromatin accessibility (2372); this information was taken from the international consortia of the Cristrom Project [59], ENCODE [60], and Roadmap Epigenomics [61]. A convolutional neural network was employed to predict specific regulatory functions. To specify regulators binding to functional variants, atSNP (http://atsnp.biostat.wisc.edu/ search (accessed on 3 January 2023)) was used [62]. This tool predicts specific transcription factor binding to functional variants via statistical testing of the maximum binding affinity score using subsequences encompassing the variant position within a position weight matrix that is preliminarily assigned from the ENCODE [60] and JASPAR [63] motif libraries. Differential binding affinity by allele was determined using the likelihood ratio test.

Functions of Cis-Regulatory eQTL
Functional analysis of 11 cis-eQTL revealed 12 functional variants within the LD blocks of 8 eQTL in Epstein-Barr virus-transformed lymphocytes ( Table 2). Seven of the 12 functional variants were identified as enhancers. In particular, rs2858870 was determined to serve as an enhancer of four genes (HLA-DRB5, HLA-DQB1, HLA-DQA1, and TAP2). Alternatively, rs27524 was found to be a transcriptional regulator of both ERAP1 and ERAP2.

Functions of Cis-Regulatory eQTL
Functional analysis of 11 cis-eQTL revealed 12 functional variants within the LD blocks of 8 eQTL in Epstein-Barr virus-transformed lymphocytes ( Table 2). Seven of the 12 functional variants were identified as e nhancers. In particular, rs2858870 was determined to serve as an enhancer of four genes (HLA-DRB5, HLA-DQB1, HLA-DQA1, and TAP2). Alternatively, rs27524 was found to be a transcriptional regulator of both ERAP1 and ERAP2.    4 Linkage disequilibrium (r 2 ); 5 Regulatory functions were predicted using deep learning sequence models in the Sei framework. The parentheses for "regulatory function" indicate the tissues in which the fSNP was reported to regulate [58].

Discussion
This study identified 80 eGenes associated with HL-specific GWAS signals (p < 4.00 × 10 −5 ), belonging to functional groups involved in abnormal cell growth and having the potential to invade other parts of the body. Specifically, these functions include cytoskeletal processes, cell cycle, apoptosis, cell differentiation, and cell proliferation. In particular, apoptosis is more critical in the pathogenesis of HL than in other cancers because apoptotic evasion by the pre-apoptotic germinal center B cells leads to malignant transformation, thereby producing RS cells, the pathological signature of HL [10,11]. Some eGenes are directly associated with apoptosis. TNFRSF10C functions as an antagonistic receptor that protects cells from apoptosis induced by TNF-related apoptosis-inducing ligand (TRAIL) activity [64]. In contrast, KHDC1 activates cysteine-type endopeptidase to induce apoptosis by cleaving caspase-8 [65]. Further, PUS10, cleaved by caspases 8 and 3 during TRAILinduced apoptosis, may be involved in the release of cytochrome c, thereby amplifying mitochondrial apoptosis signaling [66]. Enrichment analysis using eGene sets enabled us to reveal the specific functions of these genes involved in apoptosis. In particular, terms associated with "oxidoreductase activity" were enriched in this analysis. Oxidoreductases consist of oxidase, dehydrogenase, and reductase enzymes. The current analysis resulted in the "negative regulation of oxidoreductase activity" for oxidase and "oxidoreductase activity with aldehyde/ketone as a donor and NAD/NADP as an acceptor" for aldehyde dehydrogenase (ALDH). These results are plausible considering the control of oxidoreductase: oxidase can increase reactive oxygen species (ROS) to enhance apoptosis [67], whereas ALDH can inhibit the apoptosis-inducing pathway [68]. These results aligned with a previous experimental HL study in which a decrease in ROS and a high activation of ALDH were observed (p < 0.05) in RS cells (L1236 and L428) [69]. Another study demonstrated a reduced expression of genes composing the NADPH complex in HL cell lines [70], suggesting that the decreased ROS in HL cells might be caused by the downregulation of NADPH oxidase. High activation of ALDH can increase glycolysis [71,72]; additionally, acetyl-CoA, required for oxidative phosphorylation (OXPHOS), has also been observed to increase in expression with enhanced glycolysis in RS cells [73,74]. Overall, the upregulated activity of OXPHOS enables the production of ATP for survival, thus evading apoptosis [74,75]. Moreover, highly activated ALDH has been reported to protect corneal cells from apoptosis by oxidizing cytotoxic 4-hydroxynonenal (4-HNE) [76].
These corresponding changes in oxidase and ALDH levels in RS cells enabled us to strongly postulate that these genes possess important roles in the pathogenesis of HL. This hypothesis may be strengthened by the eGenes identified in the present study, considering their direct regulation of oxidase and ALDH. For example, MT3, which is associated with the SNP rs112998813, can bind to free zinc and reduce zinc levels in cells [77,78]. Such a reduction in zinc may downregulate NADPH oxidase in a protein kinase C (PKC)dependent manner [79]. PKC has two zinc-binding motifs and is activated in a zincabundant environment [80]. PKC is critical for phosphorylation-dependent assembly and activation of NOX, a key component of NADPH oxidase [81]. OXA1L, associated with the rs6439924 SNP, is an essential factor for translocating the N-terminal of cytochrome c oxidase subunit 2; therefore, OXA1L affects the efficiency of OXPHOS, which involves cytochrome c oxidase as complex IV [82]. Moreover, some ALDH family genes (ALDH3A1 and ALDH8A1) were associated with rs7745098. ALDH3A1 can remove the presence of 4-HNE by oxidation [76]; alternatively, ALDH8A1 can oxidize a precursor to acetyl-CoA in the kynurenine pathway [83,84].
Since spurious eGenes might result from the strong linkage of the HLA genomic region, we excluded terms related to immune response (e.g., aminopeptidase and MHC) in the enrichment analysis. Nevertheless, the immune response might be important for the pathogenesis of HL since RS cells evade the immune response, thereby expanding the growth and proliferation of the RS cells within the tumor microenvironment [85].
This study showed the most significant association between ERAP1 expression and its intronic variant rs27524 (p = 3.60 × 10 −38 ). This strong association was supported by the GTEx study, in which the transcriptional association of ERAP1 was found with this variant (GTEx v8, lymphocytes, p = 9.40 × 10 −25 [86]; eQTLGen, whole blood, p = 1.78 × 10 −23 [87]). ExPecto and Sei analyses in the current study determined that the rs27524 variant was a very functional regulator in the transcription of ERAP1. This nucleotide variant has also been detected in B cells via histone ChIP-seq data obtained from the Roadmap Epigenomics Consortium [61]. Moreover, the rs27524 variant may bind to 37 transcription factors that were predicted to have differential allelic binding affinity using atSNP (p < 0.05) [62]. In particular, NF-κB is a well-known transcription factor regulating ERAP1 expression [88] and was discovered by its motif (NFKB_disc3) in B lymphocytes (GM10847, ENCODE) [60]. Furthermore, NF-κB activity is constitutive and strong in RS cells [89][90][91]. In the GM10847 cell line, NF-κB showed a higher binding affinity to the minor (A, adenine) allele at rs27524 (p = 4.71 × 10 −2 ) than the major (G, guanine) allele (p = 5.36 × 10 −1 ). Moreover, BCL3, which acts as a regulator together with NF-κB in the transcription of genes, has also been discovered in B lymphocytes (K562, ENCODE) [60]. Our results indicated that BCL3 might be a repressor since it had a lower binding affinity to the A allele (p = 1.72 × 10 −1 ) of rs27524 than the G allele (p = 1.80 × 10 −2 ). This potential repression concurred with previous studies [92][93][94][95][96] where BCL3 bound to NF-κB to inhibit the corresponding function of the NF-κB pathway.
Overall, these findings suggested that individuals homozygous for the minor allele A of rs27524 were more susceptible to HL than those homozygous for the major allele G ( Figure 4A). The A allele may decrease the repressor activity of BCL3, thus inducing stronger binding of this protein to NF-κB. As a result, this NF-κB repression increases the transcriptional activity of ERAP1, as shown in the current study, in which ERAP1 mRNA was observed at a higher level in those with the A allele than those with the G allele. The high ERAP1 expression with the A allele was also observed with 97 healthy individuals in a previous study [97]. On the other hand, we found that the A allele decreased the mRNA of another eGene, ERAP2 (p = 1.52 × 10 −7 , Table 1). This decrease in ERAP2 transcription may produce an undesirable imbalance between the two endoplasmic reticulum aminopeptidases, ERAP1 and ERAP2, resulting in a reduction in the supply of tumor-associated peptide antigens to HLA class 1 [98]. The attenuated HLA class 1 activity may decrease antigen presentation in RS cells and hinder the recognition of these RS cells by cytotoxic T cells [99,100]. Therefore, functional weakness of HLA class 1 may lead to immune escape and development of the tumor microenvironment in HL. Overall, this hypothesis suggests an underlying mechanism for the association signal of rs27524 which was identified from a European-based GWAS with 1200 HL cases and 6417 controls (GCST001387 [32]). Moreover, this study may explain the imbalance between ERAP1 and ERAP2 observed in various tumor cell lines, such as leukemia, lymphoma, carcinoma, and melanoma [101], and in tumor-transformed tissues of the breast, ovary, lung, colon, and thyroid [102]. The immune response escape of tumor cells by the imbalance of ERAP2/ERAP1 was further supported by prior studies [101,102], in which a decreased HLA class 1 surface expression was observed together with this imbalance, thereby reducing the tumor-associated peptide antigen presentation. was identified from a European-based GWAS with 1200 HL cases and 6417 controls (GCST001387 [32]). Moreover, this study may explain the imbalance between ERAP1 and ERAP2 observed in various tumor cell lines, such as leukemia, lymphoma, carcinoma, and melanoma [101], and in tumor-transformed tissues of the breast, ovary, lung, colon, and thyroid [102]. The immune response escape of tumor cells by the imbalance of ERAP2/ERAP1 was further supported by prior studies [101,102], in which a decreased HLA class 1 surface expression was observed together with this imbalance, thereby reducing the tumor-associated peptide antigen presentation.  We propose another hypothesis corresponding to the second strongest eQTL signal (p = 9.36 × 10 −19 ) of rs7745098 associated with ALDH8A1: individuals homozygous for its minor allele C were more susceptible to HL than those homozygous for its major allele T ( Figure 4B). This eQTL was also found in previous studies (GTEx v8, lymphocytes, p = 5.20 × 10 −9 [86]; eQTLGen, whole blood, p = 2.12 × 10 −250 [87]). In the current study, it was determined that these associations may be attributed to a functional variant, rs6930223, in linkage with rs7745098 (LD = 0.85) because rs6930223 was detected as an enhancer variant in the transcription of ALDH8A1 using deep learning tools (ExPecto [57]; Sei [58]), enhancer databases (Genehancer, GH06J135100 [103]; Ensembl, ENSR00001116792 [52]), histone ChIP-seq data (Roadmap Epigenomics Consortium, H3K4me1 and H3K27ac [61]), and chromatin states in B cells from the Roadmap Epigenomics Consortium (Core 15-state model [59]; 25-state model with 12 imputed marks [61]). In particular, using atSNP [62], activator protein 1 (AP1), which controls apoptosis [104] and is constitutively expressed in RS cell lines [105], was predicted, by its motif (AP1_disc10), to bind to rs6930223 in B lymphocytes (ENCODE, GM12878 [60]). Differential binding affinity was significantly observed between its alleles, G (p = 1.32 × 10 −5 ) and T (p = 1.98 × 10 −1 ). This suggests that the stronger binding of the enhancer variant (G allele) may increase ALDH8A1 transcription. Increased ALDH8A1 expression enhances the oxidation of 2-aminomuconate semialdehyde to 2-aminomuconic acid, the precursor of acetyl-CoA in the kynurenine pathway [81,82]. Subsequently, acetyl-CoA upregulates OXPHOS, producing more ATP for improved survival [74,75]. As a result, this process helps these cells to evade apoptosis, thus producing RS cells 75,76]. This hypothesis explains the underlying mechanism of the HL GWAS signal rs7745098, identified from a European-based meta-GWAS with 1465 HL cases and 6417 controls (GCST002237 [33]).
In addition to these findings, we suspected that the enhanced RS cells also had weakened cytoskeletal function [106]. The current eQTL analysis revealed cytoskeleton-related eGenes ( Figure 2B) with cytokinetic functions during cell division. For example, the corresponding protein of CDC42EP5, identified as an eGene of rs2069757, activates SEPT9, a septin protein critical for actomyosin contractility and abscission of the cell during late cytokinesis [107]. CEP76, an eGene of rs3806624, inhibits centriole duplication by repressing CP110 and inducing cell division failure via multipolar spindles [108]. Therefore, the failure of cytokinesis in the cell division process may be attributed to these genes, resulting in the production of RS cells with multiple nuclei [106].

Conclusions
In this study, we identified 80 eGenes from 20 HL GWAS signals. These eGenes were enriched in functions critical to HL susceptibilities, such as apoptosis, immune response, and cytoskeletal function. In particular, ERAP1 trims peptides to load onto HLA for immune responses; the minor allele at its eQTL, rs27524, may help RS cells to escape this immune response. Further, ALDH8A1 oxidizes the acetyl-CoA precursor to produce ATP for cell survival; the minor allele at rs6930223 may upregulate oxidization in pre-apoptotic germinal center B cells, resulting in apoptosis evasion. The roles of these eGenes may reflect RS cell-associated pathogenesis, a signature characteristic unique to HL. The variants, rs27524 and rs6930223, are both common over various populations, with minor allele frequency ranging from 0.27 (Latin American) to 0.43 (African) and from 0.27 (African) to 0.50 (European), respectively [109]. These minor alleles have been proposed as genetic risk factors for susceptibility to HL. Odds ratio estimates have been reported in previous European GWAS for HL; 1.22 (95% CI = 1.11-1.33) for rs27524 [32] and 1.21 (95% CI = 1.14-1.29) for rs7745098 in a strong linkage with rs6930223 (r 2 = 0.85) [33]. Accurate risk estimates of the functional variants should be specifically obtained for different ethnic populations and/or considering different environmental exposures. Further research on the various genetic risk factors is needed to understand the genetic architecture and to improve the accuracy of precision oncology. Although this study is based on GWAS signals, the pathological scenarios were inferred largely through gene expression associated with genetic variation in normal individuals. Experimental studies on genetic risk factors for HL susceptibility, especially in patients or animal models, would elucidate the scenarios on the underlying pathogenic mechanism suggested in this study.  Institutional Review Board Statement: This study was exempt from IRB review because we used publicly available population data and subjects could not be identified.

Informed Consent Statement: Not applicable.
Data Availability Statement: The data used in this study are publicly available in the GEO DataSets (Accession No. GSE61742).