Integrating GWAS and proteome data to identify novel drug targets for MU

Mouth ulcers have been associated with numerous loci in genome wide association studies (GWAS). Nonetheless, it remains unclear what mechanisms are involved in the pathogenesis of mouth ulcers at these loci, as well as what the most effective ulcer drugs are. Thus, we aimed to screen hub genes responsible for mouth ulcer pathogenesis. We conducted an imputed/in-silico proteome-wide association study to discover candidate genes that impact the development of mouth ulcers and affect the expression and concentration of associated proteins in the bloodstream. The integrative analysis revealed that 35 genes play a significant role in the development of mouth ulcers, both in terms of their protein and transcriptional levels. Following this analysis, the researchers identified 6 key genes, namely BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2, which were related to the onset of mouth ulcers. By combining with multidimensional data, six genes were found to correlate with mouth ulcer pathogenesis, which can be useful for further biological and therapeutic research.


Scientific Reports
| (2023) 13:10437 | https://doi.org/10.1038/s41598-023-37177-y www.nature.com/scientificreports/ data from oral ulcers and a protein quantitative trait locus (pQTL) dataset obtained from blood. Second, the Mendelian randomization (MR) analysis was performed to validate PWAS significant genes. Third, the COLOC method was used to combine the GWAS data and blood pQTL via Bayesian co-localization analysis to determine whether the two correlation signals correlate with a common variation.

PWAS identified 35 genes correlated with MU.
By combining MU GWAS data with blood proteomes, the FUSION pipeline was used to conduct the PWAS of mouth ulcers 17 . Based on the Bonferroni correction threshold of P 0.05/number of genes analyzed, PWAS identified 35 genes with protein abundances related to oral ulcers ( Fig. 1 and Supplementary Table 1). These genetic instruments all had F statistics exceeding 10, which indicates strong instruments 18 . During the study, the researchers created a PPI network (depicted in Fig. 2A and B) and discovered that TLR1, IL1RN, and CTSB were the central genes in the protein interaction network. The GO enrichment analysis demonstrates that 1,170 terms, including 1065 BP terms, 70 CC terms, and 34 MF terms, were enriched. Among these GO categories, defense response to the bacterium, cytokinemediated signaling pathway, and toll-like receptor signaling pathway was identified (Fig. 3A). The KEGG enrichment analysis 19 revealed that four KEGG terms were found, including the Toll-like receptor signaling pathway, Cytokine-cytokine receptor interaction, Tuberculosis, and Pertussis (Fig. 3B). Through the above finding, we found that these genes were involved in inflammation.

Figure 1.
Manhattan plot for the discovery mouth ulcers PWAS integrating the mouth ulcer GWAS with human blood proteomes. Each point represents a single association test between a gene and mouth ulcers ordered by genomic position on the x axis and the association strength on the y axis as the − log10(P) of a z-score test.

Discussion
Based on MU GWAS data and blood-derived proteome data, a combination of PWAS, MR, and Bayesian colocalization analysis was used to screen crucial genes for mouth ulcers. PWAS analysis revealed that 35 variations in the gene expression were associated with mouth ulcers. In the MR analysis, 30 variations in the gene expression have a association with mouth ulcers. Ultimately, we identified 6 potential risk genes (BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2) of mouth ulcers with altered protein abundances in the blood. Research into these genes may provide mechanistic and therapeutic targets. Human genetics research aims to identify therapeutic targets for diseases, which is especially essential for mouth ulcer research. Of these identified genes, IL12B was associated with the risk of recurrent Oral ulcers 20 and peptic ulcer disease 21 . Meanwhile, IL12B was closed to inflammation development 22,23 . It has been reported that overexpression of BPI inhibits Treg differentiation and intrigues exosome-mediated inflammatory responses in systemic lupus erythematosus 24 . Oh et al., demonstrated that FAM213A is associated with the prognostic significance of tumor development through regulation of oxidative stress, such as myelopoiesis 25 and oral carcinoma 26 . www.nature.com/scientificreports/ Zhang et al. 27 reported that the synergistic effect of CD100 and PlxnB2 promotes the inflammatory response of keratinocytes through the activation of NF-κB and NLRP3 inflammasomes and is involved in the pathogenesis of psoriasis. Several studies demonstrated that IL22RA2 was involved in the inflammation process [28][29][30] . The expression of several butyrophilin (BTN) and butyrophilin-like (BTNL) molecules was significantly altered by inflammation, including BTN1A1, BTN2A2, BTN3A3, and BTNL8 31 , and associated with tumor development [32][33][34] . Several advantages can be drawn from our study. First, Our PWAS for mouth ulcers included the largest and most comprehensive human proteome and GWAS data. Meanwhile, by integrating multidimensional QTL data, we were able to gain a comprehensive understanding of the complex biology of MU in blood. Second, by using Bayesian colocalization, two correlated signals with common causal variants can be estimated at specific sites, and the causative proteins of oral ulcers (BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2) were validated.  www.nature.com/scientificreports/ Finally, this study analyzed protein levels associated with oral ulcers using PWAS. Blood protein screening for mouth ulcers may help provide greater insight into those at high risk for MU recurrence. There are also some limitations in our study. First, pQTL mapping does not resolve all GWAS signatures. It is difficult to explain how genes are involved in the biological development of oral ulcers at a single level, such as the protein level. It is necessary to conduct more epigenetic studies, such as mQTL analysis, single-cell sequencing, and whole-genome sequencing, to design tailored treatments and fully understand mouth ulcer molecular mechanisms. Second, A larger MU GWAS dataset will be necessary for the validation of our analysis, since we only analyzed one mouth ulcer dataset. Third, it is not sufficient to elucidate the numerous MU GWAS-recognized motifs at the protein and transcriptional levels. To gain a deeper understanding of disease progression, methylation data can be integrated into the analysis. Fourth, considering other races should be taken into consideration when extending our findings. Additionally, this research examines predicted protein levels using an in-silico investigation. To enhance the validation of the results, it would be more desirable to have an independent sample with measured proteomics instead of predicted. Functional genomics and biological experiments must be conducted to elucidate and understand the molecular mechanisms underlying mouth ulcers. Fifth, functional studies and/or genetic evidence suggest that although the identified genetic variations are directly involved in the pathogenesis of mouth ulcers, the underlying mechanisms of the disease are multifactorial, and need to be taken into consideration.
In conclusion, we found strong evidence supporting six novel blood proteins (BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2) associated with mouth ulcers. In our study, we provided suggestions for future biological and therapeutic studies to verify their potential roles in MU.

Method
The present MR analysis was based on summary data from previous studies 35,36 that had gained written informed consent and ethics approval. No ethical permit is required for the secondary analysis of summary data.

Mouth ulcers of GWAS data. A GWAS summary for mouth ulcers was collected from the UK Biobank
(UKB) of European ancestry for the present study 7 . By completing questionnaires, and interviews, completing physical measurements, and donating biological samples, participants provided information pertinent to health outcomes in adulthood and later life (data showcase available at http:// www. ukbio bank. ac. uk) 37 . The UKB was used for GWAS on mouth ulcers, in which all participants completed a baseline questionnaire regarding oral health. The term "Mouth ulcers (yes/no)" was defined as having had mouth ulcers within the past year.
Human blood proteomic data. The serum proteomic data was obtained from a large population-based study (Atherosclerosis Risk in Communities (ARIC) study; N ~ 9000) 36 . In 1987 and 1989, 15,792 participants were recruited from four communities in the U.S. for the ARIC study: Forsyth County, North Carolina; suburban Minneapolis, Minnesota; Washington County, Maryland, and Jackson, Mississippi. Blood samples for proteomic measurements were acquired during the third visit (v3) in 1993-1995. After excluding participants without genotype data, the current study retained 9084 participants with plasma protein data. The modified aptamer ('SOMAmer Reagent' , hereafter referred to as SOMAmers) is a proteomics analysis platform for the determination of serum levels of 4,657 human serum proteins.
FUSION was used to estimate the effect of SNPs on protein abundance for proteins with significant heritability (heritability P < 0.01). Several predictive models were used in the analysis, including top1, blup, lasso, enet, and bslmm 38 . A selection of protein weights originated from the comprehensive predictive models. We then integrate the genetic effect of oral ulcers (mouth ulcer GWAS z-score) with protein weights for PWAS of oral ulcers using FUSION. By summing Z-scores and weights of independent SNPs on the locus, a linear sum is calculated. To reduce false positives, Bonferroni-corrected P values were used. Benjamini-Hochberg (BH) method was also used to impute the P value when the false discovery rate was adjusted.
Mendelian Randomization (MR) analysis. Through its cis-regulated protein abundance, the PWAS significant genes obtained from the FUSION method were related to mouth ulcers. The most significant genome-wide SNPs (P < 5 × 10 -6 ) were targeted and LD clustering was used to determine independent SNPs (R2 > 0.01). Data from QTLs and MU GWAS were harmonized following the same effect alleles. When only one independent QTL is obtained, Wald ratios were employed to estimate the association between the mouth ulcer and genes. In cases where multiple SNPs are available, the ratios of SNP exposures to SNP outcomes were combined using the inverse variance weighting (IVW) method for random-effects meta-analysis. In addition, when the number of genes exceeded three, horizontal pleiotropy was tested using the MR-egger method 39,40 . Bonferroni correction thresholds for the number of genes analyzed were set at P < 0.05/multiple comparisons. Using R version 4.0, the two-sample Mendelian randomization analysis was performed using "TwoSampleMR" version 0.5.5.

Bayesian colocalization analysis.
To determine whether the same causal signal shared MU risk loci and pQTL, we used the Coloc Bayesian test for colocalization 17 . It was defined that the default COLOC prior should be P1 = 10 -4 , P2 = 10 -4 , and P12 = 10 -5 , where P1 is the probability that a specific variant causes a mouth ulcer, P2 is a measure of the likelihood that a variant in a mouth ulcer correlates with a significant pQTL, and P12 indicates the probability that a specific variant shares common pQTL in a mouth ulcer. Five mutually exclusive hypotheses were tested: H0, no relationship with either GWAS or pQTL; H1, relationship with GWAS and no relationship