Associations of Genetic Variants Contributing to Gut Microbiota Composition in Immunoglobin A Nephropathy

The gut microbiota and host genetics are implicated in the pathogenesis of IgAN. Recent studies have confirmed that microbial compositions are heritable (microbiome quantitative trait loci [QTL]).

I mmunoglobulin A nephropathy (IgAN) is the most common form of primary glomerulonephritis, but its pathogenesis is not well understood. Current knowledge indicates that defects in IgA1 glycosylation that lead to the formation of immune complexes are at the center of its pathogenesis (1,2). Genome-wide association studies (GWASs) in IgAN support that galactose-deficient IgA1 (Gd-IgA1) is heritable (3,4), and they also suggest that some GWAS loci, such as CARD9, TNFSF13, and PSMB8, are shared among IgAN, inflammatory bowel disease (IBD), and bacterial infections (5). These findings highlight an important role for the intestinal immune response to mucosal pathogens in IgAN.
More recent studies directly checking the role of the gut microbiota indeed supported the role of the gut-kidney axis in IgAN, evidenced by studies in both human samples and mouse models (6)(7)(8). Functional studies suggested that bacterial infections, as well as some of their metabolites, could induce immune hyperresponsiveness, resulting in the overproduction of IgA and proinflammatory cytokines (9,10). Moreover, the use of immunosuppressants targeting excessive mucosal immune responses and broad-spectrum antibiotics targeting gut microbes had shown effectiveness in treating IgAN (8,11).
Similar to IBD, while it is known that both host genetics and the microbiome influence the development of the disease, how they precisely interact is less well understood (6,12,13). Recent studies have proven that the composition of the gut microbiota is heritable, and host-microbe interactions have a role in shaping the genetic architecture of IBD (14). Thus, in this pilot study, we aim to check the role of the microbiota in the etiology of IgAN in terms of host genetic susceptibility. The flowchart of the present study is shown in Fig. 1.

RESULTS
Genetic associations between genetic variants affecting the gut microbiome and susceptibility to IgAN. As described in Materials and Methods, 136 genetic variants were left for genetic associations in this large cohort after quality control. Although we did not observe associations with genome-wide significance (5 Â 10 28 ), nine variants were associated with IgAN with suggestive P values ( Table 1). As two single nucleotide polymorphisms (SNPs) (rs1889714 and rs1248290) belonged to the same gene region and were in high linkage disequilibrium (r 2 = 1.0), rs1889714, which had a lower P value, was selected for further analysis.
Genotype and subphenotype associations at enrollment. Next, we analyzed the associations between the risk variants and clinical subphenotypes of IgAN ( Fig. 2; see also Table S1 in the supplemental material) under additive models unless genotypic counts were ,10. The risk genotypes of LYZL1 rs1889714-AA/AG were associated with higher serum levels of Gd-IgA1 and lower body mass index (BMI) ( Fig. 2A and B). The risk genotype of SIPA1L3 rs148330122-CC was associated with a younger age at onset (Fig. 2C). The risk genotype of TTLL2 rs3010562-TT was associated with male predominance (Fig. 2D).
Risk genotypes of PLTP rs6065904-AA/AG were associated with worse kidney function and more severe microscopic hematuria (MH) (Fig. 2E to G). The risk genotypes of AL365503.1 rs9363741-GG/GA were associated with a younger age at onset and more severe hematuria ( Fig. 2H and I). Of note, the associated clinical subphenotypes, such as age, sex, and BMI, had already been suggested to be strongly associated with the composition of the gut microbiota (15,16).
Prognosis associations. In the current cohort, 409 patients with IgAN were regularly monitored. The median duration of follow-up was 6.5 years. We observed a significant association between genotypes of AL392086.3 and the risk of end-stage renal disease (ESRD), which occurred in 22 out of 103 patients (21.36%) with rs7083345-CT/TT genotypes and 35 out of 291 patients (12.03%) with the CC genotype. Kaplan-Meier analysis showed that the cumulative renal survival rate was significantly lower in patients with rs7083345-CT/TT genotypes (Fig. 3).
Consistent with previous reports (17,18), the univariate analysis indicated that renal function, Gd-IgA1, and proteinuria were the prognostic factors in IgAN. Moreover, the risk genotype of rs7083345 was observed to be an additional prognostic factor. In the multivariate Cox regression analysis, rs7083345-CT/TT genotypes, lower estimated glomerular filtration rate (eGFR), and higher time-averaged proteinuria were independently associated with poorer renal prognosis (Table 2). Microbiome QTL annotations. By data mining of the previous results on microbiome quantitative trait loci (QTL) ( Table 3), it was shown that the risk genotypes of rs1889714 (LYZL1) in IgAN were negatively correlated with the abundance of Dialister, which has been reported to be significantly decreased in patients with IgA vasculitis or Crohn's disease (19,20). The risk genotypes of rs148330122 (SIPA1L3) and rs7083345 (AL392086.3) in IgAN were reported to be negatively correlated with the abundance of Bacilli, and the risk genotype of rs3010562 (TTLL2) in IgAN was reported to be negatively associated with the abundance of Anaerofilum. These microbiota members were suggested to be beneficial in maintaining intestinal homeostasis (21,22). On the other hand, the risk genotype of rs6065904 (PLTP) in IgAN was positively associated with the abundance of Erysipelotrichaceae, and risk genotypes of rs9363741 (AL365503.1) and rs12541437 (RAD21-AS1) in IgAN were reported to be positively associated with the abundance of Lachnobacterium (14). Both Erysipelotrichaceae and Lachnobacterium were suggested to be detrimental to gut disease (23,24).
Gut microbiota in patients with IgAN. We included two independent data sets to check the diversity and structure of the microbial community using the 16S rRNA technique. After quality control by removing the abnormal samples due to low connectivity (,2.5), we had 87 cases with IgAN (age, 38 The alpha diversity of bacterial communities was evaluated according to the Chao1 diversity index ( Fig. 4A and B). Generally, there was no statistical significance between patients with IgAN and controls in terms of alpha diversity. Partial least-squares discriminant analysis (PLS-DA), a supervised learning method, showed a distinct clustering pattern between samples from cases with IgAN and healthy controls in both data sets ( Fig. 4C and D). For both data sets, the top one microbe, which contributed most significantly to differentiate cases from controls, was the same. The variable importance in projection (VIP) scores for Bacteroides were 5.30 and 3.11, respectively ( Fig. 4E and F).
In confirmation of clues suggested by the genetic associations, Dialister was also observed to play a role in group separation (VIP = 1.45 and 1.23, respectively). To check the difference at the genus level using Wilcoxon rank sum tests, as shown in Fig. 4G and H, the relative abundance of Dialister showed a decreased tendency in IgAN patients, and the relative abundance of Erysipelotrichaceae showed an increased tendency ( Fig. 4I and J). For the microbe with the top VIP score, Bacteroides tended to increase in both data sets ( Fig. 4K and L). For Bacilli, Anaerofilum, and Lachnobacterium, which were associated with the former genetic associations, no output data were available due to the limited richness of 16S rRNA gene sequencing. The complete results about the relative abundances at the genus level in two independent data sets are shown in Tables S2 and S3 in the supplemental material.
To further confirm the differential abundance, we conducted a meta-analysis for the two data sets (Fig. 5). There was moderate statistical heterogeneity (30% to 60%), Kaplan-Meier analysis showed that the cumulative renal survival rate was significantly lower in patients with rs7083345-CT/TT genotypes (the P value was determined by a log rank test). HR, hazard ratio.
but the forest plots showed consistent associations, with a decreased tendency for Dialister and an increased tendency for Bacteroides in IgAN.

DISCUSSION
The microbial composition was associated with multiple traits (e.g., age, sex, and BMI) and diseases (e.g., cardiovascular disease and IBD) (15,25,26). Meanwhile, microbial compositions could be affected by multiple factors such as the environment and host genetics. Genetic variants associated with the microbiome are defined as microbiome QTL. Recent studies have identified microbiome QTL in human diseases, including IBD, cancer, heart disease, and meningitis (27,28). These microbiome QTL can be regulated by genes involved in microbiome-related pathways, including the immune system, food metabolism, and drug-related systems.
In the current study, we systemically checked the microbiome QTL in reported GWASs. We further investigated associations between reported microbiome QTL and the susceptibility to and severity of IgAN. In a large cohort with 1,511 patients with IgAN and 4,469 controls, we observed that 9 SNPs were associated with susceptibility to IgAN. Genotype-phenotype associations between risk alleles and disease subtypes may provide insight into disease etiology and mechanisms. Intriguingly, these risk variants were observed to be associated with subphenotypes of IgAN, i.e., early age at onset, elevated Gd-IgA1 levels, severe hematuria, and advanced chronic kidney disease (CKD) stage., and in a concordant way, specific risk genotypes in IgAN were associated with decreased abundances of potentially beneficial microbes and increased abundances of potentially harmful microbes, as reported for IBD. Furthermore, one microbiome-  associated SNP was also independently associated with renal outcome in IgAN. Along with a gene-centered study, we further checked the validity of the candidate microbiota members by 16S rRNA gene sequencing in two independent data sets to increase statistical power. G*Power 3.1.9.6 software was used to calculate the posterior effect size index using the asymptotic relative efficiency (ARE) method (29). The effect sizes were between 0.25 and 0.30. To estimate the power of the Wilcoxon test for a given F value with the ARE method, we scaled the sample size with the corresponding ARE value. The calculated powers in cohort 1 and cohort 2 were 0.22 and 0.14, respectively. After meta-analysis, the power was 0.34 for the combined samples. According to the results from the two data sets, we confirmed that both microbiota richness/evenness and the abundance of Dialister were decreased in IgAN. However, this candidate microbe was not the most significant in IgAN. In addition, according to the VIP score, we observed that Bacteroides contributed the most in differentiating IgAN from controls. It has been proven that Bacteroides can impact fecal IgA levels (30). However, as Bacteroides was not included in our candidate microbiome QTL, we cannot rule out the role of host genetic impacts. Therefore, by a multistage association strategy in this pilot study, we tentatively confirmed that host genetics had impacts on the gut microbiota, which played a role in both disease susceptibility and severity.
For the function and pathogenesis of the microbiota in IgAN, recent studies suggested that changes of the gut microbiota could alter IgA-mediated immunity, and differences in IgA binding to bacteria have been linked to IBD (31), which shares some  similarities in etiology and is also a common comorbidity with IgAN. It is widely recognized that IgA plays a pivotal and special role in IgAN, and our data supported the hypothesis that individuals with IgAN have an altered gut microbiota, which was somewhat genetically determined. Interestingly, the LYZL1 risk genotypes, which were Dialister related, were associated with an elevated level of Gd-IgA1. The relative abundance of the bacterial genus Dialister was reported to be significantly decreased in children with IgA vasculitis (19), in which an increased level of Gd-IgA1 was a common pathogenic mechanism, as in IgAN. Besides, in mouse studies, the c-type lysozyme gene Lyzl1 exhibited antibacterial activity (32), and mouse models targeting LYZL1 and Dialister will be of interest. Studies have highlighted the role of the gut microbiome in IgA or Gd-IgA1 immunity. Specific microbes or metabolites are known to be involved in the class switch of B cells and the production of IgA (7,30). This study may have significance in future translational medicine. First, we observed that host genetics might affect gut microbiota composition. This genetic information might contribute to risk stratification. Since genetics, instead of diet, were unchanged, we might check whether the tested people were genetically susceptible to abnormal bacterial composition, IgA immunity, and IgAN, especially in risk populations or risk families. Second, a precise link between genetics and the gut microbiota would shed new light on disease pathogenesis. It was speculated that both the mucosal immune response and the gut microbiome participated in the pathogenesis of IgAN. But the precise role of the gut microbiota in mediating the gut-kidney dialog in IgAN has not been clarified. Last but not least, targeting specific bacteria or certain metabolites would have therapeutic significance in IgAN. Thus, future studies integrating host genetics and microbiota may shed some light on precision medicine, possibly by targeting IgA immunity and infections.
Despite a large-scale genetic study on microbiome QTL in IgAN, we should note some limitations. First, there were still no widespread microbiota-related studies in IgAN, not to mention an IgAN disease-specific microbiome QTL study. To guarantee $80% power to detect a 1.2-fold-increased risk, we may need at least 2,366 cases and 7,000 controls (see Table S4 in the supplemental material), and we cannot rule out the possibility that other genetic variants had a higher magnitude of risk effects in IgAN, i.e., Bacteroides-associated variants in IgAN. Second, fecal 16S rRNA gene sequencing was limited in precision in microbiota differentiation. Meanwhile, our study did not differentiate between progressor and nonprogressor IgAN patients, whose gut microbiome profiles were different (6). However, consistent with our results, their study also observed decreased alpha diversity and key microbe changes (Bacteroides coprocola, Bacteroides faecis, and Dialister sp.) in IgAN. Thus, a prospective study using metagenomic sequencing to check the relationship between the gut microbiota and disease progression is warranted. Third, we tried our best to minimize the impact of confounding factors, including body weight, BMI, gender, alcohol intake, tobacco use, and diet. Those who were vegetarians or had dietary bias were excluded before enrollment. However, we had to acknowledge that the proportions of some constituents of the diet, such as salt or fatty acids, were not controlled in our cohorts (33)(34)(35). Fourth, all the associations may not bear multiple corrections. With 8 independent variants and 13 clinical variables, a conservative Bonferroni threshold to declare a significant association would be 4.81 Â 10 24 (0.05/8/13), whereas it may indicate that the functional impact of the gut microbiota was not very high since the occurrence and progression of IgAN were the products of the complex interactions between genetics and the environment. Fifth, both genetic association analysis and 16S rRNA sequencing were conducted in participants with the same genetic background from the same living district (Han Chinese ethnicity). Replication from different ethnicities was the gold standard for evaluating the size and reliability of a genetic finding. As a first attempt to explore potential associations with any available trait, the current data were tempting, but more widespread replications focusing on these eight SNPs would be needed in the future.
In conclusion, the first pilot microbiome QTL genetic study in IgAN showed that eight independent genetic variants were associated with both the susceptibility to and severity of IgAN. Along with feces microbe confirmation, our data indicated the decreased abundance of potentially beneficial microbes in IgAN, which might shed some light on future interventions.

MATERIALS AND METHODS
SNP selection. We selected the potentially gut-microbiome-associated genetic variants identified by GWASs by searching the NHGRI GWAS Catalog database as of 5 March 2019 using the keyword "gut microbiome" (36). Finally, we selected 175 genetic variants associated with the gut microbiome (P , 1.00 Â 10 26 ). These genetic variants were associated with alpha/beta diversity, gut microbiota taxa and their relative abundance (37)(38)(39). Here, we provide all the related information about these genetic variants in Table S5 in the supplemental material.
Genotyping and study populations. For genetic associations, a total of 5,980 participants, comprising 1,511 cases with IgAN and 4,469 healthy controls of Chinese ancestry, were recruited from Peking University First Hospital. The diagnosis of IgAN was confirmed by kidney biopsy with immunofluorescence studies for IgA deposits. Cases with secondary IgAN, such as systematic lupus erythematosus, rheumatic disease, or IgA vasculitis, were excluded. Ethnically and geographically matched healthy controls were voluntarily recruited.
Peripheral blood samples were collected from participants using the anticoagulant EDTA. We obtained the genomic DNA from peripheral blood leukocytes using the salting-out method. Genotyping was performed by using the MassArray system from Sequenom (Compass Biotech, Beijing, China). SNPs with low call rates (,95%), low minor allele frequencies (MAFs) (,0.01), or deviation from Hardy-Weinberg equilibrium (P , 0.001) were excluded.
Among the cases enrolled, 409 patients were regularly monitored. Clinical outcome of ESRD was defined as an eGFR of ,15 ml/min/1.73 m 2 or the application of renal replacement therapies, including hemodialysis, peritoneal dialysis, or renal transplantation.
16S rRNA gene sequencing. 16S rRNA gene sequencing was conducted to confirm the likely involved microbes suggested by the genetic associations. In total, we recruited 119 cases with IgAN and 45 healthy controls. Patients with ESRD, secondary IgAN, IBD, and type 2 diabetes mellitus were excluded. For both patients and healthy controls, those who reported the use of antibiotics, microbial agents, or immunosuppressants within 8 weeks before entry were excluded. Besides, those who were vegetarians or had dietary bias were also excluded. Healthy controls had no gastrointestinal diseases.
The propensity score method was used to adjust for known confounding factors, including age, sex, and BMI. Thus, the enrolled participants were divided into two groups. The first data set consisted of 90 cases with IgAN and 25 age/sex/BMI-matched healthy controls, and the second data set consisted of the rest of the 29 cases with IgAN and 20 healthy controls without selection. Fecal samples were deemed biologically representative specimens because of their noninvasion and convenience. The fecal samples from each participant were collected in the hospital and immediately stored at 280°C until 16S rRNA gene sequencing.
Fecal genomic DNA was extracted using the QIAamp Fast DNA stool minikit (catalog number 51604; Qiagen). The targeted V3-V4 hypervariable region of the bacterial 16S rRNA genes was amplified by PCR using the primers 341F (59-CCTACGGGRSGCAGCAG-39) and 806R (59-GGACTACVVGGGTATCTAATC-39). Amplicons were extracted from 2% agarose gels and purified using the AxyPrep DNA gel extraction kit (catalog number AP-GX-50; Axygen Biosciences) and quantified using Qubit2.0 (Invitrogen, MA, USA). After the preparation of the library, sequencing was performed on a HiSeq platform to generate pairedend reads of 250 bp (Illumina, CA, USA) for the first data set. The same protocol as the one described above was adopted for the second data set, and the tags were sequenced on the MiSeqDx platform (Illumina, CA, USA).
Data preprocessing. Consistent methods were used to process all qualified data from the two data sets. The output reads were processed using USEARCH v.10 and in-house scripts (43,44). The quality of the paired-end Illumina reads was checked by FastQC v.0.11.5 (45) and processed in the following steps by VSEARCH: joining of paired-end reads and relabeling of sequencing names, removal of barcodes and primers and filtering of low-quality reads (Q , 20), and finding of nonredundant reads. Unique reads were denoised into amplicon sequence variants (ASVs)/operational taxonomic units (OTUs). The representative sequences were picked by UPARSEH (46). The OTU table was generated by USEARCH. The taxonomy of the representative sequences was classified with the RDP classifier (47). Analysis of the differential OTU abundance and taxa was performed using the edgeR v3.26.8 package in R v.3.6.1. We subsample ("rarefy") an OTU table to a fixed number of reads per sample using random subsampling without replacement. Besides, the abnormal observations were filtered if the connectivity was ,2.5, using the Weighted Correlation Network Analysis (WGCNA) package (48).
Alpha diversity was evaluated by the Chao1 diversity index. PLS-DA was used to reveal taxonomic changes in different groups, and VIP scores were used to rank the abilities of different taxa to discriminate groups (49). Wilcoxon rank sum tests (two tailed) were conducted to detect differences in relative abundances between the two groups.
Ethics statement. Statistical analysis. A two-tailed P value of ,0.05 was considered statistically significant. Genetic association analysis was performed using PLINK-1.9 (50). Subphenotype associations were taken under the additive models at the genotypic level. To ensure the stability of the analysis, only if the sample count for the risk homozygous genotype was ,10, a dominant model was considered.
Continuous variables in this study were compared using an unpaired t test or analysis of variance (ANOVA) between groups if the variables were normally distributed; otherwise, a Mann-Whitney U test or a Kruskal-Wallis test was performed. Categorical variables were compared using the chi-square test or Fisher's exact test. Cumulative renal survival rates were calculated according to the Kaplan-Meier method. Univariate and multivariate Cox regression analyses were used to evaluate the risk of ESRD. The statistical analysis was performed with SPSS 26.0 software (SPSS Inc., USA).
As two different sequencing platforms were adopted in this study, we conducted a meta-analysis based on group differences rather than directly combining demultiplexed sequences. Raw data were processed into a normal distribution by calculating the square root of the relative abundance of the specific microbe. Also, further measures of the group differences were processed by Review Manager (version 5.4; Cochrane Library) under the fixed-effect model, and the degree of statistical heterogeneity was quantified by an I 2 test.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.