Association between Human Genetic Variants and the Vaginal Bacteriome of Pregnant Women

ABSTRACT The influence of human genetic variants on the vaginal bacterial traits (VBTs) of pregnant women is still unknown. Using a genome-wide association approach based on the 16S rRNA bacteriome analysis, a total of 72 host genetic variant (single nucleotide polymorphisms [SNPs], indels, or copy number variations [CNVs])-VBT associations were found that reached the genome-wide significance level (P < 5 × 10−8) with an acceptable genomic inflation factor λ of <1.1. The majority of these SNPs that reached the genome-wide significance level had a relatively low minor allele frequency (MAF), and only seven of them had MAFs greater than 0.05. rs303212, located at the IFIT1 gene on chromosome 10, was the most eye-catching variant, which had a genome-wide association with the relative abundance (RAB) of Actinobacteria and Bifidobacteriaceae and also had a suggestive association with the RAB of a few common vaginal bacteria including Actinobacteriota, Firmicutes, Lactobacillus, and Gardnerella vaginalis and the beta diversity weighted UniFrac (P < 1 × 10−5). The findings of the study suggest that the vaginal bacteriome may be influenced by a number of genetic variants across the human genome and that interferon signaling may have an important influence on vaginal bacterial communities during pregnancy. IMPORTANCE Knowledge about the influence of host genetics on the vaginal bacteriome in pregnancy is still limited. Although a number of environmental and behavioral factors may exert influences on the structure of vaginal bacterial communities, the vaginal bacteriome often undergoes a relatively fixed transition to a more stable and less diverse state as the menstrual cycle stops, which raises questions on the effects of human genetics. We utilized a genome-wide approach to identify the associations between genetic variants and multiple VBTs and performed enrichment analyses. The human genetics during pregnancy may be involved in multiple pathways. The results may disclose innate functional factors involved in shaping the vaginal bacteriome during pregnancy and provide insight into the establishment of specific strategies for prevention and clinical treatment of pregnancy complications.

immune response (1,2). The same is true for the vaginal bacteriome. The bacterial ecosystem and multiple predisposing factors shaping its structure are critical to the biological pathways leading to internal homeostasis or disorders (3,4). Disturbances in the vaginal bacteriome may contribute to a variety of common obstetric or gynecological disorders including bacterial vaginosis (BV), preterm birth, and other infectious diseases (5)(6)(7)(8).
Pregnancy is a special phase in a woman's life that involves many physiological changes from physical to hormonal (9). The vaginal bacteriome often undergoes a pronounced and relatively fixed change during pregnancy to a more stable and less diverse state in response to these changes as the menstrual cycle stops (8). Previous studies have suggested that the composition of the commensal vaginal bacteriome can be determined by multiple behavioral and environmental factors including sexual behavior, pregnancy, smoking, and diet (10)(11)(12). Although these studies have uncovered several lines of host-bacteriome interactions, the reasons for the relatively fixed transition of the vaginal bacteriome upon pregnancy remain unclear (13)(14)(15). In light of the complex etiology of microbial traits, questions about host genetics are raised to fill the paucity of intrinsic determinants of vaginal bacterial ecosystems.
Humans harbor bacteria that may acquire beneficial functions in response to host adaptation to local environments under the pressure of natural selection and become heritable in the host population when specific genetic variants evolve to replace or recruit these beneficial bacteria (13,16). Empirical genome-wide association studies (GWAS) in recent years have suggested that the influence of host genetic variants on the human bacteriome can be body site specific and vary by ethnicity (14,(17)(18)(19)(20)(21). However, the influence of host genetic variants on the vaginal bacteriome in pregnancy has scarcely been addressed among these studies. As vaginal microbiota can aid maternal receptivity and promote a tolerogenic maternal immune system for a successful pregnancy, candidate gene-based association studies have often concentrated on the role of immune-related genes like interleukin-1 receptor antagonist gene and Toll-like receptor 4 (TLR4) gene in its colonization (9,(22)(23)(24). Evidence has confirmed the functional roles of several immunomodulators in the molecular mode of pathogen-associated molecular patterns (PAMP) in response to the vaginal bacterial communities, but genome-wide evidence of the effects of genetic variants on the structure of the vaginal bacteriome in pregnancy is still limited.
Given that the potential function of the host-vaginal bacteriome relationship could help stabilize the vaginal bacterial ecosystem and the pathogenesis of pregnancy complications, we performed a GWAS based on the 16S rRNA gene-targeted bacteriome analysis to determine whether host genetic variants are associated with multiple vaginal bacterial traits (VBTs) with the baseline data in a cohort of pregnant Chinese women.

RESULTS
Overview of VBTs in pregnant women. The participants with available 16S rRNA data (n = 359) were clustered into five community state type (CST) groups and termed CST-I (134, 37.3%), CST-II (13, 3.6%), CST-IIIa (107, 29.8%), CST-IIIb (45,12.5%), and CST-IV (60, 16.7%), respectively (Fig. 1). CST-I was the most prevalent group, dominated by Lactobacillus crispatus, and then followed by CST-IIIa, which contained a high abundance of a single Lactobacillus iners. CST-IIIb was also dominated by Lactobacillus iners but had a mixture with Gardnerella vaginalis, Lactobacillus crispatus, and other non-Lactobacillus spp. CST-II was the group with the lowest prevalence and was primarily dominated by Lactobacillus paragasseri/gasseri. CST-IV had the most diverse bacterial communities among all the groups and was mostly characterized by Gardnerella vaginalis and Atopobium vaginae. Between-group comparisons suggested that the CST groups had a slight difference in baseline gestational age and first-time pregnancy status after pairwise comparisons and multiple corrections (q , 0.05; see Table S2   available microscopic examinations did not show evidence of vaginitis, as suggested by white blood cell counts and Trichomonas and yeast tests (Table S3 at https://doi .org/10.5281/zenodo.5054434).
The 16S rRNA read summary results at the genus level suggested Lactobacillus topped all the 16S reads in this study, next followed by Gardnerella, Atopobium, Prevotella, Dialister, Megasphaera, and Shuttleworthia (Table S4 at (Table S4 at https://doi.org/10.5281/zenodo.5054434). The fixation index (FST) was also estimated to identify whether the SNPs in the population were differentiated with respect to each binary VBT. However, all the estimated FST values were quite close to zero, suggesting a high homogeneity in the genetic background of the study participants (Table S8 at  Genome-wide association analysis between genetic variants and VBTs. Totally, a sum of 72 host genetic variant (SNP, indel, or CNV)-VBT associations were found that reached the genome-wide significance level (P , 5 Â 10 28 ), and 3,415 variant-VBT associations reached the suggestive significance level (P , 1 Â 10 25 ) with a genomic inflation factor (GIF) l of ,1.1 (Tables S12, S13, and S14 at https://doi.org/10.5281/zenodo .5054434). The Manhattan plots for the individual associations between genetic variants and VBTs were archived in the supplemental materials (Doc S4, Doc S5, and Doc S6 at https://doi.org/10.5281/zenodo.5054434).
The analyses on the associations between indels and VBTs (Table S13 at https://doi .org/10.5281/zenodo.5054434) identified only the C deletion in rs201053982 at the KATNAL2 gene on chromosome 18 as having a genome-wide significant association with P105-PWY (P = 4.37 Â 10 28 ). rs201053982 also had a suggestive association with some other MetaCyc pathways and the RAB of Gammaproteobacteria [Class] (P = 5.11 Â 10 27 ). Additionally, the AT deletion in rs146695045 at the ZNF28 gene had a suggestive association with the P/A and RAB of Fusobacteriales [Order] (P = 9.46 Â 10 26 and 2.67 Â 10 26 , respectively). These indels did not show any suggestive significant association with the alpha or beta diversity metrics (P . 1 Â 10 25 ).
Joint analyses for multiple VBTs. Joint association analyses for the RAB of multiple species-level taxa were performed to investigate the potential pleiotropy of the FIG 2 The association between single genetic polymorphisms (SNPs) and the vaginal bacterial taxa in pregnant women (l , 1.1). All the SNPs that reached the suggestive significance level (P , 1 Â 10 25 ) were shown in the plots, but only those with P values lower than 1 Â 10 26 were highlighted with a relatively low transparency in filled colors and annotated with variant labels. The left panel shows the significant SNPs (P , 1 Â 10 25 ) and their corresponding positions on different chromosomes, whereas the right panel shows the number of suggestively significant markers on each chromosome accordingly. The background color represents different taxonomic levels of each bacterial taxon. Covariates including the first-time pregnancy status, baseline GA, and 10 SNP-PCs were adjusted. (a) The associations between host SNPs and the binary VBTs including P/A and D/ND of a taxon given its distribution in study samples. Text in bold red indicates that the dominance of the taxon was studied, whereas the P/A trait was studied for the remaining taxa. (b) The associations between host SNPs and the RAB of vaginal bacterial taxa. genetic variants. The results showed that none of the CNVs or indels had a suggestive association with all the 16 or the top three prevalent species-level bacterial taxa. However, a total of 20 SNPs located across the human genome were identified that had a suggestive association with the RAB of the top three species-level bacterial taxa Lactobacillus crispatus, Lactobacillus iners, and Gardnerella vaginalis (Table S15 at https://doi.org/10.5281/zenodo.5054434). These three taxa accounted for over 80% of all the bacterial reads in the participants with available 16S rRNA data. Among these SNPs, rs2843019, rs303212, and rs184236 had the highest MAF among the SNPs identified, which were 0.486, 0.394, and 0.438 and located at the HMGCS2, IFIT1, and IFIT5 genes, respectively. There were some other SNPs that also had a suggestive association with all the 16 species-level bacterial taxa. However, the estimated l for genomic inflation suggested the results were inflated (l = 1.108; Table S16 at https://doi.org/10 .5281/zenodo.5054434).
Gene mapping and enrichment analyses for all the mapped genes and individual VBTs. The GTEx expression quantitative trait locus (eQTL) mapping identified a total of 882 eQTL-related genes that were expressed in 47 tissues for the SNPs that reached the suggestive significance level (Table S17 at https://doi.org/10.5281/zenodo.5054434) and one gene in three tissues for indels (Table S18 at https://doi.org/10.5281/zenodo.5054434). The vagina-specific eQTL-related genes included NBPF26, RP11-92G12.3, MIR6891, FIG 3 The association between single genetic polymorphisms (SNPs) and the bacterial diversity metrics in pregnant women (l , 1.1). All the SNPs that reached the suggestive significance level (P , 1 Â 10 25 ) were shown in the plots, but only those with P values lower than 1 Â 10 26 were highlighted with a relatively low transparency in filled colors and annotated with variant labels. The left panel shows the significant SNPs (P , 1 Â 10 25 ) and their corresponding positions on different chromosomes, whereas the right panel shows the number of suggestively significant markers on each chromosome accordingly. Covariates including the first-time pregnancy status, baseline GA, and 10 SNP-PCs were adjusted. (a) The associations between host SNPs and the alpha diversity metrics. (b) The associations between host SNPs and the beta diversity metrics by MANOVA.
Finally, a total of 1,145 unique mapped genes with available Entrez ID (Table S21 at https://doi.org/10.5281/zenodo.5054434), which were related to at least one VBT (the diversity metrics, the bacterial taxa, or MetaCyc pathways), were submitted to FUMA GENE2FUNC for overall enrichment analysis (Doc S2 at https://doi.org/10.5281/zenodo .5054434). The differentially expressed gene (DEG) sets for these genes were found to have upregulated expression in tissues like Skin Sun Exposed Lower Leg, Esophagus Mucosa, Vagina, Skin not Sun Exposed Suprapubic, Artery Coronary, Lung, Adipose Subcutaneous, Nerve Tibial, and Esophagus Gastroesophageal Junction (P bonferroni , 0.05; Doc S2 p5-6 at https://doi.org/10.5281/zenodo.5054434). Joint gene set clustering suggested that these genes were involved with multiple biological pathways like metal ion homeostasis, host immunity, lipid metabolism, hormone signaling, and so forth, given 19 different databases (Table S22 and Doc S2 at https://doi.org/10.5281/zenodo .5054434). For instance, the Reactome database identified six pathways including REACTOME METALLOTHIONEINS BIND METALS, REACTOME SYNTHESIS OF LEUKOTRIENES LT AND EOXINS EX, REACTOME RESPONSE TO METAL IONS, REACTOME PHOSPHOLIPID METABOLISM, REACTOME GLYCEROPHOSPHOLIPID BIOSYNTHESIS, and REACTOME INNATE IMMUNE SYSTEM, whereas the Wikipathways database identified three pathways including Zinc homeostasis, Allograft Rejection, and Oxidation by Cytochrome P450 (q , 0.05).
Enrichment analysis results for individual VBTs were shown in Table S23 at https:// doi.org/10.5281/zenodo.5054434. Among these gene sets, seven KEGG pathways (including hsa00072;Synthesis and degradation of ketone bodies, hsa00260;Glycine, serine and threonine metabolism, hsa00280;Valine, leucine and isoleucine degradation, hsa00650;Butanoate metabolism, hsa00900;Terpenoid backbone biosynthesis, hsa01230; Biosynthesis of amino acids, and hsa03320;PPAR signaling pathway) and four Reactome pathways including Amino acid synthesis and interconversion (transamination), Ketone body metabolism, Serine biosynthesis, and Synthesis of Ketone Bodies were significantly enriched for the RAB of Gardnerella vaginalis [Species] (q , 0.05). Except for hsa01230; Biosynthesis of amino acids, hsa03320;PPAR signaling pathway, and Amino acid synthesis and interconversion (transamination), these pathways were also significantly enriched for the RAB of Bifidobacteriaceae. For Lactobacillus, there were one KEGG pathway (hsa05168;Herpes simplex infection) and two Reactome pathways (Interferon Signaling and Interferon alpha/beta signaling) significantly enriched for its relative abundance (q , 0.05). The Reactome pathways Interferon Signaling and Interferon alpha/beta signaling, which were generally involved with IFIT gene families like IFIT1 and IFIT2, were also related to the MetaCyc pathways like ARO-PWY [

DISCUSSION
In this study, a total of 72 host genetic variant (SNPs, indels, or CNVs)-VBT associations were found that reached the genome-wide significance level (P , 5 Â 10 28 ), and 3,415 associations reached the suggestive significance level (P , 1 Â 10 25 ) with a GIF l of ,1.1. The majority of these SNPs that reached the genome-wide significance level had a relatively low MAF, and only seven of them had MAFs greater than 0.05, including rs7903692, rs303212 on chromosome 10, rs7842439 on chromosome 8, rs10806401 on chromosome 6, rs13004173 on chromosome 2, rs11831423 on chromosome 12, and rs75042393 on chromosome 4. Of these variants, the SNP rs303212 located at the IFIT1 gene was the most eye-catching variant and had a relatively high MAF in the study samples (Fig. 5  [Order]. rs75042393 (225,257 bp to DKK2) was significantly associated with the RAB of Lactobacillus mulieris, which was a recently identified Gram-positive, nonmotile, nonspore-forming, catalase-negative, coccobacillus-shaped bacterium (25,26). The remaining SNPs with a MAF greater than 0.05 had a genome-wide association with the MetaCyc pathways which were not abundant and prevalent in the samples with available 16S amplicon data. During pregnancy, the vaginal bacteriome is generally characterized by dominant Lactobacillus, which accounted for the majority of all the sequenced reads in this study. Only two SNPs including rs4670222 and rs62096874, which are located at LINC00211 and NFATC1, respectively, were identified that had a suggestive association with its dominance status. These findings suggest that the overall structure of the vaginal bacteriome in pregnancy may not be simply influenced by a small fraction of genetic variants but by a broad range of variants across the host genome. Joint analyses for multiple traits identified few SNPs associated with the RAB of the top three species-level bacterial taxa (Lactobacillus crispatus, Lactobacillus iners, and Gardnerella vaginalis) at a suggestive significance level with a subtle population structure (l , 1.1), suggesting the potential pleiotropy of several SNPs on the colonization of vaginal bacterial communities.
An increase in the RAB of one taxon often brings a decrease in one another. Unlike the bacteria colonizing other human body sites, the polarization of bacterial communities is more common in the vagina, as it is often colonized by a few dominant bacteria, especially in pregnant women (18,19,27,28). The bacterial taxa Actinobacteria and Bifidobacteriaceae are on the same lineage and are the higher-level taxa of Gardnerella vaginalis. As Gardnerella vaginalis made up the majority of the reads of Bifidobacteriaceae and its genus-level taxon Gardnerella was negatively associated with the most common genus-level taxon Lactobacillus, rs303212 and some adjacent SNPs like rs184236 and rs10887959 were also identified that had either a suggestive or genome-wide association with the RAB of Gardnerella vaginalis, Lactobacillus, and their higher-level taxa in this study. For instance, the T allele of rs303212 showed a positive association with the RAB of Lactobacillus but had a negative association with the RAB of Gardnerella vaginalis, which often competes with the Lactobacillus spp. for an ecological niche in the female lower genital tract (29,30). As the major lower-level bacterial taxon of Bifidobacteriaceae, Gardnerella vaginalis was also often reported as a potentially risky bacterium for women's reproductive health. It can form a specific biofilm on vaginal epithelium and recruit other bacteria seen in BV, leading to a polymicrobial environment in women's vaginas (31). Therefore, the associations between rs303212 and the RAB of primary vaginal bacterial taxa identified may be possibly driven by the negative correlation between Gardnerella vaginalis and Lactobacillus spp.
rs303212 and its adjacent SNPs are located at or near other IFIT gene family members like IFIT1 and IFIT5, which belong to the interferon (IFN)-stimulated genes (ISGs). The human IFIT gene family is generally comprised of four members (IFIT1, IFIT2, IFIT3, and IFIT5) and plays a pivotal role in host antiviral defense (32,33). This gene family encode the IFN-induced proteins with tetratricopeptide repeats and can be strongly induced by several IFNs like type I IFNs and type III IFNs (34)(35)(36). The relationships between the IFIT gene family and the colonization by vaginal bacteria were not frequently reported in previous studies or in the GIMICA platform (37) (Table S23 at https://doi.org/10.5281/zenodo .5054434). However, recently a study has found that the lactic acid bacteria, which are generally considered beneficial bacteria in the vagina, can activate type I interferon production and induce IFIT1 activation via the intracellular cytosolic sensor STING (38). The interferon signaling or interferon alpha/beta signaling pathways were also enriched for a variety of VBTs in this study like the P/A of Actinobacteria [Class] and the RAB of Actinobacteriota [Phylum] and Lactobacillus [Genus], suggesting a potential effect of interferon signaling on the colonization by these vaginal bacteria. These findings may support the point that IFIT1 may function as a modulator in influencing the susceptibility to bacteria through the interferon response (39). Additionally, Interferon Signaling and Interferon alpha/beta signaling were also enriched for some abundant MetaCyc pathways, indicating that the interferon-related signaling might also play an important role in the vaginal bacterial community functions.
Generally, bacterial associations with host genetic variants are considered to be body site specific. However, the associated variants of the vaginal bacteriome were not as widely considered as those of the gut bacteriome (14,(18)(19)(20). The immune-related gene-targeted association studies have revealed the potential influence of polymorphisms of genes such as TLR4 and interleukin-1 receptor antagonist on the colonization by specific pathogenic bacteria in the vagina (Table S23 at https://doi.org/10 .5281/zenodo.5054434), but an early genomic study based on whole-genome sequencing with 80 vaginal-posterior fornix samples identified only a few taxon-related single nucleotide variants after a stringent Bonferroni correction (20,22,(40)(41)(42)(43). These studies provided some evidence regarding the potential influences of host genetic factors on vaginal bacterial communities.
Another related GWAS was recently reported by Mehta et al. (21), where the association between host SNPs and the vaginal bacterial composition was investigated in Kenyan women. Several SNPs have been identified that have a suggestive significant association with the RAB of Lactobacillus crispatus, Lactobacillus iners, and Gardnerella vaginalis and the Shannon index. These suggestively significant associations were not replicated in this study. However, the association between rs527430 and the RAB of Lactobacillus iners shared the same direction as what was found in the Kenyan GWAS though the P value was not statistically significant in this study. Several reasons might account for the differences. First, the Kenya GWAS recruited 171 women of reproductive age with a high proportion of participants being infected with herpes simplex virus or HIV, whereas women in pregnancy were the primary interest in this study. Second, both of the studies excluded many SNPs during the quality control (QC) process due to the small sample sizes, possibly leading to relatively sparse and divergent markers being captured in the association analysis. Last, the genetic background of the study populations and the designed chip loci might also result in the different genetic variants remaining to be tested. Despite these differences, other similarities were also found regarding the genes mapped and identified biological pathways. For instance, the GO term GO:0001106;RNA polymerase II transcription corepressor activity was also enriched for Shannon index in this study. Additionally, despite that the results were not always statistically significant, the G-protein-coupled receptor (GPCR)-related pathways were also clustered by different genes with respect to multiple VBTs including the P/A of Bifidobacteriaceae, which is the higher-level taxon of Gardnerella vaginalis, whereas it was identified for the major vaginal bacteria like Lactobacillus crispatus and Gardnerella vaginalis in the Kenyan GWAS.
The immunosuppressed host-graft model, which regards the immune cells at the maternal-fetal interface as an immune response to the semiallogeneic fetus, was once considered the paradigm in the immunology of pregnancy. However, a recent review suggested that the response instead begins with a proinflammatory stage to facilitate blastocyst implantation, then shifts to the anti-inflammatory stage to allow fetal growth, and finally shifts back to the proinflammatory stage for labor and delivery (9). As most study participants were enrolled at the early second trimester, immune-related genetic associations may mostly represent an anti-inflammatory stage of the immune response to the maternal vaginal bacteria. Furthermore, trophoblast cells that attach the zygote to the uterine wall can express TLRs and are responsible for sensing and responding to bacterial products (9). The response to TLR4 ligation by lipopolysaccharide (LPS) can also induce the production of type I interferons that are involved in the modulation of inflammatory responses via their effects on TLR signaling pathways (39,44,45). Although a direct single-marker association on TLR4 polymorphisms was not captured (no SNP that passed QC was located at this gene in this study), the polymorphisms of other ISGs like the IFIT gene family may be involved in the modulation of the TLR4 signaling pathway by LPS and act as a regulator of both proinflammatory genes and interferon-stimulated genes so as to alter the susceptibility of bacteria as shown by previous evidence (39). In addition, genes like ATF1 and S100A9 were also enriched in some TLR-related pathways for the P/A of Peptoniphilus vaginalis and the RAB of Finegoldia magna, suggesting their potential role in influencing the colonization by these two bacteria. Previous evidence also showed the functionality of these genes in influencing the response to the vaginal infections by pathogens. For instance, S100A9 alarmins can be produced from the vaginal epithelium in response to Candida infection (46). These findings reiterated the functional roles of human immunology in shaping the vaginal bacteriome.
In addition, similar to the heritability estimation for the transmission of the gut bacteriome, a relatively high heritability estimate with statistical significance for the vaginal bacterial taxa was not detected in this study. Besides the lack of statistical power caused by the small sample size, a few other reasons might explain the lack of statistical significance. The statistical narrow-sense heritability defines the proportion of phenotypic variation in a population that is explained by genetic variation (47). However, the estimates can be affected by strong purifying selection, environmental noise, and nonadditive genetic variance (16). As only relatively healthy pregnant women at a single site were enrolled, a strong purifying selection was carried out in this study, possibly leading to a fixation of genetic variants and a reduction in the heritability estimates. Moreover, as suggested by previous population genetic theories and empirical studies, heritable components of the microbiota are more likely to be involved in genetic adaptations, and the associated fitness may not provide the relative contributions from genetic and environmental variance to a trait, which may also lower the heritability estimation (48)(49)(50)(51)(52). Suzuki and Ley also demonstrated that the heritable components of the microbiota in the host population could be fixed long term for genetic adaptation to replace or recruit the microbes with beneficial effects on hosts (16). Therefore, heritable genetic variants in this study may provide merely a starting point to identify the microbes that are undergoing host genetic adaptations to the vaginal environment in pregnancy. Nonetheless, several MetaCyc pathways had statistically significant heritability estimates, suggesting the potential heritability of microbial functionalities in adaptation to the selection by the host-environment interactions, although most of these MetaCyc pathways were not abundant in this study.
In summary, this study adds to a limited body of knowledge on the association between host genetic variants and the vaginal bacteriome in pregnancy. However, as clinical measurements regarding sexual transmitted diseases (STDs) were involved with too many missing values, a thorough assessment of infection status for the participants was not applicable in this study, which might weaken the comparability of the results from other studies which assessed the STDs. Additionally, as a sample of pregnant Chinese women with a relatively high homogeneity in genetic background and demographics was studied, the results from the association and enrichment analyses may underscore the variants with high specificity. Nevertheless, the power of the study results may still be subject to the small sample size. Further studies with larger sample sizes are still preferable to confirm these associations. Understanding the genetic background of the vaginal bacteriome in pregnancy may disclose the innate factors that function to guarantee a favorable pregnancy outcome so as to facilitate the establishment of specific strategies for prevention and clinical treatment of pregnancy complications.
Conclusion. The vaginal bacteriome may be influenced by a number of genetic variants across the host genome. The polymorphisms in the IFIT gene family and the related interferon signaling pathway may be important in determining the colonization by the common vaginal bacteria during pregnancy.

MATERIALS AND METHODS
Study population and sampling. This study was conducted among a cohort of pregnant women from Anqing Municipal Hospital in Anhui Province, China. During their prenatal visits, pregnant women who gave informed consent and who met the following criteria were included: (i) aged 18 or older, (ii) Fan et al.
July/August 2021 Volume 6 Issue 4 e00158-21 msystems.asm.org 12 baseline gestational age (GA) in the first or second trimester, (iii) having not taken any antibiotics in the previous 4 weeks, and (iv) absent of serious organic or systemic diseases (like coronary heart disease, stroke, leukemia, and so forth). Initially, a total of 1,561 pregnant women were enrolled from 22 February 2018 to 22 January 2020 prior to the COVID-19 pandemic and were followed up for their pregnancy outcomes. Then, a sample of 390 participants was drawn from the cohort baseline population using a systematic random sampling method with a sampling ratio of 1:4. After excluding those who (i) had missing or unqualified vaginal samples for 16S rRNA gene amplicon sequencing (n = 13), (ii) had sexual intercourse in the previous 48 h (n = 14), (iii) had multiple pregnancies (n = 3), or (iv) had reported inconsistent key data between the baseline and follow-up in medical records (n = 1), a final sample of 315 out of 359 eligible participants with the 16S rRNA data had corresponding SNP array data for the association analysis between host genetic variants and VBTs (see Fig. S1 at https://doi.org/10.5281/ zenodo.5054434). The participants with both host genome and vaginal bacteriome data had a median age of 28 years (interquartile range [IQR], 25 to 30). Most pregnant women were enrolled during their second trimester with a median 16.4 weeks of pregnancy (IQR, 15.7 to 17.0). These participants did not exhibit significant differences in age, baseline GA, age at menarche, prepregnancy body mass index, first-time pregnancy (primigravida or not), smoking, drinking habits, colporrhagia within 2 weeks, diarrhea or constipation within 2 weeks, vaginal douching habit, family history of hypertension and diabetes, or self-reported gestational diabetes at baseline compared with the original cohort participants (P . 0.05; Table S5 at https://doi.org/10.5281/zenodo.5054434). The study protocol was reviewed and approved by the ethical committee of the School of Public Health, Fudan University (IRB#2017-09-0636). Sample collection. At baseline, midvaginal samples were drawn by experienced obstetricians using sterile swabs. The vagina was scraped with the swab in a clockwise rotation for five spins. Paired swabs were collected for each subject. One swab was placed in a collection tube with preservation solution and stored in a low-temperature freezer at 280°C before genome sequencing, and the other was inspected by wet mount for clinical purposes. A 5.0-ml blood sample was taken from each participant by the hospital laboratory staff and also stored at 280°C before genotyping.
DNA extraction, 16S rRNA amplicon sequencing, and host DNA genotyping. The vaginal bacterial genomic DNA was extracted from frozen swabs using a mixture of cetyltrimethylammonium bromide and sodium dodecyl sulfate methods. The V3-V4 hypervariable regions of 16S rRNA genes were amplified with specific barcodes on the primers for PCR and then sequenced on an Ion S5 XL instrument (Novogene Co., Ltd., Beijing, China). The primers were as follows: 341F, 59-CCTAYGGGRBGCASCAG-39, and 806R, 59-GGACTACNNGGGTATCTAAT-39. PCR amplifications were conducted in a final reaction volume of 30 ml mixture containing 15 ml of 2Â Phusion Master Mix, 3 ml of primer (2 mM), 10 ml of the genomic DNA (gDNA) (1 ng/ml), and 2 ml H 2 O. The reaction consisted of one cycle of initial denaturation at 98°C for 1 min, followed by 30 cycles at 98°C for 10 s, annealing at 50°C for 30 s, and extension at 72°C for 30 s, and a final cycle at 72°C for 5 min. The GeneJET gel extraction kit (Thermo Fisher Scientific, Waltham, MA, USA) was used for PCR product purification.
Host DNA was isolated from blood samples with a CWE9600 automatic nucleic acid extractor using a CWE9600 blood DNA kit following the manufacturer's instructions and was genotyped on an Illumina Infinium Asian screening array (ASA-750K) following the Infinium HTS array protocol. The chip was developed by Novogene Bioinformatics Technology Co., Ltd., containing 738,980 sites in total, including the Illumina Asian screening array with an additional 50,000 SNP sites. All the samples with a call rate of .95% were reserved for further genotyping.
Vaginal bacteriome data processing. The original demultiplexed single-end 16S rRNA gene sequence data were preprocessed with the QIIME 2 Core 2020.8 distribution (53). All the reads were truncated to 370 nucleotides based on the quality scores that dropped at this number determined by reviewing the Interactive Quality Plot. All the truncated reads were then denoised and clustered by the DADA2 algorithm for quality control (QC) (54). Pretrained compatible classifiers based on the SILVA database (version 138) were downloaded from the official QIIME 2 website (https://docs.qiime2.org/2020.8/ data-resources/) for the taxonomic assignments of denoised amplicon sequence variants (ASVs) (55,56). To improve the taxonomic profiling, sequences that were resolved at the genus level but lacked a good resolution at the species level were queried through the BLAST database ref_prok_rep_genomes with local BLAST1 command-line applications (57,58). BLAST best matches were considered hits with E values of ,1 Â 10 250 , percentage of identical matches of .95%, and highest bit-score among each queried list. The lineages of queried taxa were extracted with the program from the "dib-lab/2018-ncbi-lineages" project on GitHub (https://github.com/dib-lab/2018-ncbi-lineages). Multiple or incompatible matches were left blank, except for Lactobacillus paragasseri/gasseri given its biological meaning in the classification of the vaginal bacterial community mentioned in previous publications (59,60). The feature table was rarefied to the lowest number of read counts among the selected vaginal samples (3,128 reads). Taxa that were not assigned biologically meaningful names at any taxonomic level, had overall proportions below 0.01%, or were not bacteria were removed from the analysis. The databases used for the taxonomic analyses were deposited in Zenodo (http://doi.org/10.5281/zenodo.4480400). The MetaCyc pathway abundance was extrapolated from the original feature table by PICRUSt2 and then rarefied to the lowest level among the study samples (118,013, returns the largest integers not greater than 118,013.5; Fig. S2 at https://doi.org/10.5281/zenodo.5054434).
Host genome data processing and QC. All the genetic variants returned from the ASA chip were considered for the association analysis. Multiple duplicate loci with the same annotations were merged in SAS 9.4 (SAS Institute Inc.) prioritized by the locus with the highest Illumina GenTrain score. A comprehensive QC on SNPs and insertions/deletions (indels) was then performed with PLINK 1.9 (Fig. S3 at https://doi.org/10.5281/zenodo.5054434). Specifically, markers that met the following conditions were removed from the study: (i) missing genotypes for all samples, (ii) missingness rate across the individuals of .0.02, (iii) Hardy-Weinberg equilibrium (HWE) of P , 1 Â 10 25 , or (iv) minor allele frequency (MAF) of ,0.01. The original chromosomal locations are based on chromosome build GRCh37/hg19. Sporadically missing genotypes of the SNPs across the host genome were imputed with BEAGLE 5.1 (61).
The copy number variations (CNVs) were detected by PennCNV 1.0.5 with the exclusion of loci with unrecognized genotypes (62). Sample QCs were performed with preset criteria including (i) the absolute value of waviness factor of .0.035, (ii) the standard deviation of logR ratio of .0.35, (iii) the number of CNV calls for a sample of .200, and (iv) the drift of B allele frequency of .0.01. The CNV QC process was performed for the remaining 311 participants who passed the sample QC with the filtration criteria including (i) the length of CNV calls of ,50 bp and (ii) the fraction of gaps for merging two neighboring CNV calls of ,0.2.
The QC process for host genetic variants generated 491,912 SNPs, 669 indels, and 6,859 CNVs across host genomes (Fig. S3 at https://doi.org/10.5281/zenodo.5054434). The density plots showed that most cleaned genotyped SNPs and indels were uniformly distributed on multiple chromosomes except for chromosome 6, where dense SNP markers were located in a region between the locations of 28 Mb and 56 Mb (Fig. S6 at https://doi.org/10.5281/zenodo.5054434).
Bacteriome trait preparation and statistical analyses. The overview of the vaginal bacteriome in pregnant women was described by the composition of vaginal community state types (CSTs) clustered by the partitioning around medoids (pam) algorithm based on a Bray-Curtis distance matrix. Pairwise taxonomic correlations at the genus and species levels were estimated by the SparCC algorithm using FastSpar 0.0.10 (63). The correlation coefficients were estimated as the average of 50 inference iterations refined by 10 exclusion iterations with the strength threshold of 0.1, and a sum of 2,000 bootstrap data sets was generated to calculate the empirical P values. Demographic covariates were compared among the CST groups, and different statistical tests were used with respect to their statistical types (Kruskal-Wallis test for continuous variables and Fisher's exact tests for categorical variables). Pairwise comparisons were performed for those variables exhibiting a statistically significant global difference (P , 0.05). The two-sided P values were calculated for all the statistical tests.
The VBTs for the association analyses included the alpha diversity metrics, the beta diversity metrics, the binary and continuous traits (relative abundance [RAB]) of vaginal bacterial taxa, and the MetaCyc pathway abundance. All the VBTs were first processed with all the samples with available 16S amplicon data (n = 359) in order to make the most of the available information. Then, a sample with both host genome and 16S amplicon data (n = 315) was subsetted for the subsequent association analyses. For continuous (the alpha diversity metrics, the RAB of vaginal bacterial taxa, and the MetaCyc pathways) and multivariate (beta diversity) traits, the rank transformation (RNT) to normality was performed for the subsetted data with the rntransform() function within the R GenABEL package before fitting a linear model or a multivariate analysis of variance (MANOVA) (64).
All the alpha/beta diversity metrics were estimated by QIIME 2 using rarefaction data for the original ASV counts. The default alpha diversity metrics generated from the QIIME 2 core-metrics-phylogenetic pipeline were analyzed, including (i) ASV richness, as defined by the number of nonzero ASV counts observed; (ii) the Shannon index, a metric that measures both the number of ASVs and the inequality between ASV abundances; (iii) Faith phylogenetic diversity (Faith PD), a measure that uses phylogenetic distance to calculate diversity; and (iv) Pielou's evenness, an index that measures diversity along with species richness. Beta diversity metrics including (i) Bray-Curtis dissimilarity, (ii) Unweighted UniFrac, and (iii) Weighted UniFrac were exported by the QIIME 2 diversity-lib plugin, and a 2-axis nonmetric multidimensional scaling (NMDS) analysis was performed for the exported beta diversity metrics with the metaMDS() function from the vegan package. The stresses of each NMDS beta diversity were 0.181 (Bray-Curtis dissimilarity), 0.162 (Unweighted UniFrac), and 0.086 (Weighted UniFrac), respectively.
For genetic association analysis with individual taxa, taxa were retained if they met the following criteria: (i) $5% of reads for at least one individual and (ii) $15% of participants with nonzero data (19). Additionally, the higher-level taxa that had an extremely high correlation (Spearman r . 0.985) with corresponding lower-level taxa were excluded from the association analyses to reduce the redundancy of phenotypic information (Fig. S4 at https://doi.org/10.5281/zenodo.5054434). This process generated 40 taxa for the subsequent association analyses with host genetic variants. Considering the unique nature of vaginal 16S rRNA gene data, both continuous and binary bacterial traits were of interest in this study (Fig. S5 at https://doi.org/10.5281/zenodo.5054434). The binary bacterial traits were considered either the presence or dominance of a retained taxon given its distribution across the study samples for 16S amplicon sequencing. Specifically, a taxon with zero counts in more than 5% of study samples was transformed to the presence/absence (P/A) trait and the analytic estimate reflects the association between host genetic variants and their presence; the dominance/nondominance (D/ND) of a given taxon was of interest for those taxa that had a dominant level of RAB (.90%) in more than 5% of study samples. Detailed information on phenotypic taxa and their phylogenetic structure is provided in the supplemental materials (Fig. S5 at https://doi.org/10.5281/zenodo.5054434).
For MetaCyc pathways, we retained those pathways that had (i) $5% of inferred abundances for at least one individual and (ii) $15% of participants with nonzero data. A total of 291 MetaCyc pathways were obtained for the association analysis with microbial function.
Primary association analyses for genetic variants with respect to multiple prepared VBTs were performed using either PLINK 1.9 (for SNPs and indels) or R software with linear regressions fitted for continuous traits (including all the alpha diversity metrics, the RAB of bacterial taxa, and the inferred MetaCyc pathways) and logistic regression fitted for binary traits (including P/A or D/ND of taxa given their distribution in study samples). The associations between host genetic variants and beta diversity metrics were performed using the manova() from the built-in R stat packages with dosage data processed by QCtools 2.0.8. An additive genetic model was assumed for the SNPs. Two types of covariates were considered for adjustment in the regression models: (i) demographic variables showing heterogeneity among the CST groups (global and pairwise comparison), first-time pregnancy status and baseline GA; (ii) variant-specific 10 genomic PCs for controlling the population structure in ancestry. Missing covariates were randomly imputed with the R mice package based on the observed data. Trait-specific associated loci with P of ,1 Â 10 26 are highlighted in the summary plots, and all the loci that reached a suggestive significance level (P , 1 Â 10 25 ) are annotated with the closest genes using ANNOVAR in the supplemental tables (65). Further regional associations were assessed for those SNPs that reached the genome-wide significance level (P , 5 Â 10 28 ) and had the lowest P values within a 6250-kb window by locuszoom and imputed 1000 Genomes Project (Phase 3) reference panel data from BEAGLE 5.1 (http:// bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/). Only the results with genomic inflation factor (GIF) l of ,1.1 were considered a subtle population structure and shown in the study (66,67). The estimates for l were either extracted from the PLINK log files or calculated by the estlambda() function from the R GenABEL package with the method = 'median' option.
The narrow-sense genetic heritability using either SNPs or indels was estimated with covariates corrected for each VBT (including alpha diversity metrics, RAB, P/A and D/ND of bacterial taxa, and MetaCyc pathway abundances) by GCTA 1.93.1 using a genome-based restricted maximum-likelihood (GREML) method with a relatedness cutoff of 0.25 (68). The fixation index for SNPs was also estimated by the snpgdsFst() function from the R SNPRelate package for assessing the population differentiation with respect to each binary bacterial trait (69). Genetic structural correlations with VBTs were conducted using the extracted 10 principal components (PCs) for different types of genetic variants and were analyzed with a linear or logistic regression model or multivariate analysis of variance in line with the type of VBTs. The host genetic PCs were extracted by PLINK 1.9 with pruned genetic variants (SNPs and indels) in linkage disequilibrium (LD) as estimated by r 2 . 0.8 in a 10-kb window, whereas the CNV-PCs were extracted by the psych package in the R program. Finally, joint association analyses for multiple VBTs were also performed with score tests from multivariate linear mixed models (mvLMM) using GEMMA (release 0.98.4) for all the genetic variants where the centered genotype relatedness matrix was estimated. Considering the computational efficiency and the biological implications of VBTs, two sets of multiple VBTs were considered for the joint association analyses: (i) a combination of the RAB of the top three bacterial taxa including Lactobacillus crispatus, Lactobacillus iners, and Gardnerella vaginalis and (ii) all the RAB of species-level taxa (16 taxa). The overall analytic approaches used for the association analyses were summarized in Table  S1 at https://doi.org/10.5281/zenodo.5054434.
Functional mapping and enrichment analyses. Gene prioritization was based on a combination strategy of positional mapping and expression quantitative trait locus (eQTL) mapping in this study (70). For trait-specific associated genetic variants including SNPs and indels with a suggestive significant association (P , 1 Â 10 25 ), the GTEx portal (version 8, dbGaP accession phs000424.v8.p2) was used to determine whether these associated variants were also eQTLs (71). Except for the prostate and testis, all the available tissues and SNP-gene-tissue pair results with a nominal P value less than 0.05 were considered eligible sites for eQTL mapping. Positionally associated genes with respect to each VBT were identified by MAGMA 1.09 (72). The SNPs or indels were annotated by positional mapping and with a generous gene definition that included 35 kb upstream and 10 kb downstream of each gene (73)(74)(75). The gene table was downloaded from the NCBI Entrez Gene database on 18 June 2020. Both the updated gene list and the GTEx data use the GRCh38/hg38 assembly with an incompatible coordinate for gene mapping. Therefore, the original coordinates were converted to the corresponding coordinates in GRCh38/hg38 using the UCSC liftOver (76). Summary statistics of SNPs/indels, irrespective of their significance level, were integrated into the MAGMA gene-based association analysis where both the mean and top models were implemented. In summary, the mapped genes for the subsequent enrichment analyses were considered those that (i) had an (eGene) q of ,0.05 in GTEx eQTL mapping, (ii) had a raw P of ,1 Â 10 25 and q of ,0.05 in MAGMA gene-based association analyses, (iii) were the nearest gene to a significant CNV (P , 1 Â 10 25 and q , 0.05) within a 65-Mb sliding window in the association analyses for individual VBTs, or (iv) were the closest gene of the SNPs/indels that reached the genome-wide significance level within a 6250-kb sliding window. Only the results with l for genomic inflation of ,1.1 were considered for the enrichment analyses.
All the mapped genes with available Entrez gene IDs were submitted to the GENE2FUNC pipeline with all the background genes and default parameters used in the FUMA platform (http://fuma.ctglab .nl/) to obtain an overview of the functionality of mapped genes in the biological context (70). The enrichment analysis for individual VBTs was locally analyzed by geneSCF v1.1 with the utilization of the GO_MF (molecular function), NCG, KEGG, and REACTOME databases (77). Differentially expressed genes (DEGs) against different databases were searched to find whether the obtained genes in relation to specific VBTs have any enrichment in molecular functions, different cancer types, or KEGG or REACTOME biological pathways.
Data availability. As this is not an international collaboration project, the data from human genetic resources are not publicly available due to the restrictions from the Regulation of the People's Republic of China on the Administration of Human Genetic Resources. Other summary data in support of the findings of the study will be available upon request from the indicated corresponding author in compliance with the local laws and regulations (Yingjie Zheng, yjzheng@fudan.edu.cn). The databases used for the taxonomic analyses were deposited in Zenodo (http://doi.org/10.5281/zenodo.4480400).