The cervical microbiota of Hispanics living in Puerto Rico is nonoptimal regardless of HPV status

ABSTRACT The cervicovaginal microbiota is influenced by host physiology, immunology, lifestyle, and ethnicity. We hypothesized that there would be differences in the cervicovaginal microbiota among pregnant, nonpregnant, and menopausal women living in Puerto Rico (PR) with and without human papillomavirus (HPV) infection and cervical cancer. We specifically wanted to determine if the microbiota is associated with variations in cervical cytology. A total of 294 women, including reproductive-age nonpregnant (N = 196), pregnant (N = 37), and menopausal (N = 61) women, were enrolled. The cervicovaginal bacteria were characterized by 16S rRNA amplicon sequencing, the HPV was genotyped with SPF10-LiPA, and cervical cytology was quantified. High-risk HPV (HR-HPV, 67.3%) was prevalent, including genotypes not covered by the 9vt HPV vaccine. Cervical lesions (34%) were also common. The cervical microbiota was dominated by Lactobacillus iners. Pregnant women in the second and third trimesters exhibited a decrease in diversity and abundance of microbes associated with bacterial vaginosis. Women in menopause had greater alpha diversity, a greater proportion of facultative and strictly anaerobic bacteria, and higher cervicovaginal pH than premenopausal women. Cervical lesions were associated with greater alpha diversity. However, no significant associations between the microbiota and HPV infection (HR or LR-HPV types) were found. The cervicovaginal microbiota of women living in Puerto Rican were either dominated by L. iners or diverse microbial communities regardless of a woman’s physiological stage. We postulate that the microbiota and the high prevalence of HR-HPV increase the risk of cervical lesions among women living in PR. IMPORTANCE In the enclosed manuscript, we provide the first in-depth characterization of the cervicovaginal microbiota of Hispanic women living in Puerto Rico (PR), using a 16S rRNA approach, and include women of different physiological stages. Surprisingly we found that high-risk HPV was ubiquitous with a prevalence of 67.3%, including types not covered by the 9vt HPV vaccine. We also found highly diverse microbial communities across women groups—with a reduction in pregnant women, but dominated by nonoptimal Lactobacillus iners. Additionally, we found vaginosis-associated bacteria as Dialister spp., Gardnerella spp., Clostridium, or Prevotella among most women. We believe this is a relevant and timely article expanding knowledge on the cervicovaginal microbiome of PR women, where we postulate that these highly diverse communities are conducive to cervical disease.

Tobacco smoking was less common (11.9%) and similarly distributed among groups. Most subjects were self-declared heterosexuals (98.0%) and reported a sexual history of three sexual partners, which was higher in nonpregnant women (median: four sexual partners). The use of antibiotics for the last two months was reported only in 7.1% of women, and 1.0% did not answer. These women were retained in the study since they were uniformly distributed among women's groups. However, this variable was always included in the models as covariable. Pregnant women were mostly in their second pregnancy trimester (54.1%), followed by third (24.3%) and first (21.6%, Table 1).

Microbial community composition
The differences in composition and structure of cervicovaginal microbial communities were evaluated using PERMANOVA models. To avoid collinearity between cervicovaginal pH and women's group, cervicovaginal pH was only included in models when stratified by women groups. We found that cervicovaginal microbiota composition significantly differed among women's group (significant for unweighted, P = 0.003; generalized, P = 0.007; and weighted UniFrac, P < 0.030; R 2 = 0.029-0.032). We observed significant differences in the microbiota between menopausal and pregnant subjects (Padj <0.014) as well as between menopausal and nonpregnant subjects [Padj <0.038, for unweighted, and weighted UniFrac (Fig. 1a), and generalized UniFrac (not shown)]. The presence of cervical lesions or HPV infections did not significantly influence microbial composition among all women (P > 0.050, R 2 <0.01). For the stratified analysis among nonpregnant or menopausal subjects, cervicovaginal pH was significantly different (P < 0.01 among nonpregnant, for all metrics; P = 0.046 only for unweighted UniFrac). Finally, for pregnant women, the microbial composition also differed by pregnancy trimester (for all distances, P < 0.034, R 2 >0.13), where first and second trimesters significantly differed (P < 0.007, R 2 = 0.15, pairwise PERMANOVA, for all distances, Fig. 2a and b). We also did a higher rarefaction of 5,000 reads per sample and found similar results (Fig. S2a through e). To provide a comprehensive analysis, we included data analyzed using both SILVA 138 and the GreenGenes extended database. A comparison of the results is presented in Fig. S67, with similar outcomes observed with both databases. However, we found no significant reduction of alpha diversity with the pregnancy trimesters when using SILVA compared to the significant result using GreenGenes. Apart from this, all diversity metrics were consistent between the two databases. In addition, we observed that GreenGenes extended database proved to be better for resolving Lactobacillus species in the cervicovaginal environment.

Association of bacterial taxa with women's group, cervicovaginal pH, HPV infections, and cervical lesions
We also analyzed the taxa association, including in the model the covariables of interest using Maaslin2 algorithm, which relies on linear models. We found that overall, Prevotella, Parvimonas spp., Sneathia, Clostridium, Dialister, Prevotella buccalis, Porphyromonas uenonis, Megasphaera oral taxon841, Peptoniphilus lacrimalis, Gardnerella vaginalis, Atopobium vaginae, and Prevotella bivia had significantly greater proportions in less acidic cervicovaginal pH (Padj <0.050, Fig. 1b; Table S3). Also, being pregnant or younger was associated with lower Atopobium vaginae ( Fig. 1b; Table S3). A separate analysis was performed for pregnant women by pregnancy trimester and observed that Aerococcus christensenii, Dialister spp., Gardnerella spp., Clostridium, Megasphaera oral taxon841 decreased in proportion consistently in second and third when compared to the first trimester, and Prevotella spp. decreased in the second trimester (Padj >0.046, Fig. 2e; Fig  S2i). In general, anaerobic opportunistic taxa were positively associated with higher pH, and negatively associated with age and pregnancy. We did not detect any bacterial taxa significantly associated with HPV infection or cervical lesions (Padj >0.104, Maaslin2). However, before P-value adjustment, we a Fisher's exact test among total prevalence per cervical lesion group. P value adjustment with "False Discovery Rate" method. b HR = high risk HPV types, LR = low risk HPV types.
Research Article mSystems observed that two taxa, Corynebacterium spp. and Methanobrevibacter spp., showed increased proportions in low-risk HPV infections compared to negative samples (P = 0.006, and P = 0.030, respectively). Also, the two taxa were associated with HR-HPV infections before the P-value adjustment for multiple comparisons: Ureaplasma Research Article mSystems decreased while Clostridium spp. increased in proportion (P = 0.035, and P = 0.043, respectively), and, for cervical lesions Escherichia coli, increased with LGSIL (P = 0.007; Maaslin2). Although marginally, HPV infections and cervical lesions were associated with an increase of the relative abundance of anaerobic-opportunistic taxa, and the signifi cance of these associations may increase with a larger cohort size.

Alpha diversity varies by group and HGSIL
We used linear mixed models (LMM), including the same covariables of interest as we employed for the beta diversity and taxa association analyses. We found that women's group (P < 6.2e-5) and cervicovaginal pH (P < 0.002) were significantly associated with alpha diversity (Fig. S1c). Within women's group, menopausal subjects showed greater Shannon diversity than nonpregnant [Padj <0.010, least-squares means (EMM) for

Research Article mSystems
Shannon and Chao1] and then pregnant women (Padj = 0.027, only for Shannon, EMM) with no differences between nonpregnant and pregnant subjects (Fig. 1c). However, no significant alpha diversity differences were observed between cervical lesion severity levels.
In the stratified analyses for each women's group, we found that for nonpregnant group, alpha diversity was greater at higher cervicovaginal pH or age, as well as in the presence of cervical lesions (HGSIL or LGSIL) compared with healthy subjects [P < 0.050, pairwise Padj <0.050, for Shannon ( Fig. S1b), and Chao1, P = 0.004, Padj <0.003, EMM]. The later comparisons within pregnant (Padj >0.390, Fig. S1c) or menopause women groups were not different (P > 0.450, Fig. S1d). Instead, HPV infection did not significantly influence alpha diversity among all women or nonpregnant women (P > 0.050, LMM).
A similar analysis was performed per trimester of pregnancy, initially showing no significant association (P > 0.050, for Shannon and Chao1). Nonetheless, when the two outliers were top-coded (that means when data points > 2 x standard deviation away from the mean within each trimester were assigned the last highest value), the pregnancy trimester variable showed a significant trend of decreasing late in pregnancy (P < 0.049, for both alpha metrics). Pairwise analysis showed that first trimester was significantly greater than the second and third trimester (P < 0.010, for Shannon diversity, not significant for Chao1 estimator, Fig. 2c and d). Indeed, when considering a 5,000 read rarefaction level we found that only the Chao1 estimator evidenced a significant decrease in richness from the first to the third pregnancy trimesters (P = 0.01, Fig S2f and  g).
In general, alpha diversity analyses showed the greatest values in women with cervical lesions or menopause, and was lowest in pregnant women, particularly in the last trimester of pregnancy.

Women living in PR are dominated by diverse microbial Community State genotypes
We also evaluated the Community State genotypes (CST) distribution among women's groups. CSTs are categories that allow classifying each woman's cervicovaginal microbial composition. In general, they are defined by the dominance of L. crispatus (CST-I), L. gasseri (CST-II), L. iners (CST-III), L. jensenii (CST-V), or nonLactobacillus dominated (CST-IV) (12). A finer classification by CST subgroups has also been proposed and used in this analysis (18). We used Fisher's exact test to test the prevalence of these CSTs among women's groups. As a side note, when analyzing the sequences, only a small percentage (0.61%) of the Lactobacillus ASVs were unable to be identified at the species level, indicating that our data for this analysis are reliable. We observed that most women had Lactobacillus-dominated CSTs (64.9%), prevailing CST-III (39.5%). No significant differen ces were observed among women's groups (P = 0.125). However, when stratifying by women's group, nonpregnant women had a higher prevalence of CST-III (42.3%) followed by CST-IV (34.7%) than any other CST (Padj <0.010, Fig. 1e). Instead, pregnant women showed higher CST-III (40.5%) and CST-I (32.4%, Padj <0.010, Fig. 1e), with a higher prevalence of the protective L. crispatus (Fig. 1d). Nonpregnant and menopause women had mostly L. iners and CSTIII and CST-IV (Fig. 1d). Indeed, for menopausal women, CST-IV (45.9%) followed by CST-III (29.5%) were the most prevalent (Padj <0.010, Fig. 1e). Among the CST-IV, the subtype CST IV-C was strongly higher in menopausal women (24.6%) than in the other women's groups (<4.1%, Padj = 0.002, Table 4). This profile is characterized by a low relative abundance of Lactobacillus spp., G. vaginalis, A. vaginae, and BVAB1. Instead, a diverse array of facultative and strictly anaerobic bacteria exists. Among the different CST IV-C subtypes, menopausal women showed a higher prevalence of CST IV-C0 (11.5%) which is an even microbial profile with a moderate amount of Prevotella, and CST IV-C1 (9.8%) which is a Streptococcus-dominated profile (Table 4; Fig. S1d).
In terms of cervical lesions (presence and types), analyses performed with all women by CST had no significant association (P < 0.393, Fisher's exact test, Table 2), nor when stratifying by pregnant, nonpregnant, or menopausal women ( Fig. 3; Fig. S4). However, the prevalence of CSTs was found to be significantly different among cervical lesions within nonpregnant women (positive/negative, P = 0.033, Table S4), where the presence of cervical lesion showed lower CST-I prevalence compared to its absence, although not significantly (8.7% vs 22.3%, Fisher's exact test, Padj >0.137, data not shown). Analyses of subCSTs within nonpregnant women suggest that women negative for lesion showed higher prevalence of Lactobacillus-dominant groups, specially CST I-A (L. crispatus-domi nated) and CST-III-A (L. iners-dominated), although not significantly (Fig. S4). Instead, women with HGSIL showed higher prevalence of CST IV-B (characterized by having a high to the moderate relative abundance of G. vaginalis with some Atopobium vaginae). However, this difference was also not statistically significant. CST-IV is characterized by markers of cervicovaginal dysbiosis (Fig. S4).
We found no significant differences (P < 0.05, Fisher's exact test, data not shown) in CSTs with HPV infections (presence and types) among all women. However, when considering only nonpregnant women, we did find significant differences between HR-HPV/Only_LR-HPV/negative (P = 0.039, Fisher's exact test, Table S5), which shows that women with LGSIL had a particularly low prevalence of CST-IV compared to HGSIL and negative samples (11.8% vs 35.5% and 43%, respectively, Table S5). In general, being menopausal, having cervical lesions, and being infected with exclusively HR-HPV infections are traits associated with diverse microbial profiles. We also aimed to compare CSTs from nonpregnant Puerto Rican women with those of U.S. Hispanic women (selfdeclared) reported by Ravel et al. (12). We found no significant differences between Hispanics from the U.S. vs Puerto Rican Hispanics (P > 0.132, Fisher's exact test, Table S6). For example, for CST IV, 38.1% and 34.7%, respectively (P = 0.881).

Most abundant L. iners-ASVs were not associated with cervical lesions or HPV infection
We also tested if the top15 most abundant ASVs classified as L. iners were associated with cervical lesions or their types. We did not find significance for any L. iners-ASVs by cervical lesion or HPV infection, not even by the severity of lesion or HPV risk genotype (P > 0.050, Fisher's exact test, data not shown).

Longitudinal changes of the cervicovaginal microbiota in a subset of women
To explore whether microbial profiles in this population vary over time, we analyzed two-time points longitudinal data. Thirty-nine women (nonpregnant, n = 29; pregnant, n = 1; menopausal, n = 9) who had had a second sampling between 4 to 16 mo after their first visit were analyzed. Approximately half of women (51.3%, 20/39) kept the same CST between visits, and only 12.8% (5/39) changed from a Lactobacillus-dominated (all CST-III) to a diverse profile (IV). The same proportion, 12.8% (5/39), changed from a diverse (CST-IV) to Lactobacillus-dominated profile (CST-I and II). A total of 35.9% (14/39) of women changed between different combinations of Lactobacillus-dominated CSTs (CST-I, II, and III), with no particular trends and with no differences among women's groups ( Fig. S5a and b). Cervical lesion phenotype was maintained between the two visits for 56.4% (22/39) of women. Meanwhile, HPV status coincided in 67.6% (25/37) of women from the first to the second visit ( Fig. S5c and d). Changes in CST based on cervical lesions or HPV infection were also explored; however, sample sizes were too small to perform a reliable comparison. Transitions from Lactobacillus-dominated to nonLactobacillus-dominated profiles or vise-versa did not show any significant trend for cervical lesions or HPV status (P < 0.225, Fisher's exact test, Table S7 and S8).

DISCUSSION
This study revealed not only an abundance and ubiquity of CST III and IV of the cervicovaginal bacterial microbiota but also a different distribution of oncogenic HPVs in this Puerto Rican cohort. The most frequent genotypes were HPV51, 16, 33, 52, and 56, mostly in HGSIL lesions, with few to no HPV18. This seems to indicate that both the quadrivalent HPV vaccine (4vHPV) (targeting HPV genotypes 6, 11, 16, and 18) and the 9-valent HPV vaccine may have limited effectivity as they lack coverage of HPV51 and 56, HPV genotypes that are prevalent in our cohort. The role of multiple HR-HPV infections, specifically HPV51 and 56, in the aggravation of cervical lesions has not been studied in PR and deserves further attention. We found that women in menopause had the lowest vaccination prevalence (4.9%), and this is expected as vaccination in younger women has better promotion and adherence. In PR, HPV vaccination started at time of HPV vaccine approval in USA back in 2006. The vaccination focusses at the time on the younger population, and many were vaccinated after being sexually active. Thus, the low rate of vaccination in this cohort. The prevalence of HPV in this study was similarly high (74.8%; HR-HPV, 67.3%) when compared to that reported in a previous study in the Puerto Rican population by our team (84%; HR-HPV, 79%) (16), as well as in Hispanic and Amerindian populations (77%; HR-HPV, 65.9%) (19). The prevalence of HPV in PR was higher than that observed in other Hispanic populations, including Costa Ricans (carcinogenic, 35%) (20), Europeans (~20%) (21), Japanese (~20%) (22), and Nigerian (carcinogenic, 40%) (23), using the same SPF10/LiPA25 HPV detection method. The high prevalence of HR-HPV genotypes in our cohort is however not generalizable to the population of PR, as clinics cover general gynecology and colposcopy, that is, some patients are referred for colposcopy, hence the higher prevalence of HR-HPVs. In the past, a population-based study in PR showed that the prevalence of HR-HPV genotypes was lower; however, they included only HPV types 6, 11, 16, and 18 and used a different detection method (24).
Although PR is a Caribbean country, it differs from other Afro-Caribbean women from Barbados, where the cervicovaginal microbial community is mostly dominated by nonLactobacillus genera (72%), including Prevotella (25). This is possible due to differences in African ancestry between Barbados and PR, as well as disparities in their lifestyle. The cervicovaginal microbiota of women living in PR was dominated by L. iners (39.5%) or diverse nonLactobacillus communities (35.4%). This is consistent with previous work from our lab for Puerto Rican population (16), U.S. Hispanics (12), and Venezuelans (26).
Both L. iners-dominant profiles and high diversity profiles (i.e., CSTs III and IV in ~77% of nonpregnant women and in menopausal women) have been associated, in previous studies, with bacterial vaginosis (27), STI infections (28,29), and adverse pregnancy outcomes (30,31). Although L. iners is a Lactobacillus species, its metabolic and ecological characteristics make it the least protective cervicovaginal Lactobacillus. Unlike other health-associated Lactobacillus, L. iners can cohabit in a diverse environment and act as a vaginal symbiont, or as an opportunistic pathogen if surrounded by other anaerobic pathogens (32). Its H 2 O 2 production is null or very limited (33,34). It also does not promote a sufficiently acidic environment that inhibits undesirable bacteria (35). Additionally, it generates only L-lactic acid instead of the protective D-lactic acid isomer. This promotes inflammation (36) and facilitates invasion of the cervix by opportunistic bacteria (37). Diverse cervical microbial profiles have also been associated with high levels of proinflammatory genital cytokines. This damages the endocervix's columnar epithelial barrier, exposing it to the colonization of other microbial or pathogenic agents (38,39). Hence, the existence of these two microbial profiles in the Puerto Rican population may promote higher susceptibility toward disorders in the cervicovaginal area and inflammation conditions (40).
Diverse microbial profiles (CST-IV) in our nonpregnant population were mostly represented by CST-IV-B (30.1%), like Hispanic women in the United States (18). This profile is considered of higher risk for bacterial vaginosis since it includes taxa such as

Research Article mSystems
Gardnerella vaginalis (41) that are associated with an increased risk of viral infections including HPV (28,(42)(43)(44)(45). Menopausal women, besides having this profile (21.3%), also had a high prevalence of CST-IV-C (24.6%), with some women being high in Prevotella (CST-IV-C0) and others in Streptococcus (CST-IV-C1), as previously seen in other meno pause studies in U.S. population (18,46). Although CST-IV-C is a nonLactobacillus-domi nated profile, it has been associated with the lowest Nugent scores (Gram stain scoring system to diagnose bacterial vaginosis, where lower values suggest no B.V.). However, CST-IV-C high in Prevotella has been shown to induce a significantly greater proinflam matory signaling (38) and is considered unfavorable for menopausal women, particularly those with cervical lesions or infected by HPV. During menopause, significant hormonal changes occur. The decline in estrogen level can lead to a decrease in the cervical mucus (vaginal dryness) and glycogen deposition, the primary source of Lactobacillus (47). Low Lactobacillus induces high cervicovaginal pH leading to an environment more susceptible to infections (48). Contrary to menopause, pregnant women showed a decrease in microbial alpha diversity (particularly later in pregnancy) and an increase in Lactobacillus-dominated profiles (CST-III and CST-I), a phenomenon already reported by other groups (49). This conversion of high to low alpha diversity is accentuated in pregnant Hispanic and Black populations (50), two racial groups with high cervicovaginal microbial diversity. Variation in cervicovaginal community composition during pregnancy may also correspond with hormonal changes. The increase in estrogen levels promotes higher glycogen deposition in cervicovaginal epithelial cells (51). Lactobacillus spp. utilize glycogen as the primary carbohydrate source to produce lactic acid, contributing to the protective effect of a low cervicovaginal pH (lysing bacteria other than Lactobacillus). Additionally, the dominance of Lactobacillus produces significant amounts of bacteriocins, contributing to coloniza tion resistance (11,(52)(53)(54). This defensive strategy explains the reduction of Gardnerella spp., Atopobium vaginae, and Prevotella spp. observed in our study, which has also been reported in other pregnant Hispanics (50). Moreover, our study identified a reduction in other BV-associated taxa, such as Aerococcus christensenii, Megasphaera spp., Dialister spp., and Clostridium. The lower abundance of Atopobium vaginae observed in pregnant women, as compared to women in menopause or the nonpregnant group, can be associated with the lower pH found during pregnancy which reduces the colonization of these BV-associated species. To note, in our pregnant cohort, no women had the CST-III-B type, only CST-III-A (40.5%). CST-III-A is associated with a lower Nugent score and cervicovaginal pH (18), indicating greater protection against infection.
Previous studies have shown that greater microbial diversity in cervicovaginal samples and a community profile dominated by L. inersare both associated with an increased risk of HPV infection and persistence (46,(55)(56)(57). However, in our study, very little or no association was detected. Differences with other studies may be explained by ethnicity or the HPV detection method. However, this lack of a clear HPV-microbial association has been reported in other Hispanic populations with similar profiles and using the same HPV detection method (16,26). This might be due to the high prevalence of HPV and a higher dominance of diverse and L. iners-dominated profiles in our cohort, as compared to Caucasian women with L. crispatus-dominated profiles. Additionally, differences between transient and persistent HPV infection may influence this associ ation since only persistent infections have been associated with immune response and cervicovaginal dysbiosis (58). Chronic cervical inflammation caused particularly by diverse bacterial populations, in addition to HPV persistence, creates a favorable context for neoplasia and later cancer development (59,60). However, longitudinal methods to define HPV persistence, especially using HPV typing methods with higher analytical sensitivity, are expensive and time-consuming, and therefore studies are scarce, but very relevant.
We found greater Shannon diversity in HGSIL and LGSIL and a lower prevalence of L. crispatus-dominated profiles (higher of CST III, IV-B, and IV-C) when compared to healthy subjects, which supports previous findings (16, 61, 62). Additionally, women with LGSIL tend to have coinfections with both HPV genotypes (HR and LR-HPV), while those with HGSIL showed a greater prevalence of infections with exclusive HR-HPV types. The coexistence of HR and LR-HPV risk genotypes was similarly prevalent in HR-HPV or LR-HPV exclusive infections, a finding that has also been observed in Hispanic-Vene zuelan populations (19). This association has previously been linked to sexual partner turnover (63) and a lower risk of cervical cancer development (64)(65)(66).
We also explored the temporal stability between cervical lesions, HPV status, and cervicovaginal microbiota in a small subset of women. Changes observed between positive and negative lesions or HPV infections may be due to treatment (loop electro surgical excision procedure, LEEP, or endocervical curettage, ECC) or natural clearance. For the cervicovaginal microbiota, we found no specific trends between CST changes over time nor concerning HPV infection and cervical lesion. However, previous studies have demonstrated that CST-IV-B (with high to moderate G. vaginalis and Atopobium vaginae) and CST-III are more likely to transition to other CSTs (32,67) or that CST-IV-B often transitioned to CST-III (68). Limitation of this analysis includes the small sample size and the long period between study visits for these patients. CST transition can be very rapid, for example, switching for only a single day (67). Furthermore, our community has a deficit in L. crispatus-dominated profiles, which restricts the examination of transitional trends from, or toward, this health-related microbial pattern.
In conclusion, women living in PR, regardless of their physiological stage, predom inantly maintained cervicovaginal communities dominated by L. iners (CST III) or a diverse microbial profile (e.g., CSTs IV-B and IV-C). Even though these profiles have been previously associated with reduced cervicovaginal health, they were not found to be significantly linked to cervical lesion in our cohort. However, the high prevalence of HR-HPV and the presence of these less stable, nonoptimal bacterial profiles may be associated with the greater risk of cervical cancer observed in our populations. As a result, Puerto Rican women may be considered a susceptible population for cervical lesion development.

Study design and participant sample collection
This cross-sectional study of adult women coming for gynecology and colposcopy evaluation at the UPR and San Juan City clinics (San Juan Puerto Rico, Metropolitan area), who did not meet the exclusion criteria, were recruited to participate in this study. Exclusion criteria included: (1) history of regular urinary incontinence; (2) treatment for or suspicion of prior toxic shock syndrome; (3) candidiasis; (4) active urinary tract infections; (5) active STIs; and (6) cervicovaginal irritation at the time of screening. Only asymptomatic women were included. Clinicians do not test for STIs in the asymptomatic population as these are not routine. We selected these exclusion criteria based on the indications from the Manual of Procedures of the Human Microbiota Project protocol (69). However, we included women who took antibiotics during the last 2 mo, since they were few (7.1%) and were distributed similarly among the women groups.
The date range in which human subjects' data/samples were collected was between November 2017 and February 2020. The study and its procedures were conducted from March 2020 to September 2022. The study was approved by the Ethics Committees of the UPR-Medical Sciences Campus IRB (Protocol ref. 1050114/June 2014), San Juan City Hospital and has biosafety approval protocol # 94620. All subjects were informed (both verbally and in writing) of the sampling procedure, risks, and benefits of the study, gave written informed consent and signed HIPAA forms, following the Declaration of Helsinki. All staff involved in the project were certified by: CITI RCR, Social and Behavioral Research Best Practices for Clinical Research, HIPAA certifications and the NIH training on Protection of Human Subjects. Patients completed a metadata questionnaire with demographic characteristics (age, place of birth, employment, educational attainment), assessment of sexual risk (including the age of onset, current sexual partners), health history, antibiotic use, vitamins, and BMI.
A total of 367 samples were collected from which 337 were successfully sequenced and after rarefaction 4 samples were removed for low sequence count, for a total of 333 samples which corresponds to 294 women. Samples were collected using sterile Catch-All Specimen Collection Swabs (Epicentre Biotechnologies, Madison, WI), and placed in MoBio bead tubes with buffer (MoBio PowerSoil kit, MoBio, Carlsbad, CA) (69). Swabs were then swirled for ~20 s in 750 µL of MoBio buffer in the labeled specimen collection tube. For the sampling, a speculum was inserted for access and visualization of the cervix. The sterile swab was placed in the posterior fornix (cervix) and rotated along the lumen with a circular motion and swabs were immediately placed in the MoBio tubes. The area of the posterior fornix was used as a sample site to improve the detection of HPV and as a reliable indicator of the overall cervicovaginal environment, which has been demonstrated in previous research (70). Besides swabs, we collected approximately 10 mL of cervical lavages by injecting PCR-grade sterile water into the vaginal canal for future studies. Collected lavage was transferred to a clean 15 mL collection tube and pH was measured with hydrion wide range pH paper strips. All samples were coded and placed in ice up to 4 h. Subsequently, the samples were transported to the laboratory and stored at −80°C until nucleic acid extraction and PCRs were performed at a single laboratory (FGV) to minimize processing variation.

DNA extraction and 16s rRNA sequencing
Genomic DNA extraction was done on posterior fornix (cervical) swabs using Qiagen Power Soil Kit (QIAGEN LLC, Germantown Road, Maryland, USA). No human DNA sequence depletion or enrichment of microbial or viral DNA was performed. A detailed description of the optimized extraction protocol is available in a previously published study (16). In short, Powerbead tubes were homogenized for 10 min at 3,200 rpm, using the Vortex-Genie 2, G560 (Scientific Industries, Inc. NY). We combined 100 µL of solution C2 and 100 µL of solution C3 and vortexed for 5 s for cell lysis, and the elution solution was 100 µL sterile PCR water, which was warmed to 55°C, and to increase DNA yield, allowed to remain on the filter for 5 min before the final centrifugation step. DNA concentration was measured using the Qubit dsDNA H.S. (High Sensitivity) Assay (Waltham, Massachusetts, U.S.) (ranging from 5 to 100 ng/µL). Genomic DNA from samples was shipped to an outsourced sequencing facility with an average genomic DNA concentration of 10-30 ng/µL. No human DNA sequence depletion or enrichment of microbial DNA was performed.
The DNA obtained from cervical samples was normalized to 4 nM during 16S rRNA gene library preparation. Universal bacterial primers, 515F (5'GTGCCAGCMGCCGCGG TAA3') and 806R (5'GGACTACHVGGGTWTCTAAT3') as in the Earth Microbiota Project (http://www.earthmicrobiota.org/emp-standard-protocols/16s/), were used to amplify the hypervariable V4 region of the 16S ribosomal RNA marker gene (~291 bp) (71). The methodology applied for this process was identical to the one implemented in previous reports (16,(72)(73)(74). Amplicons were quantified using PicoGreen (Invitrogen) and a plate reader (Infinite 200 PRO, Tecan). Once quantified, volumes of each of the products were pooled into a single tube so that each amplicon is represented in equimolar amounts. This pool was then cleaned up using AMPure XP Beads (Beckman Coulter), and quantified using a fluorometer (Qubit, Invitrogen). Customized sequencing was outsourced at Argonne National Laboratory (Illinois, USA) using Illumina MiSeq with the 2 × 250 bp paired-end sequencing kit. Amplicons were sequenced with the MiSeq Illumina platform in three different runs. The facility adds a negative control, and nothing is reported if they don't produce sequence reads above 500 total reads. Internal positive controls are analyzed and aligned to the sequencer in real-time.
The reads obtained from the 16S-rRNA sequencing and corresponding metadata were uploaded in QIITA (75) study ID 12871 (https://qiita.ucsd.edu/study/description/ 12871). In addition, the resulting raw sequences were made available at the European Nucleotide Archive Project (ENA) under the access number ERP136546.

HPV genotyping and cytologic diagnoses
The kit used to complete the HPV genotyping consists of a highly sensitive shortpolymerase chain reaction-fragment assay (Labo Biomedical Products, Rijswijk, The Netherlands, licensed Innogenetics technology). This process takes advantage of SPF10 primers to amplify a 65 bp fragment contained at the L1 open reading frame of HPV genotypes, followed by a reverse-hybridization reaction to determine which HPV genotypes are present in the sample by comparing the results to standardized controls provided by the kit. This assay facilitates the identification of the following mucosal HPV genotypes: 6,11,16,18,31,33,34,35,39,40,42,43,44,45,51,52,53,54,56,58,59,66,68,73,70, and 74 and classified as either high-risk (HR-HPV) or low-risk (LR-HPV). This methodology is explained thoroughly in a previously published study (16). HPV genotype groups categorized samples as (1) only HR-HPV (exclusively HR types), (2) only LR-HPV exclusively low-risk HPVs, (3) both (if a sample contained both HR and low-risk HPV genotypes, (4) HPV negatives or (5) missing (no HPV genotyping done on these samples) ( Table 1).
Data from the medical records and questionnaires were obtained at the time of patient visit and reviewed for results of cervical cytology and pathology reports. Data regarding abnormal cervical cytology were classified according to Bethesda system (76) in atypical squamous cells of undetermined significance (ASCUS), atypical squamous cells, cannot exclude high-grade lesion (ASCH-H), low-grade squamous intraepithelial lesion (LGSIL), high-grade intraepithelial lesion (HGSIL), or squamous cell carcinoma (SCC). Missing values in the different categories corresponded to the fact that no information for a given category was retrieved for that specific participant.

Sequence processing
Three Illumina demultiplexed paired-end sequence runs were uploaded and run independently in QIIME2. DADA2 algorithm was used for truncating sequences based on quality, merging sequences, amplicon error correction, chimera identification, and obtaining representative sequences (features). Sequence length truncation was performed separately for each run, just before the drop of the base qualities. The maximum length truncation was 15 bases (reverse sequence). Taxonomy classification was performed using a Naïve Bayes classifier implemented in the q2featureclassifier plugin (77) against a custom GreenGenes modified database that contains sequences from GreenGenes, the Human Oral Microbiota Database, and cervicovaginal vaginal reference sequences from the Human Microbiome Database (78) used previously (79)(80)(81). Feature tables for the three runs were finally merged. Samples showed and average of 26,308 sequences/sample [min. 187 sequences, max. 80,253 sequences]. Rarefaction was run at 1,773 sequences per sample to include most of the samples, and four samples with a lower sequence number were eliminated (Table S1). All analyses were run with the rarefied table at the species level. An additional rarefaction of 5,000 sequences per sample was also run and feature tables were analyzed in parallel. For this threshold, a total of 16 samples were lost. Results between the two rarefaction cutoffs were similar. We also re-analyzed our data using the SILVA 138 database following the exact same parameters and metrics (see Fig. S6 and S7), also with similar results. The effect of the "Sequencing run" was also included in all models of the analysis to account for the possible batch variability; however, no significant differences were observed.

Statistical analysis
Population description was performed comparing the three main groups: nonpregnant, pregnant, and menopausal women. Categorical variables were compared with Fisher's exact test and continuous variables with the Kruskal-Wallis test.
Cytology smears were performed for cervical lesion diagnosis; however, some women required biopsy. Therefore, a consensus result was used for the analyses. Only one sample was diagnosed as squamous cell carcinoma (SCC) and it was included as HGSIL to facilitate the analysis. Additionally, diagnosis with ASCUS (5.4%, 14/258) was top-coded as LGSIL if they were HPV positive and negative when HPV negative. Then, the variable "Type of cervical lesions" only included LGSIL, HGSIL, and negative samples.
Comparison among HPV genotype categories first explored the distribution of women with the presences of any HR-HPV, as well as those with only LR-HPV geno types.Subsequently, the distribution of women infected exclusively with HR-HPV or LR-HPV or mixed infections, indicating the presences of HR-and LR-HPV genotypes (called "both" on Tables 1 and 2), was examined. Another comparison was also per formed for mixed infections, although this one includes "Number of HR-HPV genotypes" or "Number of LR-HPV genotypes, " referring to the actual number of different HPV genotypes detected in a particular sample.

Beta diversity
Beta diversity was evaluated using unweighted, generalized, or weighted UniFrac distances. Collinearity among variables was checked with Spearman's correlation using "cor.test" R function before including variables to the model. These included the following: distance ~Group (nonpregnant/pregnant/menopause) +Type of cervical lesion (LGSIL/HGSIL/negative) +HPV genotype (only HR-HPV/ only LR-HPV/ both HPV types/ negative) +BMI (numerical variable) +Antibiotics last 2 mo (yes/no) +Sequencing run (Run A/B/C).
The trimester of pregnancy was also included only among pregnant women. For the stratified analysis for each of the women groups, the Age variable was also included. The variable "Sequencing run" was set as a random variable to account for the variability among runs.
Variables included in the model were analyzed using nonparametric permutational multivariate analysis of variance [PERMANOVA (82)] with "adonis2" function from "vegan" R package (83). Permutation was set to 1000 and seed was set to 711. PERMANOVA model allows comparing variance between groups to the variance within groups (spatial location differences). "Run" was set as a random variable using "setBlocks" function from the "permute" R package (84).
Pairwise analyses for the significant categorical variables were performed with "pairwise.adonis2" function from the "pairwiseAdonis" R package (Martinez Arbizu 2017).

Taxa association with study variables
Microbial taxa and their association with all the above variables were assessed using Maaslin2's linear model in R (85), where we also set "Sequencing run" as random variable. Maaslin2 is a bioinformatic tool that helps to identify taxa-variables associations. The output table (Table S3) shows the specific taxa associated with a particular variable. The table also provides the model coefficient value (effect size), the P-value, and the P-value adjusted for false discovery rate ("fdr"), among other parameters. If the variable is categorical, a reference is needed; negative samples were always set as the reference for our analysis. A positive coefficient indicates a positive association (increase in a taxon abundance) with a particular variable. A heatmap is also generated where the plus sign in the red squares suggests an increase in the taxa abundance for a specific variable, while a negative sign in the blue squares indicates a decrease of abundance, always based on the reference (Fig. 1E). The model included women's group, HPV types, type of cervical lesions, and significant variables in the beta diversity analysis (pH, age).

Alpha diversity
Microbial alpha diversity was measured with the Shannon index calculated using "diversity" function from "vegan" R package (83) and the Chao1 estimator calculated using the "apply" function from the "OTUtable" library (86). Shannon index provides information about richness and evenness of a microbial community by sample, while Chao1 estimator provides only the microbial richness estimated in an exhaustive sampling scenario. Linear mixedeffect model (LMM), "lmer" function from "lmerTest" R package (87) were fitted, setting "Sequencing run" as the random variable to account for the differences in sequencing facilities. Fixed (explanatory) variables included were the same as those used in the beta diversity analysis. To obtain the bestfitted model we utilized the "step" function from "lmerTest" R package (87).
The trimester of pregnancy was also included only among pregnant women. For the stratified analysis for each of the women groups, "age" variable was also included. Collinearity among variables was checked with Spearman's correlation using "cor.test" R function, before including variables to the model. We also checked the normality of the linear regression residuals and did logarithmic transformation to alpha diversity metrics to reach residual normal distribution when necessary. Models were run for between 164 and 184 samples due to missing samples. R library "emmeans" (88) was used to run pairwise analyses for mixed model.

Community State types
Each women's sample was classified into CST following the protocol of Valencia program (18) in Python 3 (https://github.com/ravel-lab/VALENCIA). Input data had to be formatted using local scripts. To compare the proportion of CST in women in PR from this study, with a similar Hispanic population but living in the USA, metadata from a very known study of cervicovaginal microbiota were downloaded and compared (12). Prevalence of CSTs was also compared among women groups and among cervical lesions. A 95% confidence interval for these values was calculated using "BinomCI" function from "DescTools" R package (89).

Longitudinal changes of the cervicovaginal microbiota in a subset of women
To explore whether microbial profiles in this population vary over time, we analyzed two-time points longitudinal data. Thirty-nine women who had had a second sampling between 4 to 16 mo after their first visit were analyzed. We first used the CST classifica tion to group women with CSTs dominated by Lactobacillus (CST-I, II, III, and V) or diverse (CST-IV). We then built a table to observe the coincidences, first among type of cervical lesions and then among the positive and the negative HPV infection. Finally, we ran a Fisher's exact test for each table (Table S7 and S8).
As supplementary materials, we provide the updated STORMS checklist (The Organization and Reporting of Microbiome Studies), which we updated based on our study's criteria to provide complete reporting of our study and ensure reproducibility.

ACKNOWLEDGMENTS
We extend our heartfelt gratitude to the volunteer students who took part in this study. Additionally, we acknowledge Rafael López Martinez building scripts for data analysis.

DATA AVAILABILITY
Data are available. The reads obtained from the 16S-rRNA sequencing and its corre sponding metadata were uploaded in QIITA [76] study ID 12871 (https://qiita.ucsd.edu/ study/description/12871). In addition, the resulting raw sequences were made available at the European Nucleotide Archive Project (ENA) under the access number ERP136546 (https://www.ebi.ac.uk/ena/browser/view/PRJEB51893?show=reads). Supplement data are also available.