Cervicovaginal microbiome and natural history of HPV in a longitudinal study

Background Human papillomavirus (HPV) infection is one of the most common sexually transmitted infections. However, only a small percentage of high-risk (HR) HPV infections progress to cervical precancer and cancer. In this study, we investigated the role of the cervicovaginal microbiome (CVM) in the natural history of HR-HPV. Methods This study was nested within the placebo arm of the Costa Rica HPV Vaccine Trial that included women aged 18–25 years of age. Cervical samples from two visits of women with an incident HR-HPV infection (n = 273 women) were used to evaluate the prospective role of the CVM on the natural history of HR-HPV. We focus specifically on infection clearance, persistence, and progression to cervical intraepithelial neoplasia grade 2 and 3 (CIN2+). The CVM was characterized by amplification and sequencing the bacterial 16S V4 rRNA gene region and the fungal ITS1 region using an Illumina MiSeq platform. OTU clustering was performed using QIIME2. Functional groups were imputed using PICRUSt and statistical analyses were performed using R. Results At Visit 1 (V1) abundance of Lactobacillus iners was associated with clearance of incident HR-HPV infections (Linear Discriminant Analysis (LDA)>4.0), whereas V1 Gardnerella was the dominant biomarker for HR-HPV progression (LDA>4.0). At visit 2 (V2), increased microbial Shannon diversity was significantly associated with progression to CIN2+ (p = 0.027). Multivariate mediation analysis revealed that the positive association of V1 Gardnerella with CIN2+ progression was due to the increased cervicovaginal diversity at V2 (p = 0.040). A full multivariate model of key components of the CVM showed significant protective effects via V1 genus Lactobacillus, OR = 0.41 (0.22–0.79), V1 fungal diversity, OR = 0.90 (0.82–1.00) and V1 functional Cell Motility pathway, OR = 0.75 (0.62–0.92), whereas V2 bacterial diversity, OR = 1.19 (1.03–1.38) was shown to be predictive of progression to CIN2+. Conclusion This study demonstrates that features of the cervicovaginal microbiome are associated with HR-HPV progression in a prospective longitudinal cohort. The analyses indicated that the association of Gardnerella and progression to CIN2+ may actually be mediated by subsequent elevation of microbial diversity. Identified features of the microbiome associated with HR-HPV progression may be targets for therapeutic manipulation to prevent CIN2+. Trial registration ClinicalTrials.gov NCT00128661.


Conclusion
This study demonstrates that features of the cervicovaginal microbiome are associated with HR-HPV progression in a prospective longitudinal cohort. The analyses indicated that the association of Gardnerella and progression to CIN2+ may actually be mediated by subsequent elevation of microbial diversity. Identified features of the microbiome associated with HR-HPV progression may be targets for therapeutic manipulation to prevent CIN2+.

Introduction
Persistent cervical infections by high-risk (HR) human papillomavirus (HPV) cause virtually all cervical cancers and their immediate precursor lesions [1]. Most sexually active women have been infected with HPV at some point in their lives and in the vast majority the infection is cleared within a few months [2]. However, a subset of women develop a persistent HPV infection that places them at high risk for cervical precancer and cancer [2][3][4][5]. with funding support from the National Institutes of Health Office of Research on Women's Health. GlaxoSmithKline Biologicals (GSK) provided vaccine and support for aspects of the trial associated with regulatory submission needs of the company under a Clinical Trials Agreement (FDA BB-IND 7920) during the four-year, randomized blinded phase of our study. MU, RDB, CPZ and AG were funded by the National Cancer Institute (NCI) (U01 CA78527). The original Costa Rica vaccine trial was sponsored and funded by the NCI (contract N01-CP-11005). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal's policy and have the following conflicts: John T. Schiller and Douglas R. Lowy report that they are named inventors on US Government-owned HPV vaccine patents that are licensed to GlaxoSmithKline and Merck and for which the National Cancer Institute receives licensing fees. They are entitled to limited royalties as specified by federal law.
of cervical neoplasia [37][38][39]. Additionally, most studies looking at the natural history of HPV and the microbiome are cross-sectional and therefore lack the ability to draw potential causal links.
For the current study, we leveraged longitudinal data and specimens from the placebo arm of a large randomized HPV vaccine trial [40] to examine the impact of the CVM on the natural history of incident HR-HPV infections to study: 1) progression to cervical precancer, 2) viral persistence, and 3) viral clearance.

Subject characteristics and cervicovaginal microbiome features
A total of 273 women with an incident HR-HPV infection were included in the analyses, of whom 266 had a second sample with an average sampling interval of 1.5±0.9 years. Table 1 presents the sample demographic information and summarizes the bacterial and fungal sequencing results after taxonomic assignment for each infection outcome at baseline. There were no significant differences between groups in age (p = 0.13), 16S rRNA gene OTU clustered read counts (p = 0.33), or ITS1 OTU clustered read counts (p = 0.53).  1) within the CVT cohort at V1 and V2. Alpha diversity analysis did not reveal any significant differences at V1 in terms of bacterial Shannon alpha diversity (trend p = 0.52). At V2 there was a significant trend of rising diversity based on the Shannon diversity index (trend p = 0.024).

PLOS PATHOGENS
Microbiome and HR-HPV Progression

Fig 3B
shows results of hierarchical clustering based on the detected fungal species. Candida albicans was the dominant fungal taxa. In terms of fungal clusters, there appears to be a single clade dominated by C. albicans (43/208, 20.7%), one dominated by an unidentified fungal species (12/208, 5.8%) and one that is composed of a diverse fungal community (153/208, 73.6%).

Microbiome and HR-HPV Progression
Analysis of bacterial taxonomic categories of the microbiome associated with persistence/ progression vs. clearance revealed a total of 24 taxa that were significant (LDA>2.0) at V1. G.

PLOS PATHOGENS
Microbiome and HR-HPV Progression vaginalis was the bacterial species with the highest positive correlation to progression (Fig 4A), while L. iners was the most positively associated taxon with clearance based on relative abundance. Amongst V2 samples there were a total of 13 significant taxa identified (three taxa associated with clearance and 10 with progression) (Fig 4B). At V2, bacteria that are commonly associated with bacterial vaginosis, such as Prevotella amnii and Anaerococcus prevotii, were significantly correlated with progression. We used a generalized linear model (GLM) to validate LEfSe biomarkers with adjustments for key covariates (age, smoking status, HPV16 and visit CST) (S1 Table).
Fungal biomarker discovery revealed five fungal taxa associated with HPV progression (S1A Fig). The average of the five combined fungal taxa were detected at rates of 0.59%, 3.55% and 3.76% for the clearance, persistence and progression outcomes, respectively (p = 0.0080, S1B Fig). However in the adjusted GLM analysis, none of the fungal biomarkers were determined to be significant at either visit (S2 Table).
To evaluate whether some common function of bacteria might be associated with HR-HPV outcomes, we used a functional analysis of gene groups imputed from the microbiome data as described in the methods. This analysis revealed 8 functional pathways at KEGG Level 2 significantly associated with progression to CIN2+ (S3 Table). Of the 8 identified pathways, only 2 had a mean read coverage of >1% and were considered for further analysis. The two identified pathways were "Xenobiotics Biodegradation and Metabolism" pathway, which was positively associated with progression (p = 0.0020) and the Cell Motility pathway, which was negatively associated with progression (p = 0.019). These two pathways were significantly correlated (Pearson correlation = -0.80, S2 Fig), and we chose to use Cell Motility in multivariate modeling since it produced a more stable GLM estimate (S3 Table).

Microbiome and HR-HPV natural history: GLM modeling
To investigate the contribution of components of the microbiome over time, we used a GLM in order to adjust for known covariates of CIN2+ progression that may influence the relationship of the CVM and progression to CIN2+ (e.g. age, smoking, HPV16 and CST). We utilized a GLM with a binary outcome (clearance/progression) and the significant microbial features identified in preceding sections as predictors. Specifically, we used the abundance of Gardnerella at V1, the abundance of Lactobacillus at V1, the Observed fungal species diversity at V1, the imputed Cell Motility pathway at V1 and the microbial diversity at V2. The model was adjusted for age, CST, smoking and HPV16 infection status. In addition, the model showed that the V2 microbial diversity was a significant risk factor for CIN2 + progression, OR = 1.17 (1.02-1.29).
Following the multivariate analysis, we wanted to explore the reason for V1 Gardnerella being insignificant despite being the top microbial risk factor in differential abundance in all previous analyses (Fig 4, S3 Fig and S1 Table). Thus, we performed a mediation analysis to determine if V1 Gardnerella was acting by inducing the elevated diversity at V2 (Fig 6A and  6B). This analysis showed that after adjustment for V1 Gardnerella there was a significant association of V2 Shannon diversity with progression to CIN2+, p = 0.04. The Average Direct Effect (ADE) showed that V1 Gardnerella wasn't significant after adjustment for the V2 Shannon diversity, p = 0.23 supporting our mediation hypothesis (Fig 6A).
Power for detection of the effects of each microbiome component was performed using the lmSupport package. Specifically each of the microbiome components (i.e., V1 Gardnerella, V1

Microbiome and HR-HPV Progression
Lactobacillus, V1 Fungal Observed OTUs, V1 Cell Motility, and V2 Shannon diversity) was tested separately to assess power with adjustment for age, CST, HPV16 status and smoking. S4 Table shows the results of the power calculation. Given large effect sizes for V1 Lactobacillus, V1 Fungal Observed OTUs, V1 Cell Motility, and V2 Shannon diversity we calculated that these could be detected with a power >98%, while the V1 Gardnerella had a power of 82% at the 0.05 alpha level.

Discussion
Previous cross-sectional studies analyzing the association between the cervicovaginal microbiome and HPV infection outcomes have consistently identified Gardnerella as a key biomarker associated with CIN2+. This has been reported in studies that utilized both nextgeneration sequencing [39,41] and other methods of microbiome analyses [52,43]. We present data that Gardnerella is in fact associated with CIN2+ lesions, but rather than directly causing the CIN2+ lesion, appears to induce a higher diversity CVM over time as measured at V2,

PLOS PATHOGENS
Microbiome and HR-HPV Progression which in turn mediates the observed effect of Gardnerella in HR-HPV disease progression. Although it is not clear how a state of polymicrobialism in the presence of a persistent HR-HPV infection leads to the development of epithelial dysplasia, recent studies on the microbiome's role in other cancers suggests that the answer lies in the establishment of a microbial microenvironment, perhaps a biofilm. For instance, it has been shown that certain cancers (e.g., colorectal cancer and prostate cancer) have distinct microbial communities at the tumor site that are associated with tumor development [44,45]. In fact, other data from our lab using cervical biopsy tissue samples indicates that there are distinct microbial differences as cervical cancer progresses to advanced FIGO stages (manuscript in preparation). This idea is further supported by data indicating that cervical precancerous lesions that regress, compared to those that progress to cancer, harbor a different immune microenvironment [46]. The local interplay between the microbiome and the local host immune system may be important to understanding the progression of HR-HPV infection to cervical cancer.
The protective microbial biomarkers identified at V1 also suggest an association of the microbiome and host innate and acquired immunity in progression to CIN2+. Specifically, the

PLOS PATHOGENS
Microbiome and HR-HPV Progression protective effect of bacterial Cell Motility may be due to the known phenomena of bacterial flagella activating host immunity [47][48][49]. This local activation may facilitate the innate immune system's ability to clear an active HPV infection. Such stimulation may be critical in HPV control since cervical lesions have been shown to be associated with local immunosuppression through the reduction of factors such as IL-17 [50]. Despite the presence of studies to support this conjecture it should be noted that this type of immune activation needs to be confirmed with rigorous experimental precision.
Gardnerella, as discussed above, continuously emerges as a risk factor for CIN2+ development and progression. Based on our findings and published data, the association may be tied to the ability of Gardnerella to be immunosuppressive in the cervicovaginal region [51]. Whereas, it seems that the presence of commensal bacteria (e.g. Lactobacillus) with the ability to stimulate a local immune response may be contributing factors to the clearance of incident HR-HPV infections. Moreover, the presence of bacteria with immunosuppressive attributes, such as Gardnerella, may promote viral persistence and progression.
Alternatively, there may be other explanations for the observed associations between the cervicovaginal microbiome and HPV's natural history. One possible explanation is a host developed or inherited immune deficiency that is a common cofactor for both cervical cancer progression and microbial diversity. For example, elevation of a particular inflammatory cytokine may be both necessary for successful tumor growth and be a causal factor in increasing vaginal microbial diversity. Such a factor may also explain the consistent identification of Gardnerella, which is commonly identified as a biomarker for increased diversity in the CVM [52] and a risk factor for CIN2+.
We have identified distinct microbial biomarkers that either protect, or promote the progression of a HR-HPV infection to CIN2+ lesions. In the context of what is known about the cervicovaginal microbiome, it may be that these factors act to suppress (in the case of progression) or activate (in the case of clearance) a localized immune response, which in turn influences the natural history of a HR-HPV infection. However, additional prospective studies are needed to establish a causal link between the cervicovaginal microbiome, the immune system and the natural history of HPV. Nevertheless, our results suggest a marker for identifying women with persistent HR-HPV infection at risk for progression by monitoring the presence of Gardnerella and subsequent elevation in microbial diversity. If future studies support a causal role of the cervicovaginal microbiome and disease progression, then it might be possible to manipulate the CVM in a manner to activate a local immune response. It is possible that HPV vaccination might influence the CVM and future research will be needed to evaluate such potential changes.
The strength of this study includes the prospective design and availability of a longitudinal cohort. In addition we used advanced epidemiological methods in a novel way to investigate potential causative factors in cervical intraepithelial neoplasia. Potential weaknesses in this study include the relatively small sample size, homogeneity of the population and the use of only two time points.
In summary, through the use of longitudinal samples from the CVT cohort we investigated and identified key features of the cervicovaginal microbiome potentially associated with progression of HR-HPV infection [28, 53-56] (e.g., Gardnerella and subsequent increase vaginal microbial diversity). Additional studies are required to validate the model proposed in this report.

Clinical trial information
The study of cervicovaginal microbiome and HR-HPV natural history was a nested analysis within the previously reported CVT [57] (clinical trials registration NCT00128661). Written

PLOS PATHOGENS
Microbiome and HR-HPV Progression informed consent was obtained from all participants in CVT. The trial protocol can be obtained from the original trial publication [57].

Ethics statement
All CVT participants were adult women between the ages of 18-25 years. All participants were shown a video describing the study design and were then required to provide written consent to continue participating in the trial. Institutional review board approval was obtained for the informed consent forms at both the NCI and in Costa Rica. Registered with Clinicaltrials.gov NCT00128661

Study population and case definitions
Subjects for this nested study were selected from the placebo arm of a community-based clinical trial of the HPV 16/18 vaccine in Costa Rica that had enrolled women 18 to 25 years of age in 2004-2005 [57]. Women with an incident HR-HPV infection (HPV16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, or 59) were selected for analysis. Incident infections were classified based on outcomes from the well-established model of HR-HPV natural history including outcomes of clearance, persistence and progression [58]. Specifically, outcomes related to the incident HR-HPV infection included women who developed a CIN2 or CIN3 (CIN2+) lesion (progression), women with an infection for 2 or more years with the same type in the absence of a CIN2+ diagnosis (persistent), or women who cleared their incident HR-HPV infections within 1 year (clearance). This analysis included 273 women of whom all had available samples at V1 (first visit positive for the studied HPV type) and 266 who had a second sample at V2 (for persistent, visit that was positive for the same type and at least 305 days after V1; progression, closest visit before diagnosis of CIN2+; clearance, following visit that was negative for that type); all had clinical follow-up data. Seven women, one with clearance, and six with persistence either did not have an available V2 sample or the sample failed in lab testing.

Cervical microbiome characterization
DNA samples [59] were shipped to the Burk Lab on dry ice where the microbiome analysis was performed. DNA had been extracted from cervical brush samples by DDL Diagnostic Laboratory (Voorberg, The Netherlands) where they had been tested for HPV as previously described [60]. Laboratory procedures for the microbiome analyses were performed within a hood (AirClean Systems, Creedmoor, NC) in an isolated room to minimize environmental contamination and water-blank negative controls were used throughout the testing.
For both amplicon experiments, 20 negative controls were randomly mixed amongst samples. Negative controls were created using nuclease-free PCR-grade water (Lonza) as described above instead of extracted DNA.
Barcoded-PCR products were combined for each amplicon type and the DNA fragments (~356 bp for 16S V4 and 400-600 for ITS1) were isolated by gel purification using a QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany). Purified PCR products were quantified using a Qubit 2.0 Fluorometic High Sensitivity dsDNA Assay (Life Technologies, Carlsbad, CA) prior to library construction using a KAPA LTP Library Preparation Kit (Kapa Biosystems, Wilmington, MA). Size integrity of the amplicons was validated with a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). High-throughput amplicon sequencing of 2x300 paired-end reads was conducted on an Illumina MiSeq (Illumina, San Diego, CA).

Bioinformatics
Illumina reads were trimmed to remove bases that had a PHRED score of <25 using prinseqlite V0.0.4 [64]. Quality trimmed reads were then demultiplexed using Novobarcode [65]. Paired-end reads were joined using PANDAseq with default settings [66]. The merged reads were processed through the VSEARCH quality-filtering pipeline [67] to dereplicate the sequences, reduce noise and remove chimeric reads.
PICRUSt was used to impute microbial functional gene content and to collapse identified genes into functional pathways [72]. Pathways that were associated with HR-HPV clearance were identified through the use of a generalized linear model (GLM) based on statistical significance (p<0.05) and relative abundance (1% or higher across all reads).
For fungal ITS1 reads, open-reference OTU picking was performed using VSEARCH [67] and the UNITE database [73]. Taxonomy of representative fungal sequences was assigned using BLAST [74].
Phyloseq [75] was used to import BIOM data for 16S and ITS assays into R separately, followed by the determination of Shannon and Chao1 alpha diversity. For all analyses, bacterial data was subsampled for 2,500 reads. For fungal analyses subsampling was performed at 500 reads. Biomarker discovery analysis was performed using the LEfSe tool [76]. Linear discriminant analysis (LDA) scores greater than 2.0 are considered to be significant [76]. Microbial community state types (CSTs) were assigned on the basis of hierarchical clustering of the 20 most abundant OTUs. Prior to clustering, OTUs were agglomerated at the species level or the

PLOS PATHOGENS
Microbiome and HR-HPV Progression lowest identified taxonomic level. Clustering was performed using the wardD2 algorithm using Euclidian distances.

Statistical analysis
R v3.4.2 [77] was used for statistical analyses. The Kruskal-Wallis test was used to assess significance of continuous data. Linear regression was used to assess the significance of variables associated with the ordered outcome states of a HR-HPV infection (1). Logistic regression was performed using the GLM function and a binomial family generalized linear model in R. For categorical data, dummy variables were created and each individual factor level was tested in a univariate GLM analysis. Models were adjusted for age, smoking, HPV16 and CSTs. Smoking status was determined through a questionnaire and incorporated into the model as ordered categories: never smoked, former smoker and current smoker [40]. Power of GLM results was computed using the lmSupport package [78].
We performed a statistical mediation analysis to test whether V1 Gardnerella (an independent variable) could be acting by inducing a subsequent elevated microbiome diversity at V2 (mediator variable) that influences the outcome of HR-HPV progression using the package mediation [79]. The outcome model we used was binary clearance/progression. Models were adjusted for age, CST, smoking status and HPV16 infection status. In the results we present the mediation effect (average causal mediation effects (ACME)), which is the total effect of V2 Shannon diversity and V1 Gardnerella minus the direct effect of V1 Gardnerella. Additionally, we estimate the direct of effect (presented using the average direct effect (ADE)) of V1 Gardnerella on the binary outcome clearance/progression, minus the effect of the V2 Shannon diversity mediator; the total effect, which is the sum between the indirect effect of the V2 Shannon diversity and the direct effect of the V1 Gardnerella; and the proportion mediated which is the ratio of the ACME and total effect estimates. Supporting information S1