Statistical Models for Vaginal Microflora: Identifying Women at Risk for Group B Streptococcus Colonization as a Test of Concept

Objective: The purpose of this study was to formulate a statistical model that relates human microflora to probabilities for vaginal colonization by group B Streptococcus (GBS). Methods: Longitudinal observations of total bacterial concentrations at various times during the menstrual cycle were obtained from overtly healthy, non-pregnant, menarcheal women. During each menstrual period and at appropriate intermenstrual times, the duplicate swab technique was used to sample the vaginal vault to obtain microbiologic samples. Women were identified as being colonized with GBS if their samples contained faculative gram-positive cocci. The method of generalized estimating equation (GEE) was used to model the longitudinal data set. Results: Concentrations of Corynebacterium sp., Streptococcus spp., and total anaerobic bacteria were found to be risk factors for GBS colonization. The sensitivity of the predictive model is 84% and the specificity is 79%. Conclusions: Although vaginal cultures for GBS are routinely performed to detect colonization, the statistical model described identifies associated risk factors which may be important determinants for GBS colonization.

roup B Streptococcus (GBS) is a gram-positive organism that forms diplococci or chains, groups within Lancefield group B, and produces a polysaccharide capsule. Antigenic variation of the capsular polysaccharide forms the basis for the serotyping system within group B. GBS are considered normal vaginal microflora under most circumstances. The role of the normal microflora for both health and disease has received increased attention in recent years as the impact of various interventions on the microflora has been identified. Microbiologic surveillance of human ecosystems often includes qualitative and, on occasion, quantitative bacteriologic culture for organisms of interest. Such methods are most often akinetic and do not identify relationships among microorganisms which may be important to the colonization of a particular environment. For diseases caused by members of the normal microflora, understanding such relationships may be significant in preventing both colonization and subsequent infection. During the past several years, we have developed statistical methods that help identify possible relationships between various members of the normal microflora. Several relationships identified in this manner have subsequently been elucidated in greater detail. The real test of these methods is, however, application of the predictive model to an actual in vivo situation.
Clinicians focused attention on Lanceficld GBS during the 1970s. GBS is a major cause of neonatal sepsis and meningitis in the United States with an incidence rate of 18/10,000 live births and a mortality rate of 6%. 1,z GBS, or Streptococcus agalactiae, can be isolated from cultures of the rectum, vagina, cervix, urethra, skin, and pharynx. As is the case for most streptococci, GBS is considered normal microflora, even though it can be an invasive pathogen in a variety of clinical settings. For neonates, however, exposure to GBS during birth can lead to development of disease. Most strategies for decreasing the risk of neonatal GBS disease involve the use of antibiotic prophylaxis. This method requires prenatal screenings for detection of GBS, which is problematic due to transient or intermittent colonization by this organism, which may be missed by ordinary culture methods.
In this study, the GBS colonization status of healthy, asymptomatic women was used as a way to test predictive modeling methods using a statistical model developed to identify women at risk for carriage of GBS. A statistical model was fitted to an in vivo data set describing the vaginal ecosystem. The model defines a statistical relationship among microorganisms which allows for the identification of women who have an increased relative risk of carriage of GBS.

MATERIALS AND METHODS Data Set
Over the past 10 years a large data set containing both quantitative and qualitative microbiologic information was assembled from in vivo studies describing the healthy vaginal environment. 3 For the recovery of anaerobic bacteria, the following media were used: prercduced Brucella base agar with 5% sheep blood containing heroin and vitamin K1, each at 10 mg/1 (BMB); BMB with 150 mg neomycin sulfate/l; and prereduced Brucclla base agar with 5% laked sheep blood, 100 mg kanamycin/1, 7.5 mg vancomycin/1, and hemin and vitamin K1, each at 10 mg/1. The media used for the recovery of facultative organisms were 5% sheep blood in tryptic soy agar, mannitol salt agar, Chocolate agar, and MacConkey agar. All colony types were isolated and identified by established criteria as described previously. "-6 Women were identified as being colonized with GBS if their samples contained facultative gram-positive cocci that were catalase negative, did not grow on bile esculin azide agar medium, and produced the CAMP factor. Verification of GBS isolation was performed using long chain fatty acid analysis and Observations with missing information on Corynebacterium sp., Streptococcus spp. and total anaerobes were excluded from the analysis. Accordingly, the data set used in the model is comprised of 536 repeated measurements taken from 58 women. The data are not composed of uniquely independent samples, and conventional logistic regression methods cannot be applied to the data. Therefore, we used the method for a generalized estimating equation (GEE), which generalizes the logistic regression by taking into account the within-person dependence between sampling 7 as part of the analysis (SAS macro software, Johns Hopkins University, Baltimore, MD).
For each woman, let Y(t) denote the dichotomous response variable for GBS colonization at the th visit. Y(t) indicates that a woman has GBS colonization at the th visit, and Y(t) 0 indicates a negative result. Corresponding to the outcome variable Y(t) is a set of independent variables Xl(t), Xz(t),..., Xk(t) containing different bacterial concentrations which are used as risk factors in the equation to predict the occurrence of GBS. Let P(t) denote the probability that Y(t) at the t th visit.
Define the logit transformation as logit P(t) In [P(t)/(1 P(t))]. where e denotes random sampling error. Stepwise forward regressions of the GEE models were then conducted to evaluate possible risk factors.

Regression Model
The results of correlation analysis indicated that Corynebacterium sp. (CORYNE) has a correlation -0.27 (P 0.0001) with Streptococcus spp. Although not significantly correlated with Streptococcus spp., the total count for obligate anaerobes (TOTALAN) had a correlation of 0.137 (P 0.001) with Conynebacterium sp., suggesting a linkage between these two risk factors. The statistics for the bacterial counts of these variables are summarized in Table 1.
Of the 536 observations included in this study, 117 (22%) were identified by culture as GBS positive. These culture results were used as the outcome variable in the analysis. Applying the GEE method with interchangeable correlation for between-visits association and a stepwise regression method, we obtain the following model for the data set: logit (P(t)) -4.653 + 0.713 STREP + 0.054 CORYNE 0.156 TOTALAN.
If we assume that the logit of the probability that a women is colonized by GBS on the t TM visit is a linear function of the various bacterial concentrations obtained during sampling, then the GEE regression model takes the form logit P(t)= B o + B * Xl(t)+ B e Xz(t) +... + B k Xk(t)+ e With the above model, a threshold value Pc as the cutoff probability can be selected for predicting GBS colonization. A predictive decision rule was made as follows: A woman is predicted to be colonized by GBS if her predictive probability value, P(t), at the t h visit is greater than the threshold value, otherwise she is not predicted to be colonized by GBS. We consider the threshold value Pc 0.24, which is greater than the observed probability (0.22) of GBS colonization.
As a measure of accuracy of the GEE model for predicting GBS colonization we computed the sensitivity and specificity for the model. Sensitivity is the proportion of GBS culture positive observations that the GEE model correctly predicts to be GBS positive. For the threshold value Pc 0.24, 84% of the 117 GBS culture positive observations contained in the data set were successfully predicted by the model as having GBS colonization. Specificity is a measure of accuracy for predicting non-GBS observations. It is the proportion of GB$ negative observations that the GEE model predicts to be GBS negative. Of the 419 GBS culture negative observations, 79% of them were predicted by the model as not having GBS colonizati6n.

DISCUSSION
The bacterial populations residing within the human vaginal vault are members of a complex ecosystem. The mechanisms that control bacterial populations in this environment are as yet to be understood. The concept of statistical modeling applied to an ecosystem provides a new tool for microbiological analysis. 8,9 The model described in this research is important in that it defines a statistical relationship among the complex interactions of the various microbial species that make up the vaginal microflora such that a linear predictive equation is obtained for identifying possible GBS cases. Of the many different microbial species present among the vaginal microflora, the obtained model is able to give a reasonable prediction on the occurrence of GBS based upon the Streptococcus spp., Corynebacterium sp., and total anaerobic bacteria counts. The predictive accuracy for this statistical model is likely understated because the microbiologic cultures used to assemble the data base were not designed to specifically detect GBS and it is possible that low levels of colonization were not detected. Although it is easier to simply culture a given subject for GBS than to perform quantitative cultures for modeling purposes, the relationships between microbial populations cannot be evaluated from such clinical studies. Because the model resulting from this research is in a simple form with only three risk factors (CORYNE, STREP, and TOTALAN), it could be easily adapted for clinical studies to identify women prone to GBS colonization, even if GBS is not isolated. Indeed, the microbiologic data used to establish this model were not set up to specifically isolate GBS, but rather to identify the dominant vaginal microbiologic components. Moreover, the three identified factors for predicting GBS may be important for future study and evaluation.
The analysis of sensitivity and specificity of the model demonstrates the goodness-of-fit of the model by comparing observed GBS cases to predicted results according to a range of cutoff probability threshold values. In addition, using the coefficients of the obtained model, we can approximate the relative risk of a woman having GBS infection with respect to the increases or decreases of the bacterial counts for STREP, CORYNE, or TOTALAN. Although the data presented represent a single application, the statistical model may provide a technique for future preventive epidemiological studies. Understanding the relationship between microbial populations may aid in understanding vaginal colonization risk factors for women.
We believe that the application of statistical modeling strategies, such as those described, represents an important new approach to understanding the underlying mechanisms responsible for microbial colonization. While GBS is only a convenient example of this strategy, it may be possible to apply similar methods to a variety of other components of the vaginal microflora. Such studies are ongoing at the present time.