Staphylococcus aureus CC30 Lineage and Absence of sed,j,r-Harboring Plasmid Predict Embolism in Infective Endocarditis

Staphylococcus aureus induces severe infective endocarditis (IE) where embolic complications are a major cause of death. Risk factors for embolism have been reported such as a younger age or larger IE vegetations, while methicillin resistance conferred by the mecA gene appeared as a protective factor. It is unclear, however, whether embolism is influenced by other S. aureus characteristics such as clonal complex (CC) or virulence pattern. We examined clinical and microbiological predictors of embolism in a prospective multicentric cohort of 98 French patients with monomicrobial S. aureus IE. The genomic contents of causative isolates were characterized using DNA array. To preserve statistical power, genotypic predictors were restricted to CC, secreted virulence factors and virulence regulators. Multivariate regularized logistic regression identified three independent predictors of embolism. Patients at higher risk were younger than the cohort median age of 62.5 y (adjusted odds ratio [OR] 0.14; 95% confidence interval [CI] 0.05–0.36). S. aureus characteristics predicting embolism were a CC30 genetic background (adjusted OR 9.734; 95% CI 1.53–192.8) and the absence of pIB485-like plasmid-borne enterotoxin-encoding genes sed, sej, and ser (sedjr; adjusted OR 0.07; 95% CI 0.004–0.457). CC30 S. aureus has been repeatedly reported to exhibit enhanced fitness in bloodstream infections, which might impact its ability to cause embolism. sedjr-encoded enterotoxins, whose superantigenic activity is unlikely to protect against embolism, possibly acted as a proxy to others genes of the pIB485-like plasmid found in genetically unrelated isolates from mostly embolism-free patients. mecA did not independently predict embolism but was strongly associated with sedjr. This mecA-sedjr association might have driven previous reports of a negative association of mecA and embolism. Collectively, our results suggest that the influence of S. aureus genotypic features on the risk of embolism may be stronger than previously suspected and independent of clinical risk factors.

Staphylococcus aureus induces severe infective endocarditis (IE) where embolic complications are a major cause of death. Risk factors for embolism have been reported such as a younger age or larger IE vegetations, while methicillin resistance conferred by the mecA gene appeared as a protective factor. It is unclear, however, whether embolism is influenced by other S. aureus characteristics such as clonal complex (CC) or virulence pattern. We examined clinical and microbiological predictors of embolism in a prospective multicentric cohort of 98 French patients with monomicrobial S. aureus IE. The genomic contents of causative isolates were characterized using DNA array. To preserve statistical power, genotypic predictors were restricted to CC, secreted virulence factors and virulence regulators. Multivariate regularized logistic regression identified three independent predictors of embolism. Patients at higher risk were younger than the cohort median age of 62.5 y (adjusted odds ratio [OR] 0.14; 95% confidence interval [CI] 0.05-0.36). S. aureus characteristics predicting embolism were a CC30 genetic background (adjusted OR 9.734; 95% CI 1. 53-192.8) and the absence of pIB485-like plasmid-borne enterotoxin-encoding genes sed, sej, and ser (sedjr; adjusted OR 0.07; 95% CI 0.004-0.457). CC30 S. aureus has been repeatedly reported to exhibit enhanced fitness in bloodstream infections, which might impact its ability to cause embolism. sedjr-encoded enterotoxins, whose superantigenic activity is unlikely to protect against embolism, possibly acted as a proxy to others genes of the pIB485-like plasmid found in genetically unrelated isolates from mostly embolism-free patients. mecA did not independently predict embolism but was strongly associated with sedjr. This

INTRODUCTION
Infective endocarditis (IE) is a severe disease with ∼20% inhospital mortality (Duval et al., 2012). Staphylococcus aureus, the major causative agent of IE (Selton-Suty et al., 2012), induces severe forms that cause patient death twice as frequently as other microorganisms Thuny et al., 2005Thuny et al., , 2007. Frequent causes of IE-related mortality include congestive heart failure, multiorgan failure or embolism which occurs in 13-51% of cases (Millaire et al., 1997;Vilacosta et al., 2002;Durante Mangoni et al., 2003;Thuny et al., 2007). IE-related embolism results from the release in bloodstream of fragments from vegetations, which are masses of thrombotic, infected tissues attached to heart valves. The known risk factors for IE-related embolism are mostly linked to patient or disease characteristics such as injection drug use (IDU) or a larger vegetation size on echocardiography (Fowler et al., 2005;Thuny et al., 2007). However, S. aureus IE per se has also been repeatedly reported as an independent risk factor (Vilacosta et al., 2002;Thuny et al., 2005;Hubert et al., 2013;Rizzi et al., 2014), suggesting that microbiological, species-specific factors might be involved in embolus development and release (Durante Mangoni et al., 2003).
Indeed, most S. aureus isolates secrete toxins that might influence the course of IE and the risk of embolism (Vandenesch et al., 2012). Determining whether specific virulence factors are involved in this threatening complication would help predict embolism and guide preventive strategies in IE patients. Experimental studies have pinpointed the potential role in vegetation development of several staphylococcal factors, including superantigens which have been proposed to facilitate local bacterial growth by promoting immune dysfunction and chronic inflammation (Stach et al., 2016), or cytotoxic exotoxins able to kill immune cells recruited at the site of infection (Salgado-Pabón et al., 2014;Dupieux et al., 2015). These experimental findings, however, have not been confirmed in clinical studies. No epidemiological study of an association between embolism and virulence factors has been conducted so far. Strikingly, the only S. aureus characteristic shown to influence embolism risk is methicillin resistance (MRSA) encoded by the mecA gene, which emerged as a protective factor (Thuny et al., 2005;Hsu and Lin, 2007;Hill et al., 2008). However, the negative association of mecA with embolism has no clear biological explanation and mecA-positive isolates have been involved in severe IE with embolism in several case reports (Zheng et al., 2015). This current knowledge gap is possibly linked to the difficulty of conducting large-scale cohort studies of S. aureus IE combining clinical and microbiological molecular data. IE is a rare disease: while large cohorts have been successfully analyzed (Fowler et al., 2005), the necessity to collect and characterize S. aureus isolates has likely limited the sample size of genetic association studies. Moreover, the modest statistical power achievable in such small-size cohorts would counteract the benefits of current high-resolution molecular techniques such as whole-genome sequencing. It is doubtful that even cohort sizes in the hundreds would allow to reliably detect predictors of embolism among thousands of potential genetic markers (Hong and Park, 2012), especially, since these yet-unknown predictors should be controlled for confusion with known clinical risk factors.
These observations prompted us to examine potential associations of S. aureus characteristics with embolism during IE using a carefully selected set of candidate genes rather than a genome-wide approach. We identified, from a previous prospective population-based IE cohort in which embolic events were well-documented (Selton-Suty et al., 2012), 98 patients with S. aureus IE whose causative isolate could be analyzed for their genetic background and for the presence of 26 alleles involved in virulence or virulence regulation. This approach allowed testing associations with sufficient statistical power in models controlling for clinical confounders, at the expense of waiving the discovery of unexpected markers in other parts of S. aureus genome.

Patient Population and Collection of Data
Patients with IE were identified from a 1-year prospective population-based observational study conducted in 2008 in seven French regions comprising one-third of the adult French population (Selton-Suty et al., 2012). The study was approved by the institutional review board of Besançon (Comité de Protection des Personnes). In accordance with French regulations, patients were informed of the study but they did not have to provide written consent. In this cohort, 497 patients had definite IE according to Duke-Li criteria (Li et al., 2000). One hundred and thirty-two patients (26.6%) had monomicrobial S. aureus IE. Patients in whom S. aureus had been isolated upon IE diagnosis, but whose isolate could not be recovered from the frozen strain collection were excluded (n = 34; 25.6%). Ninety-eight patients and S. aureus isolates were included in the final analysis. The clinical characteristics of the 34 excluded patients were compared to those in the final cohort to detect possible biases related to strain availability.
The study endpoint was the occurrence of at least one embolic event from the onset of IE symptoms to hospital discharge. Embolism based on clinical and/or imaging diagnosis was reported among a pre-established list of complications (Selton-Suty et al., 2012). Because the date of occurrence of embolism was not consistently reported, we did not consider its delay from IE onset. The other collected variables are summarized in Table 1.

S. aureus Typing by DNA Array Technique
Given the small sample size (n = 98) and the need to limit the number of model covariates to avoid type I error inflation while preserving statistical power, the genetic analysis was purposely focused on biologically relevant genetic features selected a priori for their potential role in IE. A subset of non-constant alleles involved in virulence and expression regulation (n = 22 and 4, respectively, Supplementary Table 2), was identified from the 332 target sequences probed by the StaphyType DNA array (Alere Technologies GmbH, Jena, Germany). Targets related to species identification, molecular typing, surface-expressed proteins and resistance determinants were excluded, with the exception of the methicillin resistance-conferring mecA gene, considered as a control covariate. DNA extraction and hybridization were performed as described elsewhere (Tristan et al., 2012). To examine the influence of S. aureus genetic background on the occurrence of embolism, isolates were assigned to multilocus sequence types (STs) and clonal complexes (CCs) by comparing whole-array hybridization profiles to previously MLST-typed reference strains in a dedicated database as described elsewhere (Monecke et al., 2008).
The genotypic relatedness of isolates was visualized by means of a minimum spanning tree (MSTree) based on StaphyType hybridization profiles using the complete set of targets. Briefly, an MSTree is a connected undirected graph selected to minimize the sum of marker differences over all links between genotypes (Rasigade et al., 2017). The MSTree was computed using R software version 3.2.1 (the R Foundation for Statistical Computing, Vienna, Austria) and igraph package version 1.1.2 (Csardi and Nepusz, 2006), and visualized using Gephi software version 0.9.2 (Bastian et al., 2009).

Statistical Analysis
Our aim was to detect associations of S. aureus genetic background and virulence-related markers with embolism, taking into account clinical risk factors and preserving statistical power. Models were based on logistic regression taking embolism as the outcome. Univariate analysis was used first to examine each microbiological and clinical potential predictor individually. Predictors with a Wald P-value < 0.1 were considered candidates for inclusion in multivariate logistic regression analysis. To account for the substantial number of variables relative to sample size and the high degree of collinearity between microbiological variables, the logistic model was regularized using a ridge regression procedure with automatic regularization parameter optimization as described elsewhere (Cule et al., 2011;Cule and De Iorio, 2013), using R software and ridge package version 2.2. Predictors with logistic coefficient t-test P-value < 0.1 were considered for inclusion in the final model. The final set of predictors was determined using analysis of deviance where predictors were sequentially tested for inclusion by increasing order of their P-value in the ridge regression model.
Predictors were discarded if their inclusion did not achieve 0.05 significance in likelihood ratio test, then a final, non-regularized logistic regression model was constructed. The effect of each predictor was reported as an adjusted odds-ratio with 95% confidence interval. The classification accuracy of the final model was assessed using C-statistic (area under receiver operating characteristics curve) with 95% confidence interval based on Delong's method (Sun and Xu, 2014) and computed using pROC R package.
Additionally, we verified that the exclusion of non-virulencerelated alleles from regression analyses did not led to discarding important predictors of embolism. To this aim, we conducted random forest analyses on the complete microarray data combined with relevant clinical predictors identified in multivariate logistic regression. We used the Boruta feature selection algorithm (Kursa and Rudnicki, 2010), which was repeatedly demonstrated to exhibit maximal performance for selecting predictors in random forest models (Kursa, 2014;Kumar and Shaikh, 2017). In the Boruta algorithm, permuted copies of each predictor (the so-called shadow variables) are added to the dataset before estimating the random forest model. After several repetitions of this step, the distribution of the Z-score (importance measure) of each predictor is compared to the distribution of the maximal Z-scores among the shadow variables. Predictors whose Z-scores are significantly higher or lesser than the maximal shadow variable Z-score are classified as important or unimportant, respectively, while other predictors importance is left undetermined. Of 332 alleles, 118 were non-constant and included in analysis. The Boruta technique was performed with R package Boruta using 500 repetitions of random forests of 1,000 trees.

Patient Characteristics
The clinical characteristics of the 34 patients with S. aureus IE whose bacterial strain was not available did not differ from those of the 98 patients included in the final analysis (Supplementary Table 1). Patients had a median age of 62.5y, 75 of them (76.5%) were male and 42 (42.9%) had at least one comorbidity (Table 1). Embolism was diagnosed in 54 patients (55.1%), of which 31 (57.4%) had symptomatic embolism while the remaining diagnoses were based on imaging tests. Embolism was mostly cerebral (n = 25, 46.3%) and pulmonary (n = 20, 37.0%). Other locations including spleen, kidney and peripheral arteries were found in 29 patients (53.7%). Eighteen patients (33.3%) had multiple embolic events in several locations. In-hospital death occurred in 21 patients with embolism (38.9%). This mortality rate was comparable to that of patients without embolism (47.7%, P = 0.38).

Clinical and Bacterial Predictors of Embolism
In univariate analysis, the clinical factors possibly associated (P < 0.1) with embolism were, by increasing P-value, younger age (less than the 62.5 y cohort median), mode of IE acquisition including IDU, tricuspid IE location, CRP level at admission, history of a procedure at-risk of IE in the previous 3 months, diabetes, Charlson comorbidity index, septic shock and vegetation size (Table 1). S. aureus methicillin resistance based on mecA gene presence was negatively associated with embolism ( Table 2), consistent with previous findings (Thuny et al., 2005;Hsu and Lin, 2007;Hill et al., 2008). Associations between genetic background and embolism were moderately significant, with an enrichment of CC30 and agr III (to which CC30 belongs) groups in embolism-associated isolates. Among the five virulence-related predictors with Wald P-value < 0.1, four were superantigens while one predictor, namely lukE, was a component of the LukED cytotoxic exotoxin known to provoke host cell lysis by targeting chemokine receptors (Tam et al., 2016). One of the superantigenrelated genetic markers was the tst1 gene encoding toxic shock syndrome toxin 1 (TSST-1), previously suspected to aggravate vegetation growth in a rabbit IE model (Stach et al., 2016). The three other superantigen determinants were sed, sej, and ser, which encode enterotoxins involved in food poisoning and are harbored by the same pIB485-like penicillinase plasmid (Bayles and Iandolo, 1989;Shearer et al., 2011). Because sed, sej, and ser were always harbored together in the same isolates, they were hereafter referred to as a single predictor sedjr. The sedjr markers were negatively associated with embolism, with an odds ratio of 0.05 (95% CI, 0.01-0.41; Table 2). Interestingly, sedjr and mecA were strongly associated (P = 1.16 × 10 −6 , Fisher's exact test) and both were only found in CC8 and CC5 isolates (Figure 1).
To better delineate the clinical and microbiological predictors of embolism, variables with Wald P < 0.1 (n = 15) in univariate analysis were jointly analyzed using regularized regression ( Table 3). In this multivariate model, the previously suspected predictor mecA (Hill et al., 2008) was not independently associated with embolism (P = 0.19). Only four factor coefficients had P < 0.1, namely (by increasing P-value order): sedjr, age < 62.5 y, IDU and CC30 genetic background. Upon applying analysis of deviance to these four predictors, only IDU failed to reach significance (P > 0.05) and was discarded ( Table 4). The final, non-regularized model included sedjr, age < 62.5 y and CC30 as predictors, with a C-statistic of 0.81 (95% CI, 0.73-0.89). Noteworthy, the inclusion of mecA in this model did not bring additional information, indicating that mecA significance in univariate analysis was due to its association with sedjr. In our cohort, thus, the major independent predictors of embolism in IE patients were a younger age and a causative isolate belonging to CC30 and/or not harboring sedjr, while mecA had no measurable independent influence. The accumulation of predictive factors steadily increased the risk of embolism, from 0% in older patients Frontiers in Cellular and Infection Microbiology | www.frontiersin.org FIGURE 1 | Genotypic relationships and characteristics of 98 S. aureus isolates from endocarditis patients with and without embolism. Shown is a minimum spanning tree where connections between isolates are selected as to minimize the total number of genotypic differences in the tree, based on DNA arrays targeting 332 genes and alleles. Colored marks are used to indicate embolism-associated isolates and those harboring sedjr, a set of plasmid-borne enterotoxin-coding genes negatively associated with embolism in the cohort. Gray marks denote isolates belonging to rare clonal complexes (CCs). MRSA, methicillin-resistant S. aureus.
with sedjr-positive, non-CC30 S. aureus, to 100% in younger patients with sedjr-negative, CC30 S. aureus (Figure 2). Random forest analyses with Boruta feature selection confirmed the results of regression analyses. In a random forest model using 118 microbiological predictors along with the 2 major clinical predictors identified by regularized logistic regression (Table 3), the first four predictors by decreasing importance were age, IDU, sedjr and CC30 (see Supplementary Table 3 for a ranking of all predictors). The Boruta feature selection algorithm classified the same 4 predictors as significantly important (Figure 3), while all other predictors including mecA were classified as either unimportant or of undetermined importance.

DISCUSSION
This combined analysis of clinical and microbiological characteristics in 98 patients with S. aureus IE identified two bacterial genetic factors, namely, the absence of plasmidborne genes encoding enterotoxins D, J, and R and the CC30 genetic background, as major predictors of the risk of embolism independent of host-related factors. Regularization parameter Lambda = 0.026, selected automatically based on 8 principal components (Cule and De Iorio, 2013). The clinical predictors of embolism included in multivariate analysis, such as a younger age, IDU, Charlson comorbidity index or vegetation size, were comparable to those repeatedly identified in cohorts from several countries from 2003 to 2014 (Durante Mangoni et al., 2003;Fowler et al., 2005;Thuny et al., 2005Thuny et al., , 2007Rizzi et al., 2014). This suggests that our cohort, although investigated in 2008-2009, remained representative in terms of embolism risk factors. It is intriguing that vegetation size, which was previously reported as an independent risk FIGURE 2 | Stratification of the risk of embolism in S. aureus endocarditis patients based on independent clinical and microbiological predictors. Three predictors of embolism were identified using regularized logistic regression followed by analysis of deviance. Shown are the proportions of patients with embolism in each risk group with 95% binomial confidence interval. Mark size is proportional to group sample size.
factor of embolism (Thuny et al., 2005), was discarded from the multivariate model. Similarly, the superantigen-encoding tst1 gene was not significantly associated with embolism in spite of an OR estimate of 6.4 in univariate analysis ( Table 2). We suspect that the limited sample size (n = 98) did not afford sufficient statistical power, resulting in the detection of only the strongest effects. Thus, we do not rule out a potential role of tst1, vegetation size or other clinical predictors such as IDU in our setting.
Our observation that a CC30 genetic background was an independent risk factor of embolism (adjusted OR, 9.7) brings further clinical support to the hypothesis of a peculiar pathogenic potential of CC30 in hematogenous infections (Messina et al., 2016). CC30 was overrepresented in IE compared to skin infections in the US, Europe and Middle-East (Fowler et al., 2007;Nienaber et al., 2011) but not in Australia (Nethercott et al., 2013). CC30 prevalence was moderate (∼10%) in French cohorts of IE patients including the present cohort (Tristan et al., 2012;Bouchiat et al., 2015), however it is unknown whether this prevalence differs from that of CC30 in other infections. In a recent study of IE patients from Spain, CC30 was associated with persistent bacteremia (≥5 days) and negatively associated with death (Fernández-Hidalgo et al., 2017). Although the mechanism underlying the association of CC30 with IE is unclear, it has been suggested that several characteristics of CC30 contribute to a lower immune response, allowing in turn CC30 isolates to reach the bloodstream and survive within it (Spaulding et al., 2012). These immune escape mechanisms include an overall lower toxic and pro-inflammatory potential compared to other CCs, due to hampered alpha-toxin production (McGavin et al., 2012) and the expression of a phenol-soluble modulin variant with attenuated toxic and chemotactic activities (Cheung et al., 2014). Interestingly, a lower toxic potential was shown to strongly enhance S. aureus fitness in bloodstream (Laabei et al., 2015). Whether and how the lower toxic potential of CC30 influences embolus release remains an open question.
Patients infected with strains harboring the plasmid-borne sedjr genes had a reduced risk of developing embolism, independent of clinical factors and other S. aureus characteristics. The protective effect size was large (adjusted OR, 0.07), however the biological interpretation of sedjr involvement is not straightforward. Enterotoxins D, J, and R have been involved in food poisoning (Zhang et al., 1998). Interestingly, sed was previously found to belong to a group of enterotoxin genes (with tst1, sea, see and sei) overrepresented in IE isolates compared to soft tissue infection isolates in an international cohort (Nienaber et al., 2011), suggesting a possible involvement in IE development; however, embolism was not considered in the study. The superantigenic activity of sedjr-encoded toxins is unlikely to protect from embolism because other superantigens, such as TSST-1 or egc-harbored enterotoxins, have been shown to promote vegetation growth in animal models (Stach et al., 2016), which in turn favors embolism (Thuny et al., 2005). Moreover, other superantigen-encoding markers were widely distributed in our collection (Supplementary Table 2) but not associated with embolism. Collectively, these observations indicate that the negative association of sedjr with embolism is either due to a specific and yet-unknown function of enterotoxins D, J, or R, or to a spurious association with a causative gene or allele co-localized with sedjr on the same pIB485-like plasmid (Shearer et al., 2011). We favor the second interpretation for two reasons. First, sedjr-positive isolates belonged to two different CCs (Figure 1), suggesting that genetic features common to these isolates were plasmidborne rather than chromosomic. Second, the sedjr-harboring pIB485-like plasmids contain numerous additional genes of unknown function (Shearer et al., 2011), including a marR family transcriptional regulator protein that could play a role in virulence by regulating chromosome-encoded genes (Alekshun and Levy, 1999;McCallum et al., 2004). Future research, possibly using isogenic variants harboring the pIB485-like plasmid in an experimental endocarditis model, should seek to determine whether plasmid-borne determinants influence the risk of IErelated embolism.
To conclude, genomic features of S. aureus, namely a CC30 genetic background and the absence of the sedjr-harboring plasmid present in some CC5 and CC8 strains, predicted embolism in S. aureus endocarditis. We also hypothesize that the mecA determinant of methicillin resistance, previously reported to be negatively associated with embolism (Hill et al., 2008), could have been so due to its association with sedjr markers. Collectively, our results suggest that the influence of S. aureus genotypic features on the risk of embolism may be stronger than previously suspected and independent of clinical risk factors. FIGURE 3 | Identification of predictors of embolism using random forests. Shown are the distributions of random forest importance measures for 118 microbiological predictors and 2 clinical covariates. Importance measures were obtained from 1,000-tree random forests and Boruta feature selection algorithm with 500 repetitions. Predictors whose importance was significantly higher than the maximum importance of shadow variables were classified as significantly important. For readability, labels are shown only for important predictors and shadow variables.
Determining the biological underpinnings of how S. aureus lineage-and plasmid-specific genes influence embolism would help to develop prognostic tools and preventive strategies.