Phase Variation in HMW1A Controls a Phenotypic Switch in Haemophilus influenzae Associated with Pathoadaptation during Persistent Infection

ABSTRACT Genetic variants arising from within-patient evolution shed light on bacterial adaptation during chronic infection. Contingency loci generate high levels of genetic variation in bacterial genomes, enabling adaptation to the stringent selective pressures exerted by the host. A significant gap in our understanding of phase-variable contingency loci is the extent of their contribution to natural infections. The human-adapted pathogen nontypeable Haemophilus influenzae (NTHi) causes persistent infections, which contribute to underlying disease progression. The phase-variable high-molecular-weight (HMW) adhesins located on the NTHi surface mediate adherence to respiratory epithelial cells and, depending on the allelic variant, can also confer high epithelial invasiveness or hyperinvasion. In this study, we characterize the dynamics of HMW-mediated hyperinvasion in living cells and identify a specific HMW binding domain shared by hyperinvasive NTHi isolates of distinct pathological origins. Moreover, we observed that HMW expression decreased over time by using a longitudinal set of persistent NTHi strains collected from chronic obstructive pulmonary disease (COPD) patients, resulting from increased numbers of simple-sequence repeats (SSRs) downstream of the functional P2hmw1A promoter, which is the one primarily driving HMW expression. Notably, the increased SSR numbers at the hmw1 promoter region also control a phenotypic switch toward lower bacterial intracellular invasion and higher biofilm formation, likely conferring adaptive advantages during chronic airway infection by NTHi. Overall, we reveal novel molecular mechanisms of NTHi pathoadaptation based on within-patient lifestyle switching controlled by phase variation.

H uman-adapted pathogens take up long-term residence in or on the human body as part of the normal human microbiome but can also cause acute and chronic infections depending on conditions. This is the case for chronic obstructive pulmonary disease (COPD), a major leading cause of death globally, whose primary risk factor is smoking, characterized by irreversible airflow obstruction presenting emphysema, fibrosis, neutrophil airway infiltration, mucus hypersecretion, inflammation, and longterm lower airway colonization by pathogens (1)(2)(3). Observing the within-host evolution of pathogens over the course of long-term infection allows us to witness the emergence of adaptations that enable pathogens to persist, which could indicate the molecular mechanisms that drive chronic disease (4)(5)(6)(7).
Like many other bacteria, NTHi strains show high genomic and phenotypic diversity, and individuals tend to be colonized by many strains over time (3,8,25). Whole-genome sequencing of longitudinally collected NTHi strains from COPD patients revealed rapid genetic changes in clonally related strains due to natural transformation events (bringing recombination tracts from relatives into the chromosome), point mutations and chromosomal rearrangements, and phase-variable changes in simple-sequence repeats (SSRs) at contingency loci, which produce reversible and high-frequency genetic changes in specific genomic loci (26). Recurrent genomic changes affecting specific NTHi genes indicate the underlying selection pressures that shape pathogen adaptation within the COPD lung environment (4,(27)(28)(29).
We recently found evidence that the high-molecular-weight (HMW) adhesin HMW1A is implicated in NTHi pathoadaptation, identifying recurrent genetic changes arising in COPD patients over time (4). In a previous study using natural transformation to map NTHi genes involved in intracellular invasion (30), we had found that specific alleles of the hmw1A gene conferred novel high airway epithelial cell invasiveness, i.e., a hyperinvasive phenotype: natural transformation of the whole hmw1 locus from the clinical isolate 86-028NP into low-invasion laboratory isolate RdKW20 (or of the hmw1A allele into a moderately invasive clinical isolate, Hi375, substituting for its native hmw2A allele) resulted in recombinants with ;500to 1,000-fold increased invasion into airway epithelial cells.
In this study, we tackled open questions regarding HMW-mediated NTHi hyperinvasion of airway epithelial cells, including examining in vivo bacterial entry dynamics and intracellular fate after hyperinvasion. Moreover, we hypothesized that HMW variants conferring hyperinvasion might contribute to successful chronic infection or be modulated by phase variation over time. Therefore, we examined the association of HMWmediated hyperinvasion with specific HMW allelic variants in the previously sequenced longitudinal set of COPD strains (4), including its regulation by phase variation in the hmw1 promoter. Together, the data from this work revealed a within-patient switch during persistent infection, from high cell invasiveness to biofilm growth, controlled by phase variation-reduced expression of the HMW1 adhesin.

RESULTS
Dynamics of NTHi epithelial hyperinvasion mediated by HMW1A 86-028NP and intracellular fate of bacterial aggregates. The HMW1A 86-028NP adhesin allele was previously isolated as a key factor for epithelial hyperinvasion by the H. influenzae 86-028NP strain (originally isolated from a child with severe otitis media [OM] [40]) using natural transformation. Fluorescence microscopy showed Lamp-1 vesicle reorganization around large groups of internalized hyperinvasive hmw1A (hmw1A hyper ) bacteria after cell invasion, contrasting with the vesicles containing singlet bacteria seen in lowly or moderately invasive strains (30). However, the in vivo dynamics of bacterial entry had not been observed. To monitor H. influenzae entry within airway epithelial cells, we performed time-lapse fluorescence microscopy during intracellular invasion into A549 cells after first loading acidic compartments with Lysotracker red DN99 (21). Next, cells were infected with a green fluorescent protein (GFP)-expressing 86-028NP strain, and imaging was initiated at 15 min postinfection (set at 0 min for simplicity).
Image analysis revealed that once bacterial cells or cell aggregates adhered to the epithelial cell surface, they entered rapidly (between 0 and 6 min) (for representative images, see Fig. 1; for a video, see Movie S1 in the supplemental material). Internalization of both single bacterial cells and aggregated groups was observed, with the latter event being frequent. Once inside, colocalization with Lysotracker-loaded acidic compartments was seen from 18 min onwards. When observing internalized groups of bacteria, colocalization with Lysotracker and reorganization toward a compact bacterial aggregate increased over time. Intracellular bacterial CFU titers remained stable for more than 4 h postinfection (hpi) and then decreased over time, but viable intracellular bacteria remained present at 24 hpi (Fig. 2B, white bars, and Fig. S1). Consistent with previous observations (21), intracellular bacteria appeared to remain metabolically active inside acidic endosome-like compartments throughout the course of infection (Fig. S1).
Results from live imaging of the dynamics of NTHi invasion into airway epithelial cells support our previous speculation that HMW1 86-028NP -mediated aggregates of bacterial cells invade as groups, different from invasion as single cells seen in strains with other alleles or no hmw loci (21,30). Furthermore, a significant load of intracellular metabolically active bacterial cells continues to reside in acidified compartments with late endosome features after 24 hpi. This differs from previous results with moderately invasive otitis media isolate Hi375, in which intracellular bacteria after invasion typically occupy vesicles as single cells (21). This might indicate that intracellular bacterial cell aggregates survive longer than single-cell invaders, but direct comparison is complicated by the substantially fewer bacteria of Hi375 that are initially taken up by host cells.
HMW1C and phase-variable expression of the hmw1A gene are required for epithelial hyperinvasion mediated by HMW1 86-028NP . The HMW1 adhesin is encoded by hmw1A, as part of the hmw1ABC locus (34). A prerequisite for HMW1 adhesin activity is its glycosylation at asparagine (Asn) residues by the glycosyltransferase HMW1C (41), and Asn glycosylation levels vary with changes in the ratio of HMW1A to HMW1C (42). However, genetic analysis is complicated because proteins encoded by the paralogous hmw2BC normally found in HMW-positive strains can substitute for those of hmw1BC (43). As a surrogate strain to dissect the hyperinvasive hmw1A 86-028NP allele, we used a previously generated recombinant strain, rRdS, in which hmw1ABC 86-028NP was inserted into the poorly invasive laboratory strain RdKW20 by natural transformation, thereby conferring an invasion and intracellular life phenotype comparable to that of the hyperinvasive 86-028NP strain ( Fig. 2A and B) (30). Since RdKW20 naturally lacks both the hmw1 and hmw2 loci and is also highly naturally transformable (in contrast to 86-028NP [44]), rRdS allowed us to uncouple hmw1 from hmw2 and provide for easier genetic manipulations and simpler construct design (engineered strains shown in Table 1).
Confirming the role of HMW1 and the requirement for glycosylation by HMW1C for the expression of mature protein, we generated and tested rRdS Dhmw1A and rRdS Dhmw1C mutants. Both mutants lost the hyperinvasion phenotype in two cell types, behaving similarly to the negative control lacking hmw loci, RdS ( Fig. 2A). Detection of In all cases, a reduction of intracellular bacterial counts was detected at the assay endpoint compared to their respective initial titers. Results of three independent experiments (n = 3) in triplicate are shown as the mean log 10 CFU per well 6 SD. Statistical comparisons of means were performed by two-way ANOVA and Sidak's multiplecomparison test. (C) Effect of hmw1A and hmw1C gene inactivation on the expression of the hmw1A gene. The expression of the hmw1A gene was undetectable in the rRdS Dhmw1A and RdS strains (**, P , 0.0001) and lower in rRdS Dhmw1C (*, P , 0.005) than in the rRdS WT strain. The expression level of the hmw1A gene was lower in 86-028NP than in the rRdS strain (*, P , 0.005). Results of at least two independent experiments (n $ 2) in triplicate are shown as the mean RQ (relative quantification, 2 2DCT Â 10) values 6 standard errors of the means (SEM). Statistical comparisons of means were performed by one-way ANOVA and Dunnett's multiple-comparison test. The bottom panel shows detection of HMW1 86-028NP by Western blotting; the SSR number at the hmw1A promoter regions is indicated (green indicates that HMW1A is immunodetected, and red indicates that HMW1A is not immunodetected). (D) Diagram illustrating the generated rRdS derivative strains, where the hmw1A promoter region presents a range of SSRs from 20 to 24. (E and F) Assays performed with rRdS derivative strains containing a variable number of SSRs in the hmw1A promoter region. RdS was used as a negative control. (E) Increased number of 7-bp tandem repeats reduced hmw1A gene expression (*, P , 0.005; **, P , 0.001). Results of at least two independent experiments (n $ 2) in triplicate are shown as the mean RQ (2 2DCT Â 10) values 6 SEM. The bottom panel shows that an increased number of SSRs reduced HMW1A protein to a nondetectable level. The number of SSRs in the hmw1 promoter is indicated in green (positive protein detection) or red (negative protein detection). (F) An increased number of SSRs eliminates the rRdS epithelial hyperinvasion phenotype (*, P , 0.0001). Results of at least three independent experiments (n $ 3) in triplicate are shown as the mean log 10 CFU per well 6 SD. In panels E and F, statistical comparisons of means were performed by one-way ANOVA and Dunnett's multiple-comparison test. At the bottom of panels C and E, immunoblots were performed by using primary guinea pig anti-HMW (gp85 antibody) and secondary goat anti-guinea pig-horseradish peroxidase (HRP) antibodies. Three independent experiments were performed (n = 3), and a representative image is shown. The corresponding Coomassie-stained gel portion is shown as a loading control (LC).

HMW1 Phase Variation Modulates NTHi Hyperinvasion
® HMW by immunoblotting showed that deletion of hmw1C led to undetectable HMW1A protein, although hmw1A transcript levels were only partially diminished (comparable to the transcript levels in 86-028NP, which expressed HMW1A protein and had invasion rates similar to those of rRdS) (Fig. 2C). These results confirm that HMW1A 86-028NP mediates hyperinvasion and that HMW1C-mediated glycosylation protects HMW1A against premature degradation (41).
Notably, HMW1A transcript and protein expression levels were significantly higher in the rRdS recombinant than in the 86-028NP parent strain (Fig. 2C). Although this expression difference might be a consequence of other differences in the genetic backgrounds or the absence of the paralogous hmw2 loci, Sanger sequencing identified a difference in the counts of heptameric repeats in the SSRs upstream of hmw1 in the two strains (between the P2 and P1 promoters) (Fig. 3A), with rRdS carrying 14 heptamers, compared to the parent 86-028NP strain's 15 heptamers (and in contrast to the sequence reference for 86-028NP, which contains 17 heptamers [40]). This result is consistent with previous reports of phase-variable increases in copy numbers leading to decreased HMW1 adhesin expression (36,38,39).
To confirm that hmw1A expression, and also the hyperinvasion phenotype conferred by the HMW1A 86-028NP protein allele, decreases with increased upstream SSR lengths, we generated a panel of rRdS derivative clones with SSR counts from 20 to 24 (Fig. 2D). Clones with 20 to 24 repeats had no detectable HMW1A protein expression and had low invasion levels comparable to those of the RdS control lacking hmw1, in contrast to the high HMW1A expression levels seen in the hyperinvasive rRdS recombinant with 14 repeats in the heptameric repeat upstream of its hmw1 locus ( Fig. 2E and F).
HMW1 expression is driven by the P2 promoter and sensitive to increases in its heptameric SSR. Two distinct transcriptional start sites have previously been identified at hmw1 by primer extension (promoters P2 and P1, separated by a heptameric SSR, 59-ATCTTTC-39, that overlaps the 210 box of P2 and the 235 box of P1 on each end) ( Fig. 3A) (36). Transcripts from P2 contain the heptameric SSR in the 59 untranslated region (UTR), while those from P1 do not; also, the SSR count could affect transcriptional activity from either promoter. To clarify the role of each promoter and the effect of changes in the SSR length, we generated a set of transcriptional fusion reporter plasmids coupling variants of the hmw1 promoter region to GFP, which varied the heptameric SSR count and configuration of P2 and P1 (Fig. 3B). As controls, we used the full hmw1A promoter region including both P2 and P1, separated by either 13 or 23 repeats.
Reporter plasmids were introduced into the RdKW20 strain, and heterologous GFP protein levels were detected by immunoblotting (Fig. 3C). Expression levels from control plasmids were substantially higher with only 13 repeats than with 23 (plasmid 1 [pP2-SSR 13 -P1] and plasmid 2 [pP2-SSR 23 -P1]), consistent with hmw1A chromosomal expression in rRdS derivatives. Expression from mutant constructs with only the P1 or P2 elements indicates that most or all HMW1A protein expression is driven by transcripts initiated at P2 (plasmid 3 [pP2] and plasmid 4 [pP1]). GFP protein levels were strongly decreased from P2-only fusion plasmids, with 24 versus 14 repeats (plasmid 6 [pP2-SSR 24 ] and plasmid 5 [pP2-SSR 14 ]), underlining the effect of increased SSR copies on the repression of HMW1A expression. No GFP was detected in P1-only reporter plasmids with either 13 or 24 repeats (plasmid 7 [pSSR 13 -P1] and plasmid 8 [pSSR 24 -P1]). Further supporting P2 as the transcriptional start for expressed HMW1A, alignment of the hmw1 upstream promoter region showed more variation in P1, mainly due to variation in AGGG repeats in the putative P1 210 box (Fig. S2). Nevertheless, when analyzing the repeat number effect on GFP expression from P2 by including 14 or 24 repeats, we observed lower GFP levels than those in the control plasmids including the entire region (plasmids 5 and 6 versus plasmids 1 and 2, respectively), suggesting the potential presence of posttranscriptional regulatory elements downstream of the SSR in the long 59 UTR generated from P2.
In summary, the expression of HMW1 is primarily driven by the P2 promoter and strongly affected by SSR length, although whether this effect is transcriptional or posttranscriptional remains unknown.
Natural variation in HMW adhesins among longitudinally collected persistent NTHi isolates. To test for genetic changes at hmw1 during long-term infection and whether the hyperinvasive HMW1A allele can be found in NTHi persistent isolates, we examined the distribution and sequence divergence of the hmw1 and hmw2 loci across a well-characterized genome-sequenced set of 92 strains collected from COPD sputum samples over time, 72 of which were grouped into 20 "persistent" clonal types (CTs) that consisted of at least 2 nearly identical strains isolated at different times from the same subject. We previously used this collection to identify NTHi within-lung pathoadaptation traits, searching for genes recurrently affected by mutations in distinct CTs and in different patients (4). As references for the HMW adhesin loci, we used the prototype hmw1A and hmw2A gene sequences from NTHi strain R2846, where the HMW

HMW1 Phase Variation Modulates NTHi Hyperinvasion
® adhesins were originally identified and have been most extensively characterized (33). BLASTN analysis against the 92 genome assemblies (4) identified both hmw loci in 40 isolates (43.5%), comparable to previous clinical isolate surveys that found between 40 and 75% of strains with HMW adhesins (32,45). To focus on HMW-positive strains from persistent CTs with multiple isolates, we further focused on a set of 14 isolates of 4 CTs for which there was a finished genome assembly available (see Table 2) (numbers of single nucleotide polymorphisms [SNPs] in noncoding regions were 0 for CT 3, 10 for CT 18, 298 for CT 44, and 1 for CT 73; numbers of SNPs in coding regions, including those of high, moderate, and low impacts, were 11 for CT 3, 103 for CT 18, 3,120 for CT 44, and 6 for CT 73).
HMW1/2A consists of a signal peptide (SP), the propiece (PP) (containing the secretion domain that mediates interaction with the HMW1/2B outer membrane translocator, before cleavage and release at secretion), the mature HMW1/2A adhesin that contains the binding domain, and a small C-terminal anchor (46) (Fig. 4A). To distinguish between paralogous hmw1 and hmw2 loci in each of the 4 CTs, we compared their binding domains to those of R2846 (amino acids 555 to 914 in HMW1A and amino acids 553 to 916 in HMW2A [47]). To assign HMW adhesins as either HMW1 or HMW2, the percent identity was calculated between the 12 HMW binding domains of the six isolates with fully assembled genomes, and assignment was made, based on which had higher identity to which prototype adhesin in strain R2846. Assignment based on synteny is expected to be unreliable since although all hmw loci observed are adjacent to either the yrbI (NTHI1982) or radA (NTHI1453) gene, previous observations suggest that gene conversion can swap binding domains between loci (30). HMW binding domain sequences from hyperinvasive strain 86-028NP were included to test for similarity to protein alleles naturally occurring in COPD. Although HMW binding domains were highly diverse, as expected, most comparisons yielded a clear distinction between paralogs, with putative orthologs having .40 to 50% pairwise amino acid identity and putative paralogs having ,40% pairwise identity, and their chromosomal locations were frequently shuffled (Fig. 4B). Clear exceptions are the putative HMW1 sequences from 86-028NP and four isolates of CT 73 with identical alleles (strains P651, P652, P653, and P654), which are 99.5% identical to each other but only slightly more similar to the prototype protein HMW1 R2846 than to HMW2 R2846 (41.5% versus 37.9%). These results suggest that the hyperinvasive HMW1 allele in strain 86-028NP (an otitis media isolate) can also be found in NTHi strains that have persisted in COPD infections.
When comparing the HMW1 variants present in 86-028NP and P651 to P654 (CT 73) to the rest of the tested strains, conservation was found across the signal peptide and propiece (Fig. S3, pink and green, respectively), with the adhesins themselves showing high divergence, particularly in the binding domain, as previously observed (47). Although the PP is well conserved, some differences were specific to HMW1A 86-028NP and HMW1A CT73 . Since the crystal structure of the HMW1-PP from R2846 has been solved (46) (unlike the rest of the protein), we used structural homology modeling to visualize HMW1-PP differences. Figure S4 shows changes common and specific to the HMW1-PP from 86-028NP and CT 73: in the last turn of the b-helix (D424Y, F426S, K428G, D429N, N430D, I432A, and D434E) and in the length of the loop connecting the last two turns of the superhelix (G409del). When considering mature protein variants of HMW1 only, substantial divergence was observed due to SNPs and indels, rendering size variants ranging from 1,434 to 1,661 amino acids (Fig. S3). Furthermore, HMW1C Asn glycosylation across HMW1A R2846 adhesins has been experimentally demonstrated at NXS/T motifs (41,42,48), but the distinct HMW1A alleles show relatively poor conservation of these sites (70 to 91% across all variants [CT 3, 90.9%; CT 18, 75.8%; CT 44, 81.8%; CT 73 and 86-028NP, 69.7%]) (Fig. S3 [Asn positions that could be glycosylated are shown in red boldface type]).
In sum, HMW adhesins in longitudinally collected persistent NTHi isolates had high variability in chromosomal location and amino acid sequence, with evidence that strains of CT 73 could be hyperinvasive, like 86-028NP, due to their distinct hmw1 allele. Serial CT 73 isolates from the same COPD patient lost expression of an epithelial hyperinvasion phenotype after SSR expansion downstream of the P2 promoter. Since the four serial isolates of CT 73 (collected over ;3 months from the same subject) had identical HMW1A protein sequences that were highly similar to that of hyperinvasion-inducing HMW1A 86-028NP , we tested their epithelial invasion phenotypes in A549 and NCI-H292 cells by gentamicin protection assays, comparing them to 86-028NP and the 10 additional strains from HMW-positive multi-isolate CTs (Fig. 4C). Assays were performed using bacterial cultures normalized to the same CFU per milliliter, based on correlations with the optical density at 600 nm (OD 600 ), thereby ensuring comparable multiplicities of infection (MOIs) among strains. Notably, CT 73 isolates from May 2013 (P651 and P652) had invasion rates comparable to those of hyperinvasive 86-028NP, whereas those collected later in June and July 2013 (see Table 2) had significantly lower invasion rates, comparable to those of the other HMW-positive strains from multi-isolate CTs collected from COPD patients.
All four CT 73 isolates had 100% identical HMW1 protein sequences, so we tested for correlations with the number of phase-variable heptameric SSRs downstream of the P2 site at the hmw1 promoter, predicting increased copy numbers in the later-collected isolates (36,38,39). Sanger sequencing was used to genotype the SSRs at hmw1 from all 14 strains in the four CTs since short-read sequencing had failed to confidently resolve differences in the long heptameric SSRs and often failed to distinguish reads from paralogous hmw1 and hmw2 loci. Strikingly, this revealed an expansion of the SSR from 11 to 18 heptamers at hmw1 over time in the four CT 73 strains, which was associated with both a loss of hyperinvasion and decreased HMW1 mRNA and protein levels ( Fig. 4C and D). SSR counts also varied within the other three CTs at both hmw1 and hmw2, and increased SSR counts strongly correlated with decreased mRNA and protein levels at each adhesin paralog ( Fig. S5A and B).
We aimed to confirm that hmw1A was responsible for the high invasiveness observed in CT 73 isolates P651 and P652, but natural transformation was negligible in these strains, preventing us from making the knockout. Thus, we sought evidence of an allele like hmw1 86-028NP in another strain and tested for hyperinvasiveness. We performed tBLASTn analysis using the HMW1 86-028NP binding domain as a query against all H. influenzae genomes available in the NCBI database, identifying the HMW1 binding domain of strain PittEE (HMW1 PittEE ) (49) as being 99.5% identical (Fig. 5A and Table S3). Strain PittEE (obtained from a child suffering from chronic otitis media with effusion) and the COPD isolate P652 were closely related ( Fig. 5B and C). Further sequence analysis showed that HMW1 CT73 and HMW1 PittEE are nearly identical, differing by only two polymorphisms outside the binding domain (N540S and K1475R). HMW1 86-028NP , HMW1 CT73 , and HMW1 PittEE nicely clustered together, with up to 331 commonly shared and exclusive polymorphisms, 181 of them located in the binding domain compared to HMW1A R2846 (Fig. S3, shown in gray). Their predicted level of glycosylation conservation was only 70%, with eight Asn residues compatible with being glycosylated lost (Fig. S3). Although PittEE was also poorly transformable, epithelial invasion assays showed levels comparable to those of 86-028NP and P651/P652 ( Fig. 4C and Fig. 5D). The hmw1A PittEE and hmw1A P652 promoter regions contained 14 and 13 SSRs, respectively, and mRNA and protein expression levels showed lower hmw1A expression levels in PittEE (Fig. 5E).
Together, these data support the association not only between a particular HMW1 variant and the hyperinvasive phenotype but also between the increased number of 7bp repeats in the hmw1A promoter region and the decreased gene transcript and HMW protein levels, as observed in both the laboratory rRdS strain (Fig. 2) and the clinical CT 73 strain series (Fig. 4D).
Phase-variable expression of HMW1 hyper mediates a switch between intracellular invasion and biofilm formation. Previous evidence pointed to serum antibodies as a selective pressure against NTHi strains with high HMW expression levels, leading to the downregulation of HMW1 by phase-variable changes in the promoter (39). Whether a high expression level of HMW1 serves an early role in infection is unknown, but here, we found that hmw1A phase variation regulates the hyperinvasive phenotype of strains with a particular HMW1 allele. We reasoned that the loss of HMW1 expression and hyperinvasion could lead to a lifestyle switch to favor extracellular survival. Given that biofilm formation relates to the extracellular persistence of NTHi (8), we measured biofilm formation on polystyrene microtiter plates using crystal violet staining of 24-h static cultures of the rRdS strain set with various SSRs at hmw1 and of the four serially collected CT 73 strains. By using the rRdS series, with 14, 20, 21, 22, 23, or 24 SSRs in the hmw1A promoter region, increased repeat numbers enhanced biofilm formation to levels comparable to those of the RdS negative control that lacks the hmw1 locus; no significant differences were seen among rRdS clones with 20 to 24 repeats. Furthermore, inactivation of the hmw1A or hmw1C gene also increased biofilm formation compared to the rRdS wild-type (WT) strain (Fig. 6A). We also found a strong correlation between SSRs and biofilm formation within CT 73, where later isolates, P653 and P654, produced higher biofilm biomass than that the early isolates P651 and P652 (Fig. 6B). Biofilm measurements of isolates within the other CTs carrying HMW adhesins (but not the hyperinvasive allele [CTs 18, 3, and 44]) also showed a negative correlation between biofilm formation and phase-variable HMW expression (Fig. S5C), which may indicate a more general trade-off between HMW expression and biofilm formation not only dependent on having the hyperinvasive allele. . Two independent experiments were performed (n = 2); a representative image is shown. Immunoblots were performed by using primary guinea pig anti-HMW (gp85 antibody) and secondary goat anti-guinea pig-HRP antibodies. Three independent experiments were performed (n = 3); a representative image is shown. The corresponding Coomassie-stained gel portion is shown as a loading control (LC).

HMW1 Phase Variation Modulates NTHi Hyperinvasion
® Finally, we compared the biofilm architectures of high-invasion P652 and low-invasion P653 by using scanning confocal and atomic force microscopy. Spatial distribution (twodimensional [2D]) and three-dimensional (3D) biofilm images were obtained after superimposing each z-stack series. Low-biofilm strain P652 had a less compact biofilm with patchy surface coverage (average thickness of 33.58 6 1.87mm), whereas strain P653 produced a compact biofilm structure covering the entire area (average thickness of 46.85 6 1.65mm) (Fig. 6C). Topography analysis of the shape, structure, and surface roughness of bacteria found no significant differences between P652 and P653. However, gaps could be observed when imaging the P652 strain in most images since the biofilm formed was not uniform and did not fully cover the mica surface, unlike images of the P653 strain (Fig. 6D). These results support that SSR-mediated phase variation at the hmw1 promoter regulates a switch between epithelial adherence/invasion and extracellular biofilm lifestyles.

DISCUSSION
The well-characterized adhesive glycoproteins HMW1 and HMW2 are expressed by NTHi to mediate adherence to human respiratory epithelial cells, and they enhance the ability of NTHi to colonize the nasopharynx and oropharynx of rhesus macaques Gaps where the mica surface was exposed are marked with asterisks.
® (33-35, 41, 50). We had previously found that a particular allele of HMW1A (here HMW1A hyper ) conferred NTHi with highly elevated rates of intracellular invasion, in which self-aggregated clumps of bacteria were seen invading host cell endosomes (30). Here, we extend our investigations into this hyperinvasive allele using experiments with clinical isolates of distinct origins as well as genetically engineered laboratory strains to reach our main conclusions: (i) we showed that NTHi hyperinvasion depends on HMW1A expression and glycosylation by HMW1C, extending our observations to two cultured epithelial cell types; (ii) using time-lapse microscopy, we confirmed that NTHi bacteria with HMW1A hyper invade airway epithelial cells as bacterial aggregates that can persist and survive at least 24 h; (iii) we showed that although the HMW1A hyper allele is relatively rare, its binding domain and predicted glycosylation pattern are highly diverged from those of both the canonical HMW1 and HMW2 proteins; (iv) we demonstrated that heptameric repeat expansion downstream of the P2 promoter decreases HMW1 hyper expression; and (v) we identified the presence of HMW1 hyper in a longitudinally collected series of persistent isolates that revealed a phase-variable switch from hyperinvasiveness to biofilm formation concomitant with hmw1 heptameric repeat expansion and decreased hmw1 expression. These findings suggest a potential lifestyle switch during NTHi pathoadaptation toward biofilm formation, which is mediated by phase-variable decreased expression of the HMW1 adhesin.
The hmw loci were present in 43.5% of the longitudinal isolates in our collection, consistent with other independent strain sets (32). Clinical isolates lacking the two hmw loci instead encode the adhesin Hia at another locus (51). Notably, as expected, pairwise alignment of HMW binding domains found extremely high diversity within putative HMW1 and HMW2 groups (32), and the hyperinvasion-associated binding domains were the most diverged. We found only six sequenced genomes with highly similar HMW hyper alleles, the four COPD isolates of CT 73 from Spain (strains P651 to P654) and two pediatric OM isolates from the United States, the closely related strain PittEE and the distantly related strain 86-028NP (see Table S3 in the supplemental material). Two particular features distinguished the HMW1 hyper variants in these strains from other HMW1 proteins: an exclusive amino acid signature in their binding domains and the absence of eight residues known to be glycosylated in canonical HMW1 R2846 . We attempted to model the HMW1 protein structure using homology and fold recognition methods, but the absence of solved structures with high sequence identity and coverage yielded no reliable outcome. However, all models predicted that the adhesin domain, like the propiece, will adopt a b-solenoid superhelix, as does the hemopexin binding domain of the two-partner secretion system HxuB/HxuA of H. influenzae (data not shown) (52). The observed distribution of HMW protein sequences, including the "swapping" of binding domains between loci, suggests that deeper sequence analysis of HMW variants could be of use for predictive purposes, e.g., screening genomes for hyperinvasive allele types.
Phase variation of the heptameric repeat upstream of HMW-encoding genes is known to affect HMW expression levels, with increasing copy numbers leading to decreased RNA and protein expression levels. Hyperinvasion conferred by HMW1 hyper is likewise controlled by phase variation in the upstream promoter region, and moreover, we found that the SSR number affects HMW expression from the P2 promoter only. Our reporter fusions and sequence analyses found that P1 was not functional under the conditions tested. The putative transcription start site at P1 may correspond to an mRNA processing site since previous primer extension analyses could not discriminate transcriptional starts from these (36). Whether such an mRNA processing site modulates HMW expression is unclear. Thus, SSR variation affects the length of the 59 UTR, but whether this directly affects transcription, modifies mRNA stability, and/or impacts translation requires further investigations. Likewise, hmw expression also undergoes epigenetic regulation as part of the phase-variable ModA methyltransferase phasevarion (53,54). An additional complexity level by considering not only hmw phase variation but also its epigenetic regulation is out of the scope of this study and will require further work.
SSR phase variation within the human host has been observed in the hmw and IgA protease igaB2 loci from a large set of prospectively collected NTHi strains from COPD patients (27,28) and also during experimental human nasopharyngeal colonization (although the hmw promoter regions were not specifically analyzed) (55). Also, a previous analysis of serial COPD isolates showed that HMW expression typically decreases over time, likely in response to selective pressure from the high titer of anti-HMW antibodies in COPD patient sera tested against purified HMW1 protein from strain R2846 (39). This is compatible with phase-variable genes also lowering expression during long-term persistent meningococcal carriage, in part due to antibody-mediated selection (56). The fact that high antibody titers are present even at the time of initial lower airway infection makes it unclear what advantage high levels of HMW may confer at early stages of infection. One possibility could relate to (at least for some alleles) HMW1 showing preferential binding to 2-3-linked sialic acid glycans that are predominant in the lower respiratory tract (57). Here, we show a phase-variable decrease of HMW levels over time in independent sets of serial COPD isolates, supporting this as being common during NTHi persistence within the COPD lung environment.
Shifting from a planktonic lifestyle to a biofilm community has been observed for several chronic bacterial infections (58). For example, within the COPD lung, Pseudomonas aeruginosa evolves toward increased mutation rates and antibiotic resistance, reduced production of proteases and motility, and production of biofilms (59). We speculated that phase-variable downregulation of HMW1A hyper might be coupled to a shift from one chronicity-associated phenotype (intracellular invasion) to another (biofilm formation), both of which could allow NTHi to persist, for example, by increasing the survival of genetically sensitive bacteria during antibiotic treatment. Phase-variable expression of HMW1 was not only correlated with increased hyperinvasiveness when the HMW1 hyper allele was present but also inversely correlated with biofilm formation. This negative correlation with biofilm levels held both for laboratory-engineered strains in which the hmw1 hyper locus was isolated in a distinct strain background (rRdS) and in multiple clinical isolate sets collected from COPD patients. These data suggest that phase-variable decreased HMW expression over time during persistent lung colonization, likely by antibody selective pressure, leads to a switch in NTHi's lifestyle from high adherence/invasiveness to higher biofilm formation. These data further suggest that HMW expression may generally inhibit biofilm formation. Although we originally speculated that the self-aggregation of high-expressing HMW1A hyper strains could be responsible for the patchiness of their biofilms, the inverse correlation was not restricted to HMW1A hyper strains. Instead, HMW expression in general may favor host cell interactions over biofilm formation.
We had previously identified recurrent loss-of-function mutations in the NTHi ompP1 (fadL) gene using the same set of longitudinally collected isolates (4). These mutations led to decreased adhesion/invasion but also gave resistance to arachidonic acid, which is abundant in COPD lungs. We speculated that the pathoadaptive loss of ompP1 might restrict these strains to the COPD lung environment since nearly all nonlung isolates had an intact gene. Here, in contrast, the phase-variable loss of HMW1 adhesin expression is more readily reversible, allowing for a future switch back to high HMW expression and low biofilm formation through contraction of the heptamer repeat.
In conclusion, HMW-mediated hyperinvasion is associated with specific hmw1A allelic variants, which may facilitate early stages of airway infection. However, HMW expression may often be downregulated by phase variation over time, driving a phenotypic switch from bacteria living as intracellular groups to an extracellular biofilm lifestyle. This could serve as an adaptive strategy during NTHi persistence (Fig. 7). The mutability of SSRs at phase-variable loci is determined by a combination of environmental, population, and molecular drivers that will affect the evolution of these tracts (60). A further understanding of which and how drivers affect the mutability of the tandem repeats in the hmw promoter regions will shed light on the drivers of these reversible pathoadaptations.

MATERIALS AND METHODS
The generation of bacterial strains and plasmids; bacterial growth on agar plates and liquid cultures; bacterial biofilm formation, monitored by crystal violet staining, confocal microscopy, and atomic force microscopy; cultured cell procedures involving bacterial infection, live imaging, and immunofluorescence microscopy on fixed samples; and gene expression and protein immunodetection procedures are detailed in Text S1 in the supplemental material.
Bacterial strains, plasmids, media, and growth conditions. Strains and plasmids used in this study are listed in Table 1 and Table S1, respectively. Clinical strains belong to a genome-sequenced longitudinal collection recovered from respiratory samples of COPD patients (BioProject accession number PRJNA282520) (4). NTHi strains were grown at 37°C with 5% CO 2 on chocolate agar PolyViteX (PVX; bioMérieux) or Haemophilus test medium (HTM) base agar (Oxoid) supplemented with 10 mg/ml hemin and 10 mg/ml NAD, referred to sHTM agar. NTHi liquid cultures were grown at 37°C with 5% CO 2 in

FIG 7
Model illustrating HMW SSR phase variation and its potential regulation of the H. influenzae lifestyle during persistence. The HMW adhesin, whose expression is regulated by phase variation consisting of changes in the number of 7-bp tandem repeats in its promoter region, binds to host cell receptor(s) through its highly variable binding domain. We provide evidence for phase variation likely regulating a bacterial lifestyle switch between invasion-subcellular location, as well as extracellular biofilm growth during NTHi persistence. Besides identifying shared features in the binding domains of HMW variants associated with epithelial hyperinvasion (HMW1 allelic variation), reduced hmw1A expression and HMW1A protein levels due to increased (SSR) n lower NTHi's ability to hyperinvade epithelia but also increase its ability to form biofilms. HMW-mediated cell infection may be essential for virulence at early stages of infection, but its persistence may indeed be favored by limiting HMW to thus not only overcome antibody selective pressure (*, previously suggested by Cholon and coauthors [39]) but also find a balance between amenable lifestyles favoring chronicity.   c Intensity of the Western blot band corresponding to HMW1A or HMW2A (1, low; supplemented brain heart infusion (sBHI) medium. Erythromycin at 11 mg/ml (Erm 11 ), kanamycin at 30 mg/ ml (Km 30 ), or spectinomycin at 30 mg/ml (Spec 30 ) was used when required. Escherichia coli was grown on Luria-Bertani (LB) medium or LB agar at 37°C, with ampicillin at 100 mg/ml (Amp 100 ), chloramphenicol at 20mg/ml (Cm 20 ), erythromycin at 150 mg/ml (Erm 150 ), or kanamycin at 50 mg/ml (Km 50 ), when necessary. Protein sequence alignments and molecular modeling. To perform HMW1/2 assignments, the binding domains defined by R2846 sequences as amino acids 555 to 914 in HMW1A and amino acids 553 to 916 in HMW2A were used to calculate the percent amino acid identities of the 12 HMW binding domains of the six isolates with fully assembled genomes using Clustal Omega (Table 2). Homology models of the HMW1 propiece [(HMW1-PP) NTHi ] from strains 86-028NP, PittEE, and P652 (representative of CT 73) were built automatically using the SWISS-MODEL server (61), which previously identified the crystal structure of HMW1-PP R2846 from NTHi strain R2846 (1.92-Å resolution; PDB accession number 2ODL) (46) as the best possible template (100% coverage; 95.43 to 97.04% identity). Structural and energetic evaluations of the models and template compared well. Structure visualization and figure preparation were performed with Edu PyMOL version 1.7.4 software.
SNP-based phylogeny. The genomic relationship between strains 86-028NP, R2846, PittEE, and P652 (used as a reference genome) was analyzed with CSI Phylogeny version 1.4 (62) according to single nucleotide polymorphisms (SNPs) between core genomes (92.2% of the reference genome covered by all strains). The minimum distance between SNPs was set at 10 bp. For visualization and figure preparation, the iTOL online tool was used (https://itol.embl.de/).
Statistical analysis. In all cases, a P value of ,0.05 was considered statistically significant. Analyses were performed using the Prism software, version 7, statistical package for Mac (GraphPad Software). Each analysis and its corresponding results are detailed in each figure legend.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, DOCX file, 0.1 MB.