Genomic epidemiology and antibiotic susceptibility profiling of uropathogenic Escherichia coli among children in the United States

ABSTRACT Escherichia coli is the most common cause of urinary tract infections (UTIs) in children, and yet the underlying mechanisms of virulence and antibiotic resistance and the overall population structure of the species is poorly understood within this age group. To investigate whether uropathogenic E. coli (UPEC) from children who developed pyelonephritis carried specific genetic markers, we generated whole-genome sequence data for 96 isolates from children with UTIs. This included 57 isolates from children with either radiologically confirmed pyelonephritis or cystitis and 27 isolates belonging to the well-known multidrug-resistant sequence type ST131, selected to investigate their population structure and antibiotic resistance characteristics. We observed a UPEC population structure that is similar to those reported in adults. In comparison with prior investigations, we found that the full pap operon was more common among UPEC from pediatric cases of pyelonephritis. Further, in contrast with recent reports that the P-fimbriae adhesin-encoding papGII allele is substantially more prevalent in invasive UPEC from adults, we found papGII was common to both invasive and non-invasive UPEC from children. Among the set of ST131 isolates from children with UTIs, we found antibiotic resistance was correlated with known genetic markers of resistance, as in adults. Unexpectedly, we observed that fimH30, an allele of the fimbrial gene fimH often used as a proxy to type ST131 isolates into the most drug-resistant subclade C, was carried by some of the subclade A and subclade B isolates, suggesting that the fimH30 allele could confer a selective advantage for UPEC. IMPORTANCE Urinary tract infections (UTIs), which are most often caused by Escherichia coli, are not well studied in children. Here, we examine genetic characteristics that differentiate UTI-causing bacteria in children that either remain localized to the bladder or are involved in more serious kidney infections. We also examine patterns of antibiotic resistance among strains from children that are part of E. coli sequence type 131, a group of bacteria that commonly cause UTIs and are known to have high levels of drug resistance. This work provides new insight into the virulence and antibiotic resistance characteristics of the bacteria that cause UTIs in children.

fimbriae, which can be regulated by phase variation (6,(10)(11)(12).Early studies highlighted the carriage of a specific fimbria encoded by the pyelonephritis-associated pili (pap) operon in UPEC, as well as specific alleles of key genes within the operon, such as papGII, to be decisive for infections developing into pyelonephritis (11,13,14).In one of the few studies leveraging whole-genome sequencing to investigate the genetic traits that enhance E. coli's ability to cause pyelonephritis or invasive infections, the papGII allele has been significantly associated with invasive UPEC in adults (9), consistent with earlier findings that the papGII exhibits preferential binding to globosides, which are dominant in kidneys (15).This meta-study of 722 UPEC genomes from across multiple independent investigations, of which at least 80% were from adults, also showed that invasive UPEC are phylogenetically clustered within specific sublineages that tend to encode greater numbers of virulence genes, in particular those related to iron acquisition.
UPEC are becoming increasingly resistant to agents commonly used to treat UTI, such as fluoroquinolones, trimethoprim-sulfamethoxazole, and cephalosporins.While the use of fluoroquinolones is much less common among children, fluoroquinolone-resistant E. coli has been detected in this age group (16).UPEC are also exhibiting increased multidrug resistance (MDR), defined as exhibiting resistance to at least one antibiotic in three or more drug classes (17), in large part due to the emergence and dissemination of specific clonal lineages of MDR E. coli.Of these clonal lineages, sequence type (ST) 131 is considered the most successful worldwide and exhibits MDR (18).However, little is known about the population structure of ST131 E. coli causing UTI in children.Although previous reports suggested that ST131 tend to lack papG (19), one study identified specific sublineages of ST131 that encode papGII and associate with more severe UTIs (9).The objectives of this study were to identify alleles associated with pyelonephritis in children with UTI and to examine the population structure and antibiotic resistance of UTI-causing ST131 E. coli in children.

Isolates from children with pyelonephritis and cystitis are phylogenetically diverse
Of the 61 children with UTI who underwent a renal scan, 57 E. coli isolates yielded sufficient, high-quality sequencing data for inclusion in downstream genomic analyses (Table 1).To determine the evolutionary relationship among these isolates, we called single-nucleotide polymorphism (SNPs) against a common reference using Pilon (20), removed recombined regions predicted using ClonalFrameML (21), and constructed a maximum likelihood, SNP-based whole-genome phylogeny using RaxML (22).The majority of isolates (84%) belonged to phylogroup B2.To put the phylogenetic tree into context with known E. coli STs, we determined ST designations using ARIBA and the PubMLST database.Combined, the results of these analyses showed that isolates within this collection were phylogenetically diverse, representing at least 22 STs, a large fraction of which fell within ST95 (n = 19), ST73 (n = 8), ST69 (n = 6), and ST131 (n = 5) (Fig. 1; Fig. S1; Table S1), which are the same four STs noted previously to predominate among UPEC infections in adults (9).Although ST designations tended to track well with the placement of isolates within the phylogeny, we observed four isolates, labeled as ST2614 (n = 1), ST421 (n = 1), or "novel ST" (n = 2), closely clustered with other ST95 isolates within the SNP-based phylogeny.Due to their close phylogenetic relationship, we included these four isolates among ST95 members in downstream analyses (Fig. S1; Table S2).
Layering on information about the disease status of the children from whom these E. coli isolates were obtained revealed no clear pattern between placement within the phylogeny and whether the isolate caused any specific disease.Isolates associated with a diagnosis of pyelonephritis (n = 18) or cystitis (n = 39) were distributed across the E. coli phylogeny (Fig. 1A).In addition, there was no significant overrepresentation of pyelo nephritis or cystitis cases within any of the four highly represented clades, though none of the ST131 isolates were associated with pyelonephritis, and only slightly over half (55.6%; 10 of 18) of the pyelonephritis-causing isolates belonged to ST95.

Association between carriage of the pap operon and pyelonephritis
Though the evolutionary relationship among isolates did not reveal any clear pattern to suggest an association with cystitis or pyelonephritis, the production of P-fimbriae encoded by the pap operon in E. coli has been associated with the development of pyelonephritis and renal scarring (11,13,14).To determine whether pap genes or any other virulence factors were significantly associated with pyelonephritis in children from this study, we first used ARIBA (23) together with the Escherichia coli virulence-associated gene database (EcVGDB) (9) to identify 639 distinct and functionally diverse virulenceassociated genes in one or more of the 57 UPEC isolates (Table S3).We then applied a mixed effects model (FaST-LMM) implemented in pyseer (24,25), which accounts for the population structure across our isolate set, to infer significant associations of virulence genes with UPEC from pyelonephritis cases (Table S4).Individually, no virulence genes were determined as significantly associated after accounting for multiple tests, including previously identified UPEC-associated virulence genes (6, 26) (Fig. S2).Because invasive UPEC were previously found to carry greater numbers of distinct virulence genes for certain functions, such as iron acquisition (9), we investigated whether virulence gene load was greater in pyelonephritis relative to cystitis-associated UPEC isolates across 14 classes of virulence factors (Fig. S3; Table S5).We found no significant differences in gene counts for any of the virulence classes between pyelonephritis and cystitis-related isolates.
While none of the pap genes were significantly associated with pyelonephritis individually, we found that the carriage of the full operon, consisting of 12 genes, was significantly associated with pyelonephritis (P-value = 0.0395).The full pap operon was found in 14 of 18 pyelonephritis-associated isolates (78%), while it was only found in 17 of 39 cystitis-associated isolates (44%) (Fig. 1A and B).It was carried by all ST69 isolates and nearly all isolates within the ST95 clade, in accordance with recent genomic analysis suggesting that the pap operon was obtained via a single pathogenicity island lateral transfer event in an ancestor of these clades (9).In contrast, most ST131 isolates lacked eight or nine of the twelve pap genes, including papG.The lack of papG was previously reported as a defining feature of ST131 (19); however, it was recently reported that papGII was associated with specific sublineages of invasive ST131 UPEC (9).Among the five ST131 UPEC in our study with diagnostic information for the UTI, all were related to incidents of cystitis.All but three of the isolates from children with pyelonephritis (15/18; 83%) featured the papGII allele, which has previously been strongly associated with pyelonephritis development (9,11) and shown to be important for binding to globoside glycosphingo lipids, abundant in kidney cells (11,15,27) (Fig. 1C).However, papGII was also found in 62% of the isolates from children with cystitis, resulting in our inability to identify a significant association between papGII and pyelonephritis development within this data set.This is in contrast to recent findings from a genome-wide association study (GWAS), which found papGII as the most significant virulence factor associated with invasive UPEC (9).To further investigate, we extracted information for papG allele carriage among 574 UPEC from cases of invasive or non-invasive UPEC in adults from the study by Biggel et al.Only six of the remaining 148 UPEC in their study were clearly designated as being isolated from children.Our investigation confirmed that papGII carriage is larger in non-invasive cystitis cases in children compared to non-invasive cases in adults (P-value = 2.599e-9; one-sided Fisher's exact test) (Fig. 1C).Of note, papGII presence was not always concordant with carriage of the full pap operon.

Extensive drug resistance within pediatric E. coli ST131
ST131 E. coli are notorious for their ability to resist multiple antimicrobials as well as for their rapid global spread (28).ST131 has further been subdivided into the major subclades A, B, and C (not to be confused with species-wide phylogroup designations of E. coli) (29).Subclade C in particular has been shown to exhibit intrinsic resistance to several clinically relevant drugs, including β-lactams and fluoroquinolones (18,28).
Of the total set of 48 E. coli isolates suspected of belonging to the ST131 lineage, 27 were confirmed by genomic typing (Table 1).To construct a more highly resolved phylogeny to examine the population structure of ST131 in children and its relation ship to antibiotic resistance, we constructed a maximum likelihood, SNP-based whole genome phylogeny of only ST131 isolates (Fig. 2).To define subclade membership, we also searched sequencing reads from each isolate for subclade-specific primer sequen ces (30).As expected, clade membership corresponded to deep divisions within the ST131-specific phylogeny (Fig. 2); 7 isolates (26%) were part of ST131 subclade A, 10 (37%) belonged to subclade B, and 10 (37%) belonged to subclade C. Despite some relatively close relationship, there was little evidence for clonal spread among these ST131 isolates; on average, isolates were separated from one another by 112 SNPs with nearest neighbors separated by an average of 51 SNPs.Unexpectedly, we observed fimH30, an allele of the fimbrial gene fimH often used as a proxy for ST131 isolates into subclade C (31,32), carried by three members of subclade A and four members of subclade B isolates (Fig. S4), suggesting that this typing allele lacks specificity and may be subject to horizontal exchange across subclade boundaries.We confirmed this unusual subclade distribution of fimH by further checking that ST131 subclade alleles for parC and gyrA (28) aligned with our phylogenomic model as well as confirming the absence of ambiguity in base calls within fimH30 (Fig. S4).
Understanding the extent of antibiotic resistance among ST131 UPEC in children could better inform which treatments are necessary and appropriate, in particular because early antibiotic exposure has been associated with potential long-term effects (33,34).To examine the relationship between phenotypic drug resistance among ST131 isolates and the underlying genetic features determining resistance, we first examined the susceptibility results for 23 drugs or combinations of agents (Fig. 2; Table S6).We focused on resistance to six antibiotics, which are commonly prescribed for UTIs at clinics, including two last-resort drugs, representative of four antibiotic classesfluoroquinolones, cephalosporins, carbapenems, and antifolates.Among the set of 27 ST131 UPEC, three isolates were identified as MDR, exhibiting resistance to at least one antibiotic from three of the four antibiotic classes.As expected, all three multidrugresistant isolates were within subclade C, showing resistance to fluoroquinolones, cephalosporins, and antifolates.Further, all subclade C isolates exhibited resistance to ciprofloxacin (n = 10), 80% of subclade C isolates demonstrated resistance to levofloxacin (n = 8), and 40% of subclade C isolates had resistance to ceftriaxone (n = 4).Resistance to trimethoprim-sulfamethoxazole was distributed across all three subclades, but also had the highest incidence within subclade C (n = 3 [43%] of subclade A; n = 4 [40%] of subclade B; n = 7 [70%] of subclade C).No ST131 isolates were resistant to carbapenems (meropenem and imipenem), which are last-resort antibiotics for the treatment of drugresistant infection.
For six of the most clinically relevant antibiotics or combinations, we compared the phenotypic results to the carriage of antimicrobial resistance (AMR) genes identified in our genomic sequencing of isolates in order to understand whether known genetic determinants explain the observed phenotypes (Fig. 2; Tables S7 and S8).For each of these drugs, we observed full concordance between genotype and phenotype.All 10 strains exhibiting resistance to a fluoroquinolone (levofloxacin or ciprofloxacin) carried the parC_1aAB and gyrA_1AB mutations in their quinolone resistance-determining regions (QRDRs), which have been shown previously to lead to fluoroquinolone resist ance and have been reported as a core feature in the emergence of ST131 subclade C (28).Of the four strains exhibiting resistance to ceftriaxone, an expanded-spectrum cephalosporin, all four carried the extended-spectrum β-lactamase gene bla CTX-M-15 .Of the 14 isolates that exhibited resistance to trimethoprim-sulfamethoxazole, all carried the sul1 gene encoding a dihydropteroate synthase, which confers sulfamethoxazole resistance, together with either dfrA12 or dfrA17, which encode dihydrofolate reductases previously shown to confer trimethoprim resistance (Table S8).

DISCUSSION
We found that UPEC from children with pyelonephritis were more likely to have the full suite of coding genes in the pap operon responsible for the production of P-fimbriae, which has long been associated with pyelonephritis in UPEC (9,11,14).Many studies have found that P-fimbriae contribute to UPEC virulence and the development of pyelonephritis and related diseases (14,(35)(36)(37).Here, we provide additional support for these viewpoints by using genomic analysis to show that the carriage of the full pap machinery is more prevalent among UPEC isolates collected from children who were diagnosed with pyelonephritis as compared to UPEC collected from cases of more acute cystitis.Additionally, while the rates of UPEC from pyelonephritis cases with the full suite of pap machinery (78%) were roughly equivalent to the rate of UPEC with P-fimbriae observed in a study by Kallenius et al. (91%) (14), we observed a substantially higher rate of UPEC from cystitis cases carrying the pap machinery (44%), when compared to the percentage of UPEC from such cases reported to feature P-fimbriae in the aforemen tioned study (19%).This discrepancy could potentially be explained by host factors influencing immunity to infection (38), differences in sampling and diagnosis between the two studies, or epigenetic phase variation in the regulation of the pap operon among the UPEC isolates (39)(40)(41), for which Kallenius et al. used agglutination assays that would only detect active P-fimbriae to assess their presence oin UPEC from children.Specifically, pyelonephritis and cystitis cannot be accurately differentiated based on the child's signs or symptoms or any laboratory tests; many children with cystitis have fever, abdominal/back pain, and/or an elevated inflammatory marker.As such, previous studies may have incorrectly classified children being studied.A strength of our study is that we used renal scans, the current gold standard for the diagnosis of pyelonephritis, to categorize the children in the study.
Investigations around P-fimbriae have also identified several classes of the gene coding for their tip adhesin, papG, which have been shown to differentially bind to specific glycolipids found on host cells (11,15).In particular, the papGII allele has been closely associated with pyelonephritis development and shown to bind to globoside molecules, which are abundantly found on mammalian kidney cells (11,15,27,42).papGII has also been shown to induce changes in host gene expression (43).In a recent GWAS, Biggel et al. associated genomic markers with invasive UPEC, mostly from adults, and once again highlighted papGII as a key factor associated with the diagnosis of more severe UTIs (9).However, we did not observe this same association in our data set; we found that a majority of the isolates in our study that carried the papG gene featuring the papGII allele were associated with cystitis.While these results suggest potential differences in papGII prevalence in UPEC from children compared to UPEC from adults, they could also be confounded by differences in sampling location and differences in how strains were associated with UTI severity between the studies.Our results may have also differed because we strictly compared UPEC associated with pyelonephritis with those associated with cystitis based on radiological diagnosis, unlike Biggel et al., who investigated differences between the broader categorizations of invasive and non-invasive UPEC.
Within the data set of E. coli gathered from the 57 children who were diagnosed with either pyelonephritis or cystitis as well as the set of 27 ST131 isolates, we also examined the population structure of UTI-causing E. coli.Although specific STs were highly represented, we found that the E. coli strains were diverse in children.The high count of SNPs separating isolates suggested that most UPEC isolates did not share recent ancestry and were thus not related to clonal outbreaks in hospitals but rather originated from diverse E. coli reservoirs outside the hospital, such as in food or the host's own gastrointestinal tract.This suggests ongoing selection for P-fimbriae in UPEC involved in pyelonephritis, as others have speculated (9,44,45).
In one of the only studies of pediatric ST131 UPEC isolates to date, we found similar patterns of resistance as observed in other studies and high concordance between microbiological resistance and known resistance determinants.We found that 56% of isolates were resistant to trimethoprim-sulfamethoxazole and 37% were resistant to ciprofloxacin, in accordance with previous reports of 30-60% of ST131 strains resistant to fluoroquinolones (46).As expected, all 10 ST131 subclade C isolates were resistant to ciprofloxacin, a hallmark of this epidemic subclade due to its QRDR mutations within gyrA and parC.Sequence-based profiling confirmed the presence of these resistance-contributing mutations in subclade C isolates, and we were similarly able to relate known resistance genotypes of ST131 isolates with susceptibility phenotyping for other antibiotics such as ceftriaxone and trimethoprim-sulfamethoxazole.Given that fluoroquinolones are rarely used in children, this suggests that ST131 subclade C isolates may be spreading in them through either household transmission or yet unidentified collateral selective pressure.
We found that fimH30, the fimH allele that is tightly associated with ST131 subclade C (31), was carried by three subclade A and four subclade B isolates, suggesting the effects of recombination, and possibly a biological advantage for the carriage of this allele outside of clade C. Recombination has previously been reported to affect fimH, which encodes a fimbrial adhesin (47), but is likely underappreciated due to the common usage of fimH for typing ST131 strains into clades (32) (Fig. S2).Importantly, we did not observe the clade A-associated fimH41 and the clade B-associated fimH22 alleles to occur outside of their associated clades.In alignment with recent experimental findings that fimH30 can enhance biofilm formation and host cell adhesion relative to other variants of the fimH gene (48), our observations that the allele has likely been laterally acquired in strains belonging to ST131 subclades A and B further build support that it offers a selective advantage for UPEC.
Limitations of our study included convenience sampling and a relatively small sample size.Strengths include the prospective design, detailed clinical and laboratory character ization of each patient, use of a dimercaptosuccinic acid (DMSA) renal scan to differentiate pyelonephritis versus cystitis, and a small timescale (2012-2016) in which samples were collected, limiting confounding effects from evolutionary trends.
In conclusion, using whole-genome sequencing, we provide further support for the association between the carriage of the full, 12-gene pap operon and pyelonephritis.In agreement with previous work, we observed that the papGII allele was found in most isolates resulting in pyelonephritis; however, in our study, papGII was also found in a majority of cystitis-causing isolates.papGII was also observed within multiple incom plete pap operons, highlighting the importance of considering context and full operon presence.Additionally, we observed a diverse representation of ST131 UPEC isolates associated with infections in children, for which antibiotic susceptibility phenotyping largely corresponded to expectations based on literature and genotypic markers.Interestingly, we observed evidence of inter-clade recombination of fimH30, thought to be a key marker of subclades within ST131, suggesting that care must be taken when using fimH30 to infer clade membership.We also found that UPEC in childhood UTIs exhibit a similar population structure as UPEC from adult infections (9) and that 37% of ST131 isolates from UPEC in children belonged to the highly drug-resistant subclade C.

Selection of samples for comparisons between strains isolated from pyeloneph ritis versus cystitis cases
Urine cultures were obtained from children presenting with UTI, who were enrolled in two previously described studies (49,50) conducted in the emergency room or outpatient clinical offices at five centers in the United States.The study was approved by the Institutional Review Boards at the participating centers.E. coli was identified from urine cultures at the clinical microbiology laboratories, and a single colony was stored in glycerol at −80°C and subsequently used for this study.All children participating in these studies presented with signs and symptoms of a UTI, had pyuria on urinalysis, and were eventually diagnosed with a UTI based on their urine culture results.Occurrence of fever during the UTI episode was recorded.In one of the studies (49), we offered a renal scan within 2 wk of the diagnosis of UTI to all children whose parents consented to this procedure.Of the 111 children with UTI in that study, 82 parents agreed to an early renal scan.Of these, 61 had isolates of E. coli that were stored.DNA from these 61 isolates, which represent a convenience sample from children with UTI, was sent to the Broad Institute for whole-genome sequencing (Table S1).

Selection of samples for analysis of population structure of ST131
We screened all 359 E. coli isolates in the aforementioned two studies for ST131 using a published multiplex PCR method, which collectively captured subclades A, B, and C within ST131 (51).Of the 359 E. coli isolates in both studies, 47 were suspected of belonging to ST131 as they screened positive for O-antigen types 16 or 25 by PCR, serotypes often associated with ST131 E. coli (19).One additional isolate that was not predicted to belong to ST131 based on serotype but exhibited resistance to multiple antimicrobials (gentamycin, ciprofloxacin, and trimethoprim-sulfamethoxazole), common for ST131, was also selected.DNA from these 48 isolates was sent to the Broad Institute for whole-genome sequencing (WGS; Table S1).Of note, six isolates were selected for both aims (i.e., they were suspected to belong to ST131 and also had an early renal scan to localize the site of UTI).Thus, a total of 103 isolates were sequenced between the two aims.For the 48 suspected ST131 isolates, we centrally determined the minimum inhibitory concentrations (MICs) of 23 agents by the broth microdilution method using commercially available dry plates (Sensititre GN4F, Thermo) (Table S6).Susceptibility was interpreted using the CLSI guidelines (M100-S28).

Extraction of genomic DNA and WGS
Genomic DNA was purified using DNeasy Blood and Tissue kits (Qiagen) and sent to the Broad Institute.Sequencing libraries were generated from 2 ng of input DNA using the Nextera XT DNA Library Preparation kit (Illumina) according to the manufacturer's recommended protocol.Libraries were sequenced on an Illumina HiSeq 2500 with 151 bp paired-end reads.

Quality control and processing of sequencing data
Sequencing data were processed using Picard tools (https://broadinstitute.github.io/picard/).All but one of the samples had sufficient reads; the remaining sample (P6076) was removed from downstream analysis.After demultiplexing, adapters were removed using TrimGalore with default "--nextera" settings (52), and quality filtering was performed using Trimmomatic with lenient settings (leading:3, trailing:3, sliding-window:4:15) (53).After adapter removal and quality filtering, the average chromosome-wide sequencing coverage ranged from 56X to 514X, with a median of 174X, based on alignment to the reference E. coli Eco889 genome (GCF_001663475.1).
To assess the presence of sequence contamination from other genera, sequencing data were then aligned to a Centrifuge (54) index of archaeal, bacterial, and viral sequences from RefSeq (built in April 2018).Two isolates that had >10% of their reads aligned to a genus other than Escherichia or Shigella were excluded from analysis.To assess the presence of contamination due to multiple strains from within the same genus, we constructed draft assemblies using SPAdes (v 3.12.0)(55) and examined the length of the total genome assembled.Four samples had genome lengths exceeding the size of a typical E. coli's genome (>6 Mbp), thus likely indicating additional contami nation.
All seven isolates excluded from analysis are noted in Table S1, along with their reasons for exclusion.Four of these excluded isolates were among those selected for analysis of pyelonephritis versus cystitis, and four were among those selected for analysis of ST131, including one excluded isolate that was shared between the two studies.This left 57 isolates remaining for the cystitis versus pyelonephritis analysis and 44 for the ST131 analysis.

Sequence typing
ARIBA (23) was used with default settings to determine STs for isolates using the pubMLST/EnteroBase MLST database (56).Isolates were further identified as being members of ST clades by examination of the phylogeny (Fig. S1).Phylogroups were assigned based on available mappings of STs to phylogroups (9) together with phylo genomic inference.Computational predictions of serotype were made using the EcOH database (57) with SRST2 (58).
ST131 isolates were further categorized into their respective subclades by searching for previously published, subclade-specific primer sets (30,59,60) for subclades in sample sequencing read sets.Of the 44 isolates with high-quality sequencing data originally predicted to represent ST131 based on O antigen or an MDR profile, 19 were not predicted to be ST131 from MLST analysis and excluded, leaving 25 isolates for downstream analyses of ST131.We additionally included two isolates originally selected for comparisons of cystitis versus pyelonephritis, which were identified by MLST analysis as ST131, for a total of 27 isolates for analysis of the ST131 clade.
Because fimH30, an allelic variant used as a marker of ST131 subclade C, was also observed outside of subclade C, we performed additional analysis to confirm this unexpected distribution, including (i) verifying that alleles of alternative marker genes used for subclade typing of ST131, such as gyrA and parC, were in agreement with subtype designations based on whole-genome phylogeny (gyrA-1A for subclade A, gyrA-1a for subclade B, gyrA-1AB for subclade C, parC-1b for subclade A, parC-1 for subclade B, and parC-1aAB for subclade C) and (ii) confirming that any base call ambiguity did not affect fimH30 by performing alignment of sequencing reads to a fimH reference and examining variants using Pilon (20).

Detection of AMR and virulence genes
ARIBA (23) was used with default settings to identify known AMR genes cataloged in the CARD database (downloaded August 2018) (61) and virulence factors from EcVGDB (9).Because EcVGDB does not exist as an integrated database for ARIBA, we manually created an ARIBA database, which resulted in 70 of 1,368 entries being removed due to either length or because the sequences did not resemble a coding gene as expected.ARIBA was run directly on processed, unassembled sequencing data.AMR and virulence genes were considered present in a sample if the local, targeted assembly of reads homologous to the gene produced by ARIBA had at least 90% identity and 80% coverage of the reference gene.

Allelic typing of papG
The EcVGDB database (9), which featured five distinct alleles of papG, was used with ARIBA to directly type papG alleles in our isolates as described earlier.Supplementary tables from Biggel et al., 2020 (9), which used and introduced the EcVGDB, were queried for papG carriage and alleles in UPEC from adults.

Phylogenetic analysis
Reads were mapped to the ST131 reference strain E. coli Eco889 (GCF_001663475.1)using bwa version 0.7.17 (62).This reference was selected because its genome is complete, it belongs to ST131, and it was isolated from a urine sample (63).Samtools and Picard were used to sort, index, and mark duplicates in the resulting alignment files (64).Pilon version 1.22 (20) was used to identify variants using default settings.Variant loci with a mean mapping quality <10 were excluded.ClonalFrameML was used to predict and remove recombined regions (20), resulting in a core alignment with 12,004 SNP sites.RAxML was used with the GTRCAT model to construct a maximum-likelihood phylogeny with 1000 bootstrap replicates (22).From this larger phylogeny, a subset was extracted using PareTree (65), corresponding to the 57 isolates from patients who underwent an early scan.
Applying this same protocol, we generated a separate maximum-likelihood phylogeny for the ST131 clade.Based on our large tree, we selected all strains that were part of the clade containing ST131, which included the 27 expected ST131 isolates plus an additional ST2279 isolate, which was nested among the ST131 isolates.The core alignment used for this phylogeny construction contained 2,790 SNP sites.The ST2279 strain was removed from downstream visualization.

Testing for genotypic association with pyelonephritis development
The carriage of specific virulence genes was tested for enrichment or depletion within the isolates collected from patients diagnosed with pyelonephritis versus cystitis using the pyseer package (24).The presence or absence of virulence genes was encoded as binary genotypic traits, and the diagnosis of pyelonephritis was treated as a binary phenotype.Association analysis was carried out using a mixed effects model (FaST-LMM) (25) with a kinship matrix constructed from the whole-genome phylogeny after recombination removal to account for the population structure of our sample set and to overcome the influence by lineage effects.Multiple hypothesis testing correction was performed using the count_patterns.pyscript provided within pyseer across all individual virulence genes.A separate, targeted test was performed for the association of the full pap operon with pyelonephritis development.
To assess differences in virulence gene load for different virulence classes between UPEC from pyelonephritis versus cystitis, we used a permutation-based test.Briefly, for each virulence class (Table S4), we performed 10,000 simulations, where designations for UPEC as cystitis or pyelonephritis associated were shuffled.The resulting median gene counts for pyelonephritis and cystitis isolates in simulations were compared to the actual observed values.If the difference for a simulation was greater than or equal to the actual observed difference, a counter was incremented.An empirical P-value was then calculated for the virulence class based on dividing the counter by the number of simulations, with a pseudocount of one added to the numerator and denominator to avoid assigning a P-value of zero.The false discovery rate across virulence classes was controlled using the Benjamini-Hochberg procedure.

FIG 1
FIG 1 The presence of the full pap operon associates with pyelonephritis versus cystitis.(A) Maximum-likelihood phylogeny of the 57 isolates from patients with early scan diagnosis (Materials and Methods).Nodes in the phylogeny are colored by the four common sequence types we observed in the sample set.Phylogenetic clades corresponding to the common phylogroups B2 and D are marked.The colored track displays whether isolates were gathered from children diagnosed with cystitis (pink) or pyelonephritis (dark red).(B) Heatmap showing the presence (gray) or absence (white) of 12 pap operon genes in the sequencing data of each isolate.(C) papG carriage and alleles are illustrated for the 57 isolates from our study alongside UPEC from adults, extracted from supplementary files from the (9) study.

FIG 2
FIG 2 Antibiotic resistance among ST131 isolates.(A) Maximum-likelihood phylogeny of 27 ST131 isolates.Asterisks (*) denote MDR isolates.(B) Colored tracks depict the presence of fluoroquinolone resistance-associated alleles for the core genes gyrA and parC, as well as the carriage of bla CTX-M-15 , sul, and dfrA.(C) Heatmap depicting the degree of resistance exhibited by each isolate to six clinically relevant antibiotics using the Sensititre Gram Negative GN4F assay.Lighter colored cells indicate that the isolate is more susceptible to a particular antibiotic at the lowest dosage, whereas a black cell indicates that the isolate exhibits high levels of resistance to even the largest dosage tested.Antibiotic abbreviations: CIP, ciprofloxacin; LVX, levofloxacin; CRO, ceftriaxone; MEM, meropenem; IPM, imipenem; SXT, trimethoprim-sulfamethoxazole.

TABLE 1
Demographic and clinical characteristics of included children a