Transcriptome architecture of the three main lineages of agrobacteria

ABSTRACT Agrobacteria are a diverse, polyphyletic group of prokaryotes with multipartite genomes capable of transferring DNA into the genomes of host plants, making them an essential tool in plant biotechnology. Despite their utility in plant transformation, genome-wide transcriptional regulation is not well understood across the three main lineages of agrobacteria. Transcription start sites (TSSs) are a necessary component of gene expression and regulation. In this study, we used differential RNA-seq and a TSS identification algorithm optimized on manually annotated TSS, then validated with existing TSS to identify thousands of TSS with nucleotide resolution for representatives of each lineage. We extend upon the 356 TSSs previously reported in Agrobacterium fabrum C58 by identifying 1,916 TSSs. In addition, we completed genomes and phenotyping of Rhizobium rhizogenes C16/80 and Allorhizobium vitis T60/94, identifying 2,650 and 2,432 TSSs, respectively. Parameter optimization was crucial for an accurate, high-resolution view of genome and transcriptional dynamics, highlighting the importance of algorithm optimization in genome-wide TSS identification and genomics at large. The optimized algorithm reduced the number of TSSs identified internal and antisense to the coding sequence on average by 90.5% and 91.9%, respectively. Comparison of TSS conservation between orthologs of the three lineages revealed differences in cell cycle regulation of ctrA as well as divergence of transcriptional regulation of chemotaxis-related genes when grown in conditions that simulate the plant environment. These results provide a framework to elucidate the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria. IMPORTANCE Transcription start sites (TSSs) are fundamental for understanding gene expression and regulation. Agrobacteria, a group of prokaryotes with the ability to transfer DNA into the genomes of host plants, are widely used in plant biotechnology. However, the genome-wide transcriptional regulation of agrobacteria is not well understood, especially in less-studied lineages. Differential RNA-seq and an optimized algorithm enabled identification of thousands of TSSs with nucleotide resolution for representatives of each lineage. The results of this study provide a framework for elucidating the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria. The optimized algorithm also highlights the importance of parameter optimization in genome-wide TSS identification and genomics at large.

start sites are a primary feature of 5′ UTR and an essential determinant of gene expres sion. For example, bacteria have been shown to use multiple TSSs depending on stress conditions to enable multiple layers of gene expression control (15). Although the biology of agrobacteria has been well studied in Sinorhizobium meliloti and a few reference strains in BV1, the promoter and 5′ UTR structure across their genomes have not been well explored. This understanding is especially limited in BV2 and BV3 where most genome sequences are incomplete. Comparing TSSs on the genomes of these strains will highlight conserved and divergent gene regulation features. Genome-wide TSS identification across the three main lineages will provide a comparative understand ing of transcriptional regulation and evolutionary differences in agrobacteria-plant interactions.
Differential RNA sequencing (dRNA-seq) has emerged as a high-throughput method for genome-wide TSS identification (16). This approach utilizes a terminator exonuclease (TEX) to enrich for 5′ triphosphorylated primary transcripts relative to 5′ monophosph ated processed transcripts. Comparative analysis between TEX-treated and -untreated samples enables accurate TSS identification based on next-generation sequencing technologies. This method has been used for genome-wide TSS identification in diverse bacterial species, including Helicobacter pylori (16), Escherichia coli (17), and Bacillus subtilis (18). In particular, dRNA-seq has been used to identify TSSs in plant symbionts such as Bacillus amyloliquefaciens (18) and Sinorhizobium meliloti (19). Biotechnologically relevant bacteria have also been analyzed by dRNA-seq, and the results may be exploited to optimize transcriptional regulation of metabolic pathways (20,21). For example, the distance of the TSSs relative to RNA polymerase binding sites or the sequence identity of the 5′ UTR has been used to modulate gene expression in synthetic circuits.
Several bioinformatics tools are available for TSS identification from dRNA-seq data (22)(23)(24), permitting fast and unbiased detection as well as rapid reanalysis after inclusion of new sequencing data. However, these algorithms still require supervision with respect to their sensitivity and specificity. Parameter optimization on manually annotated data is important to minimize false positives, yet this is often overlooked. Previous work provided a comprehensive approach for genome-wide TSS identification by perform ing manual curation of TSSs and comparing algorithm results to previous manually annotated TSSs across the genome of Helicobacter pylori under five different growth conditions (24). Previous work in Clostridium species (21) also incorporated optimiza tion by accounting for expression bias from highly expressed genes by incorporating RNA-seq from the same libraries in order to normalize their TSS data set. These results are a reminder that algorithms, especially in genomics, have limited accuracy without performing parameter optimization, then validation on separate subsets of manually annotated data can lead to an inaccurate representation of biological phenomena.
A systematic understanding of transcriptional regulation at a genome scale remains to be elucidated for agrobacteria. We identified genome-wide TSS with nucleotide resolution to understand genome and transcriptional dynamics in the three main lineages of agrobacteria. We found differences in transcriptional regulation of cell cycle and cell wall functions across lineages. We also identified divergent transcriptional regulation during induction conditions that might explain differences in pathologies during plant infection.

Characterization of strains representing BV2 and BV3
Complete, finished genome sequences facilitate accurate TSS identification. The type BV1 strain A. fabrum C58 was among the first bacteria ever sequenced and has been extensively studied (8); however, for both the BV2 and BV3 clades, there are no rep resentative strains that have been characterized as thoroughly as A. fabrum C58. To identify reference strains for the BV2 and BV3 clades, we characterized the growth in defined media of three isolates of both BV2 and BV3 clades that have previously been phylogenetically characterized (Fig. S1) (2). Based on our results, BV2 strain Rhizobium rhizogenes C16/80 and BV3 strain Allorhizobium vitis T60/94 grew well in rich media, mannitol glutamate yeast extract salts medium (MGYS), and minimal media, morpholino propanesulfonate (MOPS) based medium; however, neither grew well in Reasoner's 2A (R2A) or lysogeny broth (LB) media (Fig. S1). R. rhizogenes C16/80 has an RI type II plasmid and was isolated from Malus domestica, while A. vitis T60/94 has a Ti type IVa plasmid and was isolated from Vitis vinifera (Fig. 1). Both R. rhizogenes C16/80 and A. vitis T60/94 were tested for antibiotic susceptibility against carbenicillin, gentamicin, kanamycin, chloramphenicol, spectinomycin, tetracycline, hygromycin, and apramycin (Table S1). R. rhizogenes C16/80 is susceptible to all tested antibiotics except spectinomycin, whereas A. vitis T60/94 is resistant only to carbenicillin. Although both R. rhizogenes C16/80 and A. vitis T60/94 have previously been sequenced (2), both genome assemblies were drafts, so we used Oxford Nanopore technology for long-read sequencing to close their sequences ( Table 1). The genome of R. rhizogenes C16/80 is closed and consists of a chromosome, a chromid, two plasmids, and an Ri type II plasmid. The genome of A. vitis T60/94 consists of a chromosome, a chromid, two plasmids, and a Ti type IVa plasmid; one of the non-oncogenic plasmids failed to close and is represented by a large contig of 135.7 kb and several small contigs of less than 2 kb. Completion of both R. rhizogenes C16/80 and A. vitis T60/94 enables comprehensive genome-wide TSS identification of each multipartite genome.

dRNA-seq reveals the transcriptome architecture of agrobacteria
The study of agrobacteria has largely focused on their unique process of infection, which has led to a detailed understanding of their virulence plasmids. However, the impact of chromosomes and chromids on virulence remains less understood. We aimed to explore the broader transcriptome organization across replicons by identifying TSSs and corresponding promoter structure to inform how agrobacteria virulence has evolved across the three main lineages. Using dRNA-seq data, we identified a total of 6,998 TSSs across the three lineages. Consistent with other Alphaproteobacteria, we observed a conserved promoter structure. Furthermore, we found lineagespecific divergence in TSSs associated with cell cycle regulation, suggesting adaptations in cell cycle control.
Genome-scale TSS identification requires expression from operons across replicons (25). To this end, we isolated RNA from cultures grown in rich media (MGYS) sampled from both logarithmic and stationary phases, which have been shown across multiple bacteria to yield dramatically different transcriptomes (26). Additionally, we isolated RNA from each strain grown in minimal media supplemented with glucose or succinate as a sole carbon source. A minimal medium yields different gene expression patterns than that of bacteria grown in rich media (27). As glucose is metabolized through glycolysis and succinate is metabolized through the TCA cycle, we expected different transcriptional dynamics. Finally, as expression of virulence genes is extremely important to the study of agrobacteria, we isolated RNA from cultures that were resuspended in virulence-inducing buffer (10 mM MES, pH 5.5, 200 µM acetosyringone) for 2 hours (28).

Optimization of the TSS identification algorithm
TSSs are often classified as primary, secondary, antisense, internal, or orphan based on proximity to annotated coding sequence (CDS) (24) as illustrated in Fig. 2a. The primary TSS was defined based on proximity to dRNA-seq reads mapped within 300 bp upstream of the annotated start codon of the CDS. TSSs further upstream of the CDS were classified as secondary TSS. Secondary TSSs are hypothesized to regulate gene expression in specific conditions where polymerases bind to alternative sigma factors (21). Internal TSSs are located inside a CDS on the sense strand. Antisense TSSs are located internal or within 100 bp of a CDS in the antisense strand. A TSS not assigned to any of these categories (i.e., more than 300 bp upstream or more than 100 bp downstream of an annotated CDS) was categorized as an orphan TSS. A TSS can fall into more than one category depending on its location relative to surrounding CDS annotations.
Accurate TSS identification by computational methods requires parameter optimiza tion for improved sensitivity (29). TSS identification with default parameters has been shown to have lower sensitivity resulting in false positives (29). We used ANNOgesic (29) to select optimal parameters on a manually annotated subset of TSSs across each of the replicons of A. fabrum C58 with variable read depths. Parameter optimization reduced the number of detected TSSs relative to those predicted when using default parameters, especially for antisense and internal TSS. Examining how parameter optimization affected TSS identification across TSS classes in A. fabrum C58 revealed the greatest reduction of TSSs within the antisense (90.4% reduction), internal (91.3% reduction), secondary (87.2% reduction), and orphan (73% reduction) classes, while the primary TSS (45.5% reduction) class was less depleted (Fig. 2b).
We assessed the consistency of the TSS class distribution in A. fabrum C58 with that of R. rhizogenes C16/80 and A. vitis T60/94. In R. rhizogenes C16/80, optimization reduced internal TSSs by 91.5% and antisense TSSs by 92.9%. In A. vitis T60/94, optimization reduced internal TSSs by 89.6% and antisense TSSs by 91.6% (Fig. S3). Although some antisense and internal TSSs drive expression of truncated proteins or non-coding RNAs (ncRNAs), these transcripts have been observed by other genome-wide TSS studies and have been shown to mainly be the result of transcriptional noise, arising at spurious promoters throughout the genome (30). In comparison to previous genome-wide TSS identification in E. coli, which used this algorithm to identify 5,574 internal and 5,495 antisense TSSs (17), this brings into question the widespread misuse of computational methods resulting from inadequate parameter tuning. Using the default parameters, we identified 8,606 TSSs in A. fabrum C58. Parameter optimization reduced the total number of detected TSSs to 1,916 in A. fabrum C58. Comparing the overlap of identified TSS, 1,842 optimized TSS are also found in the default parameter data sets in A. fabrum C58 (Fig. 2c).

Validating the performance of the optimized TSS algorithm
Next, we compared the predicted TSSs to those previously reported. Previous efforts at identifying TSSs of sRNA in A. fabrum C58 also found 356 primary TSS corresponding to CDS across its replicons (31). We extended these findings by identifying 1,302 primary TSSs, 99 secondary TSSs, 192 internal TSSs, 262 antisense TSSs, and 61 orphan TSSs for A. fabrum C58 in the optimized data set (Fig. 2d). Of the 356 primary TSSs previously reported in A. fabrum C58, 227 (64.3%) were also found within our data set while also identified an additional 1,687 novel TSSs (Fig. 2e). There are 129 previously reported primary TSSs that we did not identify (Fig. 2e). A total of 82 of the previously reported TSSs correspond to genes that encode hypothetical proteins where 21 are incorrectly annotated within the CDS. For the remaining 47, we identified TSSs at a different position in 44 the corresponding gene. The three remaining genes in which we failed to identify a TSS are fliQ, fliR, and tnp. Both fliQ and fliR are predicted to be involved in motility. Motility has been reported to be important for attachment to plant hosts, and ΔflaABC mutants were non-motile and are attenuated for virulence (32). However, the role of fliQ and fliR has not been subject to further experimentation (33). The TSS identification algorithm may have not identified a TSS for the flagellar export as a result of overfitting of the optimized parameters or the gene may have low expression in the growth conditions we selected. The tnp is predicted to encode a transposase that may have been lost or mobilized and is located within a portion of the pAt plasmid that can be deleted in A. fabrum C58 (34). This deletion has been shown to increase expression of virulence genes in A. fabrum C58 while also reducing the burden of the plasmid on the host cell (34).  3a). Many of these primary TSS are only expressed in one condition, which might be an indication of condition-dependent transcription (Fig. 3b). Next, we sought to understand positional contributions to gene expression across replicons (Fig. 3c). In addition to the TSS position on replicons, we examined the distribution of condition-dependent TSSs across replicons (Fig. 3d). In A. fabrum C58, there were 425 (30.3%), 549 (29.9%) in R. rhizogenes C16/80, and 397 (23.7%) in A. vitis T60/94 primary and secondary TSSs expressed exclusively in one condition (Fig. S6). In addition to gene-level TSS identification, our study provides comprehensive coverage of TSSs at the operon level throughout the entire genome. We used an operon prediction pipeline (35) Fig. S5. A comprehensive summary of TSS counts by bacterial species, replicon, and TSS class in the data sets using default and optimized parameters can be found in Table 2.

Transcriptional regulation of cell cycle control across agrobacteria
Alternative regulation programs in specific conditions may be driven by transcription initiation using alternative sigma factors at secondary TSSs. We determined the number of primary and secondary TSSs corresponding to individual genes (Fig. 4a). Most genes have only a primary TSS, although there were instances where one gene had several corresponding secondary TSS. In A. fabrum C58, we identified 99 genes with a primary TSS and at least one secondary TSS, and 112 genes for both R. rhizogenes C16/80 and A. vitis T60/94. In A. fabrum C58, R. rhizogenes C16/80, and A. vitis T60/94, there was a maximum of four TSSs (one primary and three secondary TSSs) for a single gene (Fig. 4a). Across the three lineages, there were a total of 305 genes that were found to have a primary TSS and at least one secondary TSS (Fig. 4a). The identification of primary TSS enables the analysis of the 5' untranslated region and promoter structure for each lineage. A total of 4,590 primary TSSs were identified, 92.3% were purines (2,294 A and 1,943 G), and 7.7% were pyrimidines (205 T and 148 C). UTR length appears to be consistent across biovars with a median length of 61 nucleotides (Fig. 4b).
Comparing promoter conservation across species can highlight different or similar transcriptional features and regulation. We compared promoter sequences in the three lineages by extracting the 50 nucleotides upstream from the primary TSS. Sequence logos for promoters of each strain show enrichment of conserved −10 and −35 motifs upstream of detected TSSs, consistent with regions being bona fide promoters (Fig. 4c). The promoter structures we identified are consistent with those predicted for Alphapro teobacteria representatives as well as the identified promoter of repABC2 (36,37). An ortholog clustering analysis recovered a total of 2,362 gene orthologs across lineages with at least one primary TSS and 300 orthologs with primary TSSs in all three agrobacte rial genomes (Fig. 4d). Functional enrichment analysis for orthologs with a primary TSS identified in all three lineages shows that promoters driving expression of genes involved in energy production and conversion, post-translational modification, and cell wall biogenesis are highly conserved (Fig. S7). For example, the promoters of cold shock, phosphate starvation, and ribosomal orthologs had high pairwise similarity. We would expect these promoters to be conserved since they are under high selective pressure (38). Orthologous clustering of genes with a primary and at least one secondary TSS expressed exclusively in one condition showed that this subset of TSS is only conserved in the virulence-inducing conditions where 27 orthologs share conditionspecific TSSs in at least two species (Fig. S8). Of these 27 genes, we found only one, orthologs of atu2173 -a methyl-accepting chemotaxis protein-that had conditionspecific primary and secondary TSSs conserved across all lineages. Chemotaxis toward plant signals is the first step in the attachment of agrobacteria to plant surfaces, which might explain the presence of this TSS in conditions that simulate the plant environments. Noticeably, of the 17 inductionspecific primary and secondary TSSs shared only between R. rhizogenes  without (red) an identified primary TSS. There is incomplete coverage of the pAt plasmid due to a common deletion.
Research Article mSystems annotated as altronate dehydrogenases (orthologs of atu2822 and atu3817) and an ABC transporter (ortholog of atu4661) whose homolog in S. meliloti was shown to be involved in raffinose transport (40). This suggests that specific sugar metabolism may also be uniquely turned on in R. rhizogenes C16/80 under traditional virulence inducing condi tions. Of the 39 inductionspecific primary and secondary TSSs identified within A. vitis T60/94, seven are upstream of genes predicted to encode components of the flagellum, which may explain some of the unique motility (41) and systemic pathology phenotypes (5) observed in A. vitis isolates. In addition to flagellar genes, A. vitis T60/94 has three unique primary and secondary TSSs upstream of predicted non-heme chloroperoxidases (orthologs of atu3493, atu4778, and atu5389) all located on its second plasmid. As homologs of these genes are also found on both the pSymA and pSymB plasmids of S. meliloti (42), this may reflect a conserved defense against halogenated compounds among some Rhizobium. Ortholog clustering analysis revealed 27 genes in which at least one of its members has one or more secondary TSSs. Of these 27 genes, 10 are predicted to function in the cell cycle, including major cell cycle regulators ctrA, cpdR, mraZ, and rcdA, and modifying the cell wall. Previous work in Caulobacter cresentus has shown that important cell cycle regulators often had two independently regulated promoters to ensure robust control of cell division (43). All members of three ortholog clusters (represented by atu1164, atu1363, and atu3742 of A. fabrum C58) have multiple TSSs (Fig. 5a). Two of these genes have known essential cellular functions with atu1363 encoding proteolytic complex member clpAS1 (44) and atu3742 encoding rcdA which controls the cell cycle regulator ctrA (45). The remaining gene, atu1164, is predicted to encode an L,D-transpeptidase shown in A. fabrum C58 to localize heterogeneously along the cell membrane and play a role in polar growth strategies (46). To evaluate whether these genes could be regulated via different regulatory mecha nisms across the three lineages, we identified sequence motifs for SciP and CtrA binding sites from C. cresentus (43). We mapped these binding sites onto the 10 genes predicted to function in the cell cycle and in modifying the cell wall. We identified possible SciP binding sites upstream of orthologs for atu1164 and possible CtrA binding sites upstream of atu2434 and atu3742 (Fig. 5b). We also propagated the TSS distributions and CtrA binding site predictions to orthologs previously reported to be regulated by CtrA in A. fabrum C58 (47,48) (Fig. S9). Our findings suggest that these binding sites are further upstream relative to their location in C. cresentus. However, this is consistent with previous studies in S. meliloti, which is more closely related to A. fabrum C58, R. rhizogenes C16/80, and A. vitis T60/94 than C. cresentus (49,50). The distance of primary and secondary TSSs relative to CDS startis plotted for the remaining seven genes with cell cycle and cell wall modifying predicted functions in Fig. S10.

DISCUSSION
In this study, we identified thousands of TSSs with nucleotide resolution to significantly expand the characterization of promoters and genome-wide transcriptional regulation across the three main lineages of agrobacteria. We extended upon the 356 TSSs associ ated with CDS previously reported by Wilms (24). These studies used manual curation or validation to optimize the accuracy of their TSS identification. In studies with less optimization or validation, the number of reported TSSs is significantly higher. For example, the 17,001 TSSs (2,541 TSS/Mbp) identified in S. meliloti 1021 (19) and 14,868 TSSs (3,204 TSS/Mbp) identified in E. coli MG1655 (17) are consistent with the TSS counts in the agrobacterial strains using default parameters. This likely indicates the use of TSS identification methods that are underfitting, thereby resulting in false positives. These results are a reminder that algorithms, especially as they apply toward high-throughput analysis of biological data sets, have limited accuracy without parameter optimization on manually annotated data. This lack of parameter optimization and stringent reproducibil ity standards in genomics can lead to inaccurate representations and interpretations in understanding transcriptional regulation as well as biological phenomena as a whole.
Full coverage of TSSs for all genes in these organisms will require a better understand ing of genome-wide operon induction conditions, especially on plasmids. Genome-wide TSS identification enabled analysis of promoter structure and 5′ UTR features that are consistent with other Alphaproteobacteria representatives. Our TSS identification can also be used toward promoter engineering for synthetic circuits across lineages. These data can contribute to improved identification of transcription factor binding sites and understanding of 5′ UTR contributions to conditional gene expression and regulation in agrobacteria.
In the plant environment, agrobacteria perceive host signals by driving complex transcriptional regulation programs. Some of these natural circuits have been engi neered for plant biotechnology purposes, and genome-wide TSS identification of agrobacteria is a crucial step toward mapping global transcriptional changes that occur when the circuits are activated during host infection. In this study, we identify conserved regulation of housekeeper, cell cycle, and cell wall functions. We specifically focus on CtrA-and SciP-mediated transcriptional regulation of agrobacteria promoters and predict their binding sites relative to TSSs in agrobacteria. We identified multiple promoters in A. fabrum C58, R. rhizogenes C16/80, and A. vitis T60/94 upstream of putative cell cycle-regulated genes. These findings are consistent with the differential transcriptional regulation programs found in C. cresentus (43) and S. meliloti (49,50). The bacterial cell cycle has largely been studied in Alphaproteobacteria, especially Caulo bacter species, and the regulation of cell cycle in the close relative of agrobacteria, S. meliloti, shows adaptive contributions to host-associated lifestyle (51). Differences in cell cycle regulation of the three main lineages of agrobacteria may contribute to their unique pathologies. Future studies should focus on mapping the complete regulatory networks of cell cycle control by performing binding site analysis based on our TSS data set in these organisms to better understand the evolution of cell cycle control.
Understanding agrobacteria in the context of plant infection can be used toward improving plant biotechnology efforts (52). Our TSS data set enables identification of conserved and divergent transcriptional regulation during virulence-inducing condi tions. Expression of motility genes in these conditions suggests that chemotaxis appears to be a hallmark of infection by R. rhizogenes C16/80 and A. vitis T60/94. Motility genes expressed exclusively in A. vitis T60/94 during virulence induction conditions is consistent with previous studies (5,41) and highlights the importance of chemotaxis during plant infection by A. vitis T60/94. Expression of putative sugar transporters in R. rhizogenes C16/80 suggests that specific sugar metabolism may also be uniquely turned on during plant infection. We identified numerous LysR-type transcriptional regulators expressed exclusively in R. rhizogenes C16/80 that might be important for sensing plant metabolites. Future studies should consider making strains with deletions as well as constitutively expressed LysR-type transcriptional-regulated genes to further understand the contribution of metabolite sensing in disease severity.
In addition to biotechnology applications, the genetic contributions to host range and the unique pathologies across lineages of agrobacteria remain to be elucida ted. Understanding the genome-wide transcriptional regulation can facilitate a more complete representation of the evolution of natural genetic circuits. Our completed genomes and optimized TSS provide a step toward completely mapping the transcrip tional regulation architecture of agrobacteria.

Growth curve measurements
Growth kinetics of agrobacterial strains were carried out as previously described with slight modification (54). Briefly, optical density at 600 nm (OD600) was monitored for 60 hours in a BioTek Synergy 4 plate reader (Agilent Technologies, USA) at 28°C with fast continuous shaking. Cultures were grown in 200 µL volume in 96-well plates for 60 hours in either MGYS, MOPS minimal media with 0.1% glucose as a carbon source, MOPS minimal media with 0.1% succinate as a carbon source, R2A, or LB media (Fig. S1).

Antibiotic susceptibility testing
Agrobacterial strains were tested for antibiotic susceptibility by performing 11 two fold dilutions of each antibiotic (1,000 ng/mL carbenicillin, 200 ng/mL gentamicin, 1,000 ng/mL kanamycin, 500 ng/mL chloramphenicol, 1,000 ng/mL spectinomycin, 50 ng/mL tetracycline, 1,000 ng/mL hygromycin, and 1,000 ng/mL apramycin) in MGYS across 11 columns of a 96-well plate. The 12th column contained MGYS with no antibiotics. Bacteria were grown in MGYS at 28°C to exponential phase then diluted 50-fold into wells of the 96-well plate. The plate was incubated at 28°C, and OD600 measurements were collected in a BioTek Synergy 4 plate reader (Agilent Technologies, USA) with fast continuous shaking for 48 hours. Antibiotic susceptibility was determined by normalizing the OD600 with a blank MGYS well, then selecting the lowest antibiotic concentration at which the normalized OD600 fell below 50% of the OD600 strain grown in MGYS without antibiotics.

dRNA-seq library preparation
To map TSSs to genome sequences, we developed a custom dRNA-seq protocol based on previous methods that use conditional treatment with TEX (20). Agrobacterial strains were struck out on MGYS agar plates and grown overnight at 28°C. Three biological replicates were selected from each plate and inoculated individually into separate tubes with 5 mL of MGYS then grown overnight at 28°C. The following morning, each culture was spun down and washed twice in PBS buffer then inoculated at a 50-fold dilution into MGYS, MOPS + glucose, and MOPS + succinate, respectively. After 18 hours, OD600 measurements were taken every hour. The final OD600 for each biological replicate is shown in Fig. S2. Cell pellets were spun down and frozen at 80°C. Nucleic acid isola tion was carried out using bead beating and Qiagen RNeasy Kits (Qiagen, Germany), followed by DNA depletion using the DNA-free Kit (Invitrogen, USA

TSS data analysis
Illumina adapter sequences were removed from sequencing reads using trimmomatic (60) then filtered based on the presence of the 5' adapter using seqtk (61) to an average of 15,083,319 reads per sample after adapter trimming. Segemehl (62) was used to align filtered reads to their corresponding reference genome. We observed high alignment between dRNA-seq reads and the reference genomes with an average of 97.44% overall alignment rate per library. A custom Perl script was used to convert alignment files into WIG files that are required by the TSS identification algorithm in the ANNOgesic package (29). TSSs from the linear chromosome of A. fabrum C58 for parameter optimization were manually curated as previously described (16). The ANNOgesic docker image was used to create project directories and perform TSS optimization and prediction.

Ortholog clustering
We performed ortholog clustering of protein sequences across the three BV genomes using PIRATE v.1.0.4 (63). We selected a 50% amino acid sequence similarity threshold to compare the conservation and divergence of orthologs as previously described (21). TSS data sets from ANNOgesic and ortholog clustering from PIRATE were analyzed in Python v.3.10. All code is publicly available on GitHub (https://github.com/shih-lab/agrobacte ria_TSS).