Identification and characterization of putative Aeromonas spp. T3SS effectors

The genetic determinants of bacterial pathogenicity are highly variable between species and strains. However, a factor that is commonly associated with virulent Gram-negative bacteria, including many Aeromonas spp., is the type 3 secretion system (T3SS), which is used to inject effector proteins into target eukaryotic cells. In this study, we developed a bioinformatics pipeline to identify T3SS effector proteins, applied this approach to the genomes of 105 Aeromonas strains isolated from environmental, mutualistic, or pathogenic contexts and evaluated the cytotoxicity of the identified effectors through their heterologous expression in yeast. The developed pipeline uses a two-step approach, where candidate Aeromonas gene families are initially selected using Hidden Markov Model (HMM) profile searches against the Virulence Factors DataBase (VFDB), followed by strict comparisons against positive and negative control datasets, greatly reducing the number of false positives. This approach identified 21 Aeromonas T3SS likely effector families, of which 8 represent known or characterized effectors, while the remaining 13 have not previously been described in Aeromonas. We experimentally validated our in silico findings by assessing the cytotoxicity of representative effectors in Saccharomyces cerevisiae BY4741, with 15 out of 21 assayed proteins eliciting a cytotoxic effect in yeast. The results of this study demonstrate the utility of our approach, combining a novel in silico search method with in vivo experimental validation, and will be useful in future research aimed at identifying and authenticating bacterial effector proteins from other genera.

119 family combinations were performed, and the significance of each correlation was assessed 120 through Mantel tests and p-values corrected for multiple testing using Benjamini-121 Hochberg's False Discovery Rate (FDR) correction. The correlation-weighted network ( and 122 ) was submitted to the Markov Clustering Algorithm (MCL) with 5.5 inflation parameter 123 [51,52]. The unpartitioned concatenation of the 1,678 gene families present in the largest 124 cluster was submitted to RAxML [53] for phylogenetic reconstruction using the 125 GTR+GAMMA substitution model, and support was assessed using the aLRT SH-like 126 method [54].

127
Gene phylogenies and reconciliations 128 Nucleotide sequences from all putative effector gene families were aligned using MAFFT, 129 and phylogenetic trees were generated using RAxML (GTR+GAMMA+I model). The 130 obtained trees were refined using the TreeFixDTL tool [55] and reconciled to the species 131 tree using RangerDTL [56]. S2 Fig shows cladograms of putative effector phylogenies after 132 applying the TreeFixDTL tool and rooting the gene trees through RangerDTL.

133
Addition of non-Aeromonas representatives 134 Amino acid sequences from the putative T3SS effectors were used to query against 135 GenBank to identify non-Aeromonas homologs. The top non-Aeromonas hit was 136 downloaded if the aligned region spanned through at least 60% of the query sequence and 137 had an identity of at least 30%. Once a non-Aeromonas hit fulfilling the requirements was 138 identified, it was downloaded together with all other non-Aeromonas hits with bitscores of 139 at least 85% of the top hit.

140
Strains and growth conditions 141 The bacterial and yeast strains and plasmids generated in this study are listed in S1 142 Table. The E. coli strain DH5α λ-pir was used to clone the plasmids and was cultured in LB 143 broth or on agar solidified plates containing 100 µg/ml ampicillin as needed. The yeast 144 strain S. cerevisiae BY4741 was cultured using yeast extract-peptone-dextrose (YPD) 145 medium for non-selective growth [57], and selective growth was performed using synthetic 146 defined medium lacking histidine (SDM-His) and containing either 2% glucose (SDM-His-147 Glu), or 2% galactose and 1% raffinose (SDM-His-Gal). Some strains were additionally 148 evaluated on SDM-His-Gal medium supplemented with 7 mM caffeine or 0.5 M NaCl.

149
Strain and plasmid construction 150 All primers used in the present study are listed in Table S4. Putative T3SS effector genes 151 were PCR amplified from Aeromonas genomic DNA using Phusion DNA polymerase (New 152 England Biolabs; NEB) and appropriate primers. Primers were tailed to allow subsequent 153 Gibson Assembly (NEB) cloning of amplicons at the EcoRI and XhoI sites of the shuttle 154 vector pGREG533 (Euroscarf), a plasmid which allows for galactose inducible expression of 155 cloned genes from the GAL1 promoter. Amplicons were purified using a Wizard SV Gel and 156 PCR Clean up System kit (Promega) and EcoRI/XhoI digested pGREG533 was gel purified 199 conjugated goat anti-mouse IgG H&L HRP secondary antibody (ab49969 and ab205719, 200 respectively; Abcam). The blots were developed using an ECL Plus Western Blotting 201 Substrate kit (Pierce) and imaged using a FluorChem HD2 (ProteinSimple).

202
203 Results and discussion 204 Reference phylogeny 205 A phylogenomic study of 56 high-quality genomes from Aeromonas strains was 206 published together with an MLSA tree of 16 housekeeping genes from the same genomes 207 [40]. This study identified incongruities between gene, MLSA, and core genome trees [58], 208 which is expected since different sets of genes were used to construct each tree. In our 209 current study, we expanded on this set with an additional 49 Aeromonas genomes. To 210 produce a reference phylogeny that reflects a significant share of Aeromonas genes and 211 minimize conflicting evolutionary signals among genes, we constructed a phylogenomic 212 tree using 1,678 genes with compatible evolutionary histories. Building phylogenies from 213 concatenated genes is a widely used approach to reconstruct the representative history of a 214 genome. However, the concatenation of genes with different trajectories can lead to 215 unresolved trees and/or a phylogeny that represents neither the history of the organism 216 nor that of the individual genes used in the concatenation [59-62]. One approach used to 217 obtain better phylogenies is to group gene families with compatible evolutionary histories 218 using tree distance metrics [63,64]. Therefore, to assess the similarity between the 219 evolutionary histories of different genes we performed pairwise Pearson's correlation tests 220 between Maximum Likelihood distance matrices of all gene families present in at least 10 221 genomes. It has been discussed that metrics accounting for both topology and branch 222 length do not perform significantly better than metrics accounting only for the latter [64]. 223 For comparison, we also measured the distances between gene families using the Fitch-224 Margoliash criterion [65], i.e., in calculating the sum of squares of the differences in 225 distance between two matrices, the square of each difference was multiplied by the inverse 226 of the distance, thereby increasing emphasis on the difference in distance between more 227 similar sequences. The two approaches used to compare distance matrices showed a strong 228 correlation (). Significant correlation coefficients, , were submitted to a Markov Clustering 229 (MCL) process [52], and a weighted network of connected gene families were constructed 230 as discussed by Dongen and Abreu-Goodger (2012). From the 168 gene family clusters 231 obtained through MCL clustering, we considered the largest cluster, which contains 1,678 232 gene families, to be the primary phylogenetic signal among Aeromonas strains. This gene 233 family cluster comprises more than twice the number of gene families present in the 234 second largest cluster, which contains 738 gene families. By using the largest gene cluster, 235 containing gene families with different functions and genomic locations but with 236 compatible evolutionary histories, we ensured that our phylogeny is representative of the 237 evolutionary history of the genomes.

238
On average, each Aeromonas genome has homologs from 1,637 out of the 1,678 families 239 used in the reference phylogeny (), representing 39% of the average number of coding 240 sequences in the Aeromonas species assayed. The resulting phylogenomic tree (Fig 1)  In silico identification of putative T3SS effectors 255 For the identification of putative effectors, initially all coding sequences from sampled 256 genomes were compared to each other to cluster homolog genes. Hidden Markov Model 257 (HMM) profiles generated from each of the 25,518 identified Aeromonas spp. homolog gene 258 families were queried against T3SS sequences from the VFDB, which yielded 633 positive 259 hits. Although some of these positive hits were clearly not T3SS related they displayed 260 alignments with e-values as low as due to a common origin of other systems shared with 261 T3SS components. Protein sequences from Vibrio fisherii ES114 [46] and Escherichia coli 262 K12 [48] were used as negative controls for bona fide T3SS sequences, as both species are 263 known to not have a T3SS. The combination of BLAST searches of sequences from the 633 264 initial candidates against both VFDB and the negative control genomes identified 127 gene 265 families significantly more similar to T3SS sequences. Through subsequent manual 266 curation, we assessed the domains present in each putative family and their genomic 267 contexts (e.g., whether they are adjacent to chaperones) and classified 21 gene families as 268 encoding likely effectors (Table 1). . We also 283 identified aopN, but did not consider it further in our study as it was shown in Bordetella 284 bronchiseptica to have a dual role in controlling the secretion of translocator proteins and 285 suppressing host immunity but was not cytotoxic [73].

286
The remaining 13 likely Aeromonas T3SS effectors had not been studied in any detail 287 and were designated as putative T3SS effectors, which included pteA, pteB, pteC, pteD, 288 pteD.1, pteE, pteF, pteG, pteH, pteI, pteJ, pteK, and pteL. pteD.1 was combined with the pteD 289 family despite being originally classified into distinct homolog groups since they share 290 significant sequence similarity, as assessed through PRSS [74] using the PAM250 scoring 291 matrix. The comparison between PteD.1 from A. jandaei Ho603 and PteD from A. sp. MDS8 292 sequences resulted in a Z-score of 65.7. Besides significant sequence similarity, homologs 293 from both pteD and pteD.1 are likely related to the same chaperone family, whose 294 members were automatically grouped within a single homolog protein cluster. The thirteen 295 newly described putative T3SS effectors and the 8 previously described effectors were 296 submitted to InterProScan [75] for domain identification (Table 1), and were subsequently 297 assessed for cytotoxic effects through expression in S. cerevisiae strain BY4741.

298
Screening of putative T3SS effector proteins in yeast 299 To assess the cytotoxicity of putative T3SS effectors identified in Aeromonas spp. 300 genomes, we expressed representative proteins from each putative group in the yeast 301 strain S. cerevisiae BY4741 and monitored for growth inhibition (Table 1). Although eight 302 of the identified Aeromonas effector families have previously been identified or studied, to 303 the best of our knowledge, none have been assessed for causing cytotoxicity or growth 304 inhibition in yeast. Expression of bacterial toxins in yeast is a common means of assessing 305 the deleterious impact of these proteins on eukaryotic host cell processes [76][77][78]. The 306 serial dilutions allow one to assess better how severe the cytotoxicity of an effector is. 307 While all of the strains grew on medium containing glucose (Fig 2), those expressing pteA 308 and pteF were somewhat inhibited for growth under non-inducing conditions, yielding 309 colonies with reduced size relative to cells carrying the pGREG533 plasmid alone. 310 Presumably, these effectors are so cytotoxic that even uninduced, basal protein expression 311 is sufficient to inhibit growth. Interestingly, yeast cells transformed with pteF yielded 312 colonies with different sizes, possibly due to the acquisition of suppressor mutations in the 313 larger colonies.

329
If the cellular process targeted by a bacterial effector does not typically limit yeast 330 growth, the presence of stressors (e.g., elevated salt or caffeine) can promote the inhibition 331 phenotype to be observed [79]. The addition of NaCl to SDM-His-Gal medium inhibited the 332 growth of strains expressing aopP and pteJ (slight growth on the 10 0 dilution) compared to 333 that observed on unsupplemented SDM-His-Gal medium (growth on the 10 4 dilution) (Fig  334 3). On SDM-His-Gal plates containing caffeine, no further reduction in growth was observed 335 for the aopP-expressing strain, whereas that of pteJ-expressing strain was observed on the 336 10 0 and 10 -1 dilutions (Fig 3). In addition, the strains expressing pteH and pteL produced 337 small colonies at the 10 -3 and 10 -2 dilutions, respectively, when grown on SDM-His-Gal 338 medium containing NaCl compared to that observed on SDM-His-Gal medium alone 339 (growth on the 10 -3 and 10 -2 dilutions, respectively) (Fig 3). On plates containing caffeine, 340 the strains expressing pteH and pteL grew on the 10 -3 and 10 -1 dilutions, respectively (Fig  341 3). For the constructs that did not produce a growth inhibition phenotype (aopH, aopO, 342 pteD, pteD.1, pteE, and pteK), we assessed whether bacterial proteins of the expected fusion 343 protein size were expressed by western blot and all of proteins were expressed at the 344 predicted size except PteD for which no product was detected (S3 Fig). 345 Fig 3. Yeast growth inhibition assay under stress conditions. Strains carrying putative 346 T3SS effectors cloned into pGREG533 were grown overnight in SDM-His-Glu, washed and 347 10-fold serially diluted. Aliquots from each dilution (10 µl) were spotted onto SDM-His-Glu 348 or on SDM-His-Gal plates containing either 0.5 M NaCl or 7 mM caffeine. The strain 349 containing pGREG533 was used as a negative control and showed no growth inhibition. The 350 plates were incubated at 30°C for 2-3 days.

351
Both AopH and PteK have similar domain structures (Table 1), and neither protein 352 elicited a growth phenotype when expressed in yeast. The Yersinia homologs, YopH and 353 YpkA, of these two Aeromonas proteins have been previously assayed in yeast for 354 cytotoxicity [80]. Although YopH similarly did not cause a growth inhibition phenotype, 355 YpkA strongly inhibit yeast growth. Several explanations could account for the lack of 356 observed phenotypes in strains expressing AopH, AopO, PteD, PteD.1, PteE, and PteK, 357 including: a lack of activity due to the adjoined 7-HA tag; the lack of a target protein for the 358 effector in the yeast strain assayed; the effector does not produce a cytotoxic effect but 359 interacts with host cells in another manner; the protein was not expressed at a high enough 360 level; or the putative effector may not be a bacterial toxin.

361
These assays allowed the putative T3SS effectors identified in our bioinformatics 362 analysis to be rapidly evaluated. Our findings showed that 15 out of the 21 tested proteins 363 inhibited growth of S. cerevisiae BY4741, and that addition of NaCl, and to a lesser extent 364 caffeine, to SDM-His-Gal plates increased the sensitivity of yeast cells to four of these likely 365 effectors. Future assessments of the effectors with no observed phenotypes should focus on 366 generating C-terminally tagged proteins, since the N-terminal tags generated in the current 367 study could be the cause of aberrant activity or localization that may potentially mask a 368 growth or cytotoxicity phenotype. Additionally, alternate yeast strains could be tested.

370
The scattered occurrence of the 21 predicted effectors described in this study 371 throughout the Aeromonas phylogenomic tree is evidence of the impact of HGT during their 372 evolution. Since only 15 out of the 21 putative effectors are present in at least four taxa, our 373 HGT inferences using phylogenetic reconciliation are restricted to aexT, aexU, aopX, aopO, 374 aopP, ati2, aopH, aopS, pteA, pteB, pteC, pteD, pteE, pteG, and pteK. In effectors present in 375 less than four genomes, we only considered their presence/absence to infer gene transfer 376 and loss events. The presence of a putative effector among Aeromonads has an origin in 377 two possible scenarios: [1] vertical inheritance, or [2] HGT from a non-Aeromonas lineage. 378 In a reconciliation scenario where no HGT is allowed among Aeromonads, 1,177 gene loss 379 events and 105 gene duplication events are required to reconcile the histories of 15 380 putative effectors across Aeromonas phylogeny. In the reconciliation of the same 15 381 putative effectors allowing HGT and using default reconciliation penalties (loss = 1, 382 duplication = 2, and transfer = 3), only 85 gene losses and 115 HGTs are required. Of the 383 115 predicted HGT events, 44 took place between terminal nodes of the tree. There is a 384 significantly larger number of inferred HGTs between genomes from distinct species (30) 385 than between members of the same species (14). Such small number of inferred within-386 species transfers may be due to a lack of resolution in the gene trees leading to inferred 387 events with low confidence values. If a strong phylogenetic signal is absent from 388 bipartitions present in the putative effector tree, our reconciliation approach assumes it to 389 be equivalent to the genome phylogeny, since the gene tree bipartition does not strongly 390 support the incongruence. Reconciliations also revealed a large variance of HGT events 391 inferred among T3SS effector gene families, ranging from 0 to 32 horizontal transfers 392 within Aeromonas spp.

393
In order to evaluate the gene exchange between Aeromonas spp. and other lineages, we 394 recruited non-Aeromonas homologs of the 15 putative effectors present in more than four 395 Aeromonas genomes. The inclusion of non-Aeromonas homologs revealed that Aeromonas 396 T3SS effectors are not frequently shared outside the genus boundaries. In all extended 397 putative effector trees, Aeromonas homologs grouped into clans [81], excluding homologs 398 from other taxa to separate clades. This result suggests that each of the putative effectors 399 has a single origin within Aeromonas strains, i.e., they entered the genus only once, either 400 from its common ancestor or through a single HGT event. The disparity between within-401 genus and between-genera transfers is well described in the literature [20,82-84]. The 402 high number of inferred gene transfers within Aeromonas genomes is evidence of the 403 interchangeability of T3SS effectors within the genus, while the paucity of HGT events 404 between genera may either reflect decreasing transfer rate with increasing evolutionary 405 distance [83][84][85] or the high degree of specificity between effectors and the secretion 406 system apparatus. We recognize the potential for undetected transfers with unsampled 407 lineages not represented in public databases, although one would expect to observe at least 408 some number of paraphyletic aeromonad clades if between-genera exchanges involving 409 Aeromonas were common.

410
Five putative effector families exhibited more than ten HGT events within Aeromonas 411 spp. according to phylogenetic reconciliations. The most exchanged effector is pteE, with 32 412 inferred transfers within the genus. The reconciliation analysis inferred 18 HGT events 413 during the evolution of pteB in Aeromonas. Two of the identified effectors, aopP and ati2, 414 did not require horizontal transfers during its reconciliations with the genome phylogeny. 415 Both effector families are exclusively present in A. salmonicida genomes, although aopP is 416 absent from A. salmonicida A449, and ati2 is absent from A. salmonicida CIP103209T, and 417 their closest non-Aeromonas homologs are present in Yersinia enterocolitica and Vibrio 418 harveyi, respectively. The genomes of A. salmonicida strains are very similar to each other, 419 which is reflected in the short branch lengths present in the genome phylogeny (Fig 1). 420 Thus, we would not expect a significant number of different HGT events among A. 421 salmonicida strains. We hypothesize that an A. salmonicida common ancestor acquired 422 these effector genes through horizontal transfer, likely from Yersinia spp. or Vibrio spp., 423 which were then vertically inherited by most of the descendants.

424
The great variation in effector occurrence among Aeromonas strains, from zero to nine 425 effectors per genome, may reflect the diversity of lifestyles observed in this genus. For 426 example, A. schubertii possesses nine likely effectors in its genome, more than any other 427 assessed strain, and it is also among the earliest branching aeromonads (Fig 1). We were 428 unable to determine if this large number of predicted T3SS effectors is related to the niche 429 A. schubertii occupies (cultured from a clinical forehead abscess), since no other sampled 430 strain was isolated from a similar source. On the other side of the spectrum, 19 strains have 431 no putative effector, and they are found across all clades of the Aeromonas phylogeny. 432 Among putative effectors, some exhibit a wide distribution (pteA and pteB), and others 433 have a very restricted distribution (pteJ, pteI, and pteL).

434
The majority of the putative effectors display significant co-occurrence with other 435 putative effectors in Aeromonas genomes (). One cluster of five putative effectors displays 436 strong co-occurrence (aexT, aopH, aopO, aopP and ati2) (S1 Fig), and their occurrence is 437 most prevalent in the A. salmonicida branch (Fig 1). Interestingly, one effector in this co-438 occurring cluster, aexT, is also present in the A. veronii strains. Another co-occurring cluster 439 comprises five putative effectors (aexU, aopX, pteA, pteB and pteK) (S1 Fig), and their 440 occurrence is mainly related to two different clades of the phylogeny, being identified 441 among members of the A. hydrophila, A. dhakensis, and A. veronii clades. Despite the co-442 occurring clusters, we are unable to identify links between presence/absence of putative 443 effectors and isolation sources of their respective genomes. The resemblance of 444 phylogenetic signal observed in co-occurrence of putative effectors is probably due to the 445 higher within-species HGT frequency, as once the effector is acquired by a member of the 446 species it is easily spread among closely related genomes [85,86]. Due to the low frequency 447 of putative effectors in the remaining two clusters displayed in S1 Fig, we  In this study, we identified likely T3SS effectors present in the genomes of 105 452 Aeromonas strains and assessed their cytotoxicity in S. cerevisiae. The in silico identification 453 of T3SS effector sequences has been considered to be a complex task given that their short 454 sequences and shared homology with proteins associated with different cellular systems 455 constitute a barrier to accurate analysis [87-89]. Our two-step comparisons against 456 positive and negative data sets greatly reduced the number of false positives and resulted 457 in the identification of 13 new likely Aeromonas effector families and eight that were 458 previously described. The expression of these proteins in S. cerevisiae provided strong 459 evidence for the cytotoxicity of most of the identified effectors.

460
Based on the in silico and in vitro evidence obtained in this study, we propose naming 461 the 9 newly described effectors with validated cytotoxic activity with the prefix "ate", for 462 Aeromonas T3SS effectors. Consequently, the newly described Aeromonas effector families 463 should be referred as ateA, ateB, ateC, ateF, ateG, ateH, ateI, ateJ, and ateL, whereas pteD, 464 pteD.1 pteE and pteK should keep the putative designation since no yeast growth inhibition 465 phenotype was detected.

466
The high frequency of horizontal transfer events of effectors within Aeromonas is 467 reflected in their scattered distribution throughout the phylogenomic tree of the genus and 468 reconciliations of each effector gene tree with the phylogenomic tree. Members