Research Paper
A bioinformatic approach to identify core genome difference between Salmonella Pullorum and Salmonella Enteritidis

https://doi.org/10.1016/j.meegid.2020.104446Get rights and content

Highlights

  • Identified core genome difference between S. Pullorum and S. Enteritidis.

  • Identified putative host-specificity factors in S. Pullorum.

  • Provided a novel bioinformatics workflow to identify core genome differences between user-defined bacterial clades.

Abstract

S. Pullorum and S. Enteritidis are closely related in genetic terms, but they show very different pathogenicity and host range. S. Enteritidis infects many different hosts, usually causing acute gastroenteritis, while S. Pullorum is restricted to avian, where it causes systemic disease in young animals. The reason why they differ in host range and pathogenicity is unknown. The core-genome denotes those genes that are present in all strains within a clade, and in the present work, an automated bioinformatics workflow was developed and applied to identify core-genome differences between these two serovars with the aim to identify genome features associated with host specificity of S. Pullorum. Results showed that S. Pullorum unique coding sequences (CDS) were mainly concentrated in three regions not present in S. Enteritidis, suggesting that such CDS were taken up probably during the separation of the two types from their common ancestor. One of the unique regions encoded Pathogenicity Islands 19 (SPI-19), which encodes a type VI secretion system (T6SS). Single-nucleotide polymorphism (SNP) analysis identified 1791 conserved SNPs in coding sequences between the two serovars, including several SNPs located in a type IV secretion system (T4SS). Analyzing of 100 bp regions upstream of coding sequences identified 443 conserved SNPs between the two serovars, including SNP variations in type III secretion system effector (T3SE). In conclusion, this analysis has identified genetic features encoding putative factors controlling host-specificity in S. Pullorum. The novel bioinformatic workflow and associated scripts can directly be applied to other bacteria to uncover the genome difference between clades.

Introduction

The majority of Salmonella serovars are broad host range pathogens, however, a few serovars only infect one or a few host species (Uzzau et al., 2000). The underlying mechanism behind host specificity of these serovars remains unknown. Salmonella Pullorum (formal name S. enterica subspecies enterica serovar Gallinarum biovar Pullorum (S. Pullorum)) belongs to the host specific group as it only infects avian, where it causes systemic disease with high mortality in young chicken (Barrow and Freitas Neto, 2011; Shivaprasad, 2000). Studies have revealed that S. Pullorum is closely related to S. Enteritidis (formal name S. enterica serovars Enteritidis) (Langridge et al., 2015; Thomson et al., 2008), another Salmonella serovar, which commonly infects avian, but has a broad host range and mainly causes gastroenteritis (Rodrigue et al., 1990). The close evolutionary relationship and yet different host range and pathogenicity makes comparison of genomics and traits of S. Pullorum and S. Enteritidis a suitable approach to identify genome features that are associated with host-specificity in Salmonella.

The development towards host-specificity of Salmonella serovars has likely been driven by several mechanisms. Comparative genomics have already uncovered important genome features of host-specific serovars, most notably that each of them contain a number of specific genes, and that host adaptation has been accompanied by pseudogene formation in genes that apparently are not needed for the host-specific infection (Langridge et al., 2015; Thomson et al., 2008). Also, studies in other bacteria have indicated the importance of horizontal gene transfer for the evolution of microbial genomes (Pal et al., 2005; Popa et al., 2011; Treangen and Rocha, 2011) leading to regions which are specific to particular bacterial (sub)species or strains, termed the ‘mobilome’ (Dobrindt et al., 2004; Ou et al., 2007; Siguier et al., 2006).

A pan-genome is the entire set of genes for all strains within a clade, and the core-genome represents genes present in all strains (Medini et al., 2005; Tettelin et al., 2005; Vernikos et al., 2015). As the number of available genomes increases, so does the pan-genome of each serovar (Baddam et al., 2014; Laing et al., 2017), and pan-genome analysis tools have been established to increase the power of genome comparisons (Xiao et al., 2015). Core-genome, on the other hand, denoted the set of genes present in more than 90% of members of a clade. Pan- and core-genome analysis can provide a useful framework to determine the genomic diversity of the dataset at hand (Vernikos et al., 2015).

Apart from pathotype diversity associated with gene gain or loss, single-nucleotide polymorphism (SNP) may also contribute to host-specificity (Bekal et al., 2016; Yue et al., 2015; Yue and Schifferli, 2014). Previous research has reported that SNP mutations in coding sequences cause phenotypic difference in Salmonella (Hopkins and Threlfall, 2004; Thornbrough and Worley, 2012). Functionally important SNP mutation can also exist in non-coding regions (Hammarlof et al., 2018; Zaunbrecher et al., 2009), such as in promoter regions or in recognition motifs for regulators, usually located at the upstream of gene start codon (Haugen et al., 2008; McClure, 1985). Thus, it is important to be able to combine searches for unique genes with searches for SNPs that are likely to influence phenotype.

In the current study, we determined the pan-genome of S. Pullorum and S. Enteritidis, and based on this, we identified the shared core-genome of the two serovars, as well as the unique parts of the core-genome in each serovar. The aim was to discover putative host-specificity associated genes and SNPs in S. Pullorum. In order to do so, we designed a novel workflow to identify core-SNP and core-upstream-SNP between groups of strains. Our new genome-based analysis workflow (Corevar) with related scripts can also be applied to studies of other bacterial clades without additional adjustments.

Section snippets

Read dataset

In this study, a total of 144 read datasets were analysed. It consisted of 97 S. Pullorum genome sequences (Hu et al., 2019) and 47 S. Enteritidis strains, which were de novo sequenced in the current study following the same protocol applied for sequencing of S. Pullorum (Hu et al., 2019). Briefly, genomic DNA was fragmented with an insertion size of ~500 bp to prepare the library. Then paired end sequencing (2 × 150 bp) was applied by a HiSeq 2500 system (Illumina, USA). Reads with <90% Q30

Features of the genomes analysed

The pan-genome analysis of 97 S. Pullorum genomes and 47 S. Enteritidis genomes identified 6449 CDS, which were divided into 3997 core CDS, 116 soft core CDS, 844 shell CDS and 1492 cloud CDS. The high identify of core-CDS between the two serovars supported the view that S. Pullorum, and its close relative S. Gallinarum biovar Gallinarum, are descendants of S. Enteritidis (Langridge et al., 2015; Thomson et al., 2008). Among the core-CDS of each serovar, 145 and 127 were unique to S. Pullorum

Discussion

Pathogenicity Island (PAI) CDSs and prophage elements were common among the unique CDS according the VRprofile prediction results. Prophages are known to drive diversity in S. enterica (Cooke et al., 2007; Thomson et al., 2004), and thus, it is not surprising that there is a relatively high enrichment of prophage related CDS among the unique CDS. PAIs play a pivotal role in the virulence of bacterial pathogens including Salmonella (Schmidt and Hensel, 2004), and the relative high hit number of

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (31730094, 31920103015); National Key Research and Development Program of China (2017YFD0500705; 2017YFD0500100); Jiangsu province agricultural science and technology independent innovation funds (CX(16)1028); and The Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD); Xiao Fei was supported by funding from the China Scholarship Council.

References (65)

  • R. Baddam et al.

    Genome dynamics and evolution of Salmonella Typhi strains from the typhoid-endemic zones

    Sci. Rep.

    (2014)
  • A. Bankevich et al.

    SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing

    J. Comput. Biol.

    (2012)
  • P.A. Barrow et al.

    Pullorum disease and fowl typhoid--new thoughts on old diseases: a review

    Avian Pathol.

    (2011)
  • P.A. Barrow et al.

    Functional homology of virulence plasmids in Salmonella gallinarum, S. pullorum, and S. typhimurium

    Infect. Immun.

    (1989)
  • P.A. Barrow et al.

    Contribution of Salmonella-Gallinarum large plasmid toward virulence in fowl typhoid

    Infect. Immun.

    (1987)
  • S. Bekal et al.

    Usefulness of high-quality core genome single-nucleotide variant analysis for subtyping the highly clonal and the most prevalent Salmonella enterica Serovar Heidelberg clone in the context of outbreak investigations

    J. Clin. Microbiol.

    (2016)
  • C.J. Blondel et al.

    Comparative genomic analysis uncovers 3 novel loci encoding type six secretion systems differentially distributed in Salmonella serotypes

    BMC Genomics

    (2009)
  • C.J. Blondel et al.

    The type VI secretion system encoded in Salmonella pathogenicity island 19 is required for Salmonella enterica serotype Gallinarum survival within infected macrophages

    Infect. Immun.

    (2013)
  • C. Chu et al.

    Comparative physical and genetic maps of the virulence plasmids of Salmonella enterica serovars typhimurium, enteritidis, choleraesuis, and Dublin

    Infect. Immun.

    (1999)
  • F.J. Cooke et al.

    Prophage sequences defining hot spots of genome variation in Salmonella enterica serovar typhimurium can be used to discriminate between field isolates

    J. Clin. Microbiol.

    (2007)
  • S. Das et al.

    Identification of a novel gene in ROD9 island of Salmonella Enteritidis involved in the alteration of virulence-associated genes expression

    Virulence

    (2018)
  • U. Dobrindt et al.

    Genomic islands in pathogenic and environmental microorganisms

    Nat. Rev. Microbiol.

    (2004)
  • R.M.A. Graham et al.

    Comparative genomics identifies distinct lineages of S. Enteritidis from Queensland, Australia

    PLoS One

    (2018)
  • D.G. Guiney et al.

    Biology and clinical significance of virulence plasmids in Salmonella serovars

    Clin. Infect. Dis.

    (1995)
  • D.L. Hammarlof et al.

    Role of a single noncoding nucleotide in the evolution of an epidemic African clade of Salmonella

    Proc. Natl. Acad. Sci. U. S. A.

    (2018)
  • S.P. Haugen et al.

    Advances in bacterial promoter recognition and its control by factors that do not bind DNA

    Nat. Rev. Microbiol.

    (2008)
  • K.L. Hopkins et al.

    Frequency and polymorphism of sopE in isolates of Salmonella enterica belonging to the ten most prevalent serotypes in England and Wales

    J. Med. Microbiol.

    (2004)
  • R.B. Hornick et al.

    Typhoid fever: pathogenesis and immunologic control

    N. Engl. J. Med.

    (1970)
  • Y. Hu et al.

    Loss and gain in the evolution of the Salmonella enterica serovar gallinarum biovar pullorum genome

    mSphere

    (2019)
  • J. Huerta-Cepas et al.

    eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences

    Nucleic Acids Res.

    (2016)
  • K. Katoh et al.

    MAFFT multiple sequence alignment software version 7: improvements in performance and usability

    Mol. Biol. Evol.

    (2013)
  • C.R. Laing et al.

    Pan-genome analyses of the species Salmonella enterica, and identification of genomic markers predictive for species, subspecies, and Serovar

    Front. Microbiol.

    (2017)
  • Cited by (4)

    View full text