Genomic analysis of four strains of Corynebacterium pseudotuberculosis bv. Equi isolated from horses showing distinct signs of infection

The genomes of four strains (MB11, MB14, MB30, and MB66) of the species Corynebacterium pseudotuberculosis biovar equi were sequenced on the Ion Torrent PGM platform, completely assembled, and their gene content and structure were analyzed. The strains were isolated from horses with distinct signs of infection, including ulcerative lymphangitis, external abscesses on the chest, or internal abscesses on the liver, kidneys, and lungs. The average size of the genomes was 2.3 Mbp, with 2169 (Strain MB11) to 2235 (Strain MB14) predicted coding sequences (CDSs). An optical map of the MB11 strain generated using the KpnI restriction enzyme showed that the approach used to assemble the genome was satisfactory, producing good alignment between the sequence observed in vitro and that obtained in silico. In the resulting Neighbor-Joining dendrogram, the C. pseudotuberculosis strains sequenced in this study were clustered into a single clade supported by a high bootstrap value. The structural analysis showed that the genomes of the MB11 and MB14 strains were very similar, while the MB30 and MB66 strains had several inverted regions. The observed genomic characteristics were similar to those described for other strains of the same species, despite the number of inversions found. These genomes will serve as a basis for determining the relationship between the genotype of the pathogen and the type of infection that it causes.


Introduction
As of February 2016, thirty-three genomes of the species Corynebacterium pseudotuberculosis had been deposited into the National Center for Biotechnology Information database. This species is an animal pathogen that infects goats and sheep, causing caseous lymphadenitis, as well as horses, which can show distinct signs and symptoms. C. pseudotuberculosis can be classified into two biovars based on its ability to reduce nitrate to nitrite [1]. Nonreducing, i.e., nitrate-negative, strains are grouped into the ovis biovar and are responsible for CL. The reducing, i.e., nitrate-positive, strains are grouped into the equi biovar and mainly infect horses.
Recent increases in the number of infections in horses have led to C. pseudotuberculosis bv. equi being classified as a re-emerging pathogen. In Texas, USA, the number of cases increased 10-fold between 2005 and 2011, with a cumulative increase in annual incidence from 9.3 to 99.5 infections per 100,000 horses over the same period [2]. Kilcoyne et al. [3] analyzed the number of cultures positive for C. pseudotuberculosis in samples isolated from infected horses in 23 states in the USA. The proportion of positive cultures was higher for the most recent years, 2011 and 2012 (54% of the total number of samples), than for the period spanning 2003 to 2010 (46% of the total number of samples). These current data show the growing numbers of infections caused by this bacterium and emphasize the need for new studies on the genotypic characteristics of the biovar.
C. pseudotuberculosis bv. equi infection is commonly known as "pigeon fever" because it leads to the formation of external abscesses on the chest of the animal, making it expand, similar to a pigeon breast. Despite its common name, the bacteria can also cause other types of infections with distinct signs and symptoms, such as the formation of internal abscesses or ulcerative lymphangitis, which is characterized by the infection of limbs and compromises the lymphatic system [4]. It is currently believed that the major vectors of the disease are domestic flies of the species Haematobia irritans, Stomoxys calcitrans, and Musca domestica [5].
The pathogenesis of C. pseudotuberculosis is intrinsically linked to its genetic content. Several virulence factors have previously been described in the literature that strongly influence the ability of the bacteria to interact with the host, causing infection. Phospholipase D, the iron uptake system, and pili proteins are examples of these factors [6]. Characterization of these and novel virulence factors depends on the sequencing of new genomes from the biovar, as the vast majority of the genomes in databases belong to the ovis biovar. Therefore, to generate data that allows for a more robust genotypic analysis of the equi biovar, four genomes from strains isolated from horses with distinct signs of infection by C. pseudotuberculosis were sequenced using the nextgeneration Ion Torrent PGM platform.

Organism information
Classification and features C. pseudotuberculosis bv. equi is a facultative intracellular, beta-hemolytic, pleomorphic ( Fig. 1), non-sporulating, unencapsulated, non-mobile, facultative anaerobic, Grampositive pathogen. [6]. The main characteristics of the species are shown in Table 1. C. pseudotuberculosis is taxonomically classified in the phylum Actinobacteria, class Actinobacteria, order Corynebacteriales, family Corynebacteriaceae, and genus Corynebacteria. The strains included in this study were isolated from horses in the state of California, USA. The animals had distinct signs and symptoms of infection. Strain MB11 was isolated from a 6-month-old American Paint horse with ulcerative lymphangitis. Strain MB14 was isolated from an Arab/ Saddle horse with abscess formation in internal organs (liver and kidney). The animal also presented hepatic lipidosis and myocardial fibroses. Strain MB30 was isolated from the pectoral abscess of a 2-year-old Quarter horse. Finally, strain MB66 was isolated from a 20-year-old Polish Arab mare with metastatic melanoma and multiple external and internal abscesses. These distinct signs, such as pectoral abscesses ("pigeon fever"), abscesses on the internal organs, or abscesses on the limbs (ulcerative lymphangitis), suggest that the equi biovar can interact in several ways with the host animal to cause infection. All strains were isolated over the period of October-1996 up to June-2002.
A dendrogram was calculated with the Neighborjoining statistical method using a bootstrap analysis with 1000 replicates. The rpoB gene, which codes for the beta subunit of the RNA polymerase enzyme, was used as a marker when constructing the dendrogram. The analysis was performed using the NCBI reference sequence for the species, retrieving from the database at least one representative from each genus in the Corynebacterium, Mycobacterium, Nocardia, and Rhodococcus group (Fig. 2). This group is composed of species that share cellular characteristics, such as a cell wall composed of peptidoglycan, arabinogalactan, and mycolic acids, as well as a genome with a high GC content [6]. The first phylogenetic studies on the CMNR group used the 16S rRNA gene as a marker. These studies demonstrated that the genera in the family Corynebacteriaceae form a monophyletic clade composed of four groups, in which C. pseudotuberculosis is phylogenetically closest to the species C. ulcerans and C. diphtheriae [7]. Recently, Khamis et al. [8] proposed that the gene rpoB could be used as a marker to identify clinical isolates of the genus Corynebacterium. The positive results for identification using the rpoB gene were greater than those of the 16S rRNA gene, indicating that rpoB is useful for taxonomic classification the family Corynebacteriaceae [8]. The dendrogram in Fig. 2 shows the phylogenetic proximity between the sequenced biovars of the species C. pseudotuberculosis. In addition, it corroborates the analyses performed with the 16S rRNA gene, which designated C. diphtheriae as the species most closely related to C. pseudotuberculosis. The results show that each genus in the CMNR group is divided into clades supported by high bootstrap values.

Genome project history
The four C. pseudotuberculosis genomes in this short report are part of a collaboration between the University of California, Davis, USA, and the Federal Universities of Minas Gerais and Pará, Brazil. The project seeks to determine the genomic characteristics of 12 strains of the equi biovar isolated from horses in California showing distinct signs and symptoms of infection. Isolation was performed over several years from different horse breeds ( Table 2). One of the major aims of the project is to determine if a relationship exists between the genetic content of the strains and the type of infection that it causes (i.e., ulcerative lymphangitis, external abscesses, or internal abscesses). In parallel, the project seeks to increase the amount of genomic data for the species C. pseudotuberculosis in databases, which will form the basis for broader functional studies. The genomes obtained in this study have been deposited into the NCBI database under accession number CP013260, CP013261, CP013262, CP013263. The project information is also presented in Table 2.

Growth conditions and genomic DNA preparation
After isolation, the bacteria were maintained in 25% glycerol at −80°C, and the medium was refreshed routinely. To extract genomic DNA, the bacteria were first cultured in liquid brain heart infusion (BHI) medium at 37°C with shaking. DNA was extracted during the log-phase of cell growth according to the protocol described by Pacheco et al. [9] for clinical isolates. The extracted DNA was subjected to electrophoresis on a 1% agarose gel to determine the quality of the material.

Genome sequencing and assembly
Genomic DNA was sequenced on the Ion Torrent PGM (Thermo Scientific) platform using the 318 chip v2 in accordance with the manufacturer's instructions. The quality of the reads was analyzed using FastQC software [10]. The reads were then trimmed and filtered to remove those with a phred-scaled quality score less than 20. Next, the reads were assembled using Mira 4 software [11]. Redundancy within the assembled contigs was eliminated using the SeqMan Pro tool in the Lasergene software package (DNAS-TAR). The few remaining gaps after redundancy removal were manually closed using local BLAST or a program developed by our research group called GapBlaster [12], which uses a reference genome to assemble similar sequences to close the gap using the sequencing reads. For this analysis, we used C. pseudotuberculosis biovar equi strain 316 as a reference. An optical map using KpnI restriction sites was generated to evaluate the quality of the genome assembly for the MB11 strain (Fig. 3). The optical map was analyzed using MapSolver v.3.2.0 (OpGen). Figure 3 shows that the in silico assembly for strain MB11 was very satisfactory; the positions of the restriction sites were corroborated by the optical map analysis.

Genome annotation
An automatic annotation was first conducted using the online software Pannotator [13], which provided the .fasta files for the assembled genomes and a reference .embl file for C. pseudotuberculosis 316. The results were then manually curated to meet the gene annotation standards set by UniProt [14] using Artemis software [15] to visualize the coding sequences. Next, pseudogenes were also manually curated to resolve mismatches using CLC Genomics Workbench 5 (CLC Bio) and Artemis. Predicted genes for the four genomes were classified by the clusters of orthologous groups functional category, as shown in Table 3.

Genome properties
All of the genomes were completely closed, resulting in a size of 2  Table 4. A circular map was generated using the CGView web tool [20] that shows the relationship of the predicted proteins in the MB14, MB30, and MB66 genomes compared to strain MB11, in which the in silico assembly was corroborated by the optical map (Fig. 4). All of the genomes had similar sizes and a similar number of CDSs, with few differences between the coding regions of the genomes. Structural analyses were conducted by comparing the four genomes with a local database using blastn, and the results were analyzed using the Artemis Comparison Tool [21]. The MB11 and MB14 strains showed extensive structural similarity, while MB30 had a large inversion of approximately 1.2 Mbp compared to MB14 (Fig. 5). However, MB66 had the largest number of structural rearrangements (Fig. 5). It is worth noting that two    The total is based on the total number of protein coding genes in the genome

Conclusions
Because of the large number of infections reported for C. pseudotuberculosis biovar equi in recent years, sequencing and analyzing genomes for this biovar is an essential step towards new perspectives that will improve our understanding of pathogen-host interactions and facilitate the development of vaccines to eradicate the disease. The four genomes presented in this study showed structural differences, except for strains MB11 and MB14. The phylogenetic relationship is closer to other strains of the equi biovar, and other genomic characteristics, such as the GC content, number of CDSs, and tRNA and rRNA clusters, are similar to those described for other strains of the same species. Virulence factors that were previously described in the literature were identified in the analyzed genomes. In addition, in silico assembly of the MB11 genome was validated by an optical map of the KpnI restriction sites. These initial data suggest that differences between types of infection should be analyzed using a reductionist approach, taking into account factors such as pathogenicity islands in each strain, the transmission method, and the entry point of the pathogen for each case, as well as expression levels and use of virulence factors specific to the bacteria, among other factors. Phylogenetic studies and the detection of small Fig. 4 Circular map of the genome for the sequenced Corynebacterium pseudotuberculosis strains. The outermost ring in blue shows the features extracted from the MB11 genome using a .gbk file. The next ring shows the CDSs predicted on the forward strand of MB11 in red, followed by the CDSs on the reverse strand with their features in blue. The other three rings in red, green, and blue show proteins predicted by blastx for the MB14, MB30, and MB66 genomes, respectively, compared to the MB11 genome. The two innermost rings show the GC content and GC skew, followed by the size of the genome in base pairs genetic changes such as SNPs and INDELs should then be performed because the bacteria have a very high gene density, and therefore, point mutations can strongly affect the biological response of the pathogen.