Comparative genome analysis of Pseudogymnoascus spp. reveals primarily clonal evolution with small genome fragments exchanged between lineages

Pseudogymnoascus spp. is a wide group of fungi lineages in the family Pseudorotiaceae including an aggressive pathogen of bats P. destructans. Although several lineages of P. spp. were shown to produce ascospores in culture, the vast majority of P. spp. demonstrates no evidence of sexual reproduction. P. spp. can tolerate a wide range of different temperatures and salinities and can survive even in permafrost layer. Adaptability of P. spp. to different environments is accompanied by extremely variable morphology and physiology. We sequenced genotypes of 14 strains of P. spp., 5 of which were extracted from permafrost, 1 from a cryopeg, a layer of unfrozen ground in permafrost, and 8 from temperate surface environments. All sequenced genotypes are haploid. Nucleotide diversity among these genomes is very high, with a typical evolutionary distance at synonymous sites dS ≈ 0.5, suggesting that the last common ancestor of these strains lived >50Mya. The strains extracted from permafrost do not form a separate clade. Instead, each permafrost strain has close relatives from temperate environments. We observed a strictly clonal population structure with no conflicting topologies for ~99% of genome sequences. However, there is a number of short (~100–10,000 nt) genomic segments with the total length of 67.6 Kb which possess phylogenetic patterns strikingly different from the rest of the genome. The most remarkable case is a MAT-locus, which has 2 distinct alleles interspersed along the whole-genome phylogenetic tree. Predominantly clonal structure of genome sequences is consistent with the observations that sexual reproduction is rare in P. spp. Small number of regions with noncanonical phylogenies seem to arise due to some recombination events between derived lineages of P. spp., with MAT-locus being transferred on multiple occasions. All sequenced strains have heterothallic configuration of MAT-locus.


Background
Pseudogymnoascus spp. is a group of fungi species which phylogenetically belongs to the phylum Ascomycota, family Pseudeurotiaceae. Many of the P. spp. including P. destructants were known as Geomyces spp. until reclassification based on phylogenetic analysis conducted in [1]. Species boundaries in Pseudogymnoascus still remain uncertain [1] recalling an overall problem in fungal taxonomy [2]. P. spp were long time believed to be anamorphic based on the absence of the evidence of sexual reproduction [3][4][5][6], P. destructants was shown to spread clonaly in North America [7]. However, several homothallic lineages of P. spp were shown to produce ascospores in culture [1,8], while P. destructants was proposed to have a heterothallic sexual reproduction pathway.
Morphology of P. spp. varies dramatically dependent on the growing conditions [9]. P. spp. are widespread in soils and can be found almost everywhere from Arctica to Antarctica [10]. P. spp. can tolerate low temperatures and high salinity, although they are not truly psychrophilic or halophilic [11][12][13][14]. P. spp. can degrade keratin and cause skin infections [15], and P. destructans causes white nose syndrome in bats [16].
Strictly asexual reproduction should result in clonal structure of population. However, sex is often hard to detect in experimental studies on Ascyomycota species [17]. Also many Ascomycota species are capable of parasexual process, which consists of fusion of cells followed by chromosome loss which eventually restores the normal caryotype, but does not involve meiosis. Parasexual process is often accompanied by recombination, although its rate is lower than that of meiotic recombination and it affects only short chromosome segments [18,19]. Horizontal gene transfer (HGT) can also occur in fungi. The most common type of HGT involves homologous recombination between genome sequences [20]. Although most of the cases reported so far involve HGT between different species [21], one can expect that within-population HGT which involves homologous recombination is even more common [20,22]. Thus, even if P. spp. truly lack meiosis, there still could be some genetic exchanges between strains in its populations.
Whole-genome analysis of P. spp. enables us to investigate such recombination events and detect genes associated with recombination activity. It also reveals relation between strains extracted from permafrost and temperate environments, which are considered isolated. Here, we report data on the genetic structure of P. spp. strains.

Results
Genome assembly, annotation, and key characteristics of P. spp. genomes We performed whole-genome sequencing and analysis of 14 P. spp. strains. These strains were collected from different habitats: temperate environment and Arctic active layers (contemporary samples), permafrost (age is 1.8-3.0 myr) and cryopeg, a layer of unfrozen ground in permafrost, (age is 120,000-200,000 years), and from different geographic locations ( Table 1). None of the strains was seen to produce ascospores. Sequencing was performed on HiSeq2000 machine using paired-end libraries with average insert size~350 nt. The sequenced reads were assembled, independently for each individual, with SOAPdenovo (v. 1.05). Assembly statistics for each strains are listed in Table 2. Whole-genome alignments of the assembled genotypes was created with LASTZ and CLUSTAL (see Materials and Methods). Mapping reads to their assembly reveals that all studied P. spp. isolates are haploid.
Annotation of genomes of the sequenced strains was performed with Augustus [23] v.2.7. Number of annotated genes within a genome varies from 9516 to 12470 (Table 3). The vast majority of genes is present in all or almost all assemblies (Additional file: 1 Figure S1), e.g. out of 11305 genes in strain VKM F-3808, 8495 genes were identified in at least 10 other assemblies and 487 were not found in any other assembly. Using CEGMA  (Table 3). Considerable variation of the number of annotated genes among genomes could be due to difference in assembly quality. However, separate analysis of genes pseudogenized or deleted on specific branches of the phylogenetic tree indicates asymmetric loss of genes among P. spp. strains ( Figure 1A). Strains F-4281, F-4246, and F-4513 have the lowest numbers of genes and the highest rates of gene loss (1.0-2.4 × 10 −5 per silent nucleotide substitution), whereas strains F-4518 and F-4520 have the highest number of genes and the lowest rates of gene loss (1.4-1.5 × 10 −6 per silent nucleotide substitution) ( Figure 1A). Overall we detected 282 lost genes (145 deleted and 137 pseudogenes). The GC-content varies from 49.1% to 51.1% (Table 3)   average intron lengths are 102-111 bp, median intron length is 58-60 bp (Table 3).
We also compared sequences obtained in our study to sequences of P. spp. obtained previously in other studies. Genotype sequence of strain F-4281 is very similar (id = 99%) to genotype sequence of P. spp. strain sequenced in [25]. We also combined our data with [1] (based on ITS region, LSU, MCM7, RPB2, and TEF1) and attributed our strains to different clades of P. spp. obtained in that study (Additional file: 1 Figure S2). Our strains correspond to 7 different clades of P. spp. from [25].

Relationships between 14 P. spp. genotypes
Comparison of the genomes of P. spp. strains reveals their very high nucleotide diversity. A typical genetic distance between two sequences at synonymous sites, dS, is~0.5, although some strains form compact clades ( Figure 1A) and are much closer to each other. For strains from different clades, a typical distance at nonsynonymous sites dN is~0.04 ( Figure 1B). Synteny between all genomes is extensive, and even within the most distant genome pairs over 90% of orthologous gene pairs are followed by another pair of orthologous genes ( Figure 2, Additional file: 2  Table S1, see also Materials and methods). There are no traces of either geographical or geological structure of the global population of P. spp. in the phylogenetic relationships among the analyzed genomes. Thus, these structures, if they exist, must be much younger than divergence of the ancestral lineages of these genomes.
Topology of the phylogeny shown on Figure 1 holds throughout almost the entire genome. Only 0.47%, 0.31%, 0.05%, 1.27%, and 0.58% of whole genome alignments do not support the 5 clades, ( (Table 4). This implies that regular recombination does not take place between the P. spp. strains and supports the observations of primarily asexual reproduction in P. spp. Clade (VKM F-3808, VKM F-3557, VKM F-4514, VKM F-4516), the only clade with more than two strains, demonstrates a strong linkage disequilibrium among genotypes from the same clade ( Figure 3A). No linkage disequilibrium was observed at distances over 20 nucleotides for genotypes from different clades (even at nonsynonymous sites) ( Figure 3BC), which is likely due to homoplasy between highly diverged (dS~0.5) sequences and little time intervals between lineage splits. Strains VKM F-3557, VKM F-4515, VKM F-4246 were used to demonstrate relations between distant clades, however the results are similar to that observed on Figure 3BC for any combination of distant strains.

Search for meiotic genes and mating pathway genes
We searched for the genes orthologous to those which are responsible for meiosis or mating in S. cerevisiae. In P. spp. genomes we found orthologs for 17 out of 31 genes involved in different steps of meiosis in S. cerevisiae (Table 5), implying that 14 out of these 31 genes were lost in P. spp.. 11 out of 14 lost genes are involved in early phases of meiosis in S. cerevisiae: ime1 and rec12 are meiosis-inducing protein [26,27], mum2 is required for premeiotic DNA synthesis [28], red1 is required for segregation of chromosomes in meiosis I [29], zip1, zip2, zip3 and zip4 are required for initiation of chromosome synapsis [30,31]; the rest 3 of these genes, Figure 2 Genome synteny across P. spp. strains. Each square corresponds to fraction of adjacent gene pairs in strain from vertical row with orthrologs in strain from horizontal row, which are also adjacent in strain from horizontal row.  dit1, isc10 and mum3 are involved in sporulation in S. cerevisiae [32,33].
In contrast to meiotic genes we observed only 1 lost gene out of 21 which are responsible for mating in S. cerevisiae (Table 5), notably all STE genes responsible for mating factor sensitivity in S. cerevisiae are also present in P. spp. strains [34]. A putative mating-type (MAT) locus with highly-conserved apn2 and sla2 genes was also found in P. spp. (Figure 4, Additional file: 3 Table S2).
We sequenced MAT-locus in 16 additional strains of P. spp. to study it in more detail. Two distinct idiomorphs of MAT-locus were observed: MAT1 idiomorph includes homolog of MAT1-1-1 α-box transcription factor, homolog of MAT1-1-3 high-mobility group (HMG) transcription factor, and an unknown gene which corresponds to MAT1-1-6 in [8] ( Figure 4A); MAT2 idiomorph includes MAT1-2-1 HMG-box gene and an unknown gene which corresponds to MAT1-2-5 in [8] ( Figure 4B). Phylogenetic configuration at MAT-locus ( Figure 5A) is strikingly different from the rest of the genome (Figure 1, Figure 5B). The boundaries of the segment with altered phylogeny reside at the ends of MAT1-1-3 and MAT1-1-1 genes for MAT1 idiomorph and MAT1-2-1 and MAT1-2-5 for MAT2 idiomorph, so that flanking regions have canonical phylogenetic configuration ( Figure 5B). The last~150 nucleotides of MAT1-1-1 and MAT1-2-5 are homologous to each other and unlike the rest of MAT-locus have canonical phylogenetic configuration. Multiple clades with both variants of MAT-locus and slightly variable boundaries of such segments in different strains indicate multiple recombination events within the MAT-locus ( Figure 5).
No paralogs of MAT-locus were found across P. spp. genomes, indicating that the observed pattern could not arise due to intragenomic conversion and, instead, implying multiple intergenomic recombination events at MAT-locus. Analysis of the MAT-locus indicates that all sequenced strains are heterothallic. P. spp. strain sequenced by [25] and P. destructants sequenced by "Geomyces destructans Sequencing Project" (http://www.broadinstitute.org/ annotation/genome/Geomyces_destructans/MultiHome.html) also heterothallic and both have MAT1 configuration. According to [8] homothallic configuration with two idiomorphs combined also occurs in P. spp., however no homothallic strain was detected among 14 fully-sequenced strains and 16 strains with only MAT-locus sequenced suggesting that homothallism is rare in P.spp.  Table S3). Average nucleotide divergence between VKM F-3557 and VKM F-4514 in such regions is 0.115 compared to genome average 0.015 ( Figure 6B, Additional file: 4 Table S3). Figure 7 describes one of such regions. VKM F-4514 becomes an outgroup to VKM F-3557 and VKM F-3808 inside the recombined region ( Figure 7B) in contrast to the flanking regions which maintain the canonical phylogenetic configuration ( Figure 7AC). The genetic distances from recombined strain to strains from outside clades are not increased in this example as well as in the other regions with noncanonical phylogenetic configuration (see F-4515 vs. F-3557 and F-4515 vs. F-4514 in Additional file: 4 Table S3). Thus, such regions did not arise due to hypermutation and, instead, were likely generated through some sort of the recombination events. In one case (Figure 8), a genomic region which supported (VKM F-3557, (VKM F-3808, VKM F-4514)) topology was marked by a 5.3 kb inversion present in VKM F-3808 and VKM F-4514 genotypes but not in any other P. spp. genotypes. This inversion was preceded by~100 nt noninverted segment which also supported (VKM F-3557, (VKM F-3808, VKM F-4514)) topology. Such a complex situation is very unlikely to arise through independent reversing mutations.  Sequence reads mapped back to assemblies ensure that regions with altered phylogenetic topologies could not be assembly artifacts as reads map normally on such regions and on their boundaries, with average coverage for this regions being the same to the rest of the genome. We considered a possibility of the intragenomic nonallelelic recombination. For 3 of the 77 regions we identified paralogs inside P. spp. using BLAST against the entire genome. However, none of these 3 paralogs could explain the pattern we observed.

Analysis of genomic regions with altered phylogenies
The most plausible explanation for the regions with altered topology is the weak recombination activity between the distant P. spp. lineages. In the first example (Figure 7), VKM F-4514 likely recombined with some genotype outside of (VKM F-3808, (VKM F-3557, VKM F-4514)) clade, in the second example the inversion took place before the (VKM F-3808, (VKM F-3557, VKM F-4514)) branching, but was eventually eliminated in VKM F-3557 by recombination with some distant genotype ( Figure 8).
Exon sequences comprise 50.1% of the P. spp. genome, but only 11,345 nt in 23 regions out of the total 67,577 nt in 77 recombination regions (16.8%) overlap with exon sequences. The lack of coding sequences in recombination regions is likely due to a negative selection on high-distant recombination events at coding sequences.

Discussion and conclusions
We sequenced and independently assembled genotypes of 14 haploid P. spp. strains. Thus, we did not perform standard genotyping procedures including read mapping and SNP calling but, instead, aligned the contigs which were obtained independently. We believe this method to be preferable to read mapping, because longer sequences are aligned and more robust alignments are obtained. Longer sequences are particularly important in case of high nucleotide diversity within the aligned genotypes.
Genome comparison of the sequenced strains reveals predominantly clonal structure of P. spp. lineages ( Figure 3A, Table 4) which is consistent with the multiple observations of asexual-only reproduction of P. spp. strains [1,[3][4][5][6]. The sequenced genomes are also very diverse with typical distance between strains from different clades dS ≈ 0.5. Assuming that P. spp. produce no more than 10 generation per year [9] and mutation rate is less than 10 −8 per nucleotide per generation (similar to other S. cerevisiae [35,36]), we can estimate that the last common ancestor of P. spp. lived more than 50 Mya. However, the strains are still very similar in functional sites as dN between distant lineages is~0.04, synteny of genes is >0.9 between different clades ( Figure 2).
Complete absence of genetic exchanges between strains would lead to a strict clonality of the population. However, there are evidences of recombination within a number of  genes in anamorphic Candida albicans and Aspergillus fumigatus [37,38]. We also observed such evidence in P. spp. Phylogenetic structure of MAT-locus and other regions with noncanonical topology indicates the exchange of this genome fragments between P. spp. lineages (Figures 4, 5, 6, 7 and 8, Additional file: 4 Table S3). Recombination rate estimated from these regions is low: 1 recombination event per~2500 single-nucleotide substitutions at synonymous sites, and only short genome regions are affected (average length is 878 nt) ( Figure 6). However, it is enough to cover an entire genome for a period of time passed since last common ancestor of P. spp. lineages, and thus, also contributed to the loss of linkage disequilibrium ( Figure 3).
As in an asexual fungi Candida glabarta [39,40], in P. spp. we observed MAT locus and other genes responsible for mating and meiosis in S. cerevisiae (Table 5, S2, Figure 4). Interestingly, MAT locus in P. spp. has phylogenetic configuration very different from the rest of the genome, indicating multiple transmissions between distant lineages at MAT locus. However, in both Candida glabarta and in vast majority of P. spp. sexual reproduction has never been observed, suggesting that either sexual processes are extremely rare, and thus are hard to detect, but are still important in these species, or that these genes have some functions other than sexual reproduction. Evidence of recombination at MAT locus and in other genome regions could  also indicate parasexual activity which is known to be a substitute of sex for many Ascomycota [19]. The other explanation could be horizontal gene transfer (HGT) across P. spp. HGT better fits the pattern observed for MAT-locus phylogeny and could indicate presence of a vector which carries and transmits MAT-locus across the population.
There are many economically significant species among Ascomycota, including aggressive pathogens of plants and animals. Recently P. destructans was shown to spread rapidly in North America and decimate bats populations [7,16]. However, population genetics and evolution of Ascomycota species remain poorly understood due to low number of whole-genome data. Our analysis reveals predominantly clonal evolution of P. spp. lineages. But despite a very long time passed since their last common ancestor, these strains still have very similar morphological traits and evidently occupy the same ecological niche. Indeed, strains VKM F-4513, VKM F-4514 and VKM F-4517, which belong to 3 distant clades ( Figure 1), were all extracted from the permafrost samples of the same age (1.8-3.0 Myr) where no other organism could survive. Furthermore, sequenced genomes indicate some sort of genetic recombination between diverged lineages. Therefore we believe that P. spp. should be treated as the entity of lineages interacting with each other rather than an ensemble of independent species. This approach could also be useful in understanding evolution of the other Ascomycota species with little or unknown sexual reproduction.

Extraction and cultivation of samples from permafrost
Methods of sampling, storage, transportation and control were chosen, and specialized tests were performed, to make sure that the microorganisms found in samples were indigenous and not contaminants. The cores (diameter 5-10 cm, length 15-30 cm) were collected using a dry drilling technique developed specifically for microbiological studies of permafrost [41,42]. The dry  Table S3.
drilling and sampling prevent down-hole contamination caused by drilling fluids. The sampling is achieved by dry shaving of the core back to native ice-cemented sediment. Possible contamination during the drilling was monitored by several tests. Previous studies have employed fluidless drilling techniques combined with an exogenous bacterial tracer such as a pure culture of Serratia marcescens. In tests using the isolation techniques, S. marcescens bacteria were found only on the surface of the frozen sample, never inside the frozen cores [42].
To recover fungi, 0.5-g portions of a core sample were placed in test tubes with 5 ml of water heated to room temperature (20°C), as well as to 35 and 52°C. Following one minute, the suspension was shaken at room temperature for 10 min. The tenfold dilutions of this suspension were inoculated, in triplicate, on Czapek agar (Cz), Malt Agar (MA), Starch ammonium agar (SAA) to which lactic acid was added at a concentration of 4 ml/1 to suppress the unwanted growth of bacterial cells. The inoculated plates were incubated at 4 and 25°C. The grown colonies were examined and enumerated on the 21st and 30th days, respectively [43].

Genome sequencing
Before DNA extraction, all samples were grown on Malt Agar for 10 days. Total genomic DNA was extracted using modified CTAB-method [44]. To construct the libraries for whole genome sequencing DNA was processed as described in the TruSeq DNA Sample Preparation Guide (Illumina). Libraries with average length of 350 bp were selected for sequencing. Libraries were quantified using fluorimetry with Qubit (Invitrogen, USA) and real-time PCR and diluted up to final concentration of 8 pM. Diluted libraries were clustered on a paired-end flowcell using cBot instrument and sequenced in 101 cycles using HiSeq2000 sequencer with TruSeq SBS Kit v3-HS (Illumina, USA). After trimming of adapter-derived and low (Q-score below 30) quality sequences reads were assembled using SOAP de novo assembler application (k-mer size 57). GapCloser for SOAP de novo was used to determine sequences of the gaps in scaffolds [45].

RNA sequencing
RNA-seq was performed for strains F-3808 and F-4515 grown in control conditions (malt agar, temperature 25 C) and under low temperature and high salinity (). Prior to RNA extraction, samples were collected in RNAlater solution (Ambion, USA), then homogenized using liquid nitrogen. Extraction was carried out using RNeasy Mini Kit (Qiagen, Germany) following manufacturer's instruction. The only modification was the addition of 10% Plant RNA Isolation Aid (Ambion, USA) to the lysis buffer. RNA quality was assessed using capillary electrophoresis on Bioanalyzer 2100 (Agilent, USA), only RNA with integrity number (RIN, [46]) greater or equal to 8 were taken for library preparation. For library preparation, TruSeq RNA Sample Prep Kit v2 (Illumina, USA) was used following manufacturer's instructions. After preparation libraries were quantified using Qubit fluorometer and quantitative PCR and sequenced on HiSeq2000 with read length 51 nucleotide.  Table S3.

MAT locus sequencing
MAT locus was amplified using primers Geo-MAT1-2-F (5′-ATG GCT CAA AGC ACR TTG CAR GGC TTC-3′) and Geo-MAT1-2-R (5′-CTT CTT TAT CTG GAC GTC ACT TCT CAC A-3′) that encompass the region between genes sla2 and apn2. PCR products were run on agarose gel and bands between 3 and 9 Kb were cut and purified. Libraries were prepared using Nextera XT DNA sample prep kit (Illumina, USA) and sequenced using Miseq sequencer with read length equal to 250 from each end. Libraries were 200-800 bp in length.

Genome annotation
Gene predictions for 14 P. spp. strains were done as described further. Each genotype assembly file was masked using RepeatMasker 3.3.0. To find exons and introns, RNAseq data we had for strains F-3808 and F-4515 were mapped on the masked scaffolds of each strain using Tophat2 [47] (version 2.0.8) and the results were used to generate intron hints for AUGUSTUS gene predictor (with bam2hits and filterBam programs from AUGUSTUS pipeline, included in distributive, and samtools package for sorting and filtering). AUGUSTUS extrinsic.cfg file was adjusted for considering information about potential intron boundaries from RNAseq data (larger bonus for intron confirmed by RNA mapping, tiny penalty if not). Final gene prediction was done by AUGUSTUS [23] (version 2.7.) with intron hints and species parameter was set to "botrytis_cinerea".

Whole genome alignment
Whole-genome alignment of the assembled contigs was performed in 2 steps. First, we used LASTZ [48], the program which identifies the regions of local similarity, to match the contigs from different samples. Single_cov2 from TBA package [49] was used to filter out the lower-scored alignments in regions with more than one significant alignment. Then, to increase the length of the alignment blocks, we performed global alignment of contig groups obtained on stage 1 using CLUSTAL [50]. For the analysis of the genomic regions with the conflicting phylogenetic configuration we only used the alignment blocks of length >20 kbp. The entire length of such blocks is 5.6 Mbp.

Identifying regions with noncanonical phylogeny
We considered a nucleotide site to support phylogenetic configuration (strain A, (strain B, strain C)), if nucleotides in strain B and strain C are identical and distinct from nucleotide in strain A, also we required nucleotide in strain A to be carried by at least 6 of the rest 11 sequenced G. spp. strains. Phylogenetic configuration (VKM F-3808, (VKM F-3557, VKM F-4514)) was name canonical as it stands for the vast majority of the genome, whereas phylogenetic configuration (VKM F-3557, (VKM F-3808, VKM F-4514)) and (VKM F-4514, (VKM F-3808, VKM F-3557) were named non-canonical. The nucleotide frequency of sites with noncanonical phylogenetic configuration is 0.002.
We considered a window of length 200 nt to have a noncanonical phylogenetic configuration, if the number of nucleotide sites supporting a noncanonical phylogenetic configuration exceeds the number of sites with canonical phylogenetic configuration by at least 8 nucleotides. The threshold of 8 guaranties that less than 0.01 such windows would be found at random. The overlapping windows were combined into the resulting regions with the boundaries set at nucleotide sites supporting noncanonical phylogenetic configuration. PAML implementation of Kishino-Hasegawa test was run to compare phylogenetic configurations and calculate bootstrap values [51], pRELL threshold was set at 0.95.
To ensure the regions with altered phylogenetic configuration are not assembly artifacts, we mapped the original sequence reads using bwa [52] program on the regions with noncanonical phylogenetic configuration, overlapping the boundaries of the region to ensure that these region are not the assembly artifacts. Regions with noncanonical phylogenetic configuration show coverage similar to the rest of the genome.

Calculating phylogenetic distances, number of gene losses and synteny
To identify gene orthologs we searched bidirectional best hits for each pair of P. spp. strains. We obtained 7524 groups of homologous genes, which are present in each of these 14 strains. Then, each group of homologous genes was aligned with MACSE [53]. Finally, the concatenate of alignments was used to calculate synonymous and nonsynonymous distances with codeml program from PAMLpackage. Only codon columns present in all 14 strains were used in the analysis. Dendroscope (v. 3.2.10) [54] was used for visualizations of phylogenies. We evaluate number of genes lost on each branch from sets of orthologs which have no blast hits to exon sequences in certain lineages. The lost gene is considered pseudogene if the significant blast hit to genome is observed but gene structure is disrupted, the gene is considered deleted if there is no significant blast hit to genome.
Gene orthologs were also used to estimate synteny across P. spp. strains. The pair of two orthologous genes was considered syntenic if those genes were adjacent in each strain. The pair of two orthologous genes where genes were adjacent only in one strain was considered nonsyntenic. Total numbers of syntenic orthologous pairs out of all orthologous pairs are shown in Additional file: 2 Table S1.