Genome sequences of 12 isolates of the EU1 lineage of Phytophthora ramorum, a fungus-like pathogen that causes extensive damage and mortality to a wide range of trees and other plants

Here we present genome sequences for twelve isolates of the invasive pathogen Phytophthora ramorum EU1. The assembled genome sequences and raw sequence data are available via BioProject accession number PRJNA177509. These data will be useful in developing molecular tools for specific detection and identification of this pathogen.


Experimental design, materials and methods
Fungus-like pathogens belonging to the oomycete genus Phytophthora pose significant threats to a wide range of plants [1]. Recent studies have generated whole-genome sequence data for Phytophthora species that cause disease in trees [2][3][4][5][6]. Phytophthora ramorum is an exotic pathogen whose geographical origin is unknown. In North America, P. ramorum is responsible for Sudden Oak Death while in Europe it causes Sudden Larch Death and Ramorum Blight [7][8][9][10].
Four distinct lineages are known, which have been isolated from each other for hundreds of millennia [11][12][13]. For more than a decade, a reference genome sequence was available [14] for NA1, the lineage that has established itself in the wild (i.e. outside of the nursery trade) in North America. No genome sequence was available for lineage EU1, the first lineage to be discovered in Europe and which has subsequently been detected in North America [15] [16].
We previously reported genome sequences [3] for one of the two lineages found in Europe, namely EU2. Here we present the first genome sequences for lineage EU1 isolates, which were collected from several host species in several counties of England (see Table 1). The availability of genome sequences from multiple lineages will help to address the question of what are the genetic differences that underlie observed phenotypic differences [17] among the lineages as well evolutionary relationships among lineages and the possibility of identifying lineagespecific molecular markers. Availability of sequence data from multiple isolates within a single lineage may further offer insights into the recent evolutionary events following colonization of a new geographical range and new host populations [18]. In the absence of sexual recombination in these diploid pathogens, one mechanism for rapid adaptation may be aneuploidy and/or loss of heterozygosity (LOH) [19][20][21][22][23].
Paired-reads were generated from genomic sequence libraries, following the manufacturer's instructions, on the Illumina HiSeq 2000 or Illumina GA IIx massively parallel sequencing platforms. Numbers of reads, lengths and database accession numbers for the raw reads are listed in Table 1.

Contents lists available at ScienceDirect
Genomics Data j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / g d a t a read1-filtered.fq -pe1-2 read1-filtered.fq -o output-directory". During submission of the assemblies to GenBank [27], we removed sequences identified by the NCBI curators as contamination from vectors, mitochondria, bacteria etc. Assembly statistics are summarized in Table 2.
We assessed the completeness of the genome assemblies using BUSCO (Benchmarking Universal Single-Copy Orthologs) [28], which checks for the presence of single-copy orthologous genes commonly conserved across eukaryotes. BUSCO denotes each gene as "complete single copy", "complete duplicated", "fragmented", or "missing" in the assembly. Table 2 shows the percentage of these 429 genes that are "complete single copy" in each genome assembly. The levels of completeness (83.22 to 84.15) are comparable to those of six recently published Phytophthora genomes [5], which had up to 82.8% completeness, as assessed by the same method (Table 3).
Average nucleotide identities (ANI) were calculated, using the dnadiff tool in MUMMer [29,30], between EU1 and previously published assemblies of closely related genomes [3][4][5][6]14]. The Pr EU1 assembly shared 99.2% ANI with Pr NA1 and 98.7% ANI with Pr EU2 suggesting a more ancient divergence between EU1 and EU2 than between EU1 and NA1. Between Pr EU1 and its sister species P. lateralis, there was 91.5% ANI. The dnadiff analysis also revealed that 1.5% of the EU1 genome is not alignable against the previously published genomes of EU2 and NA1, suggesting that there is a significant complement of lineage-specific genome content, including genes encoding effector proteins.
Heterozygosity has previously been observed in P. ramorum lineage NA1 [14] and is apparent in the newly presented data here for lineage EU1. We surveyed the distribution of heterozygosity across the genome by aligning sequence reads against the previously published genome sequence assembly of NA1 [14], which we downloaded from the Joint Genome Institute at http://genome.jgi.doe.gov/ramorum1/ramorum1. download.ftp.html. Prior to alignment using BWA-mem [31,32], the reads were first filtered using TrimGalore as described above. The resulting alignment was converted to mpileup format using SAMtools [33]. By parsing the mpileup file, it was possible to count the number of sites that were probably homozygous (N 95% consensus among aligned reads) and those that were probably heterozygous (N45% and b55% consensus). Fig. 1 and Fig. 2 show plots of rates of heterozygosity respectively over scaffold 7 and scaffold 24 of the reference genome. On scaffold 7, there are large stretches with little or no heterozygosity in isolates CC2168, CC2176, CC2184, CC2186, CC2275 and CC12475 while the same regions show normal levels of heterozygosity in the other isolates. This suggests that CC2168, CC2176, CC2184, CC2186, CC2275 and CC12475 have undergone LOH in these regions of scaffold 7. The depths of sequencing coverage are normal (see panel B in Fig. 1) across the LOH regions, indicating that this is copy-number-neutral LOH rather than hemizygosity. Similarly, isolate CC2184 appears to have undergone copy-number-neutral LOH on scaffold 24 (Fig. 2); similar patterns can be observed on several other genomic scaffolds including scaffolds 11, 14, 16 and 33. It is not clear whether these putative LOH events occurred during growth on the host plant or whether they occurred subsequently in the laboratory after collection. However, a recent study of phenotypic and genetic variation in lineage NA1 concluded that partial aneuploidy and copy-neutral LOH were induced by the host. The most unique pattern of LOH among the EU1 isolates was observed for isolate CC2184 from yew (Taxus sp.); it would be interesting to survey additional isolates from this host and check whether they display the same distinctive LOH profile across their genomes.
Whole-genome sequence data are now available for multiple isolates of both of the P. ramorum lineages found in Europe, that is EU1 (this study) and EU2 [3,6] as well as for the NA1 lineage found in North America [14]. As well as being a resource for biological and evolutionary research on this important invasive species, it also allows the identification of genomic sequences that could be targeted in new molecular tools for detection and identification of the species and lineages. Furthermore, identification of loci that are polymorphic among different   1. Heterozygosity profiles of twelve Phytophthora ramorum EU1 isolates over scaffold 7. The previously published P. ramorum NA1 genome sequence [14] was downloaded from the Joint Genome Institute at http://genome.jgi.doe.gov/ramorum1/ramorum1.download.ftp.html and used as a reference sequence, against which genomic sequence reads from each of the 12 isolates were aligned with BWA-mem [31,32]. Panel A: we used a sliding window of 1000 nucleotides to calculate the rate of heterozygosity. Proportion of single-nucleotide positions at which 45-55% of the aligned reads contain the second-most abundant nucleotide was expressed as a percentage; that is the vertical axis represents percentage heterozygosity. Panel B: we used a sliding window of 1000 nucleotides to calculate average depth of coverage by aligned reads. The vertical axis represents depth of coverage, normalized so that the median depth over the whole genome is one. In both panels, the horizontal axis represents position on the scaffold and regions of zero heterozygosity are highlighted in yellow.
isolates within the single lineage offers opportunities to track the spread of the pathogen in time and space at high resolution.