Draft Genome Sequence of a Diploid and Hybrid Candida Strain, Candida sanyaensis UCD423, Isolated from Compost in Ireland

ABSTRACT Candida sanyaensis is a CUG-Ser1 clade yeast that is associated with soil. Assembly of short-read and long-read data shows that C. sanyaensis has a diploid and hybrid genome, with approximately 97% identity between the haplotypes. The haploid genome size is approximately 15.4 Mb.

T he yeast Candida sanyaensis is a member of the subphylum Saccharomycotina and the phylum Ascomycota and is closely related to Candida sojae and Candida tropicalis (1). The species was originally isolated from soil samples from Hainan Island in south China and Taiwan. C. sanyaensis UCD423 was isolated from a wormery in Dublin by two passages of compost material in 9 ml liquid yeast extract-peptone-dextrose (YPD) medium containing chloramphenicol (30 mg/ml) and ampicillin (100 mg/ml) and culture on YPD plates at room temperature, similar to a method reported previously (2). The species was identified from the internal transcribed spacer (ITS) sequence, which is 99% identical to that of C. sanyaensis (1).
For Illumina sequencing, total genomic DNA was extracted from a YPD culture and purified by extraction with phenol-chloroform-isoamyl alcohol. Libraries were generated from 1 mg genomic DNA and sequenced by BGI Tech Solutions Co. (Hong Kong), as described by Morio et al. (3). A total of 150 bases were sequenced from each end with an Illumina HiSeq 4000 instrument, yielding 6.5 million spots. For long-read sequencing, genomic DNA was extracted using the Qiagen Genomic-tip 100G kit. The sequencing library was generated from 400 ng of DNA using a rapid barcoding kit (SQK-RBK004) from Oxford Nanopore Technologies (ONT), following the manufacturer's instructions. This library was mixed with three libraries from unrelated projects, purified with AMPure XP magnetic beads (Beckman Coulter), and eluted in 12 ml Tris-EDTA (TE) buffer, of which 10 ml was used for sequencing on a FLO-MIN106 flow cell primed with kit EXP-FLP002 on a MinION 1B sequencer using MinKNOW software v4.1.22 (ONT). Base calling (using the fast model [dna_r9.4.1_450bps_fast.cfg]) and demultiplexing were performed using Guppy v4.2.2 (ONT). The total number of MinION reads was 561,441, with a read N 50 of 11,647 bp.
Using the diploid parameters with Canu resulted in an assembly that largely kept two haplotypes separate (Fig. 1). The first haplotype (haplotype A) is represented by the largest 10 scaffolds, ranging from 2.94 Mb to 369 kb (Fig. 1). The second haplotype (haplotype B) is fragmented into much smaller contigs, each of which matches a contig from the first haplotype (Fig. 1). Therefore, C. sanyaensis has a diploid genome and the haplotypes differ by ;3.3%, suggesting that the genome results from hybridization between related but not identical parents (9). The final diploid assembly of the MinION data was error corrected by incorporating the Illumina data, using nine rounds with Pilon v1.23 (10). a The SPAdes assembly of the short-read data alone is highly fragmented. The haploid Canu assembly of the long-read data has the smallest number of contigs. b For the long-read data alone, the haploid assembly has a smaller genome size and a greater N 50 value than the diploid assembly. The average sequence identity between the haplotypes is 96.72%, as calculated from scaffolds of .100 kb using the average nucleotide identity (ANI) calculator described by Rodriguez and Konstantinidis (14) with default parameters.
Data availability. This whole-genome shotgun project has been deposited in DDBJ/ ENA/GenBank under accession number CAJVQF00000000. The raw reads from Illumina sequencing are available under SRA accession number ERR6313261 and those from MinION sequencing under SRA accession number ERR6310792. These data are also available under project PRJEB46370. The ITS sequence is at accession number MZ507576.

ACKNOWLEDGMENTS
This work was supported by undergraduate teaching resources from University College Dublin. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Thanks go to Rebeca Clavero for isolating the yeast.