Draft Genome Assembly of Colletotrichum chlorophyti, a Pathogen of Herbaceous Plants

ABSTRACT Colletotrichum chlorophyti is a fungal pathogen that infects various herbaceous plants, including crops such as legumes, tomato, and soybean. Here, we present the genome of C. chlorophyti NTL11, isolated from tomato. Analysis of this genome will allow a clearer understanding of the molecular mechanisms underlying fungal host range and pathogenicity.

C olletotrichum spp. comprise a group of diverse fungi, many of which are pathogens of agriculturally important plants. Among these, C. chlorophyti has been reported to associate with a variety of herbaceous plant species, including important crop plants such as legumes (1), tomato, and soybean (2). Infections have been reported to occur on leaves, as well as in seeds. Phylogenetic analysis has revealed that C. chlorophyti does not belong to any of the major species complexes identified in the Colletotrichum genus to date whose members have previously been sequenced (3), although it is closely related to C. phaseolorum, which is also a known pathogen of soybean. Thus, the genome sequence of C. chlorophyti will be useful not only by providing information of an agricultural pathogen but also for genus-wide studies analyzing Colletotrichum diversity and host range. In this study, we present the draft genome sequence of C. chlorophyti strain NTL11, which was isolated from infected tomato leaves.
Genomic DNA was isolated from hyphae grown in vitro and purified using the Genomic-tip 100/G kit (QIAgen) following the protocol described for the 1000 Fungal Genomes Project. Two 100-bp paired-end libraries with approximately 150-bp and 500-bp insert sizes were prepared using the TruSeq DNA PCR-Free library preparation kit and sequenced using the Illumina HiSeq 2500 platform (RIKEN OSC) with 54ϫ coverage. Reads were trimmed using Trimmomatic version 0.33 (4). The acquired reads were assembled using SOAPdenovo version 2.21 (5).
The draft assembly of C. chlorophyti consists of 558 scaffolds with a total length of 52.4 Mb (N 50 : 644,295; N 75 : 313,035; L 50 : 26; L 75 : 56) and a GϩC content of 50.06%. The completeness of the assembly was assessed using a set of 1,438 conserved fungal genes identified as benchmarking universal single-copy orthologs using the BUSCO version 1.1b1 program (6). From this analysis, the assembly was estimated to include 99.9% of the assessed loci (98.5% complete, 1.3% fragmented).
Protein-coding genes were predicted using the MAKER release 2.31.8 (5) annotation pipeline with Augustus version 3.1 (7), GeneMark-ES version 4.21 (8), and SNAP (9) with conserved proteins from the genome of C. incanum (10) as a training set. Augustus was trained using a set of C. chlorophyti genes identified using the CEGMA set of conserved eukaryotic genes identified with CEGMA version 2.5 (11). A total of 10,419 proteincoding genes were predicted in the genome. Predicted proteins were classified as secreted when predicted to have a signal peptide using SignalP version 4.1 (12), to have no transmembrane domains according to TMHMM version 2.0 (13), and to have no GPI anchors according to BIG-PI fungal predictor (14). Gene-coding sequences were annotated with the Trinotate version 3.0.0 program (https://trinotate.github.io) by integrating information from the SWISS-PROT (15) and Pfam (16) databases. A total of 851 proteins were predicted to be secreted, including 279 that had no match in the Swissprot (15) database.
Accession number(s). The sequences were deposited in DDBJ/EMBL/GenBank under the accession number MPGH00000000. The version described in this paper is the first version, MPGH01000000. Files are also available at: https://sites.google.com/site/ colletotrichumgenome.

ACKNOWLEDGMENTS
This work was supported in part by the Council for Science, Technology and Innovation (CSTI), Cross-Ministerial Strategic Innovation Promotion Program (SIP), "Technologies for Creating Next-Generation Agriculture, Forestry and Fisheries" (funding agency: Bio-Oriented Technology Research Advancement Institution, NARO), by the Science and Technology Research Promotion Program for the Agriculture, Forestry, Fisheries, and Food Industries to Y.N., Y.T., and K.S., and by Grants-in-Aid for Scientific Research (KAKENHI) (24228008 and 15H05959 to K.S., 15H04457 to Y.T.). A.T. was funded by the Junior Research Associate Program of RIKEN. Computations were partially performed on the NIG supercomputer at the ROIS National Institute of Genetics.