Draft genome assembly of Colletotrichum musae, the pathogen of banana fruit

Colletotrichum musae is an important cosmopolitan pathogenic fungus that causes anthracnose in banana fruit. The entire genome of C. musae isolate GM20 (CMM 4420), originally isolated from infected banana fruit from Alagoas State, Brazil, was sequenced and annotated. The pathogen genomic DNA was sequenced on HiSeq Illumina platform. The C. musae GM20 genome has 50,635,197 bp with G + C content of 53.74% and in its present assembly has 2763 scaffolds, harboring 13,451 putative genes with an average length of 1626 bp. Gene prediction and annotation was performed by Funannotate pipeline, using a pattern for gene identification based on BUSCO.


Subject area
Biology More specific subject area Microbiology, Agricultural, Genomics.
Type of data Genome sequence data How data was acquired Illumina HiSeq. 2500 Next Generation Platforms

Data format
Assembled genome sequence.

Experimental factors
Genomic DNA was extract from mycelial growth in culture medium.

Experimental features
Genome of Colletotrichum musae strain GM20 was sequenced and assembled.

Data source location
Colletotrichum musae strain GM20 was isolated from banana lesions, in Maceio, Pernambuco Brazil.

Value of the Data
Colletotrichum musae is the causal agent of anthracnose in banana fruits, the main disease postharvest worldwide.
This is the first genome sequence of Colletotrichum musae using next-generation sequencing available in public database.
The published genome data herein will facilitate biology, pathogenicity, evolution and interaction pathogen-host studies of Colletotrichum musae, through comparative genomes studies of Colletotrichum spp. and related species.

Data
Fungi infection in plants is the most frequent cause of extensive loses in Agriculture. The fact that many endophytic fungi can case infection adds further complexity to fungal plant pathogens. Banana (Musa sp.) is one of the world's important food crops and a staple food for more than 400 million people [1]. Over 100 million tons are produced worldwide at some 5 million hectares and the cultivated area is expected to increase in the future [2]. However, banana fruits are highly susceptible to pathogens, and anthracnose disease caused by fungi from Colletotrichum genus is amongst the most frequents. Colletotrichum comprises over 100 species that are able to infect and damage diverse crops around the world [3].
Due to its ubiquity, substantial destruction capacity and scientific importance as a model of pathosystems, Colletotrichum spp. are among the top 10 of most important plant pathogens according to the international community of plant pathology researchers [4]. Colletotrichum musae (Berk. and M. A. Curtis), the causative agent of anthracnose, is a major post-harvest pathogen of banana fruits and causes severe global crop losses [5]. The disease develops from a latent fungal infection during preharvest, originated from spores that are present in immature fruits in the field. Symptoms, such as patches on the bark (brown to black color) and depressed lesions, appear in the ripening of the fruits. Furthermore, under high humidity, the formation of salmon-colored acervuli can be observed [6]. The infection thus accounts for a reduction in fruit viability during maturation, transport and storage periods [7], leading to a commercial depreciation and shortening fruit's shelf life.
To circumvent post-harvest losses, chemical fungicides are usually adopted, but other sidemethods (e.g., radiation treatment, hot water removal, refrigeration, induced resistance and biological control agents) have also been applied [8]. However, chemical fungicide usage has been limited by potential harmful effects to human health and environment. Besides, fungal pathogens are known to quickly develop resistance to chemical defensives [9].
Furthermore, the absence of available genomic sequences from C. musae is one of the main limitations for best characterization of fungal virulence determinants and development of improved management strategies. Here we report, for the first time, the whole genome sequence of the C. musae strain GM20 (CMM 4420) isolated from infected banana fruit from Alagoas, Brazilian Northeast State.
In recent years, several phytopathogenic fungal genomes have been published boosting the discovery of virulence determinants in these species. Expectedly, our analysis will encourage further studies of C. musae biology, which should provide better details about host-pathogen interaction, leading to new management measures.

DNA extraction and genome sequence
The GM20 isolate of C. musae was cultured, and DNA was extracted as previously described [10]. Whole shotgun genome sequence of C. musae GM20 was generated using the Illumina HiSeq. 2500 platform (Illumina, San Diego, CA) at the Center for Functional Genomics -Universidade de São Paulo (Piracibaba, Brazil). The libraries were prepared with the Illumina Nextera XT DNA Library Prep Kit (Illumina, San Diego, CA) and the sequencing was performed on a HiSeq Flow Cell v4 with HiSeq SBS Kit v4 (Illumina, San Diego, CA), leading to 100 bp paired-reads (2×).
Assembly statistics were generated by QUAST 3.9 (Table 1) [16]. Gene prediction and annotation was carried out with Funannotate pipeline [17] BUSCO 2.0 [18] [parameters: Sordariomycetes database (Verticillium longporum selected as closely-related species)] to generate the training files for two genome predictors: GeneMark-ES [19] and AUGUSTUS [20]. Moreover, BUSCO 2.0 was employed to evaluate genome completeness, based on conservation of single-copy benchmarking universal singlecopy orthologs (BUSCOs). The final assembly of the C. musae GM20 genome was determined to be 50,635,197 bp with a G þC content of 53.74% in 2763 scaffolds (maximum 208,119 bp; N50 32,818 bp), and 13,451 genes were predicted. This whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession number NWMS00000000. The version described is this paper is version NWMS01000000.
BUSCO analysis showed a high degree of completeness with a BUSCO score of 96.3%, of which 1263 genes were complete BUSCOs, four were complete duplicated BUSCOs, 23 were fragmented BUSCOs, and 25 were missing BUSCO orthologs out of the 1315 BUSCO groups searched.