Whole-genome comparisons of Penicillium spp. reveals secondary metabolic gene clusters and candidate genes associated with fungal aggressiveness during apple fruit decay

Blue mold is a postharvest rot of pomaceous fruits caused by Penicillium expansum and a number of other Penicillium species. The genome of the highly aggressive P. expansum strain R19 was re-sequenced and analyzed together with the genome of the less aggressive P. solitum strain RS1. Whole genome scale similarities and differences were examined. A phylogenetic analysis of P. expansum, P. solitum, and several closely related Penicillium species revealed that the two pathogens isolated from decayed apple with blue mold symptoms are not each other’s closest relatives. Among a total of 10,560 and 10,672 protein coding sequences respectively, a comparative genomics analysis revealed 41 genes in P. expansum R19 and 43 genes in P. solitum RS1 that are unique to these two species. These genes may be associated with pome fruit–fungal interactions, subsequent decay processes, and mycotoxin accumulation. An intact patulin gene cluster consisting of 15 biosynthetic genes was identified in the patulin producing P. expansum strain R19, while only a remnant, seven-gene cluster was identified in the patulin-deficient P. solitum strain. However, P. solitum contained a large number of additional secondary metabolite gene clusters, indicating that this species has the potential capacity to produce an array of known as well as not-yet-identified products of possible toxicological or biotechnological interest.


Fungal growth, genomic DNA extraction, genome sequencing, assembly, and annotation
Growth, both in culture and in vivo including inoculation of apple, and genomic DNA extraction of P. expansum R19 were conducted using the same approaches as previously published (Conway, 1982;Yu et al., 2014). The sequencing, assembly, and annotation pipeline previously described in our study on P. solitum RS1 (Yu et al., 2016) was used for the genome of P. expansum R19. Briefly, the assembly was conducted using HGAP3 under default settings, which applies long read correction algorithms and the Celera assembler to confidently produce high quality contigs (unitigs) using reads from PacBio Single Molecule Real Time (SMRT) sequencing technology (Ricker et al., 2016). The annotation software MAKER (Holt & Yandell, 2011) was implemented for four iterative runs starting with gene predictions from CEGMA (Parra, Bradnam & Korf, 2007). This newly sequenced P. expansum genome was 32,356,049 bp which is 97.3% of the currently published P. expansum MD-8 genome. The BUSCO (v. 3.0.2) analysis resulted in 289 complete BUSCOs out of 290 (dikarya odb9) and one fragmented BUSCO for a 99.7% complete assembly (Waterhouse et al., 2018). In the annotation step, P. solitum proteins were used as a part of the protein evidence set instead of P. expansum proteins.

Markov cluster algorithm (MCL) clustering
The protein sequences of the seven selected Penicillium species were subjected to BLAST comparison against each other (Altschul et al., 1990). The bitscores were then used to generate clusters of genes via the TRIBE-MCL algorithm (Enright, Van Dongen & Ouzounis, 2002) using default settings.

Gene Ontology (GO) annotation and Fisher's exact tests
P. expansum R19 protein sequences were used in a BLAST comparison against the NCBI NR protein database and the results were used to predict GO annotation using Blast2go with default settings (http://www.blast2go.com). Further, GO enrichment in selected groups of genes was tested using Fisher's exact tests. The same analysis was also performed on P. solitum RS1 protein sequences.

Secondary metabolite gene clusters
Secondary metabolite gene clusters were identified using the antiSMASH program for fungi at default settings (Weber et al., 2015).

RESULTS
The growth of Penicillium expansum and P. solitum was compared, both on agar plates as well as the aggressiveness on apples incubated at different temperatures. When cultured on potato dextrose agar, both fungi grew on all temperatures ranging from 0 to 20 • C although P. solitum grew slightly slower than P. expansum (Figs. 1A,1B). However, when inoculated onto apple fruit, the differences in aggressiveness were much more pronounced (Figs. 1C,1D). Both pathogens are necrotrophic and require a wound for infection, they are unable to puncture the skin of the fruit due to the lack of an appressorium and rely on pectin degrading enzyme production for growth (Yao, Conway & Sams, 1996;Jurick et al., 2009, p. 20019). Decay was evident 7 days post inoculation (DPI) on apples inoculated with P. expansum at 10 and 20 • C, while the apples inoculated with P. solitum displayed only a small lesion around the inoculation point at 20 • C (Figs. 1C, 1D).
To achieve optimal genome sequence and ensure reliable downstream bioinformatic comparisons, the PacBio Single Molecule Real Time (SMRT) sequencing platform was utilized to resequence P. expansum R19, with sequencing coverage reaching 65-fold. An assembly of 16 unitigs with an N50 value of 8.17 Mbp was achieved and the genome assembly was deposited under BioSample SUB4649056. Gene annotation revealed 10,560 putative protein coding genes. Using CEMGA 99.2% complete core eukaryotic genes (CEGs) and 99.6% partial (including complete) CEGs were predicted in the genome. A genome-wide multi-gene phylogenetic approach was then implemented, involving 399 CEGS, using P. expansum, P. solitum, and several other related Penicillium species, with Aspergillus flavus as a designated outgroup (Fig. 2). The closest relative of P. solitum amongst the Penicillium spp. was P. camemberti, while for P. expansum, the most phylogenetically similar were the two citrus pathogens P. digitatum and P. italicum (Fig. 2). P. roqueforti, a cheese making species associated with blue cheeses and P. chrysogenum, the penicillin-producing species, formed another distant, yet distinct clade (Fig. 2).
For P. expansum R19, 7,921 proteins were annotated (75% of all proteins) with a total of 37,962 GO entries (4.8 GO entries per protein), representing 6,000 unique GO categories. For P. solitum RS1, 7,920 proteins were annotated (74.2% of all proteins) with a total of 37,858 GO entries (4.8 GO entries per protein), representing 5,993 unique GO categories. We analyzed GO enrichment in a group of 877 proteins that were over-represented in P. expansum, defined as having ≥ 2X more copies than detected in P. solitum (Table 1). We also tested the GO enrichment in 550 P. expansum proteins that were over-represented in both blue mold fungi, defined as having on average ≥ 2X copies in P. expansum and P. solitum than in the five Penicillium species included in this study (Table 2). An analysis using P. solitum proteins over-represented in the two pome blue mold Penicillium species revealed similar results. Also, we examined the GO enrichment in a group of 1,096 proteins that are represented more frequently in P. solitum, defined as having ≥ 2X copies in P. solitum than in P. expansum (Table 3). The seven Penicillium species included in this study, with a total of 9,960 protein families, encompassing 78,479 proteins, were identified. All species shared 5,308 protein families, encompassing 64,598 proteins, or 82.3% of the total proteins also referred to as the proteome. For P. expansum 86.4% (9,119 out of 10,560), and for P. solitum 85.3% (9,099 out of 10,672) of the proteins were shared with the other species. However, a small set (36)   and 43 proteins in P. solitum (Table S1). Amino acid sequences from both fungi that cause blue mold of pome fruits were also compared. The more aggressive species, P. expansum R19, contained 222 gene families, encompassing 261 proteins that were not present in the less aggressive species P. solitum RS1. Similarly, there were 299 gene families, accounting for 375 proteins present in P. solitum but not in P. expansum, (Table S1). In addition, P. expansum R19 contained significant numbers of proteins with copy numbers that are one to several fold more than P. solitum RS1 (Table S1). In the P. expansum R19 genome, we identified the fifteen-gene patulin cluster (Fig. 3). In the P. solitum genome, a sequence similarity search (BLAST) identified a partial patulin gene cluster, spanning 31 kb (Fig. 3). Within the P. solitum partial gene cluster, only seven of the identified patulin biosynthetic genes were present (Fig. 3), with amino acid sequence identity >91% and e-value = 0. The seven genes identified from the partial cluster were patC, patD, patG, patH, patL, patM, and patN (Fig. 3). Five other genes (patA, patB, patE, patK, and patO) were found dispersed elsewhere in the genome with much lower amino acid sequence identity (between 29% and 49%). Three genes, patF, patI and patJ, were not found anywhere in the genome (e-value cutoff of 1e −5 ). Using the antiSMASH  platform, all seven of the Penicillium spp. genomes were analyzed to determine their secondary metabolism genes. Within the P. solitum genome, we identified 66 gene clusters that were putatively involved in secondary metabolism (SM), more than any other of the Penicillium species analyzed in this study (Fig. 4). In contrast, 58 gene clusters were found in P. expansum, 21 gene clusters in P. italicum, and 35 gene clusters in P. digitatum.

DISCUSSION
The first aim of this study was to provide a high quality, annotated, and assembled genome sequence of P. expansum (R19), so that it could be used for further downstream bioinformatic analyses. The longer reads achieved by the PacBio platform enabled better assembly and resulted in fewer gaps than illumina sequencing has produced. PacBio SMRT sequencing technology allows for Hence, an assembly of 16 unitigs with an N50 value of 8.17 Mbp was achieved, a major improvement over the 48.5 kbp previously achieved using Illumina (Yu et al., 2014). Although P. expansum and P. solitum cause blue mold decay of pome fruits (Frisvad, 2004;Jurick et al., 2010), the closest relative of P. solitum was determined to be P. camemberti, a species named after the distinctive soft cheese with a white rind (Cheeseman et al., 2014). In addition to causing blue mold on apple, P. solitum has been isolated from spoiled processed meats, cheeses, and margarine which are commonly stored at 4 • C like apples (Pitt et al., 1991;Hocking et al., 1998). Moreover, P. solitum has been isolated from several very cold, high salt environments and has been termed an ''extremophile'' (Stierle et al., 2012). In contrast, for P. expansum, the closest relatives in our analyses were P. digitatum (Marcet-Houben et al., 2012) and P. italicum (Ballester et al., 2015) indicating that ancestors of these three species may have occupied the same carbohydrate-rich niches associated with fruit decay.
By comparing the two blue mold fungi at the whole-genome scale, our aim was to identify differences in genetic factors that are linked to pathogen aggressiveness during pome fruit rot in addition to finding genes specific to the apple fruit pathogens. Thirty-six protein families were present only in the two pome fruit pathogens (P. expansum and P. solitum), but absent in the other five Penicillium species examined.. In P. expansum there were 222 gene families not found in P. solitum. of which 44 protein families were present only in P. expansum R19, but absent in the other species analyzed in this study. These protein families are of primary interest since they may represent specific genes involved in pome fruit maceration or account for P. expansum's increased aggressiveness during apple fruit decay. Here, we discuss the potential role of several P. expansum and P. solitum-specific enzymes.
To examine similarities and differences in gene repertoires amongst the seven Penicillium species, we identified protein families using a Markov cluster algorithm called TRIBE-MCL that detects and categorizes eukaryotic protein families (Enright, Van Dongen & Ouzounis, 2002). Gene ontology (GO) enrichment analyses were performed using annotated protein sequences via Blast2go (Conesa et al., 2005). Several categories related to combating plant defense mechanisms were enriched, including phenylpropanoid catabolic process, cinnamic acid catabolism, and response to iron ion sequestration (siderophores). Flavonoid metabolism, a specific gene category involved in plant defense, was enriched, which may be beneficial for the fungus to overcome the phenolic-rich (quercetin) environment of the apple host tissue, known to be important in basal defense systems of apple fruits (Sun et al., 2017). These categories included lyase activity. It is well established that lyases degrade cell wall components like pectin and they have been shown to be involved in the maceration of pome fruit tissues by P. expansum and P. solitum (Yao, Conway & Sams, 1996;Jurick et al., 2009).
Single copies of SnoaL-like polyketide cyclases (PF07366) are found in both P. expansum and P. solitum (Table S1). A more detailed analysis of PF07366 on http://pfam.xfam.org/ reveals their presence in a number of prokaryotes, in two other fungal species, and a few other eukaryotes (Table 4). These eukaryotic members are either aquatic or plant related (Table 4), suggesting that this gene might have been laterally transferred between organisms living in close proximity or occupying the same ecological niche. These cyclases are backbone genes for polyketide secondary metabolite biosynthesis and thus involved in fungal secondary metabolism. A Glucose-6-phosphate isomerase (PF10432) is found in a number of prokaryotes and only two eukaryotes, P. expansum and P. solitum (http://pfam.xfam.org/, Table S1), suggesting that this gene might have been laterally transferred. The exact function it plays in pome fruit decay warrants further exploration via RNAi and /or gene deletion studies.
A thermolysin metallopeptidase (PF01447) was found only in P. expansum, but not in P. solitum or any of the other five Penicillium species included in this study. Peptidases are known virulence factors for fungal plant pathogens such as Fusarium graminearum. They break down plant host proteins to provide nutrients for fungal growth, reproduction, and colonization (Lowe et al., 2015). This P. expansum unique metallopeptidase may play a role in pome fruit decay and should be explored further. In addition to the above-mentioned examples, several putative transcription factors, hydrolases, and other enzymes that are unique to one or both blue mold Penicillium species were uncovered (Table S1). They provide a guide for future research to pinpoint the genes essential for postharvest apple fruit decay.
Patulin is a well-studied, polyketide-derived mycotoxin commonly found in apple products. Due to its carcinogenic effects, the European Union (EU) has set limits on the maximum allowable amount of patulin to 50 µg/L for juice and fruit derived products, 25 µg/L for solid apple products, and 10 µg/L for juices and food for babies and infants (European Union, 2003). The U.S. Food and Drug Administration (FDA), as well as regulatory agencies in other countries, have imposed limits on the permissible amounts of patulin in fruit juices and processed pome fruit products for human consumption at 50µg/L (Puel, Galtier & Oswald, 2010). There are 15 genes involved in patulin biosynthesis found within a well-defined gene cluster (Tannous et al., 2014;Li et al., 2015;Ballester et al., 2015). The structural organization of the cluster is shown in the same order in our P. expansum R19 strain as reported earlier (Tannous et al., 2014). The partial patulin gene cluster arrangement in P. solitum was very similar to that found in P. camemberti (Ballester et al., 2015), with the same seven patulin biosynthetic genes present and arranged in the same configuration. The incomplete patulin gene cluster found in both P. solitum and in P. camemberti may explain why these species are unable to produce patulin in either liquid or solid culture conditions (Frisvad, 2004). Curiously, some of the patulin genes were dispersed outside of the cluster, likely due to chromosomal rearrangements. Our genomic data suggests that the ability to produce patulin was lost in the ancestor of these two species.
While patulin is the mycotoxin of primary concern to apple producers, processors and packers in the U.S. and EU, it is only one of many secondary metabolites known to be produced by Penicillium species (Puel, Galtier & Oswald, 2010;Wright, 2015). Genes responsible for the synthesis of secondary metabolites are generally organized together in clusters in fungal genomes. Despite the large number of putative SM gene clusters in these Penicillium species, only a few gene products such as patulin, penicillic acid, and citrinin are well studied in agriculture and food science. In addition, because P. solitum is an extremophile, there are reports of several interesting secondary metabolites isolated from unusual environments that merit further study in order to determine if they are produced in apples with blue mold symptoms. For example, a strain of P. solitum isolated from The Berkeley Pit, a former copper mine located in Butte, Montana produces two drimane sesquiterpene lactones named Berkedrimanes A and B (Stierle et al., 2012). These secondary metabolites inhibit the signal transduction enzymes caspase-1 and caspase-3 and mitigate the production of interleukin 1-β in a leukemia cell line (Stierle et al., 2012). Another biologically active metabolite from P. solitum is solistatin, a phenolic compactin analogue related to Mevastatin, one of the classes of statins which inhibit HMG-CoA reductase, which are widely prescribed as cholesterol lowering agents (Sørensen et al., 1999).
Across the fungal kingdom, however, the products of bioinformatically discovered SM gene clusters are largely unknown. Some clusters are functional and expressed only under specific conditions (e.g., during competition for nutrients or stress conditions), but not expressed under routine conditions in the laboratory (Tannous et al., 2014;Li et al., 2015). In light of our recent findings, it remains unexplored whether the large number of bioinformatically detected SM gene clusters in the P. solitum genome translates into a greater capacity for producing various secondary metabolites on apple, and if any of these metabolites might pose hitherto undiscovered risks to human health. Further characterization of these identified gene clusters, via functional gene deletion studies, will help ascertain our understanding of SM gene cluster function and the mechanisms of secondary metabolism/biosynthesis.

CONCLUSIONS
Comparison of the genomes of P. expansum and P. solitum has provided a unique opportunity to explore the genetic basis of the differential ability of these two species to cause blue mold decay. Key genes identified here can now be analyzed functionally to confirm their involvement in apple fruit decay. Fungal gene networks, pathways, and regulators can be exploited to design specific control strategies to block decay. Additionally, our comparative genomic data has clarified phylogenetic relationships between Penicillium species that occupy different ecological niches. Additionally, the SM gene cluster repertoire has been elucidated in both P. expansum and P. solitum and provided evidence to explain the lack of patulin production in P. solitum (RS1), while also revealing the potential for discovery of hitherto unknown secondary metabolites of possible biotechnological use. In agriculture, these genomic findings will guide apple fruit producers to help maintain safe, high quality apples during long-term storage, as well as being of interest to the food processing industry, plant pathologists, and the broader scientific community.