Prediction of the Diplocarpon rosae secretome reveals candidate genes for effectors and virulence factors

Rose black spot is one of the most severe diseases of ﬁ eld-grown roses. Though R-genes have been characterised, little information is known about the molecular details of the interaction between path-ogen and host. Based on the recently published genome sequence of the black spot fungus, we analysed gene models with various bioinformatic tools utilising the expression data of infected host tissues, which led to the prediction of 827 secreted proteins. A signi ﬁ cant proportion of the predicted secretome comprises enzymes for the degradation of cell wall components, several of which were highly expressed during the ﬁ rst infection stages. As the secretome comprises major factors determining the ability of the fungus to colonise its host, we focused our further analyses on predicted effector candidates. In total, 52 sequences of 251 effector candidates matched several bioinformatic criteria of effectors, contained a Y/F/ WxC motif


Introduction
Black spot disease is one of the most severe and damaging diseases of field-grown roses.It is caused by the hemibiotrophic ascomycete Diplocarpon rosae (whose anamorph is Marssonina rosae).Infections with this pathogen lead to chlorotic and necrotic spots on the leaves, drastically reducing the ornamental value of these plants.In the case of stronger infections, the disease can even lead to the defoliation or death of highly susceptible genotypes.The fungus spreads mainly though asexual conidia by splash water or direct contact (Horst and Cloyd, 2007).After the germination of bicellular conidia, the fungus penetrates the cuticle via an appressorium within the first 12 h post-inoculation (hpi) and develops intercellular haustoria to extract nutrients from the plant.This process takes two to three days and marks the early biotrophic stage of the pathogen, which is followed by a mixed biotrophic/ necrotrophic stage that leads to some tissue damage and results (after five to seven days) in the development of acervuli, where new conidia are formed (Aronescu, 1934;Frick, 1943;Gachomo and Kotchoni, 2007).
Due to the high economic importance of roses and the severe damage the fungus causes to them, the interaction of this particular fungus with cultivated roses is one of the best studied (Debener and Byrne, 2014).Many single spore isolates are available, and various authors have used different sets of host plants to differentiate up to 11 pathogenic races (Debener et al., 1998;Whitaker et al., 2007a;Whitaker et al., 2010b).The genetic diversity of D. rosae has also been studied with different types of molecular markers (Werlemark et al., 2006;Whitaker et al., 2007b;Münnekhoff et al., 2017), and its interaction with its host has been studied by histological and biochemical methods (Blechert and Debener, 2005;Gachomo et al. 2006Gachomo et al. , 2010)).The main research goal on the host side is to gain resistance against D. rosae, and several R-loci (resistance loci) are already characterised (von Malek and Debener, 1998;Hattendorf et al., 2004;Whitaker et al., 2010a).The beststudied locus is Rdr1, which mediates broad-spectrum resistance against different races and is encoded by a TNL-type resistance gene that recognises an effector of D. rosae, which leads to a defence response (Kaufmann et al., 2003;Terefe-Ayana et al., 2011;Menz et al., 2017).
The fungal secretome is defined as the entirety of proteins that are secreted outside the plasma membrane of a cell (Girard et al., 2013;Meinken et al., 2016).This definition differs slightly from the original definition made for Bacillus subtilis, where an analysis of the secretory pathway machinery was also included in the secretome (Tjalsma et al., 2000).Most secreted proteins carry a socalled signal peptide (SP), a signal sequence at the N-terminus containing three parts: a positively charged n-region, a neutral or polar c-region and a hydrophobic region of approximately 5e16 amino acids with a tendency to form an alpha helix (h-region) between the n-and c-regions (Heijne, 1990).This SP leads to the translocation of proteins by the conventional secretory pathway through the endoplasmic reticulum and the Golgi compartment.This pathway allows proteins to reach the extracellular space, be incorporated into the plasma membrane or targeted to an intracellular compartment such as the vacuole (Conesa et al., 2001;Shoji et al., 2008;Kubicek et al., 2014).Not all secreted proteins contain a SP.Unconventional protein secretion pathways exist, translocating the proteins by either vesicles of an origin other than the Golgi compartment or direct translocation via transporter proteins (for more details, see the following reviews Nickel, 2010;Ding et al., 2012;Girard et al., 2013;Rodrigues et al., 2013).The main function of extracellular proteins is to interact with the environment of a fungus, i.e., breaking down and acquiring nutrients, and to interact with other organisms, which is of particular importance for the interaction of a pathogen with its host (McCotter et al., 2016;Krijger et al., 2014).
The most important class of secreted proteins for the interaction of a pathogen with its host are effectors, which are secreted into the cytoplasm of the host or the apoplast to suppress the host's immune response and manipulate its metabolism.In necrotrophic and hemibiotrophic fungi, effectors can also have toxic functions (Ciuffetti et al., 2010;Bent and Mackey, 2007;Lo Presti et al., 2015;Sperschneider et al., 2015a).Most of the known effectors interact with the plant immune system either as a virulence (vir) factor or an avirulence (avr) factor.The plant immune response is initially triggered by a pathogen-associated molecular pattern (PAMP, for fungi usually chitin) on the cell surface.These PAMPs are recognised by membrane-localised receptors, leading to a defence response called PAMP-triggered immunity (PTI).For example, the apoplastic LysM domain containing effector Slp1 of Magnaporthe oryzae, is required for virulence by binding fungal chitin oligosaccharides; this binding suppresses the detection through plant receptors, which would lead to a chitin-induced PTI reaction (Mentlak et al., 2012).A similar mechanism was also shown for the Ecp6 effector of the ascomycete Cladosporium fulvum (Jonge et al., 2010).A mechanism other than the PTI, called effector-triggered immunity (ETI), involves the recognition of the secreted effectors by resistance-mediating receptor proteins (R-proteins), which trigger a strong defence response.In this case, the effector has an avirulence function.Most known effectors are characterized by their avr function; for example, the Avr2 protein of C. fulvum is secreted from the fungus to inhibit the tomato cysteine protease Rcr3.Such an interaction is detected by the plant's Cf receptor, which leads to an ETI reaction, in form of a hypersensitive response mediating resistance (Rooney et al., 2005;de Wit, 2016).This coevolution between the host and the pathogen leads to the formation of pathogenic races with differences in the effector repertory in the races.These differences are due to loss or modifications of effectors that can avoid the detection through the corresponding arsenal of R-genes in the host plant.Meanwhile, the modification of the effector repertory is a selection pressure on the host for the formation of new R-gene variants (Bent and Mackey, 2007).This tight co-evolution between host and pathogen is a reason that makes the prediction of effectors in genomic sequences challenging.Because a pathogen adapts to one particular host, effectors rarely share sequence similarity with each other, but they do share some structural similarities.Effectors are often small, cysteine-rich, secreted proteins that are induced during infection (Lo Presti et al., 2015;Sperschneider et al. 2015aSperschneider et al. , 2016)).
Even though effectors often lack sequence similarity, a few motifs are shared by particular classes of effectors.An RxLR-motif, which is needed for translocation into the plant cell, is very commonly found after the SP in effectors of oomycetes and is often used for their identification in newly sequenced genomes (Whisson et al., 2007;Anderson et al., 2015).Unfortunately, no such highly conserved motif has been identified in fungi.Only the Y/F/WxC motif, previously found in the barley powdery mildew fungus, was also reported in effectors of rust fungi but exhibited less positional conservation (Duplessis et al., 2011;Godfrey et al., 2010).
In this study, we use the new sequenced draft genome of the D. rosae isolate DortE4 for an in silico prediction and analysis of its secretome with the goal of identifying possible key virulence factors.Based on this analysis, we will predict effector candidates and attempt to define a set of sequences that will serve as the starting point for further analysis, with the overall goal being to identify new virulence and avirulence factors that can aid in understanding the broad-spectrum resistance mediated by the Rdr1 locus and to identify new R-genes.

Sequence information
The gene models of the draft genome sequence (NCBI accession number: MVNX00000000) (Neu et al., 2017) of the isolate DortE4 were used for the prediction of the secretome and effector candidates.

Prediction of the secretome
The secretome was predicted in a manner similar to the guidelines of the Fungal Secretome KnowledgeBase (FunSecKB) (Meinken et al., 2016).For the prediction of the SP, the standalone version of SignalP 4.1 (Organism group: Eukaryotes, D-cutoff values: default [SignalP-TM networks: 0.51, SignalP-noTM networks: 0.57], both SignalP-noTM and SignalP-TM were used) (Petersen et al., 2011) was used in combination with the Phobius web server (Kall et al., 2004).Sequences were predicted to contain a SP if both programs predicted the sequences to be secreted.To exclude membrane proteins, the TMHMM 2.0 web server (max. 1 PredHel within first 60 amino acids) (Krogh et al., 2001) was used in combination with transmembrane domain (TM) prediction by Phobius Phobius (no TM predicted).Because the helical structure of the SP is often predicted to be a part of a TM, a protein was assumed to be secreted when a TM was predicted in the first 40 amino acids of the N-terminus.To exclude sequences that contain a SP but remain in the endoplasmic reticulum, all predicted secreted sequences were scanned for retention motifs from the PROSITE database (PS00014 ER_Targeting) with the ScanProsite web server (de Castro et al., 2006).Web versions of WolfPSort (organism type: fungi) (Horton et al., 2007) and TargetP 1.1 (Organism group: Plant; Cut-off: no-cutoff) (Emanuelsson et al., 2007) were used to predict the subcellular localisation of sequences, but this information was not used to disqualify sequences.The PredGPI prediction server (General model) (Pierleoni et al., 2008) was used to predict secreted proteins that contain a GPI anchor.

Functional annotation of secretome sequences
The functional annotation of the predicted secretome was extracted from the whole genome annotation.

Effector prediction pipeline
For the prediction of effector candidates three different strategies were combined by manual comparison of the results of the different approaches.The basis for all analyses was the previously predicted secretome.CLC genomics workbench 9.0 (Qiagen, Hilden, Germany) was used to select sequences with a maximum length of 200 amino acids and a minimum cysteine content of 3 %.The standalone software EffectorP 1.0 (Sperschneider et al., 2016) was used in parallel to predict effector candidates.In a third approach, the web server of ScanProsite (de Castro et al., 2006) was used to scan for the user-defined motif [YFW]-x-[C].A sequence was included in the set of effector candidates when this motif was found within the first 90 amino acids of the protein.In addition to these predictions, sequences matching known effectors from the PHI base were also included in the set of effector candidates.

Expression data
The analysed expression data were obtained from previously reported data (Neu et al., 2017).The raw reads are deposited in the Sequence Read Archive (SRA, SRX2494485-SRX2494496).

Analysis of duplicated effector candidates
BLASTP algorithm (Altschul et al., 1997) was used to identify paralogs of the predicted effector candidates in the whole proteome.To calculate identity scores between the candidates and their best matching sequences, multiple global sequence alignments were performed by means of the MAFFT 7.3 tool (Katoh and Standley, 2013) using the FFT-NS-1 alignment algorithm and the software package trimAl 1.2 was used to calculate the identity scores based on the alignment (Capella-Gutierrez et al., 2009).

Prediction of the whole secretome
The 14,004 predicted genes of the draft genome sequence of the D. rosae isolate DortE4 was the basis for the prediction of the secretome.A combination of different programs was used in this prediction (Fig. 1).SignalP 4.1 (Petersen et al., 2011) and Phobius (Kall et al., 2004) were used for the prediction of SPs.Phobius, TMHMM 2.0 (Krogh et al., 2001) and ScanProsite (SP00014 ER_TARGET) were used to exclude membrane proteins and proteins that remain in the endoplasmic reticulum.Because TargetP uses the same algorithm for the prediction of SP as SignalP and WolfPSort has a low sensitivity (Sperschneider et al., 2015b), these data are only reported as additional information and not as disqualifiers.
With this pipeline, 827 proteins were predicted to be secreted (Fig. 1, Supp.1), which is 6 % the whole predicted proteome of D. rosae.Zhu et al. (2012) predicted a similar fraction (5.97 %) of secreted proteins in the genome of Marssonina brunnea, which is closely related to D. rosae, by using a combination of SignalP 3.0 and TMHMM 2.0.Krijger et al. (2014) used a combination of SignalP 2.0, 3.0, TargetP and TMHMM in a comparative study, resulting in a mean value of 7.6 % for the fraction of proteins that are secreted from plant pathogenic fungi, which is slightly higher than the value found in D. rosae.However, the related fungi Botrytis cinerea and Sclerotinia sclerotiorum exhibited values of 5.6 % and 5.5 %, respectively, similar to the values of D. rosae.Different results were obtained in a broad analysis by Sperschneider et al. (2015b) of 48 fungal genomes, including nine from hemibiotrophic fungi with SignalP 3.0, TargetP and TMHMM.None of these nine fungi secreted fewer than 8 % of their proteins.
Combining prediction programs increased specificity but decreased sensitivity, meaning that false positives were excluded and true positives were lost due to the stringency of the prediction To exclude proteins that remain in the endoplasmic reticulum, we scanned the predicted proteins with ScanProsite for an ER-targeting signal.WolfPSort and TargetP 1.1 provided additional information for a more stringent prediction but were not used to disqualify candidates.PredGPI was used to find proteins that are secreted but associated with the outer membrane by a GPI anchor.The functional annotation of the secretome was performed using Blast2GO and the dbCAN web server with the CAZy database for carbohydrate-active enzymes and the PHI-base.(Min, 2010).Due to this fact, we report the results of WolfPSort (cut-off 17) (Horton et al., 2007) and TargetP (Emanuelsson et al., 2007) as additional information for predictions of greater stringency.These programs predicted that 615 respectively 817 of the 827 proteins are extracellularly localised (Fig. 1, Supp. 1).Combining all programs resulted in predicting the secretion of 605 of the 827 proteins previously predicted to be secreted.The different strategies of the programs that were combined suggest that this set of sequences might be the most accurate but also the smallest.
PredGPI was used to predict GPI-anchored proteins and thus distinguish between these secreted proteins that become attached to the outer membrane and those that remain soluble.In total, 110 of the 827 secreted proteins were predicted to contain a GPI anchor, and the anchoring of 56 of them was classified as highly probable.
The computational prediction of secreted proteins should be seen as a method that can identify all proteins of this class.In silico predictions produce false positives as well as false negatives.Proteins secreted by unconventional protein secretion pathways can be translocated without a SP (Rodrigues et al., 2013).On the other hand, some proteins containing a SP can enter the secretory pathway but never reach the extracellular space because they are targeted to other compartments such as the vacuole (Conesa et al., 2001).Experimental approaches such as proteomic analysis of the protein content of growth media or secretion trap assays are needed for the validation of these predictions and to identify proteins translocated by unconventional secretion pathways (Yang et al., 2012;Lee and Rose, 2012).

Annotation of the secretome
The Blast2GO annotation (Gotz et al., 2008) assigned functional descriptions based on the top 20 BLAST matches for 696 of the 827 (84.16 %) proteins of the secretome (Supp.2), and 85 were annotated as hypothetical proteins, indicating that they are similar to other fungal proteins but that functional knowledge of them is lacking.The number of sequences with BLAST matches was comparatively small considering that 96.84 % of the whole proteome was matched with other fungal proteins.
In addition to the functional descriptions assigned based on the BLAST matches, GO terms (Harris et al., 2004) were assigned to 566 sequences, and InterProScan results were assigned to 538 (Supp.2).This additional information was used for enrichment analyses using Fisher's exact test implemented in the Blast2GO software to compare the composition of the secretome with that of the whole proteome.Both analyses showed that hydrolases are overrepresented in the secretome.For the GO-enrichment, 35.7 % of the annotated sequences were assigned the GO term "hydrolases," and most of these sequences were annotated as involved in the hydrolysis of glycosyl bonds and proteolysis (Supp.2).The secretome also contained sequences involved in the degradation of major components of the plant cell wall.All sequences of the proteome that were assigned the GO term "cutinase activity" were part of the secretome.More than half of the sequences with the GO term "pectate lyase activity" were predicted to be secreted.In addition, the GO terms "cellulose binding" and "cellulose catabolic process" were overrepresented, indicating that cellulases are part of the secretome.The enrichment also showed by the presence of GO terms such as "fungal-type cell wall" or "chitin binding" that the secretome contains proteins involved in the organization of the fungal cell wall.
This functional description of the D. rosae secretome shows strong similarities to the latest analysis of all secretomes registered at the FunSecKB (Meinken et al., 2016).A functional analysis of these data showed that major parts of the fungal secretome contains hydrolases and that hydrolases involved in proteolysis and cell wall degradation are overrepresented, indicating that degradation of the plant cell wall is one of the major functions of the fungal secretome.

Expression of the secretome
Transcriptomic data of three early time points of the fungal infection (0, 24, 72 hpi) were used to generate an overview of the expressed portion of the secretome (Fig. 2, Supp.3).In total, 650 (78.7 %) sequences of the secretome were expressed during at least one of the three time points, which is smaller than the percentage found for the whole proteome, where 88.5 % of all sequences were expressed.Despite that difference, the expression of the secretome shows the same tendencies as the whole genome.Most sequences are expressed at 72 hpi, and almost one-third of the sequences were only detected with the RNAseq technique.In contrast to the sequences that were exclusively expressed at 72 hpi, most sequences were expressed either at 24 and 72 hpi or at all three time points.Very few sequences were exclusively detected at 0 or 24 hpi.As discussed in previous work, the reasons for these patterns might be both technical and biological.The amount of fungal biomass increases over time, as does the number of sequenced reads originating from the fungus.Many genes might thus not be detected at the early time points, even if they are expressed.However, drastic changes that substantially affect the transcriptome occur in the biology of the fungus during these stages of its development.During the first 24 h, the spores are germinating and develop appressoria to penetrate the plant cell.Afterward, haustoria are formed to acquire nutrients and interact with the host cell.The fungus is in a phase of rapid growth during the development of these specific structures.All these processes are accompanied by transcriptional changes.A transcriptomic study of a S. sclerotiorum infection of Brassica napus demonstrated that between 2.6 and 8.8 % of all expressed fungal genes were up-regulated in the first 48 h (Seifbarghi et al., 2017).

Potential virulence factors
One of the crucial steps during the development of a fungal pathogen is its entry into the plant cell.Many enzymes involved in the breakdown of the plant cell wall are thus virulence factors.To better understand the capacity of the secretome to degrade the plant cell wall, we combined different sources of functional information such as Blast2GO functional descriptions, the assigned EC- numbers, GO terms and InterPro IDs as well as the results of an annotation with the CAZy database, which was used to find plant cell wall-degrading enzymes (CWDEs) (Supp.4).The results are summarised in Table 1.The secretome contains 111 enzymes that collectively target the degradation of all the main components of the plant cell wall as well as 17 lipases, indicating that the secretome plays a crucial role in decomposing the plant cell wall and the consequent penetration process.Most of these enzymes are involved in the degradation of hemicellulose.Hemicellulose is a mixture of polysaccharides containing different sugars, including xylose, arabinose, galactose, mannose and rhamnose.Different enzymes are thus needed to break down this cell wall component.In addition to this mixture of enzymes, most CWDEs are involved in the degradation of cellulose and pectin, which are the main components of the primary cell wall and the middle lamella.The secretome also contains 12 cutinases and only four callose-and lignin-degrading enzymes.
Members of all classes of CWDEs are expressed during the early stages of plant entry (Supp.3 and 4).Almost all potential hemicellulose-, cellulose-and pectin-degrading enzymes are expressed during at least one of the analysed time points.Some of the pectindegrading enzymes derive from the genes with the highest levels of expression at 24 and 72 hpi, indicating that these genes might be of special importance for this process.Only 6 of 12 cutin-degrading enzymes and one of the four lignin-degrading enzymes were generally expressed only at 72 hpi with comparable low expression values.Brown et al. (2012) analysed the secretome of the hemibiotrophic pathogen Fusarium graminearum grown on different cereals in a manner similar to that presented here.The amounts of CWDEs affecting the different cell wall components were similar to those in the present data except that fewer cutinases and pectin/ pectate-degrading enzymes were present in F. graminearum than the black spot secretome, which is not surprising because lower levels of pectin-degrading enzymes are often reported for pathogens of monocots due to differences in cell wall composition (Kubicek et al., 2014).
A BLASTP (E-Value 1E-10) with the PHI-base (Winnenburg et al. 2006(Winnenburg et al. , 2008) ) was performed to more closely examine the pathogenic features of the secretome (Supp.4).In total, 239 proteins of the predicted secretome matched with sequences from the PHIbase, and 81 of these matches are CWDEs.Accessions in the database are classified into different categories due to the results of mutation experiments.The categories indicating virulence factors, which are the most relevant, are "loss of pathogenicity and reduced virulence".Only 18 sequences of the black spot secretome matched proteins classified with "loss of pathogenicity," and 115 of the sequences were associated with the class "reduced virulence".
Six of the 18 sequences with the classification "loss of pathogenicity" are involved in the degradation of the cuticle, and most of them were expressed.These sequences might be important virulence factors, similar to the cutinases of other plant pathogenic fungi (Kikot et al., 2009;Skamnioti and Gurr, 2007;Wang et al., 2017).In addition, two autophagy lipases matched a protein from M. oryzae (PHI:2081), and two EF-domain-containing proteins matched an EF-domain-containing protein from S. sclerotiorum (PHI:3936); both are required for appressorium formation and are crucial for entry of the pathogens into plants (Kershaw and Talbot, 2009;Xiao et al., 2014).These similarities hint that orthologs in D. rosae may also be involved in appressorium formation.The expression data support this hypothesis because the expression of three of these four sequences was detected at 24 hpi where the first appressoria had already formed, but the expression values were too low to substantively indicate induction at this stage.Nevertheless, the described sequences are interesting starting points for the functional analysis of potential virulence factors.
In addition, the sequences of enzymes involved in the degradation of pectin, callose, hemicellulose and lignin were homologous to sequences in the category "reduced virulence" from the PHIbase.In addition to plant cell wall degradation, putative virulence factors are involved in various other processes such as fungal cell wall organisation, proteolysis and stress responses.

Prediction of effectors
The prediction of fungal effectors is a challenging task since effectors of different species share very little sequence similarity, which is due to the co-evolution of a pathogen and its host.This reason is why prediction approaches normally use the structural characteristics of a protein or a conserved sequence motif as an indicator for potential effector proteins.However, the prior definition of these characteristics is not consistent within the scientific community, and not all effector proteins share all of these characteristics.For this reason, we combined three different approaches for our prediction pipeline.Fig. 3 gives an overview of the prediction pipeline and its results.Because many known effectors belong to the class of small secreted cysteine-rich proteins (SSP), we used the characteristics of this class as indicators for effector proteins.
The definition of these characteristics differs depending on the author, sometimes quite significantly.Saunders et al. (2012), e.g., used 150 amino acids as a cut-off for small proteins and defined those with more than 3 % cysteine as enriched.In contrast, other authors used different cut-offs between 200 and 400 amino acids (Rep, 2005;Bowen et al., 2009;Hacquard et al., 2012), to define a protein as small.Here, we used a definition similar to those used by Saunders et al. (2012), specifically a maximal protein size of 200 amino acids and a minimum of 3 % cysteine content in the protein.
In total, 114 secreted proteins fulfilled these criteria.However, some known effectors do not fulfil these criteria (Sperschneider et al., 2015a).To include other criteria, we used the prediction program EffectorP (Sperschneider et al., 2016), which uses a machine learning approach to build a model for a prediction based on a variety of features that discriminate a set of known effector proteins from other secreted proteins.This pipeline resulted in the prediction of 171 potential effector proteins.We also used an approach based on the sequence motif Y/F/WxC, which was first discovered in the barley powdery mildew fungus Blumeria graminis (Godfrey et al., 2010) and later in a less-conserved position in rust fungi (Duplessis et al., 2011;Saunders et al., 2012).We identified 122 proteins carrying this motif in their N-terminal regions in the secretome.
With the described pipeline, we predicted in total 244 effector candidates in the secretome of D. rosae which were predicted by any of the three approaches, while 52 were predicted by all approaches (Fig. 3, Supp.5), making these 52 predictions the most reliable and the best subjects for initial detailed analyses.This set is of particular interest, because of the occurrence of the Y/F/WxC motif, indicating that there is a group of effectors shared by fungi of different lifestyles.So far the motif is only reported for obligate biotrophic fungi and D. rosae has a hemibiotrophic lifestyle.So it might be, that these effectors function in its biographic stage, where it forms haustoria.The results of EffectorP and the utilisation of characteristics of a SSP greatly overlap, which might be due to the fact that several features EffectorP uses for model building are closely related to the criteria defined for SSPs.The most discriminative features used by EffectorP are the molecular weight, sequence length, protein net charge and percentages of cysteines, serines and tryptophans in the sequence (Sperschneider et al., 2016).In addition, the majority of the proteins that EffectorP uses as a training dataset were identified by using the characteristics of SSPs.It is therefore not surprising that almost all potential SSPs were included in the EffectorP prediction results.However, the program predicted 72 additional effector candidates.Almost the same number of additional candidates were identified by searching for the Y/F/WxC-motif in the N-terminal regions of proteins, but only a small fraction of these candidates were also predicted with one of the two other approaches.Only 12 were also predicted by EffectorP or shared characteristics of SSPs.
In addition to the sequences identified by these prediction approaches, seven sequences from the secretome matched with known effectors in the PHI-base (Supp.4), but none of them were predicted using any of the applied prediction approaches.Nevertheless, a BLAST match is a valuable hint for the function of a protein, so these sequences were included in the set of effector candidates.With these seven additional candidates, the total number of effector candidates was 251.

Annotation of the effector candidates
The BLAST annotation does not result in any matches for 108 of 251 effector candidates, indicating that these sequences are specific for the black spot genome.This fact is not surprising because the evolution of effectors is tightly linked to the evolution of the host immune response.Every pathogen thus needs a specific set of effectors that are highly specialized for interactions with its host plant.The absence of BLAST hits is sometimes taken as a prediction parameter for possible effector proteins (Syme et al., 2013;Bowen et al., 2009), indicating that these unannotated sequences might be the most interesting sequences for further analysis.In addition to the candidates without BLAST matches, 40 sequences match those of proteins of other fungi that were not functionally annotated ("hypothetical protein") or had annotations with little functional information such as "small secreted protein" "extracellular serine-rich protein" or "signal peptide-containing protein".The majority of the 52 candidates predicted by all three prediction approaches were not matched to other fungal proteins by BLAST annotation.As mentioned before, this result is a hint that these proteins are highly specific for D. rosae, supporting the value of this dataset as a promising starting point for further analyses.
Nevertheless, 108 sequences are functionally annotated.Many of them are involved in cell wall degradation or cell adhesion, indicating that these proteins might be virulence factors but not effectors in the sense that they are involved in interactions with the plant immune response.The three candidates containing a chitinbinding domain might be of particular interest because the interactions of effectors with chitin to suppress the chitin-induced defence response is a mechanism used by known effector proteins such as Avr4 or Ecp6 (van den Burg et al., 2006;Jonge et al., 2010).Surprisingly, none of these chitin-binding proteins contains a LysM domain, which is found in the Ecp6 effector of C. fulvum and effector candidates of the related species M. brunnea (Jiang et al., 2014).Additional candidates were found by a BLAST against the PHI-Four of them were annotated as serine endopeptidases (DR_00434, DR_012785, DR_004799, and DR_006904), which is consistent with their PHI-base matches, GIP1 (PHI:652) and GIP2 (PHI:653) of the oomycete Phytophthora sojae (Rose et al., 2002).These proteins are inactive serine endopeptidases acting as inhibitors of the endo-beta-1,3-glucanases of the host to suppress its elicitor-mediated defence response, which is normally triggered by glucan oligosaccharides from the fungal cell wall.The corresponding sequences in the black spot secretome might function similarly in the interaction of the fungus with its host.Two other effector candidates shared similarity with an effector of the PHIbase.DR_008872 and DR_010809 were both annotated as ricin b lectins and were matched by BLAST to the apoplastic effector MoCDIP1 (PHI:3213) of M. oryzae, which induces cell death (Chen et al., 2013).Because both pathogens have a hemibiotrophic lifestyle, such a cell death-inducing gene might have an important role during the necrotrophic stage of the pathogens.

Expression of effector candidates
Effector genes are often induced during the interaction with plants.The presented expression data are a useful tool to differentiate which candidates are the most promising for further analysis.
In total, 204 of the 251 effector candidates were expressed during at least one of the three time points (0, 24, and 72 hpi) in the early stages of infection (Supp.3).Only one of the 52 effector candidates that had been predicted with all three approaches was not detected in the transcriptomic data.The majority of effector candidates were not detected at 0 hpi.As discussed earlier, this might be due to a lack of biomass at this time point.Nevertheless many candidates show outstanding expression values at 24 or 72 hpi.15 effector candidates are among the 100 strongest expressed sequences at 24 hpi, and 6 among those at 72 hpi.All of these highly expressed candidates lack any functional annotation, and eight of them belong to the 52 sequences predicted by all approaches (Supp.5).15 effector candidates are among the 100 strongest expressed sequences at 24 hpi, and 6 among those at 72 hpi.All of these highly expressed candidates lack any functional annotation, and eight of them belong to the 52 sequences predicted by all approaches (Supp.5).Of particular interest are the expression values of the candidates DR_002828, DR_003618, DR_009552, DR_006285, and DR_003215 (Table 2).DR_002828 is the strongest upregulated effector at 24 hpi and has the fourth highest expression level in the whole proteome at this time point.Its expression value drastically drops at 72 hpi, indicating that it might be more important at the beginning of the interaction.
At 72 hpi, the effector candidates DR_006285, DR_003215, DR_003618 and DR_009552 belong to the strongest expressed genes in the whole D. rosae proteome at 72 hpi in both the MACE and the RNAseq data.All of them have a log 2 fold change of more than 3.5 in comparison to 0 hpi.Additionally, all of them belong to the 100 strongest expressed genes at 24 hpi.These high expression levels might indicate an induction of these genes in haustoria, which start their development at 24 hpi making them the most promising candidates for further analysis.
In addition to these candidates, the two ricin-b-like lectins DR_008872 and DR_010809, which match the effector MoCDIP1 in the PHI-base are of interest.Their expression value is less remarkable than that of the other mentioned candidates but in combination with the sequence similarity to a known effector, it might be another interesting target for further functional analysis.

Influence of genome duplication
In previous analyses of the D. rosae genome, we found that a large portion of the genome had been duplicated (Neu et al., 2017).Because gene duplication and diversification is a typical mechanism in the evolution of effectors (Stergiopoulos et al., 2012;Lo Presti et al., 2015;Selin et al., 2016), we analysed the degree of gene duplication within the set of effector candidates.
A BLASTP search of all effector candidates in the whole proteome resulted in 223 (88.8 %) sequences with at least one match (Supp.6).Multiple global sequence alignments allowed us to calculate the sequence identities shared by the effector candidates and their potential paralogs.Only 49 % of the predicted proteins and 46 % of their corresponding mRNA sequences share more than 70 % identity with their most similar match.This number is surprisingly low compared to the number of duplicated benchmarking universal single-copy orthologs (BUSCOs) that were used for the analysis of genome duplication.Here, more than 84 % of the duplicates shared at least 70 % identity.This result indicates that diversity within duplicated effector genes is much higher than that in other classes of proteins, which is a hint that diversification is already in progress.The process of duplication and diversification is also shown to be one of the mechanisms forming the effector composition of a pathogen by the fact that 65 % of the effector candidates share the most identity with other candidates and that additional 10.7 % are best matched with another protein of the secretome that might be an effector but does not match the prior defined criteria.Some of the candidates that are of particular interest due to their induction during penetration and haustorium development or their similarity to known effectors also occur in pairs.DR_006285 and DR_003215 share approximately 70 % identity.The sequences of DR_008872 and DR_010809 match that of the MoCDIP1 effector, DR_12785 and DR_004799 match the GIP1effector, and DR_006904 and DR_004344 match the GIP2effector; each of these gene pairs share identity values within the pair of 85.9 %e97.7 %.
On the other hand, DR_009552 and the other 27 candidates that do not match any other sequences in the proteome, even with BLAST, are quite interesting because this lack of matching might indicate a loss of one of the duplicates, which we have already observed for some of the analysed BUSCOs.This loss differed between different isolates of D. rosae (Neu et al., 2017).A comparison of duplicated effector candidates in different isolates can provide more insight into the evolution of these effectors and, if duplication is involved, into the formation of pathogenic races.

Conclusion
The secretome is the main interface for the interaction of a pathogen with its host and includes the most important virulence factors and effectors.We combined different approaches and used the newly published draft genome sequences of D. rosae isolate DortE4 to predict the secretion of 827 proteins, including many enzymes involved in the degradation of all major components of the plant cell wall.Some pectin-degrading enzymes were noticeable due to their high expression levels.
The most important class of secreted proteins are the effector proteins, which are secreted from a pathogen into the apoplast or the cytoplasm of the host cell to influence the defence response of the plant.We identified 251 effector protein candidates.A subset of 52 candidates is of particular interest because these proteins were predicted by EffectorP software, share all the characteristics of SSPs and contain a Y/F/WxC motif in their N-terminal region.Especially the occurrence of this sequence motif is astonishing, because so far it has only been reported in obligate biotrophic fungi (Duplessis et al., 2011;Godfrey et al., 2010), indicating that this class of effector proteins is shared by biotrophic and hemibiotrophic fungi.Further analysis of this set of effector candidates could validate whether this motif can be used as an identifier in fungi, similar to the RxLR-motif of oomycetes.Furthermore, we pointed out 11 additional candidates, five of whom showed an outstandingly strong transcriptional induction during penetration and haustorium formation and six whom share sequence similarity with known effector proteins of other fungi.Extensive analysis of roses with the sequenced black spot isolate DortE4 revealed resistant genotypes with R-genes other than Rdr1 (unpublished data).The presented set of effector candidates can now be used to screen for new R-genes and to find the avirulence factors recognised by the Rdr1-gene and other R-genes by techniques such as knock-outs, transient expression, interaction analysis and comparisons of the effector content of different D. rosae isolates (approaches reviewed in Dalio et al., 2018).

Fig. 1 .
Fig. 1.Secretome Prediction Pipeline.Proteins were classified as secreted when SignalP 4.1 and Phobius predicted a signal peptide (SP) and no transmembrane domain (TM) was predicted by Phobius or TMHMM to be past the first 40 amino acids of the protein.To exclude proteins that remain in the endoplasmic reticulum, we scanned the predicted proteins with ScanProsite for an ER-targeting signal.WolfPSort and TargetP 1.1 provided additional information for a more stringent prediction but were not used to disqualify candidates.PredGPI was used to find proteins that are secreted but associated with the outer membrane by a GPI anchor.The functional annotation of the secretome was performed using Blast2GO and the dbCAN web server with the CAZy database for carbohydrate-active enzymes and the PHI-base.

Fig. 2 .
Fig. 2. Venn diagram of the expression of the secretome sequences during the early stages of infection (0, 24, and 72 hpi).Data were generated by means of the massive analysis of cDNA ends (MACE) and conventional RNAseq.

Fig. 3 .
Fig. 3. Prediction pipeline for effector proteins of D. rosae.Three approaches were combined for the prediction of effector candidates: characteristics of SSPs, prediction with EffectorP software and the presence of a Y/F/WxC motif at the N-terminal region of a protein.In addition to these approaches, a BLASTX against sequences of known effectors from the PHI-base was performed.

Table 1
Overview of the number of CWDEs in the secretome.

Table 2
Expression data of the most interesting effector candidates.The table contain the normalized read counts (TPM ¼ Tags per million) of the MACE data generated for the time points 0, 24 and 72 hpi, as well as the log 2 fold chages derived from these expression data.Additionally the last column contains RPKM normalized expression data for the 72 hpi time point generated with the RNAseq approach.