Infection mechanisms and putative effector repertoire of the mosquito pathogenic oomycete Pythium guiyangense uncovered by genomic analysis

Pythium guiyangense, an oomycete from a genus of mostly plant pathogens, is an effective biological control agent that has wide potential to manage diverse mosquitoes. However, its mosquito-killing mechanisms are almost unknown. In this study, we observed that P. guiyangense could utilize cuticle penetration and ingestion of mycelia into the digestive system to infect mosquito larvae. To explore pathogenic mechanisms, a high-quality genome sequence with 239 contigs and an N50 contig length of 1,009 kb was generated. The genome assembly is approximately 110 Mb, which is almost twice the size of other sequenced Pythium genomes. Further genome analysis suggests that P. guiyangense may arise from a hybridization of two related but distinct parental species. Phylogenetic analysis demonstrated that P. guiyangense likely evolved from common ancestors shared with plant pathogens. Comparative genome analysis coupled with transcriptome sequencing data suggested that P. guiyangense may employ multiple virulence mechanisms to infect mosquitoes, including secreted proteases and kazal-type protease inhibitors. It also shares intracellular Crinkler (CRN) effectors used by plant pathogenic oomycetes to facilitate the colonization of plant hosts. Our experimental evidence demonstrates that CRN effectors of P. guiyangense can be toxic to insect cells. The infection mechanisms and putative virulence effectors of P. guiyangense uncovered by this study provide the basis to develop improved mosquito control strategies. These data also provide useful knowledge on host adaptation and evolution of the entomopathogenic lifestyle within the oomycete lineage. A deeper understanding of the biology of P. guiyangense effectors might also be useful for management of other important agricultural pests.


Introduction
Mosquitoes are a major threat to global health since they are vectors of numerous devastating diseases, including malaria, dengue fever, Zika virus and other arboviruses, which together result in hundreds of millions of cases and several million deaths annually [1]. Existing commonly used control methods for reducing disease rely on the application of residual synthetic pesticides. However, intensive and repeated use of pesticides leads to ongoing development of resistance, environmental pollution and toxicity to human and non-target organisms [2]. Control strategies utilizing naturally occurring microbial pathogens have therefore emerged as a promising alternative. A particular focus has been on biological control agents [2]. Among them, the oomycete Lagenidium giganteum and the fungal pathogens, Beauveria bassiana and Metarhizum anisopliae, are well characterized, promising agents for mosquito larvae control, and have been produced commercially for field tests [2][3][4]. However, so far, available agents for mosquito control are rather limited.
Recently, a new mosquito-pathogenic oomycete, Pythium guiyangense X.Q. Su was isolated from infected larvae of Aedes albopictus from Guizhou, China [5]. It is a virulent pathogen of a wide range of mosquito larvae and is safe to non-target organisms [6,7]. The pathogen also shows robust adaptability to a variety of natural environments and can be easily mass-produced [7]. All these properties of P. guiyangense make it of interest for practical applications as a potential mosquito control agent. However, little is known about the molecular mechanisms underlying the pathological processes on its mosquito hosts. It belongs to the genus Pythium (kingdom Stramenopila; phylum Oomycota, class Peronosporomycetes) [8]. Within the oomycetes, the genus Pythium is a genetically diverse group with a broad host range. For example, many Pythium species are important plant pathogens causing a variety of diseases [9,10]. On the other hand, P. undulatum can infect fish and P. insidiosum is a well-known pathogen that is capable of infecting human and other mammals [11]. To date, only P. guiyangense has been proposed for mosquito control [5].
The availability of genome sequences of a variety of pathogenic oomycetes, including Pythium species, provides a unique opportunity for comparative analysis between P. guiyangense and other oomycetes with respect to the evolution of pathogenicity. A number of genome sequences of plant pathogenic oomycetes are now available [12][13][14]. In addition, the genomes of mycoparasitic Pythium species that infect fungi, the human pathogen P. insidiosum, and the fish pathogens S. parasitica and S. diclina have also been sequenced [15,16]. Oomycetes secrete an arsenal of effectors into the host to manipulate the host immune system and enable parasitic infection [17]. These effectors have been a central question in the study of plant-oomycete interactions, including extracellular proteins such as toxins and hydrolases, and cell-entering proteins such as the RxLR (Arg-X-Leu-Arg) and Crinkler (CRN) effectors [18]. P. guiyangense may share conserved virulence effectors with other pathogenic oomycetes because of their close evolutionary relationships.
Here, we determined the infection cycle of P. guiyangense and demonstrated an unusual process, in which mycelia were devoured by the mosquito larvae and the mycelia inside digestive system could effectively initialize infection. Then we produced a high-quality genome sequence assembly, which represents the first draft genome from an insect pathogenic oomycete. Based on the transcriptome, effector prediction, and comparative genome analyses with other Pythium species, we investigated its insect-killing mechanisms and finally identified several cytoplasmic effectors having virulence functions in insect cells, thus expanding the roles of oomycete effectors.

Two invasion pathways accelerate mosquito larvae mortality
The P. guiyangense isolate was reported to be highly virulent on mosquito larvae, but infection was not quantitated [5,19]. We created a quantitative virulence assay by inoculating early second-instar larvae of Aedes albopictus or Culex pipiens pallens with P. guiyangense mycelia. The cumulative survival curves revealed that daily survival of A. albopictus and Cx. pipiens pallens larvae quickly declined from 3~4 days post-inoculation (dpi) onwards, reaching 76% mortality for A. albopictus and 69% for Cx. pipiens pallens by 10 dpi (S1A Fig). A. albopictus larvae died faster than Cx. pipiens pallens larvae, reaching 50% survival by day 6 compared to day 8 (S1A Fig). Furthermore, P. guiyangense could infect all the tested stages of Cx. pipiens pallens, including eggs, larvae, pupae and adults, resulting a visible accumulation of mycelia by 3-4 dpi (S1 Video) and all tissues were fully covered by mycelia (S1B Fig). Together, the results confirmed that P. guiyangense is highly efficacious in killing all life stages of Aedes albopictus and Cx. pipiens pallens.
To investigate the infection process of P. guiyangense, early second-instar larvae of Cx. pipiens pallens were incubated with mycelia or swimming zoospores. Our results showed that zoospores attached to almost any part of the mosquito larval cuticle (Fig 1A showing zoospore attachments on the thorax and abdomen). Then, germination of the cysts occurred and appressorium-like swellings appeared at the tip of the germ tubes as visualized using scanning electron microscopy (SEM) (Fig 1B). Penetration hyphae emerging from the appressorium traversed the insect integument ( Fig 1C) and invaded the hemocoel of larvae. Eventually the mycelia filled the whole body, then emerged through the inner cuticle and formed sporangia ( Fig 1D). We also observed that Cx. pipiens pallens larvae readily ingested P. guiyangense mycelia even in an adequate food environment ( Fig 1E, S1C Fig, S2 Video). A thick section of a moribund larva visualized with SEM showed that, following feeding, the midgut was completely packed with mycelia that could initialize infection (Fig 1F). After fixation, embedding in paraffin, and sectioning, microscopic observations showed that the midgut epithelium, muscles, and connective tissues appeared disrupted in the P. guiyangense-infected larvae ( Fig  1G and 1H). After 48 hpi, infected larvae appeared almost devoid of internal organs or tissues and the whole body was permeated with mycelia ( Fig 1I). Thus invasion through the digestive tract is an effective route of infection by P. guiyangense. Taken together, these observations define two routes of invasion, namely infection through the exterior cuticle and through the digestive tract.

General features of the genome and transcriptome sequences
A high quality genome sequence of P. guiyangense was generated using a hybrid strategy that combined sequences from Pacific Biosciences long reads and Illumina short reads. The genome assembly indicated in an estimated P. guiyangense genome size of 110 Mb and annotation predicted 30,943 protein-coding genes ( Table 1). The assembled genome consisted of 239 contigs with an N50 contig length of 1,009 kb. To assess the completeness of the genome assembly, CEGMA analysis, which identifies orthologs of 248 ultra-conserved core eukaryotic genes (CEGs), was used to identify core genes in the P. guiyangense genome. The results revealed complete matches to 97.6% of CEGs and at least partial matches to 98.4% of CEGs within the P. guiyangense assembly; these results compared to only 78.2-94.4% of complete CEGs and 91.5-95.6% of partial CEGs in other Pythium genomes (S1 Table). Taken together, the comparison with other assembled Pythium genomes including P. insidiosum, P. ultimum, P. aphanidermatum, P. arrhenomanes, P. irregulare and P. iwayamai, revealed that the P. guiyangense assembly represented the best quality among the sequenced Pythium genomes so far.
Whole transcriptome sequencing (RNA-seq) was performed using Illumina sequencing of RNAs from P. guiyangense mycelia and from early second-instar larvae 24 hr after inoculation with P. guiyangense mycelia. Transcripts from each individual library matched approximately 69% of the genes, and together matched approximately 74% of the genes (S2 Table). A total of 3,354 genes (10.8%) were differentially expressed (>4-fold expression difference and statistical GFOLD value > 1 or < -1) between the two samples; 1,654 genes were up-regulated in the infection stage while 1,700 genes were down-regulated. Functional enrichment analysis revealed that genes encoding tyrosine kinase-like (TKL) kinases, subtilisin proteases, kazaltype protease inhibitors, and elicitin proteins, were over-represented among the differentially expressed genes (S3 Table). To validate the differentially expressed genes, we selected 18 genes that belonged to the above mentioned over-represented gene families and that were up-or down-regulated based on RNA-Seq data, and then measured their transcript levels by the qRT-PCR assay. The qRT-PCR results showed that the transcriptional patterns of 17 among the 18 genes were consistent with RNA-Seq results (S2 Fig), which further supported the general reliability of the RNA-Seq data.

Interspecies hybridization may contribute to the large size of the P. guiyangense genome
Our comparative genome analysis revealed that the genome size (110 Mb) and predicted gene number of P. guiyangense (30,943 genes) were approximately twice those of the other sequenced Pythium species (Table 1). To explore the potential mechanisms underlying such a large genome size, we initially analyzed the repetitive DNA content within the P. guiyangense genome and found that the repeat sequence content was 6%, similar to that of P. ultimum (7%) which has a genome size of only 43 Mb, thus excluding the possibility that high repeat content was responsible for the large genome size, which was observed in Ph. infestans [14]. Previously, hybridization between two parental species has been reported in yeast and Phytophthora evolution [20,21]. Most (84%) of the Pythium core genes (present in all the 7 Pythium genomes) were present in two copies in the P. guiyangense genome, consistent with a hybrid origin, which is similar to the yeast [20]. Typically, the copied core genes shared the highest sequence similarity with genes derived from the P. irregulare genome, however, the two copied genes were about 8% different in nucleotide sequence. To further confirm the hypothesis of hybridization, internal synteny is selected as a useful indicator, which could be used to evaluate hybridization between two parental genomes [22,23]. Therefore, we characterized internal synteny across the P. guiyangense genome using the MCScanX program [24]. In the whole genome, 468 conserved synteny blocks with an average size of 192 kb were identified (Fig 2A, S4 Table). These synteny blocks covered 74% of all the contigs, and together spanned 84% of the genome, suggesting that the P. guiyangense genome could be classified into two subgenomes.
To further compare the genetic relatedness of the two parental subgenomes, the average nucleotide identity (ANI) was calculated, and the ANI between the two subgenomes revealed approximately 91% identity. Notably, a total of 11,068 pairs of homologous genes were identified in these synteny blocks (Fig 2B, S4 Table), consistent with a hybrid origin. We estimated the rates of synonymous substitutions per synonymous site (Ks) of 11,068 pairs of homologous genes. This analysis showed a synonymous site divergence peak of Ks = 0.35 (Fig 2C), indicating that the two subgenomes were relatively diverse. Taken together, we inferred that P. guiyangense was a hybrid genome derived from two distinct parental species. To further investigate the parental species of P. guiyangense, we systematically analyzed CEGs in the 7 sequenced Pythium genomes. The majority of CEGs were present as two copies in the P. guiyangense genome but only one copy in each of the other Pythium genomes (Fig 2D  and 2E). A total of 167 CEGs that contained 2 copies in P. guiyangense and also had orthologs in other Pythium species were utilized for phylogenetic analysis. For each phylogenetic tree, the two copies of the P. guiyangense CEG always clustered together most closely, and then clustered with the orthologs from the other Pythium species (one tree based on the KOG1439 protein is shown in Fig 2F as an example), indicating that the parental species of P. guiyangense were not represented in the data set. In addition, cytochrome oxidase II (cox II) and β-tubulin genes, which have been widely used as phylogenetic maker genes, were available in 35 Pythium species and contained 2 copies in P. guiyangense. Phylogenetic analyses of the two genes showed that the two P. guiyangense orthologs were more similar to one another than the nearest known species (P. orthopogon) (S3A and S3B Fig), indicating that the parental species of P. guiyangense were not represented based on current information. We speculate that the parental species of P. guiyangense are more closely related to each other than to the known Pythium species.
In parallel with the genome analysis, we noticed that an unusual high percentage of P. guiyangense zoospores contained two nuclei rather than one (S3C Fig). Among 500 observed zoospores, nearly 22% of them contained two nuclei in P. guiyangense while in P. aphanidermatum and Ph. capsici, all the spores had only one nucleus (S3C Fig). We then found that the P. guiyangense zoospores containing only one nucleus could also breed similar percent of zoospores containing two nuclei, and PCR amplifications resulted in presence of both of the two copied genes. This observation suggested that P. guiyangense might be a dikaryon, and its relationship with the complex genome is still under investigation.

Evolutionary relationships and species-specific gene families
To establish the phylogenetic relationship of P. guiyangense among oomycetes, a phylogenetic tree was constructed based on 248 CEGs from P. guiyangense and other 12 oomycetes, with the diatoms as outgroups (Fig 3A). The tree clearly showed that P. guiyangense was clustered within the clade formed by the plant pathogenic Pythium species, and was distantly related to other genera, including Hyaloperonospora and Phytophthora. This phylogeny was consistent with that in previous publications [15,25]. These results imply that P. guiyangense, along with the mammalian pathogen P. insidiosum share common ancestors with the plant pathogenic Pythium species (Fig 3A).
To identify the genes responsible for host adaptation in P. guiyangense, the OrthoMCL tool was used to cluster the seven Pythium proteomes on the basis of protein sequence similarity. A total of 25,602 (83%) P. guiyangense genes had orthologs in other Pythium species. Among these, P. guiyangense shared 13,000 core genes with the other Pythium species. In addition, 5,341 genes were identified to be specific to P. guiyangense (S4 Fig). To gain insights into the features of species-specific genes in P. guiyangense, we compared the frequency of occurrence of protein family domains and identified highly over-represented domains included kinase (PF00433), kazal inhibitor (PF00050), elicitin (PF00964), and protease (PF02902) (Fig 3B, S5  Table). These gene families were also enriched among genes differentially expressed during infection.

Expanded protein kinases involved in the infection process
By searching with the HMM profiles of kinase domains derived from KinBase, 471 unique protein kinases (943 kinases in total) were identified in the P. guiyangense genome, greatly surpassing the numbers in plant pathogenic Pythium genomes, which range from 152 to 192 (Table 2). Intriguingly, other two animal pathogenic oomycetes, P. insidiosum and S. parasitica, also have expanded kinomes, coding for 286 and 538 kinases, respectively [15,16]. We further classified the kinases into 9 families defined by Hanks and Hunter [26]. Five families, including TKL (tyrosine kinase-like), CAMK (calcium/calmodulin-dependent kinase), CMGC  [including cyclin-dependent kinases (CDKs), mitogen-activated protein kinases (MAP kinases), glycogen synthase kinases (GSK) and CDK-like kinases], AGC (cAMP-dependent, cGMP-dependent and protein kinase C) and "other" were noticeably expanded in P. guiyangense ( Fig 4A). Among the 5 expanded families, a total of 220 unique TKL genes were identified in P. guiyangense kinome. A comparison of the locations of TKL genes in the P. guiyangense and P. ultimum genomes revealed extensive rearrangements, which resulted from species-specific expansions at the locations of these genes ( Fig 4B). Forty-six unique Infection mechanisms uncovered by Pythium guiyangense genome kinases belonging to AGC family and 50 unique members of the CAMK family were identified from P. guiyangense. Based on the RNA-Seq data, a total of 92 kinase genes were differentially expressed at the infection stage, including 52 TKL kinase genes (S6 Table). These results suggest that many of the protein kinases may be involved in regulation of infection processes and adaptation to the mosquito hosts.

A large number of proteases may facilitate cuticle penetration
Since major structural and physiological differences were observed between plant cell walls and insect cuticles, we compared the repertoire of plant cell wall and cuticle degrading enzymes encoded in the P. guiyangense genome to other oomycete genomes. Several groups of plant cell wall degrading enzymes, such as GH53, GH78, CE5, GH10 and GH11, and GH12 were completely absent in P. guiyangense (S7 Table). Genes encoding 12 unique pectin/pectate lyases (PL1, PL3 and PL4), two unique GH28 and 1 unique GH43 involved in pectin backbone degradation were identified in the P. guiyangense genome; however, RNA-Seq data showed that none of these genes exhibited up-regulation during mosquito infection processes. P. guiyangense had more genes encoding proteases potentially involved in insect cuticle degradation than plant pathogenic Pythium species (Fig 4C). A total of 307 unique proteases (615 genes in total) were encoded in the P. guiyangense genome, compared to an average of 260 proteases in the plant pathogenic Pythium species ( Table 2). The two animal pathogen genomes also had large numbers of proteases ( Table 2, Fig 4C). Among them, genes encoding cysteine-, metallo-and serine-proteases were particularly highly expanded in P. guiyangense (Fig 4C). The subtilisin serine-protease family had the highest relative expansion with 32 unique genes in P. guiyangense ( Table 2, Fig 4C). Phylogenetic analysis revealed that over half of the subtilisin proteases were recently expanded in P. guiyangense due to lineage-specific gene duplications (S5 Fig). Based on the RNA-Seq data, 31% of the total subtilisins were significantly up-regulated during mosquito infection. The peptidase_C1 and carboxypeptidases also exhibited significant expansion in P. guiyangense ( Table 2, Fig 4C).

Large family of kazal-type serine protease inhibitors (KPIs) potentially associated with infection
Protease inhibitors regulate various biological and physiological processes in all living systems as modulators of protease activity [27]. Among them, the kazal-type protease inhibitor (KPI) family is one of the best characterized [27]. A total of 19 unique kazal inhibitors were identified in the P. guiyangense genome, which exceeded those in plant pathogenic Pythium species ( Table 2). The animal pathogen, P. insidiosum also encoded larger numbers of kazal inhibitors. A phylogenetic tree was constructed using the Pythium kazal inhibitors, and the majority of genes derived from P. guiyangense and P. insidiosum formed clusters that were species-specific (S6A Fig), implying that these genes were retained and diversified independently in these two animal pathogens. During mosquito infection, 25% of the P. guiyangense kazal inhibitors were up-regulated, and four of these exhibited transcript levels over 40

Elicitor genes and host detection
A common feature of many plant pathogenic oomycetes is the secretion of a variety of apoplastic (extracellular) proteins to promote infection, some of which can be detected by the host immune system. These include elicitins (ELIs; lipid-binding proteins), elicitin-like (ELL) proteins and Nep1-like proteins (NLP) [28]. Ten unique ELI genes were identified in the P. guiyangense genome. In contrast, only 2 ELI genes were found in the P. irregulare genome, and none were identified in the other Pythium and S. parasitica genomes (Fig 5A, Table 2). Based on phylogenetic analysis of the elicitin domains, P. guiyangense ELIs were distributed into two clades, and one clade included genes from diverse species while the second clade only contained P. guiyangense ELIs (Fig 5B). Moreover, 19 of the 20 ELI genes were physically clustered in the P. guiyangense genome, suggesting that ELIs were expanded in a species-specific manner. In contrast to the ELI genes, ELL genes were widely distributed in all the detected Pythium genomes. Both P. guiyangense and P. insidiosum had more ELL genes (45 and 50 unique genes) than the plant pathogenic Pythium species (23-40 genes) (Fig 5A, Table 2). Further phylogenetic analysis showed that over half of the ELL genes were distributed in nine clades which were specific to P. guiyangense and contained at least four members; thus many ELL genes were specifically expanded in P. guiyangense. Based on the P. guiyangense transcriptome analysis, 45% of the ELI genes were differentially expressed, and all were down-regulated during infection (Fig 5C). In contrast, 31% of the ELL genes were up-regulated while 15% were down-regulated during infection (Fig 5C). This observation suggested that the diverse ELIs and ELLs had a variety of different functions relative to growth and infection.
Another common apoplastic effector family is the necrosis and ethylene-inducing-like proteins (NLP) genes. Many NLPs, but not all, can trigger cell death and defense responses in plants [29]. Only 1 unique NLP gene was found in P. guiyangense and none were found in P. insidiosum (Table 2). This NLP protein belonged to type 1 NLP subfamily with two conserved cysteine residues. Transcriptional analysis revealed no significant change during infection. These observations suggest that NLP proteins may not participate in oomycete-animal interactions.

Some CRN effectors can induce insect cell death
Crinkler (CRN), a large class of cytoplasmic effectors, was first identified in Ph. infestans as a family of proteins that could cause plant cell death and defense responses [30]. A total of 38 CRN candidates were predicted in P. guiyangense (S8 Table), compared to 10-46 predicted CRN proteins in the other Pythium species using the same method ( Table 2). Examination of protein alignments of P. guiyangense CRN effectors revealed considerable conservation of the characteristic LxLFLAR/K and HVLVxxP motifs, which were similar to those observed in plant pathogenic Pythium species [13,25]. Based on five secretion signal predictors, 74% of CRN candidates in P. guiyangense contained a potential signal peptide or non-classical secretion signal (S8 Table), suggesting that the majority of P. guiyangense CRN proteins might be secreted into mosquito hosts.
A homology network of the oomycete CRN proteins was generated to investigate the evolutionary relationships between P. guiyangense and other oomycetes. The network is composed of 633 nodes in which each node represents an individual CRN protein. The network contains 34,149 edges that link nodes if the node proteins are homologous based on an all-versus-all BlastP search with an E-value cutoff of 10 −10 . As shown in Fig 6A, the network was comprised of a crowd of disconnected clusters and a small number of singletons. P. guiyangense CRN proteins were mainly distributed in 3 large and 3 small clusters (cluster I-VI represented by red dotted circles). Cluster I and III were composed primarily of P. guiyangense CRN proteins, with some of these proteins having homology to Pythium proteins. Notably, cluster II, IV and V only contained CRN proteins derived from P. guiyangense, revealing that these CRN proteins did not share significant sequence similarity with other oomycete CRN proteins. Moreover, all of the P. guiyangense CRN proteins showed sequence divergence of at least 50% with the most similar CRN protein in any plant pathogenic Pythium species, indicating that the  Infection mechanisms uncovered by Pythium guiyangense genome CRN proteins are highly divergent between insect pathogenic Pythium species and plant pathogenic Pythium species.
To explore the possible functions of P. guiyangense CRN proteins in insect cells, twenty-six CRN genes were expressed in Spodoptera frugiperda cell (Sf9) lines; successful expression of the proteins was confirmed with western blots or by detecting fluorescence signals under the fluorescence microscope (S7A and S7B Fig). The cell counting Kit-8 assay was used to determine protein toxicity to cells, with the Bacillus thuringiensis Delta-Endotoxin Cry1C as a positive control [31]. The results showed that 7 CRN proteins (CRN31, 33, 34, 36, 37, 38, and 28) significantly decreased the viability of Sf9 cells while the remaining CRN proteins produced responses similar to the negative control ( Fig 6B). Notably, CRN31 appeared to be the most toxic to Sf9 cells. To further validate the toxicity of these 7 CRN proteins, a prokaryotic expression system was used to obtain recombinant CRN proteins (S7C Fig). E. coli crude extracts containing the expressed proteins were then incubated with Sf9 cells and with mosquito Aedes albopictus C6/36 cells, respectively, to determine the toxicity using the cell counting Kit-8 assay. The results showed that CRN31 and CRN28 significantly reduced the viability of Sf9 cells and C6/36 cells (Fig 6C and 6D). Since the transcript levels of the CRN31 and CRN28 genes were not elevated during infection at 24 hpi as measured by RNA-seq (S8 Table), we used qRT-PCR to test whether the two CRN genes were significantly up-regulated during earlier infection stages (1-4 hpi) (S7D and S7E Fig).

Discussion
In this study, we have determined the mode of infection of P. guiyangense on mosquito larvae. In our experiments, P. guiyangense caused up to 76% mortality for A. albopictus and 69% for Cx. pipiens pallens larvae. Analogous to most of the entomopathogenic fungi, P. guiyangense hyphae emerging from germinating zoospore cysts entered their host directly through the exterior cuticle, propagated inside hosts, and produced sporangia to start a new cycle of infection. Another infection route of P. guiyangense was through the ingestion of mycelia by larvae. Mycelia in the digestive tract progressively destroyed internal tissues of the larval midgut, leading to host death. The most common invasion route for aquatic insect pathogens, including Metarhizium anisopliae, Aspergillus clavatus and Beauveria Bassiana, was through ingestion of spores to infect their host [4,32,33]. P. guiyangense has evolved a similar strategy to initiate infection in the digestive system. Overall, P. guiyangense utilizes cuticle penetration and ingestion of mycelia into the digestive system to infect mosquito larvae. We speculate that firm adhesion of zoospores to the mosquito larvae epicuticle is critical for the success of the P. guiyangense pathogen which involves a combination of passive hydrophobic and electrostatic forces as well as protein interaction. Hydrophobins found in the outer layer of the spore cell wall of Beauveria Bassiana, mediate adhesion to the arthropod cuticle [34,35]. Hydrolytic enzymes, Mad1 and Mad2, also assisted in attachment of the fungi to insects [36]. To identify the factors that promote attachment and ingestion of P. guiyangense by the mosquitoes would be interesting to further explore in the future.
To probe the molecular basis underlying the interactions of P. guiyangense with insects, a high-quality genome assembly and transcriptome sequences were generated for P. guiyangense. Our results reveal that P. guiyangense is probably a hybrid genome derived from two parental species. Natural interspecies hybridization events have been described in the genus Phytophthora such as Ph. andina, Ph. nicotianae and Ph. cactorum [37,38]. It is believed that interspecies hybridization has the potential to create new strains that have a new or expanded host range [21]. Considering the distinct hosts, we speculate that the hybrid feature of P. guiyangense contributes to its adaptation of the mosquito host. The two parental subgenomes of P. guiyangense are approximately 9% different in nucleotide sequence, suggesting that the two parents are relatively diverse, however, the potential parents are still mysterious based on limited Pythium data. We will pay close attention to the new information of Pythium and update the concerns in future study.
The phylogenetic analysis of the currently sequenced oomycete pathogens together with two diatoms demonstrated P. guiyangense is closely related to three plant pathogenic Pythium species (P. irregulare, P. iwayamai and P. ultimum) but has a slightly more distant relationship with the mammalian pathogen, P. insidiosum. This finding suggested that as a facultative mosquito pathogen, P. guiyangense, may have evolved from a common ancestor with the plant pathogens. This result is highly concordant with recent analysis indicating the mosquito oomycete pathogen, L. giganteum has also evolved from a plant pathogen [3].
In conjunction with the transcriptome analysis, oomycete genome comparisons identified several gene families that might contribute to P. guiyangense virulence. In this study, 471 putative unique kinases (943 kinases in total) were identified in the P. guiyangense genome. Comparison with other sequenced oomycete genomes revealed that the genomes of the animal pathogens, P. guiyangense, P. insidiosum and S. parasitica, also encoded significantly more kinases than the plant pathogenic Pythium genomes [15,16]. Transcriptome analysis revealed that a total of 92 kinases were differentially expressed during infection of P. guiyangense against mosquito larvae, implying that protein kinases may be involved in regulating virulence. We also found that genes involved in insect cuticle degradation were expanded in P. guiyangense while proteins for plant cell wall penetration were absent or lost functions. The P. guiyangense genome encoded a significantly larger number of proteases than plant pathogenic Pythium species, including cysteine-, metallo-, and serine-proteases. Transcript levels of 31% of the total subtilisin-like serine proteases were significantly elevated when P. guiyangense invaded mosquito larvae. Some of these proteases were reported as key virulence determinants in entomopathogenic fungi [39], supporting a potential role of these proteases in P. guiyangense infection.
A large number of Kazal proteinase inhibitors (KPIs) were characterized from P. guiyangense and 25% of these genes were up-regulated during infection of mosquito larvae, suggesting KPIs may be involved in pathogenicity. Our study demonstrated that one invasion route of P. guiyangense was through ingestion of mycelia in the digestive system. The mosquito midgut contains an abundant array of secreted serine proteases for digestion, providing nutrition for development [40,41]. To aid in colonization in its hosts, P. guiyangense may secrete protease inhibitors, such as KPIs for protection from these proteolytic enzymes. This is consistent with previous studies which show that the animal parasite, Toxoplasma gondii secretes TgPI-1 and TgPI-2, and Hookworm, Ancylostoma ceylanicum, secretes a 8-kDa broad spectrum serine protease inhibitor of the Kunitz family into the host digestive tract to aid in infection [42][43][44]. Serine proteases are also key components of immune responses and KPIs may manipulate host immune defenses for pathogenicity [45]. A kazal-like serine protease inhibitor was characterized from the plant pathogenic oomycete, Ph. infestans, and it targeted protease P69B to counteract tomato defense responses [46]. Another oomycete pathogen, Ph. palmivora also produced a KPI, PpEP to suppress plant defense [47]. Therefore, we speculate that the large number of KPIs secreted by P. guiyangense may suppress mosquito immune defenses by targeting serine proteases.
In addition to hydrolytic enzymes, plant pathogenic oomycetes deliver a diverse battery of other secreted proteins into host tissue to support infection and interfere with host immune responses, including lipid-binding proteins (elicitins), toxins (e.g. NLPs), and host-cell-entering RxLR and CRN effectors [48,49]. We found distinct sequence and evolutionary features of these proteins in P. guiyangense. Firstly, no statistically significant evidence for RxLR effectors encoded in the P. guiyangense genome was found, in agreement with previous reports [13,25]. Secondly, 10 unique ELI genes were present in the P. guiyangense genome whereas these genes were largely absent from the other Pythium species, including P. insidiosum. The P. guiyangense ELI genes appear to have expanded relatively recently to form two species-specific clades, and one clade appears to share a common origin with the Ph. sojae genes. In contrast to ELIs, ELLs have been widely found in all sequenced Pythium species. RNA-Seq data showed that ELI and ELL genes show differential expression patterns in P. guiyangense. ELI genes were typically highly expressed in the mycelia stage while a large number of ELL genes were up-regulated in the infection stage. These results suggested that the two subclasses of elicitins may be involved in different functions. Elicitins have also been reported in another mosquito pathogenic oomycete, L. giganteum [3], suggesting that elicitins in the two aquatic insect oomycetes may be linked to pathogenicity towards the insect hosts.
CRN effectors are considered more ancient cytoplasmic effectors than RxLRs, as they are distributed across a wide range of oomycetes [14,25] and have also been reported in the fungal animal pathogen, Batrachochytrium dendrobatidis [50] and in arbuscular mycorrhizal fungi [51]. Interestingly, CRN proteins were also detected in the mosquito pathogenic oomycete L. giganteum [3,52]. They are presumed to enter the host cytoplasm and manipulate cell death and defense responses [30,53]. It has been widely reported that only a handful of oomycete CRN proteins were predicted to contain canonical signal peptides [13,14]. In this study, four different signal peptide predictors and one non-classical secretion signal predictor were used, and the majority (74%) of P. guiyangense CRN proteins were predicted to contain potential secretion signals, suggesting that these CRN proteins very likely were secreted into mosquito hosts. Once inside host tissue, the roles of these putative effectors in animal pathogenic oomycetes remains unclear. One investigation detected CRN effectors in an entomopathogenic oomycete, Lagenidium giganteum, but their roles in the mosquito pathogenicity process remained unclear [52]. In this study, twenty-six CRN candidates were characterized in P. guiyangense and insect cell line transformation experiments revealed that CRN31 and CRN28 were toxic to Spodoptera frugiperda (Sf9) cells and to a lesser extent to Aedes albopictus (C6/36) cells. Therefore, we speculate that P. guiyangense has evolved distinct lineages of CRN effectors that are secreted into mosquito cells as virulence factors to induce host cell death.
Overall, we have demonstrated that two infection routes are available for infection of mosquitoes by P. guiyangense. The high-quality genome sequence of P. guiyangense provides new insights into study oomycete evolution and host adaptation because it is an oomycete pathogen that has adapted to mosquitoes. Genome comparisons suggest adaptations to a mosquito-pathogenic lifestyle include loss of plant cell wall degrading enzymes and NLP proteins, and expansions of kinases, proteases, and kazal-type protease inhibitors. Oomycete intracellular CRN effectors were identified and insect cell toxicity was identified in at least one of them, which could serve as a new resource to control agricultural important pests.

Strain and mosquito source
The P. guiyangense strain Su was kindly provided by Dr. Xiaoqing Su from Guiyang Medical University, Guiyang, China and was maintained on 10% vegetable juice (V8) medium in the dark at 25 ± 1˚C. The Nanjing laboratory strains of Aedes albopictus and Cx. pipiens pallens were obtained from Jiangsu Provincial Center for Disease Control and Prevention, Nanjing, China, and were kept at a temperature of 25 ± 1˚C in a 16L: 8D photoperiod.

Pathogenicity assays and infection processes of P. guiyangense
For mycelia infection assays, tests were carried out in plastic cups (capacity of 200 mL), each containing 25 early second-instar larvae and 4 agar plugs (10 mm × 10 mm in size) of P. guiyangense mycelia in 100 mL of deionized distilled water. The numbers of dead larvae were recorded every 24 hours for 10 days and each treatment was replicated at least three times. For zoospore infection assays, zoospores were prepared according to the method previously described [54], and then batches of 25 early second-instar larvae were exposed to a concentration of 10 7 zoospores ml -1 in individual cups to examine the cuticle penetration process. The progress of infection in the larvae was initially documented using light microscope every 2 h for 48 h. For scanning electron microscopy (SEM), representative larvae were collected at 0.5, 2, 4, and 48 hpi and fixed in 2.5% glutaraldehyde solution. The fixed larvae were then rinsed three times in 0.1 M PBS, dehydrated sequentially in 30%, 50%, 70%, 80%, 95% and 100% ethanol, subjected to critical point drying, mounted, and finally gold coated for viewing. To investigate the digestive system infection, larvae were inoculated with 4 agar plugs (10 mm × 10 mm in size) of P. guiyangense mycelia. After different times post-inoculation, larvae were examined with SEM as described above or else embedded in paraffin, sectioned, and stained with haematoxylin and eosin for light microscope observation.

Genome assembly
High-quality genomic DNA of P. guiyangense was prepared and submitted for genome sequencing using the PacBio and Illumina NGS platforms by Berry Genomics Corporation. The 450-bp paired-end libraries were constructed and sequenced on the Illumina HiSeq 2500 platform. The resultant short reads were processed to remove adapter sequences and low-quality sequences, resulting in 10.32 Gb of clean data (approximately 100-fold coverage). Two PacBio 20-kb SMRTcell libraries were constructed and sequenced on the Sequel platform. The generated raw reads were filtered by trimming the adapter sequences and low-quality sequences. This produced 7.55 Gb of cleaned sequences, with an average cleaned read length of 7.12 kb (approximately 70-fold coverage). Both the Illumina and PacBio SMRT sequencing data were used for the genome assembly. The de novo assembly was produced using the PacBio Hierarchical Genome Assembly Process HGAP 3.0 program [55]. First, the PacBio SMRT sequence data were error-corrected using the long filtered read and sub-read cyclic consensus sequences using HGAP error correction. The error-corrected long reads were then assembled using HGAP with default parameters. The Illumina paired-end reads were aligned to the Pac-Bio assembly with BWA [56], and paired-end reads with concordant alignments were selected with SAMtools view for error correction. A final genome assembly error correction was conducted using the Pilon tool [57]. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession QXDM00000000.1. The CEMGA pipeline was used to evaluate the completeness and continuity of the genome on the basis of 248 core eukaryotic genes [58].

Gene prediction and functional annotation
The P. guiyangense assembly was masked for low complexity, as well as known transposable elements using RepeatMasker (www.repeatmasker.org). Genes in the repeat-masked genome were predicted using two predictors, AUGUSTUS [59] and SNAP [60]. The P. guiyangense core eukaryotic genes identified by CEGMA analysis were used to train the gene predictor SNAP. The AUGUSTUS predictor was trained using P. ultimum proteins. SNAP and AUGUS-TUS were then used as a part of the MAKER software to conduct the gene prediction. Protein sequences from six sequenced Pythium species, plus Ph. sojae, Ph. infestans, H. arabidopsidis and S. parasitica were submitted to MAKER as extrinsic sources of gene structure evidence to improve sensitivity of gene predictions. The transcripts discovered based on the RNA-Seq data were also submitted to MAKER as EST evidence. MAKER was set to filter out short gene models that produce proteins with fewer than 30 amino acids.
All the protein sequences from P. guiyangense were searched against themselves using the BlastP program with the E-value setting to 10− 10 . Then, the BlastP result file and the GFF file of the P. guiyangense genome were inputted into the MCScanX program to analyze the synteny blocks and homologous genes located in synteny blocks [24]. Circular representations of these homologous genes were produced using Circos program [61]. The Ks values of each pair of homologous genes were calculated using KaKs_Calculator 2.0 [62].
Whole genome protein families were classified by Pfam analysis [63]. The proteomes were screened for CAZymes (carbohydrate active enzymes) using Hmmscan from the HMMER package and the dbCAN HMM profile database [64]. Putative proteases and protease inhibitors were identified by using batch BLAST at the MEROPS server [65]. Protein kinases were classified by Hmmsearch against KinBase (www.kinase.com).

Transcriptome analysis
A sample of 30 early second-instar larvae inoculated with P. guiyangense mycelia was collected at 24 hpi as the infection stage, and another sample of P. guiyangense mycelia was harvested as the control. There were no biological replicates. The total RNAs of the two samples were extracted according to the method previously described [54], and then sequenced by Berry Genomics Corporation using the Illumina 2500 platform. The 150 bp paired-end reads were filtered for quality as described above and aligned to the P. guiyangense genome assembly using Tophat with a maximum of two mismatches [66]. The mapped reads were quantified using the Cufflinks program [67], and the transcript level of each gene was quantified as RPKM (reads per kilobase transcript length per million reads mapped). Differentially expressed genes were identified using the GFOLD algorithm [68]. GFOLD was developed for unreplicated RNA-Seq data and assigns statistics for expression changes based on the posterior distribution of log fold change. Genes with four-fold change and GFOLD > 1 or < -1 were considered differentially expressed between two samples. The reads were also assembled de novo using the Trinity package [69] with default settings to serve as additional evidence for gene prediction.
To detect the transcript levels of particular P. guiyangense genes during infection stages, qRT-PCR assays were performed. The samples of early second-instar larvae inoculated with P. guiyangense mycelia at different infection time points were collected, and another sample of P. guiyangense mycelia was harvested as the control. Then the total RNAs of the above samples were extracted for qRT-PCR assay. qRT-PCR was performed using an ABI Prism 7500 Real-Time PCR system (Applied Biosystems) with SYBR Premix Ex Tag according to the manufacturer's instructions. The comparative threshold cycle (Ct) method was used to determine relative transcript levels through ABI 7500 System Sequence Detection Software. The relative transcript levels of particular P. guiyangense genes were normalized to the mycelia data using the actin gene as internal standard. At least three biologically independent replicates of the qRT-PCR experiments were carried out.

Orthology and phylogenetic analysis
A phylogenetic analysis was conducted on the core eukaryotic genes identified using the CEGMA pipeline. Multiplex sequence alignments of proteins were made with ClustalW [70] and subsequently concatenated. A neighbor-joining tree was built using MEGA5 with 1000 fold bootstrapping for distance estimation [71]. Orthologous and paralogous groups among the seven Pythium genomes were determined using OrthoMCL with default parameters: BLASTp E-value cutoff of 10 −5 and inflation index of 1.5 [72]. The output of OrthoMCL was parsed to separate core, conserved and specific clusters. Pfam domain enrichment analysis was undertaken on genes that were specific to P. guiyangense. The fold-enrichment corresponds to the frequency of a given PFAM domain within a specific gene set divided by the frequency in the rest of the P. guiyangense proteome; a chi-square test with p-value <0.05 was used for significance tests.

Identification of putative elicitins and effectors
The elicitin domain (PF00964) was retrieved from the PFAM database [63], and then used to search against P. guiyangense proteome. Hits with E-value less than 10 −5 were considered to be elicitin candidates. To distinguish elicitin (ELI) and elicitin-like (ELL) proteins, the previously known sequence features of the elicitin domain were used [28]. ELIs contain a highly conserved 98-amino acid domain with six cysteine residues and a typical cysteine spacing pattern. ELLs possess shorter or longer elicitin domains and sequences are more diverse.
For CRN effector prediction, well-characterized CRN proteins from Ph. sojae and Ph. infestans were obtained from a previous publication [14], and then were used to construct HMM profiles based on the LFLAK and HVLVVVP motifs. The HMM profiles were used to search against all possible proteins derived from six open reading frames of the genome. The resulting CRN candidates were manually examined for the presence of LFLAK and HVLVVVP motifs. After that, the validated CRN proteins were used to update the HMM profile, which was then used to search the protein database again for new candidates. To determine whether CRN candidates were full length or pseudogenes, we aligned the CRN candidates with previously characterized CRN proteins. If the CRN candidates shared similar sequence with known CRNs at the DNA level, but had an obvious frameshift mutation or earlier stop codon, they were considered to be pseudogenes. To predict secretion signals for CRN proteins, four signal peptide predictors including SignalP 3.0 (http://www.cbs.dtu.dk/services/SignalP-3.0/), SignalP 4.1 (http://www.cbs.dtu.dk/services/SignalP/), iPSORT (http://ipsort.hgc.jp/), PrediSi (http:// www.predisi.de/), and one non-classical secretion signal predictor named SecretomeP 2.0 (http://www.cbs.dtu.dk/services/SecretomeP/), were utilized.

Insect cell bioassays
Spodoptera frugiperda sf9 cell lines were cultured with sf-900™ III SFM medium (Gibco) at 27˚C and 140 rpm in suspension flasks until they reached 2×10 6 cells/mL. Sf9 cells were subcultured using fresh medium every 3 days. The mosquito C6/36 cell lines were cultured in Schneider's Drosophila Medium (Gibco) with 10% Fetal Bovine Serum (Gibco) at 27˚C and the culture medium was renewed every 3 days. Cell density was determined using a Countess Automated Cell Counter (Invitrogen) and cell viability was evaluated by staining with trypan blue exclusion dye.
To evaluate CRN protein toxicity, Sf9 cells were transfected with DNAs encoding CRN proteins. The ORFs of CRNs excluding the signal peptide were amplified, and CRNs without the signal peptide predicted by SignalP were amplified by excluding the N-terminal twenty-five amino acids. The PCR products were cloned into pIB/V5-His vector (Invitrogen) using Clo-nExpress II One Step Cloning Kit (Vazyme). Sf9 cell suspensions were seeded into 96-well plates (100 μL/well). After 24 h of incubation, cells were transfected with 0.2 ug plasmid DNA using the FuGENE HD Transfection Reagent (Promega) as described by the manufacturer, and six parallel wells were used in each group. Cell toxicity was detected by cell counting with the Cell Counting Kit-8 (CCK-8, Dojindo Laboratories Kumamoto, Japan) according to the manufacturer's instructions. Briefly, after transfection of the Sf9 cells for 60 hours, 10 μL CCK8 solution was added to the cells. After the cells were incubated for 24 hours at 27˚C, the absorbance was analyzed at 450 nm with a reference wavelength of 600 nm using SpectraMax M5 microplate reader. The cells receiving empty vector DNA were considered as 100% viable. Then, the cell viability rate was calculated as follows: Cell viability (%) = [(As-Ab)/(Ac-Ab)]×100%, where As represents the test well reading, Ac represents the empty vector well reading, and Ab represents a blank well reading. The data are expressed as the means ± SE based on at least three independent experiments. CRN constructs were compared to empty vector DNA using Student's t-test. A difference with P < 0.05 was considered to be statistically significant.
To validate the toxicity of specific CRN proteins, the relevant CRN genes excluding the signal peptide were inserted in frame into the pET32a (+) vector (Novagen) by directional cloning between the BamHI and HindIII sites. The pET32a empty vector was used as a negative control. Cell growth and induction of expression were carried out as described in the pET system manual. Briefly, Escherichia coli BL21(DE3) strains were grown at 37˚C to an OD600 of 0.6. At that time, 1 mM IPTG was added in order to induce protein expression at 18˚C for 18 h. Induced cells were harvested by centrifugation at 4˚C and washed three times with PBS buffer. The cells were resuspended in PBS buffer and lysed using short (3 s) ultrasonic bursts separated by 6 s intervals for 6 min. Crude protein extracts were centrifuged for 10 min at 12,000 rpm. Then the extracts were filtered with a 0.22 um filter. Adherent sf9 or C6/36 cell monolayers in 96-well plates were incubated with the protein extracts for 4 hours. Cell viability was assayed using the methods in the above paragraph, and at least three independent repeats were performed.

Western blot analysis
Cells were lysed in ice-cold lysis buffer (Solarbio) for 10 min. Following this, samples were centrifuged at 12,000 g at 4˚C for 5 min. The supernatants were collected and boiled with loading buffer at 100˚C for 10 min. The samples were separated by 10% SDS-PAGE and transferred onto a polyvinylidene difluoride membrane (Millipore, Billerica, MA, USA). Membranes were blocked with 5% non-fat milk then incubated with anti-His primary antibody for 2 h. The membranes were washed with 0.1% Tween 20 in PBS and probed with IRDye 800CW-conjugated goat (polyclonal) anti-mouse IgG secondary antibodies for 1 h at room temperature. PVDF membranes were visualized using a scanner (LI-COR Odyssey) with excitation at 700 and 800 nm.  Fig 3A. The letter "A" with blue background represents animal host, and the letter "P" with green background represents plant host. The colored bars represent the numbers of core (present in all the seven genomes), conserved (present in two to six genomes) or species-specific (present only in own genome) genes for each species, which were determined using OrthoMCL.