ExVe: The knowledge base of orthologous proteins identified in fungal extracellular vesicles

Graphical abstract


Introduction
Eukaryotic and prokaryotic organisms release small bilayer extracellular vesicles (EVs) ranging from 20 to 5,000 nm under distinct biological or environmental conditions [1][2][3]. In eukaryotes, EVs is a generical term referring to bilayered membrane structures that are released by many cell types. EVs are classified into exosomes, microvesicles, and apoptotic bodies based on size, cargo and biogenesis pathways [4]. Exosomes are small structures in the 30 to 100 nm diameter range generated from the release of luminal vesicles in late endosomes after fusion with the plasma membrane [1,5]. Microvesicles range from 100 nm to 1000 nm in size, and they are formed by budding off the plasma membrane [5,6]. Apoptotic bodies are the largest vesicle structures (1 mm to 5 mm) [1], observed only in cells that undergo apoptosis [1,5].
The functions of EVs are consistently associated with cellular communication. EVs are known to transport several molecules to distant organs or tissues [4,7]. In fungi Rodrigues et al. (2007) described that EV plays a role in fungal trans-cell wall transport [8]. Monguió-Tortajada and colleagues (2017) identified an immunomodulatory potential of EVs derived from umbilical cord mesenchymal stem cells [9]. Cancer cells also produce EVs, which are related to tumor development and progression [10], invasion [11], angiogenesis [12] and metastasis [13]. In pathogens, EVs are the vehicles of exportation of several antigens [14][15][16]. Pathogen-derived EVs also participate in the host-pathogen interaction [8,14,17], since they concentrate virulence factors [4,[18][19][20][21][22]. The role of EVs as vehicles of virulence factors for many taxa, including fungi, protozoa and bacteria, has been comprehensively reviewed by Campos and colleagues [7]. The seminal discovery of EV production in the fungal pathogen Cryptococcus neoformans unraveled a mechanism of secretion in fungal cells that was extended to other fungi [4,8]. For instance, Ikeda and colleagues (2018) associated Sporothrix brasiliensis EVs with host environmental adaptation and increased fungal pathogenicity. Sporothrix EVs contained heat shock proteins, major facilitator superfamily transporters, and other enzymes that could increase fungal virulence [23]. Candida albicans EVs were described as playing a role in biofilm drug resistance. Additionally, iRNA sequences in C. albicans EVs have the potential to modulate gene expression in host cells [24,25]. In Cryptococcus EVs contain several molecules associated with virulence and survival in the host environment, including laccase, glucosylceramides and urease [26].
Several other examples of the biological functions of fungal EVs are available in the current literature, as recently reviewed by Rizzo and colleagues [17].
The molecular composition investigation of fungal EVs is the first step towards the comprehension of their functional role in pathogenic and nonpathogenic species. Among others, the description of EV proteins is currently the focus of several research groups. These studies can benefit from the new proteomic technologies and facilitated EV isolation methods [27][28][29]. An accurate method of EV isolation associated to an efficient proteomic shotgun approach may contribute to the identification of EV-associated proteins in a large number of pathogenic and nonpathogenic fungal species [4,8,20,30].
The integration of publicly available datasets is a trend in modern science to accelerate novel findings. Due to the large amount of data, databases focused on gathering and organizing this information may contribute to data sharing and encourage other studies [18,31,32]. ExoCarta [33], Vesiclepedia [34], and EVpedia [35] are the databases currently available to investigate EV proteomics data of several species. These databases may assist researchers in data mining of published data and additional analysis of their own datasets [18,[33][34][35]. However, there is no publicly available database regarding EV data of fungal species [35].
To integrate gene and protein data of distinct fungal species, orthology databases have been created [36,37]. Orthology analysis may contribute to understanding the biological function of different proteins and biological pathways under many conditions and even to comprehending the evolutionary history of a group or species [31,37]. To date, there is no publicly available database to integrate proteomics data from pathogenic fungal EVs. Here, we present ExVe, a publicly available database that integrates EV proteomic data from nine fungal species focusing on orthology, which can be freely accessed at http://exve.icc.fiocruz.br.

Fungal EV proteomics data
To build ExVe, first we selected articles with a full description of EV isolation and proteomics methods. From these articles, we used the list of identified proteins by shotgun proteomics experiments available in previously published fungal EV research articles (Fig. 1). We retrieved data from the following eight fungal species with clinical or medical relevance for humans: Aspergillus fumigatus, Candida albicans, Cryptococcus deuterogattii, Cryptococcus neoformans, Histoplasma capsulatum, Paracoccidioides brasiliensis, Sporothrix brasiliensis, and Sporothrix schenckii. Additionally, we included Saccharomyces cerevisiae as representative of a nonpathogenic fungus ( Table 1).
The aim of ExVe is to offer authors in the field an alternative to access qualitative fungal EV shotgun proteomics identification with focus on orthology. As ExVe relies on third-party analysis, we highlight that each article applied its own statistical threshold, including false positive rates. Proteins with any peptide detected in such experiments are listed as present in EVs from fungal species. We created a computer program written in Perl to retrieve all required data and gather all the information in a single file. First, we retrieved UniProt accession numbers for all proteins based on gene symbols, protein name descriptions and species (Fig. 1) [32]. Thereafter, we used each UniProt accession number to obtain a unique protein name and biological function (protein_description) from UniProt TREMBL and SWISSPROT databases. For protein names, we retrieved the name recommended by the UniProt consortium (''RecName"), and in its absence, the first submitted name (''SubName") for that protein accession was chosen [32].

Protein orthologous groups
The protein orthologous groups for each protein in our database were retrieved by the OrthoMCL algorithm available from the Fun-giDB database, release 48 [46] (Fig. 1). We downloaded the web data from all fungal species available at FungiDB [46] to retrieve the orthologous group available to all fungal EV proteins using a dictionary based on UniProt accession number or using the gene symbol associated with a species. Associated with the orthologous group information, we retrieved the most frequent Pfam domain for functional annotation of the group.

ExVe integrated data
In ExVe, users can visualize information about proteins identified in EVs as follows: gene name, protein accession number, protein name, protein description, species, strain, PubMed number, and orthologous group ( Table 2). Information about proteins that could not be recovered was named ''not available", and it is available in the downloadable ExVe flat file.

Enrichment analysis
The enrichment analysis of groups of species that shared at least 50 orthologous groups ( Fig. 2) was performed using Gene Ontology annotations Fig. 2. Briefly, we retrieved the orthologous groups shared by each group of species and recovered the genes associated with these orthologous groups for a selected representative species. We chose A. fumigatus as the reference fungus due to its presence in all groups of species. However, in the group composed exclusively of Cryptococcus genus the selected species was C. neoformans.
To retrieve Gene Ontology annotations we used FungiDB webtools as in its own guidelines [46]. We retrieved information of biological processes, molecular functions, and cellular components, considering computed and curated evidences with p-value < 0.05. The full list of identifications is available as Supplementary Table 1 (Table S1).

Implementation
ExVe was implemented using the Laravel 5.5 and Docker frameworks in PHP language. A PostgreSQL version 12.3 database was used to store our datasets using Docker containers.
The EV isolation methods of eleven out of the thirteen articles selected to populate ExVe were based on the original study on fungal EVs using liquid media [8,50]. The exceptions were the C. albicans study by Zarnowski and colleagues (2018) using biofilms [24] and the study by Rizzo et al. on C. neoformans and C. deuterogattii performing EV isolation from solid media [29].
ExVe integrates data for 11,433 hits detected in fungal EVs. Any identified protein in each study with no UniProt accession number was purged from ExVe. We identified 5,367 proteins with a Swis-sProt [32] accession number, of which 1,879 have a biological function annotated. For the remaining proteins, 6,043 were associated with a Trembl accession number, of which 614 have computationally inferred biological functions [32].
We noticed that different sets of equipment were used to generate the proteomics shotgun data. The NanoLC 1D Plus (Eksigent) associated with LTQ XL linear ion trap (Thermo Fisher) was the most frequent combination to generate shotgun proteomics data, which was applied to S. cerevisiae, P. brasiliensis, H. capsulatum, C. neoformans and C. albicans. The spectrometer Orbitrap Q Exactive Plus (Thermo Fisher) associated to the chromatograph EASY-nLC (1000/1200) (Thermo Fisher) was used to generate shotgun proteomics data of A. fumigatus, C. albicans and C. neoformans. Interestingly, the spectrometer 5600 TripleTOF (AB Sciex) in conjunction with chromatograph Ekspert nanoLC 400 (Eksigent) enabled the identification of more than 1,100 proteins for each H. capsulatum study, the highest number of detected proteins.

Protein orthologous groups
ExVe has 11,189 proteins clustered in 3,834 different ortholog groups (Table 4), whereas 292 were defined as orphan proteins, which are unavailable for website visualization.
Our findings indicate that A. fumigatus and C. neoformans presented the highest number of unique orthologous groups: 1,253 Graphical representation of the ExVe workflow. We retrieved shotgun proteomics data from 9 fungal species (8 pathogenic and 1 nonpathogenic). We performed an association of these data with Uniprot [49], Pfam [48] and FungiDB [46] to provide a web interface with orthology information about proteomic content from fungal EVs.  and 590, respectively. The orthologous group OG6_100083 was unique to all nine species in ExVe. This orthologous group comprises a set of chaperone-encoding genes with the Hsp70 protein (PF00012) as the most frequent Pfam protein domain. The comparison of all nine fungi revealed that C. neoformans and C. deuterogattii shared 296 orthologous groups, which was the highest number among all possible pairs of species (Fig. 2). However, C. albicans, H. capsualtum, C. deuterogattii, C. neoformans and A. fumigatus was the group of different genera that shared the highest number of orthologous groups (231 groups).
We used FungiDB tools to retrieve Gene Ontology (GO) annotations to investigate the enrichment of GO terms for species that shared more than 50 orthologous groups (Table S1). With the highest number of orthologous groups shared, the Cryptococcus genus had an ''extracellular region" term (GO:0005576) enriched for the ''cellular compound" category (Table S1). Additionally, ''hydrolase activity" terms (GO:0016810, GO:0004553, GO:0016798) were enriched in the ''molecular function" category (Table S1). The ''biological process" category in the Cryptococcus genus contained the ''mRNA processing" and ''mRNA splicing" associated terms  The second group of species with the highest number of orthologous groups shared (C. albicans, H. capsualtum, C. deuterogattii, C. neoformans and A. fumigatus) revealed the ''cytoplasm" term (GO:0005737) as the most enriched GO term for the ''cellular component" category. In this group, the ''organonitrogen compound biosynthetic term" (GO:1901566) was the most enriched ''biological process" term. At last, our analysis revealed the enrichment of ''structural" terms (GO:0005198 and GO:0003735) for the ''molecular function" category for this group of species (Table S1).
Next, we investigated the occurrence of orthologous groups related to fungal pathogenicity. According to our analysis, no orthologous group was exclusively present in the eight pathogenic fungi (Table 5). However, if a single pathogenic species is excluded at a time from the analysis, some orthologous groups are pinpointed. If a given Sporothrix genus was excluded at a time from the comparison to all other fungi, another exclusive pathogenic orthologous group was identified. S. brasiliensis exclusion permitted the identification of OG6_100304 and OG6_100832 exclusive pathogenic orthogoups, which represent proteins associated with nucleoside-diphosphate kinase (PF00334) and ribosomal S17 (PF00833) Pfam domains, respectively. If S. schenckii was not considered in the comparison, the orthologous groups OG6_100082 and OG6_100425 are identified, which are associated with core histone H2A/H2B/H3/H4 (PF00125) and RNA recognition motif (PF00076) Pfam protein domains, respectively. The investigation of orthologous groups exclusively detected in the nonpathogenic S. cerevisiae fungus unveiled the following orthologous groups with protein domains according to Pfam: OG6_100674, OG6_102300, OG6_142972, OG6_500194, and OG6_222591. These orthologous groups are related to dihydroorotate dehydrogenase (PF01180), the phosphoadenosine phosphosulfate reductase family (PF01507), glucanosyltransferase (PF03198), and the glycolipid 2alpha-mannosyltransferase (PF01793) Pfam protein domains. The orthologous group OG6_222591 was identified as exclusive to S. cerevisiae, but no Pfam protein domain is available yet.

Online application, visualization module and functionalities
The online application is available under the URL http://exve. icc.fiocruz.br. The ExVe contains 6 menus, named ''About", ''Contact Us", ''Download", ''Species", ''Orthologous Groups", and ''Proteins" (Fig. 3). The ''About" menu contains a brief description of the ExVe database, in addition to listing some features of the web system. The ''Contact Us" menu displays the mail contact for questions, error reports, feature requests and dataset proposals. The ''Download" menu allows the user to download the ExVe database.
In the ''Species" menu ( Fig. 4), the user can select the organism species and strain from the complete list (currently including 9 species and 28 strains, as listed in Table 3). For a user-specified species and/or strain, all available gene symbols, UniProt protein IDs and names, strains, orthologous groups and PubMed links [47] are displayed in a table.
The ''Orthogroups" menu provides features to visualize all available orthologous groups and PFAM domains they are associated with (see Fig. 5). Users can submit one or more species and receive all groups to which both have orthologous proteins. When a protein is not assigned to any group of orthologs, it is grouped in a cluster termed ''not available".
The ''Proteins" menu provides two principal functionalities (Fig. 6). Users can search for a given protein, and ExVe returns all orthologous groups with which the selected protein is associated, the gene symbol when available, the species and strain from which each protein was isolated, protein name, PMID, and the UniProt ID [32]. The UniProt ID field contains a link where users can access additional protein information such as the name and description of the protein. Additionally, on this page, the user can be redirected to the research article that described the protein by clicking on the PMID field [47]. Another feature on this page is the redirection to the UniProtKB consortium page [32] when clicking on the UniProt ID.
Another main feature of the menu is the possibility of filtering by protein, species name and type of fungal culture (liquid or solid media), and ExVe returns the same information listed above (Fig. 7).

Discussion
Fungal EVs have been recently proposed as vaccine candidates [51,52], but the molecules inducing protection of different hosts remain unknown [41,51]. Recently, we identified an EV peptide inducing protection in an invertebrate host [53], but the immunological roles of vesicle-associated proteins are still unclear. This scenario might be linked to one major gap in the field of fungal EVs: the lack of well-characterized biomarkers, as well as molecules with potential to be vaccine candidates. In this context, a systematic analysis of components of fungal EVs produced by different species will likely benefit the whole field. Moreover, an orthology analysis of EV components produced by pathogenic species may contribute to guide future research on the functions of vesicular molecules.
Reference EV databases such as Vesiclepedia [34] and Exocarta [33] have limited data regarding fungal species, while other eukaryotic and prokaryotic organisms have abundant information. We are launching ExVe as a free web repository that integrates EV proteins previously identified by shotgun proteomics into nine fungal species using orthology data, with possibility of expansion depending on the availability of data in the future literature. The use of orthology information can improve comparative studies and assist in protein function prediction [54]. Although there are orthology databases such as FungiDB [46], OrthoMCL [37], and AYbRAH [31], none of them provide a subset of fungal EV proteomics data. Additionally, the association of ExVe with FungiDB allows users to assess a large amount of biological data and bioinformatic resources to improve data analysis [46].
Distinct protocols for EV protein analysis will influence the number of identified proteins [27,30]. The variability in the EV isolation methods, mostly based on differential ultracentrifugation of samples obtained from liquid media being, has the same potential to affect protein detection [8,27,50]. Alternative protocols of isolation of fungal EVs, including chromatographic separation and the use of solid media, have been recently described [9,27,29,30]. All articles used in to populate the ExVe databases analyzed fungal EVs by differential ultracentrifugation from liquid media [8,24,50], excepting for the recent study of Rizzo and colleagues [41]. The peptide identification criteria may also affect the quantity and quality of proteins identified in fungal EVs [28,55]. Nevertheless, a restricted number of studies in ExVe overpassed the number of 1,000 proteins identified. Moreover, with the exceptions of A. fumigatus [38] and C. deuterogattii [41] that have an unique study for each specie, there were different studies with different proteomics approaches to detect proteins from EVs (Table 3). Therefore, our data suggests that the different proteomics approaches had a limited impact on the identification of orthologous groups in fungal species.
According to our analysis, OG6_100083 is the only ortholog group in all species available at ExVe. This ortholog group gathers sequences of heat shock proteins, with a focus on the HSP70 family Pfam protein domain [46]. Several studies imply the high sequence conservation of this protein family among different groups and species [56,57]. The intracellular role of Hsp proteins as chaperones and in the modulation of stress response was previously described [58,59]. HSP70 proteins were identified at the extracellular environment playing multiple roles [59,60]. Notably, Hsp70 was characterized as a component of fungal EVs [59]. The functions of extracellular Hsp70 can be numerous, but they could be associated with cell signaling mainly by the modulation of macrophage activation, and attenuation of the immune response [58,61].
In fungi, heat shock proteins have been described in the stress response and survival of different fungi under adverse environmental conditions such as temperature changes, starvation and antifungal stress [56,57,62]. Heat shock proteins are also involved in morphological changes, including the transition from mycelium to yeast, and have an important role in the interaction with host immune cells [56,63]. Indeed, heat shock proteins are apparently essential for the formation of fungal EVs. In H. capsulatum, binding of antibodies raised against heat shock proteins to the fungal surface resulted in EVs with altered protein loading [44]. In this fungus and in C. neoformans, these proteins were recognized by antibodies produced by infected patients [14,26]. In summary, our results and the recent literature strongly suggest heat shock proteins as main components of fungal EVs.  Other 4 orthologous groups were identified in 7 of 8 fungal pathogenic species, but not in S. cerevisiae ( Table 5). The nucleoside diphosphate kinase (Pfam ID PF00334) is an Pfam protein domain exclusively identified in most pathogenic fungi. This protein domain has already been related to EVs derived from breast cancer, and associated [64]. In fungi, this protein domain is potentially associated with fungal resistance to oxidative and thermal stress [65]. For instance, the gene encoding the nucleoside diphosphate kinase was shown to be essential in A. flavus, with a role in spore production and sclerotia formation [66,67]. Interestingly, disruption of this gene in S. cerevisiae resulted in unaltered growth ratio and spore production levels [68]. Because this orthologous group was identified only in EVs produced by pathogenic species, we speculate that it plays a role in pathogenicity.
Histones have a central function in eukaryotes, controlling chromatin accessibility and chromosome segregation during mitosis. In C. albicans, histone H2A was described to regulate aneuploidy, argued as a strategy to acquire tolerance to antifungal therapy [69]. H3 and a variant histone protein were described to play a role in biofilm and planktonic forms of C. albicans, depicting the relevance of such proteins in the fungal cell cycle [70]. Regarding the role of histones in fungal pathogenicity, the histone acetyl-transferase Gcn5 was shown to work in chromatin remodeling in response to stress induced by the human host environment during C. neoformans invasion [71]. Although the biological function of histones is known to be related to chromatin accessibility, additional functions could be found in such proteins. Using a mouse model, an H2B-like protein was detected in the H. capsulatum cell surface, which was used to investigate its role in the immune response [72]. Macrophage histones were detected at the surface of EVs, which was associated with proinflammatory responses [73]. However, the biological function of fungal histones in EVs is not yet known, we suppose these proteins could be associated with virulence strategies since they were not detected in S. cerevisiae. The fact that a histone-like protein was detected on the cell surface in one of the pathogenic fungi studied here sheds light on the reason why such proteins were identified in EVs.
EVs contain distinct types of molecules in their lumen, including nucleic acids and proteins. Therefore, it is expected to find proteins holding an RNA binding motif that would carry RNAs, an unstable molecule. mRNA trafficking occurs extensively in the cytosol of several fungi [74,75]. In EVs, short RNAs were detected in C. neoformans, P. brasiliensis, C. albicans, and S. cerevisiae [76]. Surprisingly, according to our analysis, the RNA recognition motif  (Pfam ID PF00076) was identified only in pathogenic species, not in S. cerevisiae. This observation should be investigated in depth in further studies to confirm that such proteins may act carrying some specific pathogenic RNAs. Some of the orthologous groups were exclusively identified in S. cerevisiae, and not in any fungal pathogenic species. This observation points to the existence of still unknown differential mechanisms of protein loading into pathogenic and nonpathogenic fungal EVs. The dihydroorotate dehydrogenase Pfam protein domain (Pfam ID PF01180), exclusively found in the S. cerevisiae EVs, is associated with key proteins related to fungal pyrimidine biosynthesis, with investigations regarding its structure and mechanism of action in S. cerevisiae [77]. The phosphoadenosine phosphosulfate reductase family (Pfam ID PF01507), also exclusive to S. cerevisiae EVs, is another Pfam protein domain associated with essential fungal metabolism. These proteins are vital to sulfur uptake by the action of 3 0 -phosphoadenosine-5 0 -phosphosulfate (PAPS) reductases. This class of fungal enzymes was proposed to be the target of new molecules aiming at fungal therapy, with a focus on Aspergillus species [78,79].
Fungal EVs have been recently linked to the formation of the cell wall [38,80], a key component of the fungal cell. These reports agree with our findings showing an association of fungal EVs and the cell wall. Glycolipid 2-alpha-mannosyltransferase (Pfam ID PF01793) is responsible for the mannosylation of the lipid-linked oligosaccharide, which is required for the formation of O-linked saccharides during cell wall synthesis [81]. Fungal proteins with the Pfam protein domain glucanosyltransferase (Pfam ID PF03198) have the b-1,3-glucanosyltransferase (Gas1) operate in cell wall synthesis, silencing of rDNA expression, and stress  response [82,83]. Since Gas1 was detected in EVs produced by yeast cells and not in A. fumigatus, the existence morphologyrelated functions for this EV protein are expected.

Conclusions
The current purpose of ExVe was to integrate available EV proteins identified by shotgun proteomics data for medically relevant fungal species. However, a database that includes data for different molecules could contribute more to the elucidation of additional questions [35]. Therefore, we plan to implement data recovery from other molecules such as lipids and RNAs, aiming for ExVe improvement. Additionally, we aim to enable sequence search, gene ontology and network analysis as future perspectives. Our proposal is that ExVe will be updated annually with newly available data. ExVe is open to continuously integrating proteins identified by shotgun proteomics data from the scientific community.