Evolutionary analysis of p38 stress-activated kinases in unicellular relatives of animals suggests an ancestral function in osmotic stress

p38 kinases are key elements of the cellular stress response in animals. They mediate the cell response to a multitude of stress stimuli, from osmotic shock to inflammation and oncogenes. However, it is unknown how such diversity of function in stress evolved in this kinase subfamily. Here, we show that the p38 kinase was already present in a common ancestor of animals and fungi. Later, in animals, it diversified into three JNK kinases and four p38 kinases. Moreover, we identified a fifth p38 paralog in fishes and amphibians. Our analysis shows that each p38 paralog has specific amino acid substitutions around the hinge point, a region between the N-terminal and C-terminal protein domains. We showed that this region can be used to distinguish between individual paralogs and predict their specificity. Finally, we showed that the response to hyperosmotic stress in Capsaspora owczarzaki, a close unicellular relative of animals, follows a phosphorylation-dephosphorylation pattern typical of p38 kinases. At the same time, Capsaspora's cells upregulate the expression of GPD1 protein resembling an osmotic stress response in yeasts. Overall, our results show that the ancestral p38 stress pathway originated in the root of opisthokonts, most likely as a cell's reaction to salinity change in the environment. In animals, the pathway became more complex and incorporated more stimuli and downstream targets due to the p38 sequence evolution in the docking and substrate binding sites around the hinge region. This study improves our understanding of p38 evolution and opens new perspectives for p38 research.


Introduction
Stress response in multicellular organisms is a complex process essential for cell survival. The p38 family of stress-activated kinases is a major player in this process in animals and is considered to be animal-specific [1]. p38 kinases belong to the larger group of mitogen-activated protein kinases (MAPKs). In animals, they are activated in response to almost all stress factors, from osmotic shock and UV radiation to cytokine-mediated inflammation and oncogene activation [2]. They also regulate apoptosis, cell differentiation, proliferation and migration [3]. Not surprisingly, dysregulation of p38-mediated pathways can lead to a number of pathologies, including inflammatory, cardiovascular and neurodegenerative diseases and cancer.
Most invertebrates have one gene coding for p38 kinase [4,5] with some exceptions [6,7], while vertebrates are known to have four paralogs: p38α (MAPK14), p38β (MAPK11), p38γ (MAPK12) and p38δ (MAPK13). The vertebrate paralogs share high sequence similarity, and their functional redundancy was suggested in several studies [8,9]. However, there is also evidence advocating individuality and specific biological roles of each paralog. For example, they may differ in substrate specificity and specificity to their activating kinases (MAPK kinases or MAPKKs). Different p38 paralogs can also exhibit different sensitivity to kinase inhibitors and tissue expression [10][11][12][13][14][15]. Therefore, it is important to understand the role of each p38 in the cell's stress response. However, still little is known about the molecular mechanisms behind the paralogs' individuality.
The four p38 vertebrate paralogs originated by a tandem duplication that led to the presence of p38β and p38γ genes. This latter was followed by segmental duplication that resulted in two more paralogs, p38α and p38δ [6]. Two other closely related groups of kinases, also known for their role in the stress response, are the c-Jun N-terminal kinase (JNK) in animals and Hog1 in fungi. p38, JNK and Hog1 kinases share a similar domain structure and are thought to have originated from the same ancestral protein [16,17]. The most conserved parts of these proteins correspond to (1) an ATP binding site, (2) the docking site and the common docking (CD) motif, (3) an activating loop and (4) a substrate-binding site [18]. The docking site and the activating loop determine the specificity of a MAPK to its substrates and activators. The activating loop also contains a dual phosphorylation motif that is highly conserved in p38, Hog1 (TGY) and JNK (TPY). The dual phosphorylation of threonine and tyrosine by MAPKK is characteristic of the stress-activated kinases and is essential for their enzymatic activity [13,19,20]. Additionally, there are other motifs that are specific for certain kinases. For example, p38γ contains a PDZ domain-binding sequence at its C-terminus [21]. Less studied are the lipid binding sites that have been described, for example, in p38α. They are thought to be involved in binding specific lipid molecules and modulating the kinase's activity [22].
Despite its importance, the origin and the evolutionary history of the p38 protein family remain unclear. Existing phylogenetic studies of MAPKs are limited to a narrow set of species, rarely outside of the vertebrate group. Moreover, the prevalence of p38α studies over studies that include the other p38 paralogs further limits our knowledge of the overall network of interactions between different players in the p38-mediated stress response. To fill these gaps, we performed a taxon-rich bioinformatics survey, followed by comprehensive phylogenetic and sequence evolution analyses of the p38 kinase family. Our data show that p38 stress signalling originated before Metazoa. In particular, we identified p38/Hog1 homologs in the genomes of several unicellular relatives of animals, including Choanoflagellata, Filasterea and Teretosporea. We further characterized expression and phosphorylation patterns of the p38 homolog in the filasterean Capsaspora owczarzaki under osmotic stress conditions. Our work shows that p38-mediated stress response in animals might have originated from a relatively simple mechanism activated in the osmotic stress and later, in animals, highly diversified to react to a multitude of stimuli. We hypothesize that this diversification likely happened due to the evolution within a specific 'hotspot' in the protein's sequence that we here describe. This work improves our understanding of the molecular basis behind the complex network of p38 signalling and the role of paralog individuality in it.

Stress kinases in unicellular relatives of animals
To elucidate the origin of p38, we first performed a bioinformatic survey across all major eukaryotic lineages and then inferred a protein phylogenetic tree (electronic supplementary material, table S1). Our data indicate that p38 and JNK are present in most Metazoa, including the early branching phyla Porifera, Placozoa and Cnidaria. We did not find any sequences in seven species of ctenophores (see Methods). It might indicate the loss of p38-like kinases in this lineage; however, more thorough sequencing efforts and higher quality genome assemblies are required for making any final conclusions. Moreover, we also retrieved homologs in several unicellular relatives of animals, namely Choanoflagellata, Filasterea and Teretosporea (Ichthyosporea + Corallochytrea) (figure 1). Interestingly, these lineages are the only nonmetazoan eukaryotes whose genomes encode p38 homologs. The corresponding sequences branch in an intermediate position between the animal p38s and JNKs and the fungal Hog1 kinase. Interestingly, the genomes of choanoflagellates encode two p38-like paralogs (named here as choano-1 and choano-2) that makes them the only lineage outside Metazoa with two p38-like proteins. The fact that the choano-1 paralog clusters outside of p38/JNK and Hog1 groups may suggest an ancestral duplication before the divergence of Holozoa and Holomycota. However, we cannot exclude the scenario of a lineage-specific duplication, for example, choanoflagellates were shown to undergo lineage-specific gene loss and gain after the split with the animal ancestor [23]. The homolog from Parvularia atlantis, a representative of a recently described nucleariids lineage [24], clusters within the fungal Hog1 group. Based on this tree, we hypothesize that the last common ancestor of Opisthokonta (Holozoa and Holomycota) possessed an ancestral protein Hog1/p38 and that it diverged into the Hog1 gene (in Nucleariida and Fungi) and an ancestral p38like protein in the root of Holozoa (i.e. the clade comprising animals, choanoflagellates, filastereans and teretosporeans). The latter evolved into p38 and JNK kinases within the animal clade.
We also retrieved several MAPKs from unicellular relatives of animals (orange and turquoise branches in figure 1). In particular, choanoflagellates, filastereans and teretosporeans have MAPKs from all major groups (electronic supplementary material, table S2) except MAPK4 and MAPK6. The choanoflagellates seem to have the highest content and diversity of MAPKs among unicellular relatives of animals.
Despite the observation that many invertebrates possess only one gene coding for the stress-induced MAPK, there are several cases of lineage-specific duplication. The two well-known examples are Drosophila melanogaster and Caenorhabditis elegans. The genome of D. melanogaster encodes three p38-like paralogs. One of them, p38c, is characterized royalsocietypublishing.org/journal/rsob Open Biol. 13: 220314 by a very long branch on the tree, reflecting a specific role of p38c kinase in fly physiology. Indeed, p38c is known to have a dual functionality: it participates in the stress response typical to p38 group and has a novel function in lipid metabolism [25]. PMK-3 from C. elegans is another paralog with a long branch, and it was shown to be a part of a novel mechanism of survival [26,27]. In this study, we found five more examples of highly diverged p38 paralogs in invertebrates (table 1).
Interestingly, p38γ-p38δ and JNK show longer branches, indicating a higher level of divergence, which is in agreement with Brewster et al. [28], who suggested that p38 kinases and Hog1 retained more similarity with the ancestral protein than JNK. Furthermore, in this study, we have identified a fifth p38 paralog in vertebrates that clusters together with p38γ and p38δ. We will call it p38γ/δ here.

p38 evolution in vertebrates
To better understand the p38 paralogs' fate, we inferred a phylogenetic tree with an extended dataset for vertebrates.
We included species relevant for our study such as whales and elephants due to a link between large body size and cancer resistance [29][30][31] and bats or naked mole rat as animals with a special immune system [32,33]. Altogether, we collected 294 protein sequences from 63 species (electronic supplementary material, table S3). The tree (figure 2) shows that additional rounds of whole genome duplication (WGD) in bony fishes and Xenopus resulted in the higher number of p38 paralogs in these lineages, with a diverse loss and retention pattern of the duplicated genes (table 2). For example, we observed a distinct trend in bony fishes, where only additional copies of p38α and p38γ were retained, but not of p38β and p38δ. The genomic location of the corresponding genes (they are distributed over two chromosomes in pairs p38α-p38δ and p38β-p38γ [2]) suggests that this is a selective retention rather than a random loss of the chromosomal segment. Salmo salar has four p38γ genes probably due to the additional round of WGD in salmonids. Cyclostomata is an evolutionarily interesting lineage-it is thought to undergo only one round of WGD and split from the royalsocietypublishing.org/journal/rsob Open Biol. 13: 220314 vertebrate ancestor before the second round of WGD took place [34]. This group is represented by lampreys and hagfish. We did not find any p38 coding genes in the currently available sequencing data of hagfish. Lampreys seem to have two p38 paralogs: Petromyzon marinus has one paralog that is a sister branch to p38α-p38β cluster, and another is a sister branch to p38γ-p38δ cluster. Lethenteron camtschaticum has two paralogs branching within the p38α-p38β cluster. It is likely that lamprey's p38 genes evolved from the ancestral p38β-p38γ gene unit before it duplicated in the ancestor of vertebrates. Petromyzon marinus retained both paralogs and L. camtschaticum probably underwent a lineage-specific loss of the p38γ-like gene and a duplication of the p38β-like gene (electronic supplementary material, figure S1). Macrostomum lignano Platyhelminthes 2 PAA60852.1 a Only shown when the branch length for the diverged paralog is at least twice longer than for the more conserved paralog.  royalsocietypublishing.org/journal/rsob Open Biol. 13: 220314 Another interesting finding is the new fifth paralog p38γ/ δ present in amphibians, bony fishes, sharks, and a representative of ray-finned fishes Lepisosteus oculatus, but absent in mammals, birds and reptiles (table 2). p38γ/δ has most of the kinase active sites conserved (electronic supplementary material, figure S2) and likely participates in p38-mediated stress response as other p38 paralogs. However, its absence from mammals, the most popular and studied group of animals, made it unnoticed until now.
It is possible that p38γ/δ may be a lineage-specific duplication. However, considering its diverse distribution in distinct animal taxa, it is most likely that p38γ/δ was lost from mammals, birds and reptiles. Interestingly, synteny analysis of all five existing paralogs indicated similarity between p38β-p38γ pair and p38γ/δ genes: the position of plexin-b2 gene upstream of p38β is conserved in animals [6] and similarly plexin-b1 gene is located upstream of p38γ/δ, although in the reverse direction (electronic supplementary material, figure S3). Plexin b1 and b2 genes are close paralogs and their evolution is intertwined, like that in the p38 family. Taking into consideration the specifics of p38 gene family evolution, i.e. their paired localization on two chromosomes ( p38β with p38γ and p38α with p38δ) [2], we suggest that p38γ/δ emerged after duplication of the p38β-p38γ gene unit within the second WGD event in vertebrate lineage (see details in electronic supplementary material, figure S1).
As discussed in the previous section, p38γ and p38δ tend to have longer branches (figure 1), which suggests that these p38 paralogs underwent evolution with a higher substitution rate. To further test this hypothesis, we performed an analysis for positive selection using the codeml program from the PAML package. We used a tree with only mammalian sequences to avoid any possible artefacts due to the dataset bias. The results indicated the possibility for positive selection for p38γ ( p-value based on χ 2 distribution is 0.02) and p38δ ( p-value based on χ 2 distribution is 2.65 × 10 −6 ), but neither for p38β nor for p38α. Strikingly, amino acids that contributed to positive selection play a role in the p38δ substrate binding site and in the hinge site of domain rotation during kinase activation (numbers 5 and 7, figure 3). This might indicate that p38γ and p38δ were retained in the mammalian genomes not only as the 'back-up' for p38α, but also acquired individual features that could allow binding to specific partners.
In general, all four p38 kinases demonstrated high sequence conservation. Functionally important sites are also conserved in the species where we expected to see differences (naked mole rat, bats, mammals with large body size). Nonetheless, we noticed a trend of N-terminal truncation in p38β and p38δ in the common minke whale Balaenoptera acutorostrata and in p38γ of the African bush elephant Loxodonta africana. In addition, the genome of L. africana lacks the gene coding for p38β. This might be a result of the genome annotation inaccuracies; however, the p38α sequence in both species is complete and conserved, including its N-terminal end. This latter fact might suggest that high-weight animals have, indeed, undergone p38 paralog-specific sequence diversification and paralog loss. However, this hypothesis should be tested in further studies.

Paralog diversification and evolution in mammalian p38s
To understand whether different p38 paralogs could exhibit distinct activities in the cell, we studied paralog sequence individuality in mammals. For this, we searched for paralog-specific amino acid (AA) substitutions that could indicate specific functions. Our data show that most of the functional sites are highly conserved in all four paralogs, except the ATP-binding and the docking regions around the hinge point (after activation by dual phosphorylation the N-and C-terminal domains of p38 rotate around the hinge point). Interestingly, this region seems to be a 'hotspot' for paralog diversification and likely affects each paralog's substrate specificity and ATP binding efficiency. We highlighted 12 amino acid substitutions as the most probable to affect protein conformation and/or function (figure 3). Half of these sites are conserved between p38 subgroups, i.e. p38α and p38β share one variant and p38γ and p38δ share a second variant. The other half of the sites differ between all paralogs. The most intriguing sites lie within the hinge point of p38δ (number 5: G → Q) and a nearby site involved in royalsocietypublishing.org/journal/rsob Open Biol. 13: 220314 substrate binding (number 7: N → Q), both of which were detected as sites of positive selection by the codeml analysis. Although these amino acid changes are semi-conservative in chemical nature and should not disrupt protein folding, the substitution of the conserved glycine can result in changes in electrostatic interactions, for example, interactions with phosphate groups [35]. In addition, due to its conformational flexibility, glycine is often present in structural turns where its substitution to any other amino acid is disfavoured. Similarly, N → Q change can alter kinase activity in specific structural three-dimensional contexts [35]. p38γ also carries a substitution in substrate binding site (number 7: N → G) that is generally considered neutral but can influence the activity in some kinases. Thus, we can suggest that these AA substitutions within the hotspot region may be the basis of p38 paralog-specific roles in the cell. Possible effects of the 12 highlighted AA substitutions are summarized in table 3.
Overall, the variation in the hotspot region can be defined as mild, i.e. substitutions happen between AA with similar biochemical characteristics. Therefore, such substitutions are not expected to inhibit or alter the kinase's function. At the same time, they have a potential to modulate efficiency of ATP binding, substrate binding specificity and affinity in each p38 paralogous protein. In this way, each paralog would be tuned to its specific target or regulator in particular cellular pathways or contexts. Future experiments using sitedirected mutagenesis could show if the differences in the near-hinge site region can be indeed used to characterize each p38 paralog's individuality. At least one such experiment has been already done, where changing the threonine within the ATP-binding site (number 3 in figure 3a) of p38α and p38β altered the sensitivity of these kinases to an ATP-binding inhibitor [50].
Next, we searched for conserved substitutions in the fifth p38γ/δ paralog, which is absent in mammals. Sequence comparison of p38γ/δ to mammalian p38 paralogs revealed that amino acids within the ATP binding site are conserved between p38γ/δ and p38γ and p38δ (electronic supplementary material, figure S2). At the same time, it has unique substitutions within the hotspot site: in the docking and substrate binding regions (numbers 4-9 in electronic supplementary material, figure  S2). This shows that all five paralogs can be distinguished based on the comparison within this region. For example, the A → Q substitution in the site associated with the docking function (number 5 in electronic supplementary material, figure S2, homologous to number 6 in figure 3a) where alanine, generally non-reactive and hydrophobic, is changed to polar glutamine with a large side chain. Similarly, histidine (positively charged) is changed to isoleucine (uncharged, hydrophobic), which is a   royalsocietypublishing.org/journal/rsob Open Biol. 13: 220314 non-conservative change and can likely affect docking specificity or strength (number 9 in electronic supplementary material, figure S2, and in figure 3a). p38γ/δ also has conserved substitutions of the two docking AAs outside of the hotspot, unlike other p38 paralogs (numbers 12 and 15, electronic supplementary material, figure S2). This suggests that p38γ/δ may have evolved the capacity to interact with new substrates and/or participate in different pathways and respond to new stimuli. Further experiments in understanding the cellular functions of p38γ/δ and its role in stress response will provide interesting insights into the role of the p38 family in animal physiology.
Our consensus comparison approach can be used for analysing sequence evolution in any other p38 homologs. For example, we looked at the invertebrate p38s. Surprisingly, they, too, exhibit high sequence conservation, especially in the docking site, in the amino acids surrounding the TGY motif, and among amino acids involved in the structural interaction with the phosphorylated threonine of the TGY motif. However, invertebrate p38s show higher diversification within the ATP-binding site, especially the p38 proteins that cluster with p38γ and p38δ.
Sequence comparison analysis of animal p38 kinases shows that all paralogs have lineage-specific AA substitutions. Many of these substitutions are positioned at the surface of the folded p38s as predicted by three-dimensional structure analysis in I-TASSER (electronic supplementary material, table S4). Therefore, these substitutions can be potential candidates for further functional studies of the p38 kinases. A particular case is the PDZ binding region that is conserved in p38γ kinase in all the vertebrates analysed here but not present in any other paralogs. This domain is involved in protein interactions and might be important for p38γ functions by forming different protein complexes. Interestingly, lipid binding sites that have been described based on the crystal structure of p38α kinase [22] are well preserved in all five p38 kinases in all lineages with a rare exception (in vertebrates, out of 16 positions 9 are identical and 5 are changed to the same chemical AA type). This suggests the importance of the lipid binding in Table 3. Paralog-specific amino acid substitutions in p38 functional sites. figure 3 substitution p38 paralog a associated function possible effect reference royalsocietypublishing.org/journal/rsob Open Biol. 13: 220314 p38 regulation and, therefore, highlights the need of future functional studies in this direction.

Paralog-specific amino acid motifs of p38/Hog1-like proteins in unicellular relatives of animals
The uncertain position of the p38-like proteins of unicellular relatives of animals in the phylogenetic tree does not allow us to discern whether those homologs are 'animal-like' p38 kinases or 'fungi-like' Hog1. Since the binding activity of proteins is often defined by just one or a few amino acids, changes in active sites are difficult to trace in a phylogenetic tree. Therefore, we performed additional sequence comparison analysis, now including sequences from choanoflagellates, filastereans, corallochytreans and ichthyosporeans in order to determine whether they can be assigned to one of the p38, JNK or Hog1 groups ( figure 4). Strikingly, all homologs from the unicellular organisms preserve the TGY phosphorylation motif characteristic to p38 and Hog1. This confirms that these proteins belong to p38/Hog1 kinases and distinguishes them from other MAPKs. Interestingly, p38-like proteins from unicellular relatives of animals share high similarity with both animal and fungal stress kinases in several key functional sites ( figure 4). Lipid binding sites show low level of conservation as compared to animals, with only half of the sites conserved. Overall, the sequence similarity analysis shows mosaic conservation between p38, JNK, Hog1, choanoflagellate choano-1 and choano-2 and filastereans-ichthyosporeans-Corallochytrium (f-i-C). That is, homologs from the unicellular relatives of animals share similarities with p38 in some sites and with JNK or Hog1 in others (blue ovals in figure 4). Such pattern implies that all stress-activated kinases evolved from an ancestral protein that originated before the split between Holozoa (metazoans and their closest unicellular relatives) and Holomycota (fungi and their closest unicellular relatives). Moreover, functional diversification of all p38 kinases possibly occurred due to amino acid substitutions in the substrate binding and docking sites in the near-hinge site (the hotspot), since the variation within this region is increased in the p38 kinases from the unicellular holozoans, as well as in animals.
It is likely that the f-i-C group, the animal p38, and the fungal Hog1 group diverged less from the ancestral kinase than the JNK and both choanoflagellate clusters. The evolutionary distance between the choanoflagellate choano-1 and animal p38 kinases is the longest among all comparisons, except with Hog1 (table 4). In agreement with the tree in figure 1, sequence analysis also indicates that choanoflagellate choano-2 shares more similarities with p38 rather than with Hog1. However, sequencing of more species within the lineages of the unicellular relatives of animals and the basal metazoans is needed to understand the evolutionary history of these paralogs in choanoflagellates. Because JNK stands out from the whole group as the most diverged paralog, we propose to call the orthologues from the unicellular holozoans p38/Hog1-like kinases.

p38/Hog1 protein of Capsaspora owczarzaki reveals functional similarity to the fungal Hog1
To better understand the origin and evolution of the stressactivated kinases, we performed stress studies on C. owczarzaki cultures. The filasterean C. owczarzaki is one of the most studied unicellular relatives of animals. Its p38-like homolog clusters within the f-i-C group on the tree (figure 1) and shares sequence similarities with p38, as well with Hog1. It is known that all three types of kinases-p38, JNK, and Hog1-can be activated by osmotic shock and it has also been shown that expression of p38 and JNK can rescue Hog1 knockout in yeasts [5]. Therefore, we subjected aggregative C. owczarzaki cells to osmotic shock by exposing them to 0.1 M NaCl concentration in the culture medium. Next, we used the qPCR technique to measure mRNA expression levels of six C. owczarzaki genes, whose homologs in animals or fungi are known to regulate different types of stress response. In particular, we selected glycerol-3-phosphate dehydrogenase (GPD1), heat shock protein 90 (Hsp90), MYC, apoptosis-inducing factor (AIF), NF-κB and the p38/Hog1-like homolog identified in this study (further called COp38/Hog1). In this stress condition, C. owczarzaki cells seem to activate a pro-survival mechanism by downregulating the expression of AIF mRNA 45 min after the stress induction and upregulating the expression of Myc and NF-κB mRNA after 3.5 h (figure 5a). We also observed an increase in the expression of GAPDH mRNA after 45 min post-induction. GAPDH gene is commonly used as a house-keeping gene, but its regulation was affected by stress, which is in agreement with previously published studies [51][52][53].
Intriguingly, COp38/Hog1 and GDP1 mRNA expression was also upregulated after 45 min of incubation under the osmotic stress conditions and decreased at 85 min (figure 5a). Upregulation of GPD1 downstream of Hog1 is specific to fungi in the osmotic stress conditions [28]. GPD1 is involved in Sln1-regulated pathway, which in yeasts mediates osmotic stress response and results in synthesis of glycerol. By accumulating glycerol, fungal cells compensate for the increased salinity in the environment. Accordingly, C. owczarzaki seems to have a fully conserved Sln1-regulated pathway (electronic supplementary material, table S5). Together with our results, this suggests that the species might exhibit an osmotic stress response using a mechanism similar to that described in fungi. Although upregulation of p38 mRNA expression has not been shown in animal stressed cells, we cannot rule out that this mechanism is present in C. owczarzaki either as species specific or ancestral feature. Cellular stress response in eukaryotes is mediated by signalling cascades that are activated by the sequential phosphorylation of several kinases. In order to investigate whether phosphorylation of COp38/Hog1 is also involved in the regulation of osmotic stress, we stimulated C. owczarzaki adherent cells with 0.1 M NaCl. At different times postinduction, cells were collected and protein extracts obtained. The phosphorylation of COp38/Hog1 was evaluated by western blot, using phospho-specific antibodies. These antibodies recognize the conserved TGY motif when phosphorylated, i.e. they recognize p38 kinases in their activated state. As shown in figure 5b,c, a phosphorylated 37 kD protein, corresponding to COp38/Hog1, was detected. Phosphorylation of p38/Hog1 was significantly increased at 15 min post-induction compared to control cells, but decreased after 30 min. This pattern of quick phosphorylation and subsequent dephosphorylation of p38 or Hog1 is crucial and typical of the stress response in other organisms and suggests that COp38/Hog1 is, indeed, mediating such a function in C. owczarzaki [54,55].
royalsocietypublishing.org/journal/rsob Open Biol. 13: 220314 None of the available antibodies against total p38α, p38γ, p38δ, or phospho-JNK detected any protein of the expected size in C. owczarzaki cell extracts. These antibodies were raised against mammalian proteins and it is possible that they failed to recognize the distant p38 homolog from C. owczarzaki. By contrast, the phospho-specific p38 antibody  royalsocietypublishing.org/journal/rsob Open Biol. 13: 220314 recognizes the small phosphorylated TGY motif, which is very well conserved across species. It is important to note that for the analysis of COp38/Hog1-like kinase phosphorylation under osmotic stress, we first used aggregative C. owczarzaki cells. However, aggregative cells showed a high level of COp38/Hog1 phosphorylation even in unstressed conditions (control cells not exposed to increased NaCl concentration), which is in agreement with the results of the phospho-proteomics analysis of C. owczarzaki's life stages described in [56]. The results described here show that the phosphorylated p38/Hog1 kinase in C. owczarzaki can be detected with antibodies raised against the mammalian phosphorylated protein and that its activation by osmotic stress follows a pattern of phosphorylation-dephosphorylation similar to other eukaryotic cells [13]. In combination with qPCR expression studies, we demonstrate that C. owczarzaki is likely to exploit fungi-like stress response to osmotic shock. In addition, structural comparison of the modelled three-dimensional structure of COp38/Hog1 to Hog1 and p38α as well indicates its higher similarity to yeast Hog1 (RMSD value is 1.9 and Z-score is 45) rather than to p38α (RMSD value is 2.4 and Z-score is 41.1). Similarly, it was suggested previously that some physiological and morphological features of the unicellular relatives of animals could be acquired from the common ancestor of animals and fungi [57].

Discussion
In this study, we pinpoint the origin of p38 stress kinases and their consequent evolution in animals. We also describe the molecular basis behind p38 paralogs' individuality. In particular, we demonstrate that stress-activated kinases originated in the common ancestor of opisthokonts (the group comprising Metazoa, Fungi, and their unicellular relatives). The ancestral p38/Hog1 protein evolved into p38 and JNK in animals and into Hog1 in fungi. The homologs from extant unicellular relatives of animals (choanoflagellates, filastereans and teretosporeans) show similarity to both p38 and Hog1 proteins, likely mirroring the features of that ancestral p38/Hog1 protein. Interestingly, the genomes of choanoflagellates contain two paralogs, which might be a unique feature of these organisms. Strikingly, the proteins from the unicellular relatives of animals share high similarity with animal p38s and fungal Hog1 within functionally important regions (TGY phosphorylation motif, ATP binding site, activation loop with the associated substrate binding sites). They also preserve five amino acids (lysine and four arginines) that have been shown to pull the kinase's structure into the active conformation upon the dual phosphorylation event [58]. Thus, we can assume that these newly identified p38/Hog1-like orthologues perform functions similar to those of the animal and/or fungal stress kinases.
Indeed, we showed that the dynamics of activation of p38/Hog1-like kinase in C. owczarzaki, a unicellular relative of animals, follow the pattern observed in animals and were quantified using the Odyssey infrared imaging system and represented as histograms. Values are given as mean ± SEM from two representative gels in duplicates. a.u.: arbitrary units. fungi under hyperosmotic stress. Increased mRNA of GPD1 is, however, resembling the signalling pathway in yeasts. C. owczarzaki is, therefore, an interesting example of an organism from an intermediate taxonomic position between Metazoa and Fungi, likely preserving physiological features from both the fungal ancestor and the unicellular ancestor of animals. Answering whether p38/Hog1-like kinases in C. owczarzaki and other unicellular relatives of animals indeed participate in stress response and whether they are involved in pathways more related to fungi or animals, will unravel how the stress response network evolved in animals. Therefore, based on this and supported by previous studies (see review [28]), we hypothesize that the common ancestor of Holozoa (animals and their unicellular relatives) possessed at least one MAPK-like stress kinase with functions involved in response to osmotic stress. However, it could also participate in the general stress resistance of the cell. Detection of and response to environmental changes in salinity is an ancient function serving the cell's integrity. Therefore, it is not surprising that the elements involved in its regulation are conserved between fungi and holozoans. For example, it has been shown that p38 from animals can regulate osmotic physiological response in yeasts [28]. During evolution in the animal lineage, rounds of WGD would allow expansion of gene families involved in the stress response and diversification of cells' reaction to a multitude of stress stimuli in different cell types. As a result, vertebrates have 4-5 p38 kinases and 3 JNKs. While these are very similar paralogs, they still demonstrate specificity towards inhibitors and substrates at least to some extent [13]. Our analysis identified the probable evolutionary hotspot where mutations can lead to paralog-specific activity. While substrate binding sites in the activating loop are conserved in all animal paralogs and also in p38/Hog1-like orthologues from unicellular holozoans, the amino acids involved in substrate docking and substrate binding around the hinge site show variation between the paralogs. We showed that amino acid substitutions within the hotspot are semi-conservative and could only be responsible for mild changes in the local charge or structure. This would allow keeping the overall function of the enzyme while modulating substrate specificity or binding efficiency since protein binding sites are usually defined by only a few amino acids and small allosteric changes [59]. Indeed, substrate docking in p38s was shown to be responsible for their specificity and efficiency in different cellular cascades [20]. Accordingly, it was experimentally shown that mutation of just two amino acids within the docking site could change substrate specificity of p38 to that of ERK and vice versa [60]. Therefore, the sequence evolution of p38 paralogs probably enabled the cell to enrich in a variety of responses to different stresses and in different cell types in animals. This would also enhance precision in the regulation of the kinase activity and increase the number of interacting proteins in the pathway, as similarly discussed in [2,6,28,61,62]. Such functional diversity could also have led to a very complex crosstalk between p38 and JNK kinases in different conditions and different cell types, as was shown in some studies [63].
Paralog diversification in the p38-JNK group in animals can be seen as a classical example of the phenomena where one copy ( p38α) preserves the original function and other copies have more flexibility and undergo further divergence ( p38γ, p38δ, JNK and p38γ/δ). The evolution of p38γ, in particular, resulted in the acquisition of a new binding site for PDZ-containing proteins, such as α1-syntrophin, SAP90/PSD95 and SAP97/hDlg. This binding site is essential for the phosphorylation of these proteins by p38γ under stress conditions [21,64,65]. We also described a new, fifth p38 paralog, p38γ/δ, present in several animal lineages. This protein is characterized by specific amino acid substitutions within the hotspot region, indicating that it might have a specific set of targets. This finding further demonstrates the complexity of p38 signalling in animals.
A similar strategy of paralog duplication and diversification can be seen in some invertebrates. For example, C. elegans, D. melanogaster and a few other species underwent lineage-specific duplication, and one of their paralogs is characterized by higher sequence divergence. Drosophila's p38c, for example, even lost the TGY motif typical to p38. Experiments with other long-branched paralogs described in this study will bring more understanding of p38 evolutionary history and the capacity of paralogs to acquire new characteristics. Our detailed evolutionary scenario presented here is a valuable resource for further research on the p38 family. In particular, it can help in identifying amino acids that might be relevant to understanding p38 paralog individual roles in diseases and to discovering new functionally important sites that novel therapeutic inhibitors can target.

Data collection and phylogenetic analysis
Protein sequences were collected by running blastp and tblastn search against protein database at NCBI and GenBank nucleotide database, respectively, with human p38 proteins as query (NP_002745.1, NP_001306.1, NP_002960.2, NP_002742.3). Choanoflagellate sequences were obtained from the dataset generated in [23]. Sequences from Breviatea were obtained from [66], CRuMs from [67], and S. destruens from [57]. Sequences from Ctenophora Euplokamis dunlapae and Vallicula multiformis were pulled from the Traces database at NCBI and for Pleurobrachia bachei from the Neurobase (https://neurobase.rc.ufl.edu/pleurobrachia/download), and sequences from Mnemiopsis leidyi, Hormiphora californensis, Beroe ovata and Beroe forskalii were taken from the Genome database at NCBI. Hits with more than 30% identity and 75% query coverage were selected. Incomplete and misassembled sequences were manually corrected. A multiple sequence alignment was built in MAFFT [68] using BLOSUM62 matrix and default gap penalties settings, and the alignment was manually checked and trimmed using Ugene program [69]. The phylogenetic tree of figure 1 was inferred using IQtree [70] with the model LG + G4 + F and 1000 ultrafast bootstrap replicates to test nodal support. The tree of figure 2 was inferred using RAxML [71] with the PROTGAMMALG model as chosen by performing a RAxML parameter test. Tree visualization was done in iTOLtree [72]. All trees in Newick format and all multiple sequence alignments can be found in the electronic supplementary material, files S1-S3 (can be accessed at Figshare To test whether any of the p38 paralogs in mammalian lineage underwent positive selection, we first inferred an ML tree with mammalian sequences using RAxML as described above. Next, we created an amino acid-nucleotide alignment using PAL2NAL software [73]. The test for positive selection was done using the codeml tool in PAML software [74]. We used site-branch model analysis to test each of the four branches with the following parameters: model = 2, NSsites = 2, fix_omega = 0 and omega = 1. The null hypothesis in this case was estimated under the following parameters: model = 2, NSsites = 2, fix_omega = 1 and omega = 1. For statistical assessment, twice the difference between the log likelihood of the alternative and the null (the ratio is the same for all branches) hypotheses were compared with the χ 2 distribution. The χ 2 distribution test for statistical support was used. Each estimation of alternative hypotheses was run twice by codeml with varying parameter 'omega' to test for contingency. Parameter 'cleandata' was set to 0, allowing for retaining gaps in the alignment. The input phylogenetic tree was built using RAxML as described earlier.

Comparative sequence analysis
To identify paralog specific substitutions, we first built a consensus sequence using EMBOSS Cons using 7 protein sequences for each lineage. Multiple sequence alignment was done and manually corrected in Ugene. AA substitutions were highlighted based on the analysis of AA biochemical qualities and literature search. Functionally important sites were mapped according to the conserved domains map at Conserved Domain database at NCBI [75] and previous studies [22,[76][77][78][79][80][81][82].

Synteny analysis
Gene surroundings of p38γ/δ were assessed by analysing gene maps in the genomic viewer at NCBI available at Gene database within the region of 15 kbp in the following species: Xenopus laevis, Danio rerio, Lepisosteus oculatus, Rhincodon typus.

Calculation of the evolutionary distances
Between-group mean distances were calculated in MEGAX [83], v. 10.1.7, based on the tree T1 and the corresponding multiple sequence alignment using default parameters.

Cell culture, RNA extraction and qPCR experiment
Capsaspora owczarzaki cells (strain ATCC 30864) were grown axenically in ATCC medium 1034 (modified PYNFH medium) with the addition of 10% (v/v) fetal bovine serum (Sigma-Aldrich, F9665) at 23°C. Cells were seeded in 12well plates at a density of 2.4 × 10 6 cells ml −1 and placed at agitation at 50 rpm for aggregate formation as described in [84]. At the same time point, osmotic shock was induced by adding 0.1 M NaCl (Sigma-Aldrich, S1619). NaCl solution was prepared in the culturing medium. Whole RNA was extracted from the cells in triplicate for each condition after 0 min (control), 45 min, 1 h 25 min, 3 h 45 min of incubation using Trizol (Invitrogen/Thermo Fisher Scientific, 15596026) as described in [85]. Subsequently, the samples were treated with DNAseI (Roche, 4716728001) and purified using the RNeasy minikit (Qiagen, 74104). RT-PCR was done using a SuperScript III First-strand synthesis system for RT-PCR (Invitrogen, 18080-051) with random hexamer primers, following the manufacturer's protocol. The resulting cDNA was treated with RNaseH (Invitrogen, AM2293).
Primers for each gene for the qPCR reaction were designed in Primer3 Plus [86]. The primer sequences can be found in the electronic supplementary material, table S6. Primers' specificity was tested by PCR amplification using cDNA and by generating a standard curve with cDNA dilutions 1; 0.1; 0.01; 0.001; 0.0001; 0.00001 with the iTaq Universal SYBR Green Supermix kit (Bio-Rad, 172-5121). Using the same kit, qPCR was carried out with the iQ5 multicolor Real-Time PCR detection system with iCycler™ Thermal Cycler (Bio-RAD) using Bio-RAD iQ5 software. All the conditions were tested in triplicate in addition to no-template control and a negative control with no-RT RNA. PGK1 was used as a housekeeping gene.
Data analysis and statistics test were carried with the program REST 2009 [87]. Fold change was calculated using the 2 −ΔΔCt method [88,89]. PGK1 was used as a housekeeping gene.

Defining NaCl working concentration
Capsaspora owczarzaki cell viability in hyperosmotic conditions was assessed by using the resazurin (Sigma, R7017) assay as follows. We subjected the cells to an array of NaCl

Western blot
Adherent cells were cultured as described earlier, and osmotic shock was induced by adding 0.1 M NaCl solution in the culturing media to the cells. At the indicated time points, the medium was aspirated, and the lysis buffer was added directly to the cells. The cells were collected by pipetting up-and-down and the samples were immediately frozen in liquid nitrogen and stored at −70°C until ready to proceed. Total protein extraction was performed as follows: samples were thawed on ice and sonicated on a Digital Sonifier (Branson, model 102C) at an amplitude of 15% with 15 s pulse and 30 s pause for 8 cycles on ice. The samples were then incubated on ice for 15 minutes more and centrifugated at 20 000g, 20 min, 4°C. The supernatants containing proteins were immediately frozen in liquid nitrogen and stored at − 70°C. Before freezing, aliquots were taken to measure protein concentration by the Bradford method (BioRad, 5000205) according to the manufacturer's protocol.