Genomic evidence of adaptive evolution in the reptilian SOCS gene family

The suppressor of the cytokine signaling (SOCS) family of proteins play an essential role in inhibiting cytokine receptor signaling by regulating immune signal pathways. Although SOCS gene functions have been examined extensively, no comprehensive study has been performed on this gene family’s molecular evolution in reptiles. In this study, we identified eight canonical SOCS genes using recently-published reptilian genomes. We used phylogenetic analysis to determine that the SOCS genes had highly conserved evolutionary dynamics that we classified into two types. We identified positive SOCS4 selection signals in whole reptile lineages and SOCS2 selection signals in the crocodilian lineage. Selective pressure analyses using the branch model and Z-test revealed that these genes were under different negative selection pressures compared to reptile lineages. We also concluded that the nature of selection pressure varies across different reptile lineages on SOCS3, and the crocodilian lineage has experienced rapid evolution. Our results may provide a theoretical foundation for further analyses of reptilian SOCS genes’ functional and molecular mechanisms, as well as their roles in reptile growth and development.


INTRODUCTION
Cytokines are multifunctional proteins and essential intercellular regulators that are involved in innate and adaptive inflammatory organism defense, cell development, and repair processes via different signaling mechanisms (Oppenheim, 2001). Suppressors of cytokine signaling (SOCS) are some of the most crucial feedback inhibitors in the prevention of excessive cytokine signaling and maintenance of homeostasis and normal cellular functions (Hong-Jian et al., 2008). The SOCS proteins function as negative feedback inhibitors, controlling particular cytokine signals in order to regulate cellular responses and maintain a stable environment (Linossi, Calleja & Nicholson, 2018). During the phosphorylation of signal transducer and activator of transcription (STAT) proteins, SOCS proteins helps regulate various cytokines by combining the kinase inhibitory region (KIR) and members of the Janus kinase (JAK) family (Tannahill et al., 2005). The SOCS gene family was initially identified in mammals and was comprised of eight members, including SOCS1-7 and the cytokine-inducible SH2-containing protein (CISH) (Linossi & Nicholson, 2015). Additional studies found that all SOCS family members shared a conserved structure: an N-terminal domain, a highly conserved C-terminal motif (called the SOCS box), and a central SH2 domain (Hao & Sun, 2016; and structural characteristics of reptile SOCS genes has become more attainable. The SOCS sequences extracted from reptilian genomes and those from public resources provided us with an excellent opportunity to explore reptile evolutionary selection diversity. Reptiles, the only ectothermic amniotes, have wide ranges of habitat, modes of diet, behaviors, lifespans, and reproduction (Zimmerman, 2020). It has been demonstrated that reptile body temperature cannot be kept constant and will undergo seasonal shifts with environmental temperature, and infection is strongly related to body temperature (Zimmerman, Vogel & Bowden, 2010). A previous study found that the SOCS family might have adapted to natural environmental changes (Tian et al., 2020). In this study, we investigated the evolution of SOCS genes in reptiles and detected the evolutionary selection diversity in reptile lineage types. We aimed to test our hypothesis that these SOCS genes are under adaptive evolution across reptiles to determine if different reptilian clades experienced different selective regimes.

Species and sequences
In this study, we chose 23 reptilian genomes to extract eight SOCS genes downloaded from the National Center for Biotechnology Information (NCBI). The reptilian genome information is summarized in Table S1. We obtained and directly downloaded the previously sequenced SOCS genes for several reptilian species using an online Web BLAST search and the NCBI database. We set these sequences as queries to explore reptile genomes that have not been annotated. We created a local database for each reptilian genome and used the BLASTN and TBLASTN in BLAST v2.7.1 to search SOCS encoding sequences. We used an e-value of 10 −5 as the default cut-off to confirm significant matches against the genome. SOCS sequences confirmed by the BLAST searches were applied in reciprocal balstx searches of human proteomes to improve the accuracy of the ortholog matches. The SOCS gene sequences and accession numbers are shown in Table S2.

Phylogenetic analysis of reptile SOCS genes
The 260 reptilian SOCS gene sequences were aligned based on their amino acid translations using Multiple Sequence Comparison by Log-Expectation (MUSCLE v3.8.31) (Edgar, 2004). The phylogenetic relationship of the reptile SOCS genes was established using RaxML v8.2.12 with 1,000 bootstrap replications. We applied jModelTest (Posada, 2008) with Akaike Information Criterion (AIC) to test the most suitable nucleotide substitution model, and determined that the GTR + Γ model was the most appropriate for detecting the evolutionary relationship across the eight SOCS gene family members. Finally, we used the iTOL online software (http://itol.embl.de) to visualize and beautify the phylogenetic tree.

Evolutionary pressure analysis
Positive Darwinian selection pressures acting on genes are usually determined by calculating the nonsynonymous (dN)/synonymous (dS) substitution ratio (ω) between homologous protein-coding gene sequences. Very simply, ω (evolutionary rate) >1, <1, and =1 represent positive selection, negative selection, and neutral evolution, respectively. The estimated that the ω ratio between homologous protein-coding sequences is a powerful symbol of positive selection at the molecular level. Phylogenetic Analysis by Maximum likelihood (PAML) is a program package used for phylogeny-based analysis to estimate molecular evolution, and its program CODEML determines positive and purifying selection sites or branches (Yang, 2007). Based on the sequence alignments and phylogenetic trees downloaded from TimeTree (http://www.timetree.org/), we carried out the selective force imposed on reptilian SOCS genes using a codon-based codeml PAML 4.9 d program (Yang, 2007). Several analyses were implemented to test the hypothesis that SOCS genes experienced natural selection across reptile species.
To determine the signatures of natural selection on SOCS genes in extant reptiles, we used the site model in codeml to explain the different functions and structure constraints undergone by amino acid sites. The M7 (model = 0, NSsites = 7)/M8 (model = 0, NSsites = 8) pair of codon-based models, which allowed the ω to vary across sites but not across lineages, was included in the site model. This pair had twice the difference compared to the log-likelihood values, and we applied a Chi-squared distribution to estimate the significance. The posterior probabilities (PP) were calculated using empirical Bayes analysis of positive selection sites in the M8 model. Additionally, we used the fixed-effect likelihood (FEL), single likelihood ancestor counting (SLAC), and mixed-effects model of evolution (MEME) in HyPhy  to detect the positive selection sites in the Datamonkey web server. Sites with PP > 0.95 for the M8 model, and a P-value < 0.1 for the FEL, SLAC, and MEME models were considered to have undergone positive selection. HyPhy software packages provided better analysis power and additional advantages for our study compared to PAML (Bulmer & Crozier, 2006;Kosakovsky Pond & Frost, 2005).
We calculated the entire mean rate of dN and dS substitutions in the SOCS coding sequences using the Z-test of selection in MEGA 5.2, the Nei and Gojobori method with Jukes-Cantor correction, 95% site coverage cut-off, and 1,000 bootstrap replicates. We then utilized branch models in the codeml program to determine whether there were differences in the selective forces acting on SOCS genes in diverse reptilian lineages. The branch model permits variable ω ratios across lineages, but changeless ω ratios in the sites, and it could be applied to detect changes in selective pressures in specific branches (Yang & Nielsen, 2002). Several targeted branches were set as one foreground branch, and the others were assigned as background branches. For this, we processed a null one-ratio model (model = 0, NSsites = 0) that estimated the same ω for all branches against the two-ratio model (model = 2, NSsites = 0) that estimated a variable ω in a specific branch using a likelihood ratio test (LRT). A P-value < 0.05 was selected to reject the null one-ratio model and evaluate the significance of the alternative hypothesis. In order to further compare the evolutionary rates of the eight SOCS genes in response to divergent reptile clades, we used the Clade model C (CmC, model = 3, NSsites = 2), which allows codon sites to evolve discrepantly along with the clade (Baker et al., 2016). The intensity of CmC selection was permitted to differ across clades through the use of a different ω for each clade. SOCS gene sites that had undergone positive selection along each branch were further identified using the branch-site model. The branch-site model identified positive selection at specific sites along the specific lineages (Zhang, Nielsen & Yang, 2005) and on a few sites in a few branches. Finally, the likelihood ratio test was utilized to contrast a null model Ma0 (model = 2, NSsites = 2, fix-omega = 1, omega = 1) and a model Ma (model = 2, NSsites = 2, fix-omega = 0, omega = 1) of positive selection pressures on the foreground branch.

Recombination and motif composition analysis
Recombination analysis of eight SOCS gene encoding sequences was performed using GARD in the Datamonkey web server. The recombination of genes may mislead the phylogenetic estimation process and distort following inferences based on inferred phylogenesis, and there will be a high false-positive rate when the sequence being analyzed undergoes recombination (Kosakovsky Pond et al., 2006). The maximum χ 2 method was employed to estimate the likelihood of recombination events and to explore putative break-points within SOCS genes. Conserved motifs analysis was performed using the Multiple Expectation Maximization for Motif Elucidation (MEME) online program (http://meme-suite.org/) to obtain information about the similarity and motif distribution of SOCS genes. The parameters applied in the analysis were as follows: minimum motif width = 6 and maximum motif width = 200.

Phylogenetic analysis
We integrated the SOCS gene family sequences from different species and reptilian orders for phylogenetic analysis, and constructed an ML phylogenetic tree using RAxML with the full length of the SOCS gene sequences. Results showed a distinct and characteristic gene category classification pattern in which the orthologs clustered closer together compared to other closely-related members of the SOCS gene family (Fig. 1). SOCS gene family trees were rooted with the CISH gene, and their topologies were categorized into two groups, SOCS types I and II, indicating a high similarity across each classification. We also examined the phylogenetic relationship across SOCS genes and found that the evolutionary tree could be generally divided into four main clades: Serpentes, Sauria, Crocodilia, and Testudines (Fig. 2).

Identification of SOCS gene selection pressure
The sites' selection pressure in their codon alignments were estimated by comparing the M7 and M8 models using the codeml program. Three site-based selection measures on the Datamonkey webserver (FEL, SLAC, FUBAR) were used to detect these sites. Considering each method's randomness, only sites estimated by at least two methods were regarded as significant. FEL was effective at capturing rate variation and contained less false positive selection in small datasets . SLAC is a conservative method suitable for large data sets, but had flaws in its substitution rate estimation . FUBAR was faster at detecting positive selection and relaxed the restrictions of other models (Murrell et al., 2013). Positive signals in SOCS4 were detected by SLAC, FEL, and FUBAR methods in Datamonkey, indicating reliable adaptive evolution signals for SOCS4. However, positive signals were not strong for SOCS1, SOCS3, SOCS5, SOCS6, or SOCS7, and all the sites were only detected using one method (Table 1). We estimated that episodic selection or provisionally changed the bouts of selection in  reptilian lineages using MEME on the Datamonkey web server at 0.05 significance.
Reptilian phylogeny detection showed that SOCS4 possessed the maximum sites and underwent episodic selection, followed by SOCS5, SOCS1, CISH, SOCS2, and SOCS3. SOCS6 and SOCS7 showed no episodic selective sites. The Z-test indicated that for each SOCS gene, the mean dN was smaller than the mean dS with a P-value < 0.01, implying significant purifying selection (Table S3). To further explore the prevailing selective action of the different reptile lineages across the eight SOCS members, we used the SOCS gene subsets Serpentes, Sauria, Crocodylia, and Testudines to estimate positive selection with branch and branch-site models in PAML. The branch model was performed by testing the one-ratio model (M0) vs. two-ratio model (M2) hypothesis in order to confirm lineage-specific adaptive events. The ω value (evolutionary rates) for all branches under M0 was less than 1, confirming that all SOCS genes underwent purifying selection, which was identical to the Z-test result. The M2 was implemented to examine whether the diverse foreground reptilian order lineages underwent dissimilar selection pressures compared to the background lineages. LRT indicated that M2 was more suitable for several lineages in each SOCS compared to M0 (P < 0.05) ( Table 2). The ω values for the Serpentes lineages of CISH, SOCS4, and SOCS6; Sauria lineages of SOCS2, SOCS4, and SOCS7; Crocodylia lineage of SOCS3; and Testudines lineages of SOCS1, SOCS5, and SOCS7 were all significantly less than one, revealing that these SOCS genes had undergone forceful purifying selection. Furthermore, the Clade model C (CmC) was implemented to estimate if the Squamata, Crocodylia, and Testudines clades underwent different selection. The results showed that the SOCS genes (except SOCS3 (ω squamata = 0.05807; ω Crocodylia = 0.36627; ω Testudines = 0.07415, P = 0.0270)) were not significantly better when compared to the M2a_rel model (P < 0.05) ( Table 3). Because the branch model simply compares mean ω values for whole gene sequences rather than for one specific site, we used the branch-site model to estimate whether positive selection was acting on particular sites in different reptilian lineages. On the SOCS phylogenetic trees, we recognized several (foreground) branches with positive selection sites, but only the Crocodilian lineage in SOCS2 had significant levels (P < 0.01) ( Table 4).
GARD was used to estimate putative recombination events, and the results showed no sequence exchanges or putative recombination events between the studied SOCS genes. The recombination results are shown in Table S3. The conserved SOCS protein motifs in reptiles were analyzed using MEME online software suite to detect the similarities and diversity in motif composition. We identified the conserved region across eight SOCS genes, and the results showed that all SOCS genes shared two of the same motifs (Fig. S1).

DISCUSSION
An organism's innate immunity acts as the first line of defense against infection (Dalpke et al., 2008). Over recent years, a series of studies on SOCS-deficient mice proved the significance of SOCS-mediated regulation of immunological and other crucial cellular   (Banks et al., 2005;Lukasz et al., 2014;Metcalf et al., 2000;Naka et al., 1998).
Previous studies also suggested that SOCS proteins are essential physiological regulators of both adaptive and innate immunity (Akihiko, Tetsuji & Masato, 2007). Reptiles share a common ancestor with mammals and hold an important amniote phylogeny position (Deakin & Ezaz, 2019). SOCS genes are important for reptile immune systems by enabling their adaptation to life in different environments. Although functional studies have explored the SOCS family's crucial role in mammals, current knowledge of the SOCS gene repertoire in reptiles and its evolution is limited. The first reptile whole-genome sequence was the green anole lizard (Anolis carolensis) (Alföldi et al., 2011). The genomes of multiple species of turtles, snakes, lizards, and crocodiles have also been sequenced over the past decades, providing us with convenient conditions for analyzing the molecular evolution of reptiles using bioinformatics. In this study, we used genome-wide analysis to explore reptilian SOCS genes, as well as their phylogenetic relationship and adaptive evolution. To our knowledge, our study is the first comprehensive overview of the SOCS gene family within the reptilian genome. Previous studies found that the SCOS family is comprised of eight members in mammals, and a second classification, type II (CISH, SOCS1-SOCS3), was added through two rounds of whole-genome duplication from a single precursor (Wang et al., 2019). In our study, we identified a total of 260 SOCS gene family sequences based on reptile genomic analyses. We identified eight intact SOCS family members in reptiles, the same  number found in mammals, suggesting that the SOCS family expanded during the two rounds of whole-genome duplication (Dehal & Boore, 2005). This proved our hypothesis that reptile and mammal SOCS family members are conserved and also that reptiles and mammals shared similar common components in their immune systems (Zimmerman, 2020). We constructed an extensive phylogenetic tree from the SOCS gene coding sequences across the examined reptiles in order to analyze the evolutionary relationships. We found close relationships between SOCS4-SOCS7, which was consistent with previous studies (Linossi, Calleja & Nicholson, 2018). We examined the phylogenetic relationship across eight SOCS genes and found that the reptile SOCS genes could be classified into two groups: type I (SOCS4-SOCS7) and type II (CISH, SOCS1-SOCS3). When the SOCS7 gene tree was compared to the traditional species tree we found that genes' phylogenetic proximity coincided with the morphological taxonomy (Fig. 2). The SOCS gene family evolved following the phylogeny of reptiles, confirming the crucial role of the SOCS genes. We performed motif identification using MEME analysis and found that diverse reptilian lineages shared high conservation in a two-motif structure of SOCS gene family members, which was consistent with the previously conserved SOCS gene structure (Hao & Sun, 2016). Previous studies confirmed that these two domains are necessary for the routine functions of the SOCS family (Fujimoto & Naka, 2003). These similarities indicate that SOCS genes are conserved in reptiles and mammals. Therefore, we considered that the SOCS gene sequences were highly conserved due to their essential role in regulating cytokine and growth factor signaling, and that reptile differentiation may be a significant driving force in the evolution of the reptilian SOCS gene family. Positive selection is a crucial driving force in gene evolution in both function and structure, and the authentication of positive selection at the molecular level is important to the field of evolutionary biology (Vitti, Grossman & Sabeti, 2013). In this study, we focused on the selection test of SOCS genes on the whole reptilian phylogeny in order to estimate the adaptive evolution pressures acting on reptiles. Positive selection in SOCS4 was detected by at least two methods, indicating SOCS4's strong adaptive evolutionary signals. No significant positive signals were detected in the remaining SOCS genes. SOCS4 has been reported to participate in HIF-1a regulation and acts as an adaptive mechanism to hypoxia (Kamura et al., 2000). Moreover, SOCS4's convergent evolution among yak and Tibetan antelope was detected by a convergent signature and phylogenetic analysis, which might explain their high-altitude adaptations (Wang et al., 2015). Reptiles are ectotherms distributed across various environments such as marine, fresh-water, mountains, and flatlands (Zimmerman, Vogel & Bowden, 2010). Our results support the possibility of rapid SOCS4 evolutionary rates of reptiles when adapting to diverse environments.
Given reptiles' wide distribution, we used branch model analyses to further explore whether positive selection acted on specific reptile lineages. The ω values for all of the target reptile branches were less than one, indicating that the primary force shaping SOCS gene evolution was purifying selection. Our study estimated the SOCS genes' evolutionary tendencies under diverse selective pressures with the evolution of reptiles. SOCS genes in whole reptilian lineages maintained the protein structure in purifying selection (Mukherjee et al., 2009), and their crucial function acts as intracellular negative physiological regulators on cytokine and growth factor signaling (Croker, Kiu & Nicholson, 2008;Kershaw et al., 2013;Shuai & Liu, 2003). Although strong purifying selection pressures on SOCS genes have been detected in reptile lineages, the branch-site model analysis showed that two sites in the Crocodilian lineage on SOCS2 were under positive selection pressure. This demonstrated that there were discrepant selection pressures at different sites. SOCS2 is involved in cell growth and several inflammatory disorders (Letellier & Haan, 2016), and SOCS2 in Eriocheir sinensis has been shown to be associated with immune defense responses (Dalla Valle et al., 2009). SOCS2 has been found to undergo natural selection in Egyptian chickens when compared to Sri Lankan chickens, which might be due to the adaptation of Egyptian chickens to the arid, hot, and dry habitat (Walugembe et al., 2019). Moreover, SOCS2 has also been detected in selection signals of different high-altitude sheep, which might support Tibetan sheep when adapting to extreme environments of high altitude and ultraviolet radiation (Wei et al., 2016). These studies suggest that SOCS2 plays an essential role in the adaptation to a specific environment. Crocodilians are large, semiaquatic reptiles with strong tails and thecodont dentition used for hunting (Vickaryous & Gilbert, 2019). We speculate that the positive selection sites in Crocodilia might be linked with their semiaquatic habitat, which contains more environmental pathogens compared to that of their terrestrial relatives. However, the branch-site model analysis did not find any significant positive selection evidence for the remaining seven SOCS genes, which might be due strong purifying selection's masking effect (Zhi-Yi et al., 2018). Differences in SOCS genes across reptiles may also reflect differences in selection pressure. The CmC model results suggested that SOCS3's selection pressures were significantly different (P < 0.05) within divergent clades of reptiles, with different evolutionary rates identified in different lineages. This proves our hypothesis that SOCS family members are subject to different selection pressures in different reptilian clades. SOCS3 plays an essential role in modulating the outcomes of infections and autoimmune diseases by binding to both JAK kinase and cytokine receptors. Previous studies found that both the cetacean and reptilian TLRs evolved in response to environmental adaptations and rapid diversification (Shang et al., 2018;Shen et al., 2012). It has been shown that SOCS3 can be induced in innate immune cells, and SOCS proteins can act as direct inhibitors of TLR signaling, suggesting that they play an essential role in innate immunity (Baetz et al., 2004). Different reptile orders' immune systems differ in terms of their evolutionary history, pathogen exposure, and other potential factors. They also vary greatly in terms of habitat, size, and life history (Pincheira-Donoso et al., 2013;Zimmerman, 2020). Thus, SOCS3's diverse selection pressures in different reptilian clades may suggest a relationship between reptilian adaptation and a diverse living environment. Notably, we found that the evolutionary rate of SOCS3 in Crocodylia was extremely greater than that of Squamata and Testudines, suggesting that the SOCS3 in Crocodylia underwent rapid evolution. A recent study found that SOCS3 plays an important role in the regulation of glucose homeostasis during high-intensity exercise, which is necessary to maintain performance (Pedroso, Ramos-Lobo & Donato, 2019). Crocodilians are remarkably stealthy predators that stalk and ambush their prey (Erickson et al., 2012). We guessed that the rapid evolution of SOCS3 in crocodiles might be related to their unique predation mode, which requires greater energy at the moment of predation. However, all the clades showed similar ω values using the CmC model in the remaining seven SOCS genes, indicating identical selective pressures on these SOCS genes across different orders of reptiles. Despite the different lineages of reptiles acting on SOCS3, a strong purifying selection signature was detected across the eight SOCS members, which corresponded to the high sequence conservation in these genes. However, our speculations are based on evidence of selective pressure acting on sequence differences so additional functional studies are still needed to confirm our hypothesis.

CONCLUSION
This study is the first comprehensive analysis of SOCS gene family evolution in reptiles. A total of 260 SOCS sequences were identified in reptiles, and this detailed phylogenetic analysis offers a great basis for further functional studies. Eight intact SOCS family members were identified in reptiles, suggesting that the SOCS family has expanded during two rounds of whole-genome duplication. Our results suggest that SOCS genes are under purifying selection in reptiles, indicating that the SOCS gene family has stabilization and significant functional constraints. However, we identified evidence of positive selection in SOCS4 across reptiles, suggesting their adaptation to different types of habitats. Meanwhile, we determined that SOCS2 and SOCS3 had undergone rapid evolution in Crocodylia, which might be related to their environment and predation behavior. In summary, through multiple analysis and comparisons, we provided novel insights into the SOCS family's molecular evolution in reptiles.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by the National Natural Science Foundation of China (Nos. 31872242 and 31672313) and the Special Fund for Forest Scientific Research in the Public Welfare (No. 201404420). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.