In silico structural homology modeling and characterization of multiple N-terminal domains of selected bacterial Tcps

Several bacterial pathogens produce Toll/interleukin-1 receptor (TIR) domain-containing protein homologs that are important for subverting the Toll-like receptor (TLR) signaling cascades in hosts. Consequently, promoting the persistence and survival of the bacterial pathogens. However, the exact molecular mechanisms elucidating the functional characteristics of these bacterial proteins are not clear. Physicochemical and homology modeling characterization studies have been conducted to predict the conditions suitable for the stability and purification of these proteins and to predict their structural properties. The outcomes of these studies have provided important preliminary data for the drug discovery pipeline projects. Here, using in silico physicochemical and homology modeling tools, we have reported the primary, secondary and tertiary structural characteristics of multiple N-terminal domains of selected bacterial TIR domain-containing proteins (Tcps). The results show variations between the primary amino acid sequences, secondary structural components and three-dimensional models of the proteins, suggesting the role of different molecular mechanisms in the functioning of these proteins in subverting the host immune system. This study could form the basis of future experimental studies advancing our understanding of the molecular basis of the inhibition of the host immune response by the bacterial Tcps.


INTRODUCTION
In mammalian hosts, the innate immune system works as the first line of defense against microbial invasion. An immune response is induced upon recognition of the bacterial conserved pathogen-associated molecular patterns (PAMPs) by the hosts' pattern recognition receptors (PRRs). TLRs are predominant PRRs that recognize various bacterial PAMPs. Upon PAMPs recognition, TLRs oligomerize, initiating an intracellular immune signaling cascade. TLR oligomerization brings both the ligand binding and the cytoplasmic domains into close proximity, which is followed by the recruitment of various cytoplasmic adaptor proteins. The cytoplasmic protein interactions are known to be mediated by the conserved cytoplasmic TIR domains present in the TLR receptors and cytoplasmic adaptor proteins. These interactions activate downstream transcription factors including the nuclear factor-κB (NF-κB) that upregulate the expression of multiple inflammatory mediators (Kawai & Akira, 2010;Ve, Williams & Kobe, 2015).
Microbial pathogens have been shown to counter host innate immune defense pathways through molecular mimicry and evasion of the immune response (Elde & Malik, 2009). Several bacterial proteins with immune evasive properties have been detected in a range of Gram-negative and Gram-positive bacteria. These proteins are TIR domain-containing proteins (Tcps) that are structurally similar to several mammalian host Tcps, and are crucial for bacterial subversion of the TLR signaling cascades. It was reported that the TIR domains are involved in protein-protein interactions with TLR receptors and/or the cytoplasmic adaptor proteins (Rana et al., 2013;Ve et al., 2012). In a previous study, more than 200 TIR homologues were identified in a wide range of bacterial species, including Brucella species, Escherichia coli and Salmonella enterica serovar Enteritidis (Newman et al., 2006). A subsequent work identified a Tcp in the non-pathogenic Paracoccus denitrificans (Low et al., 2007). However, the role of Tcps in non-pathogenic bacteria remains poorly understood. More recently, bacterial Tcps have been detected in Yersinia pestis, Staphylococcus aureus, Helicobacter pylori, Yersinia pseudotuberculosis, Enterococcus faecalis and Pseudomonas aeruginosa (Askarian et al., 2014;Imbert et al., 2017;Kaplan-Türköz, 2017;Kraemer et al., 2014;Nörenberg et al., 2013;Patterson et al., 2014;Rana et al., 2011). Bacterial Tcps are approximately 230-310 amino acids long containing TIR domains with primary sequences varies between 150-200 amino acids. In bacterial Tcps, the TIR domain can be located in either the N-terminal or the C-terminal region while the remaining regions are highly variable (Patterson & Werling, 2013). Understanding the exact molecular mechanism of Tcp-dependent bacterial evasion strategies for subverting the host immune system is important as the number of reported bacterial Tcps is rising.
The molecular functions and structural characteristics of various TIR domains from mammals, bacteria and plants have been the focus of several studies (Ve, Williams & Kobe, 2015). The available data suggest that microbial Tcps may function as dimers. However, the molecular mechanism of the Tcps dimerization is not clear (Alaidarous et al., 2014;Kaplan-Türköz et al., 2013;Ve et al., 2012;Ve, Williams & Kobe, 2015). Several studies suggested that domains other than the TIR domain (including the Tcps N-terminal domains), either from microbial or host Tcps, are involved in the Tcps dimerization, protein-protein interactions and/or binding to phosphoinositides in the cell plasma membrane (Alaidarous et al., 2014;Askarian et al., 2014;Kaplan-Türköz et al., 2013;Ve, Williams & Kobe, 2015;Xiong et al., 2019). Therefore, it is important not to focus on microbial TIR domains as sole players in microbial host immune subversion. Investigating the molecular mechanism of the full-length proteins and their N-terminal domains (NTDs) will provide a clearer understanding of the mechanisms involved. However, studies have shown that the solubility and stability of full-length Tcps decreases in solution (Alaidarous et al., 2014;Patterson et al., 2014;Salcedo et al., 2013). In this study, we use in silico approaches to determine

Secondary structure prediction
For the prediction of secondary structure components, two web servers were used, including SOPMA (Geourjon & Deléage, 1995) and GOR IV (Garnier, Gibrat & Robson, 1996). The default parameters were used. The percentages of secondary structure components were predicted based on the analysis of relative frequencies of each amino acid in the helices, sheets and turns present in the X-ray crystallographic templates of the proteins.

Multiple sequence alignment
Multiple sequence alignment for the NTDs of the selected bacterial Tcps was performed using the Clustal Omega server (Madeira et al., 2019) and viewed using the Jalview server (Waterhouse et al., 2009).

Construction and evaluation of protein models
The 3D models of the NTDs of the selected bacterial Tcps were constructed using three homology modeling servers, including Phyre2 (Kelley et al., 2015), SWISS-MODEL (Waterhouse et al., 2018) and I-TASSER (Yang & Zhang, 2015). The available default and/or automated options were used. After optimization, the 3D models were verified using the RAMPAGE (DePristo, De Bakker & Blundell, 2004;Lovell et al., 2003) and ProSAweb (Sippl, 1993;Wiederstein & Sippl, 2007) servers. RAMPAGE validates 3D models by plotting the Ramachandran plot. Generally, the best models exhibit high percentage of the total number of residues in the most favored regions and additional allowed regions and less percentage of the residues in the disallowed or outlier regions of the Ramachandran plot. ProSA-web server validates the quality of the protein models using available protein structures derived from PDB based on z-scoring system. Models were visualized using PyMOL (The PyMOL molecular graphics system, Version 2.0 Schrödinger, LLC).

Prediction and characterization of primary protein sequences of NTDs of the selected bacterial Tcps
The amino acid sequences of the NTDs of the selected bacterial Tcps were retrieved from UniProt (http:/www.uniprot.org). The details of the unique UniProt IDs, amino acid sequence boundaries and bacterial species of the NTDs of the selected proteins are provided in Table 1. We use the ExPASy-ProtParam tool (Gasteiger et al., 2005) to analyze the proteins primary structures and compute different parameters for their physicochemical properties (Table 2 and Table 3). All 20 amino acids were detected, of which the percentage of alanine, isoleucine, leucine, lysine and serine was the highest, while that of tryptophan and cysteine was the lowest (Table 2). In this study, the molecular weight (Mwt) of NTDs varied from 9.30 kDa (Yersinia pseudotuberculosis NTD) to 24.57 kDa (Staphylococcus aureus NTD) (Table 3). ExPASy-ProtParam tool computes the extinction coefficient (EC) at wavelength 280 nm.

D (Asp)
7 (4.1) 9 (5.7) 5 (4.2) 11 (7.1) 6 (3.6) 9 (6.6) 6 (4.2) 8 (4.0) 6 (7.2) 0 (0) 0 (0) Q (Gln) 7 (4.1) 14 (8.8) 12 (10.2) 7 (4.5) 11 (6.7) 5 (3.7) 11 (7.7) 10 (5.0) 4 (4.8) 3 (3.3) 0 (0) E (Glu) 10 (5.9) 11 (6.9) 11 (9.3) 12 (7.7) 13 (7.9) 14 (10.3) 14 (9.9) 18 (8.9) 4 (4.8)    Table 2). Knowing the EC value of a protein might help scientists in optimizing the purification procedure of their protein of interest (Gasteiger et al., 2005). The instability index (II) values of the NTDs of the selected proteins are shown in Table 3. The II provides an estimate of the stability of the protein of interest in a test tube. If the II of a protein is below 40, then the protein is considered to be stable, and if the II is above 40, then the protein is considered to be unstable (Gasteiger et al., 2005). Therefore, the results (Table 3) show that NTDs from PdTLP, BtpB and HP1437 are predicted to be stable in a test tube. The isoelectric point (pI) is the pH of the solution at which the net charge of the surface amino acids of a protein equal to zero (Gasteiger et al., 2005). The computed pI values of the NTDs are shown in Table 3. The pI of the protein varies from acidic, as for N-YpTdp (pI = 4.74), to alkaline, as for N-PdTLP (pI = 10.15) and N-TcpYI (pI = 10.15). The computed pI value is useful for screening a suitable buffering system for the purification of the protein of interest, which is important for ensuring the stability of the purified protein (Gasteiger et al., 2005). In addition, Table 3 shows the total number of negatively charged residues (-R (Asp + Glu)) and the total number of positively charged residues (+R (Arg + Lys)). All NTDs consisted of fewer negatively charged residues than positively charged residues except for N-TcpC, N-YpTdp and N-BtpB. Negatively charged amino acid residues (i.e., Ala, Asp and Glu) are polar and hydrophilic in nature, and they are accessible to the surrounding environment as parts of proteins. When a protein has fewer negatively charged residues than positively charged residues, it may reflect that the protein is involved in intercellular protein-protein interactions (Bhagavan & Ha, 2011). The aliphatic index (AI) of a protein is defined as the relative volume occupied by aliphatic side chains, which include Ala, Val, Ile and Leu, and contribute to protein thermostability (Gasteiger et al., 2005). The AIs for the NTDs of selected proteins are shown in Table 3. High AI means that proteins are predicted to be thermally stable and are hydrophobic in nature (i.e., they contain a large number of hydrophobic amino acids, including Met, Ala, Leu, Gly, Pro, Phe, Ile, Val and Trp) (Gurskaia, 1968). Table 3 shows that, in this study, the majority of NTDs had high AI except for N-BtpA, N-TcpYI and N-TcpYIII, indicating their low thermal stability.
The grand average of hydropathy (GRAVY) for a peptide or protein is the summision of hydropathy values of all amino acids divided by the number amino acids in the sequence (Gasteiger et al., 2005). When a protein is found to have greater negative GRAVY value, this reflects the hydrophilic nature of the protein and the possibility of better interaction between the protein and water (Gasteiger et al., 2005). In this study, NTDs, including N-TlpA, N-BtpA, N-TirS and N-SaTlp1, were found to be more hydrophilic compared to other NTDs (Table 3). ExpPASy-ProtParam tool is used to predict the half-life of proteins. It can predict the time it takes for half of the amount of protein in a cell to degrade after the protein has been synthesized (Bojkowska et al., 2011). The ExpPASy-ProtParam tool can predict the half-life of proteins of three organisms, including human, Escherichia coli, and yeast. However, the tool can be used to estimate the half-life of similar organisms (Gasteiger et al., 2005). In our study, the half-life of all NTDs was found to be similar (equal to 30 hrs), suggesting that the NTDs have long half-life. According to Moran et al. (2013), a typical bacterial protein has half-life of 20 hrs. Therefore, the half-life of the NTDs in this study needs further investigation both in vivo and/or in vitro (Bojkowska et al., 2011;Reder, Michalik & Gerth, 2018).

Prediction and characterization of the secondary structures of NTDs of the selected bacterial Tcps
The secondary structures of the NTDs of the selected bacterial Tcps were predicted using SOPMA (Geourjon & Deléage, 1995) and GOR IV (Kouza et al., 2017) servers. Both tools showed the presence of various percentages of the secondary structure components between the NTDs, including alpha helices, beta sheets and random coils (Table 4). This could be explained by the low sequence similarity and identity between NTDs of the selected bacterial Tcps (Fig. 1). The predicted secondary structure components show that N-TIpA, N-BtpA, N-PdTLP, N-TirS and N-SaTIp1 contain high percentages of alpha helices. Previous studies reported that coiled-coil structure of the NTDs from BtpA and TlpA are suggested to facilitate Tcps dimerization and/or colocalization into the plasma membrane (Alaidarous et al., 2014;Xiong et al., 2019). This allows the bacterial Tcps to mimic the function of adaptor proteins by binding to TLRs resulting in the suppression of the host proinflammatory response (Radhakrishnan et al., 2009;Rana et al., 2013;Ve et al., 2012). In addition, we suggest that the bacterial Tcps including N-TcpC, N-BtpB, N-YpTdp, N-HP1437, N-TcpYI and N-TcpYIII are likely to have their NTDs involved in the interaction with the adaptor proteins, as their secondary structure prediction showed high percentages of coiled turns. The protein-protein interaction between bacterial Tcps and the host adaptor proteins such as the myeloid differentiation factor 88 (MyD88) and the MyD88 adaptor-like (MAL) proteins is another suggested molecular function of bacterial Tcps (Ve, Williams & Kobe, 2015). However, there is no evidence of a direct interaction between these proteins reported yet. The variations in the secondary structure components between the NTDs of the selected bacterial Tcps suggest that these proteins might have different strategies for suppressing the TLR signaling pathways. However, the exact molecular mechanism of bacterial Tcp-dependent immune suppression is still unclear.

Three-dimensional structural modeling and validation of the NTDs of the selected bacterial Tcps
Although the structures of TIR domains of several bacterial Tcps have been determined, the structures of full-length proteins or other domains in the Tcps have not yet been determined (Alaidarous et al., 2014;Chan et al., 2009;Kaplan-Türköz et al., 2013). Here, we have constructed models of the NTDs of several bacterial Tcps. The 3D models of the NTDs were constructed using three homology modeling servers, including Phyre2 (Kelley et al., 2015), SWISS-MODEL (Waterhouse et al., 2018) and I-TASSER (Roy, Kucukural & Zhang, 2010). The three modeling servers produced similar NTD models for individual NTD protein (Table 5). For example, the servers predicted coiled-coil structures containing high alpha helices for N-TIpA, N-BtpA, N-PdTLP, N-TirS and N-SaTIp1 (Table 5). Interestingly, this is consistent with the secondary structure prediction for these proteins where these proteins are predicted to have more than 50% alpha helices contents (Table 4). In addition, the secondary structure prediction of N-TcpC, N-BtpB, N-YpTdp, N-HP1437, N-TcpYI and N-TcpYIII (Table 4) are consistent with the models generated for the proteins, containing low alpha helices and more of other secondary structural components (Table 5).
As part of the virulence mechanisms, studies have suggested that dimerization of bacterial Tcps are required for the binding to signaling and/or adaptor proteins or binding to the phosphoinositides in the plasma membrane (Alaidarous et al., 2014;Fekonja, Benčina & Jerala, 2012;Radhakrishnan et al., 2009;Rana et al., 2013;Ve, Williams & Kobe, 2015). Therefore, based on the structural prediction in this study, it is suggested that bacterial Tcps with the predicted coiled-coil NTD structures  are likely to be involved in bacterial Tcps dimerization facilitating the binding to signaling and/or adaptor proteins or binding to the phosphoinositides in the plasma membrane. In addition, NTDs proteins with the prediction of having low alpha helices contents (including N-TcpC, N-BtpB, N-YpTdp, N-HP1437, N-TcpYI and N-TcpYIII) are likely to be involved only in the protein-protein interaction. Future biochemical and experimental structural studies will bring insights into the exact molecular mechanism behind the bacterial Tcps-dependent signaling inhibition (Elde & Malik, 2009;Patterson et al., 2014). RAMPAGE software used in this study to validate the constructed 3D models of the proteins. RAMPAGE uses the Ramachandran plot, which presents the backbone conformation of proteins based on the position of non-Gly residues in the disallowed regions and phi/psi distribution in the model. The RAMPAGE score is an estimation of the absolute quality of a model. Determined by comparing the model to similarsized reference experimentally solved structures present in the Protein Data Bank (PDB) (Lovell et al., 2003). The models of the NTDs were analyzed by RAMPAGE, which revealed that the majority of the residues fell in the favored and allowed regions of the Ramachandran plot (Table 6). This indicated the good quality of the models constructed using the three modeling servers. N-BtpB, N-TcpYI, and N-TcpYIII showed high percentage of outliers for all models except the one generated using the SWISS-MODEL server. The modeled structures of the NTDs were also validated using the ProSA-web server (Wiederstein & Sippl, 2007). ProSA-web gives a score that indicates the ''degree of nativeness'' of the modeled protein structures, called as z-score. The value of z-score indicates the quality of the modeled protein structure, with large negative z-score indicating native fold while scores closer to positive values indicating problematic or erroneous models (Wiederstein & Sippl, 2007). In this study, the z-scores for all the three models of N-TlpA, N-BtpA, N-BtpB, N-PdTLP, and N-HP1437 were found to be highly negative (Table 7). This indicates that the models are of reasonable quality and exhibit high degree of native fold. Although in case of other NTDs, the z-scores of each of the three models are different (Table 7), further experimental studies are required to determine the consistency of these models. In addition, the servers available for constructing protein models use available protein sequence data and known protein structures to generate protein structural models (Kelley et al., 2015;Roy, Kucukural & Zhang, 2010;Waterhouse et al., 2018). This suggests that scientists need to perform a greater number of structural studies on bacterial Tcps in order to improve the computational structural modeling tools. Aiming towards producing quality protein models highlighting the mechanism behind the bacterial Tcps molecular function.

CONCLUSIONS
Understanding the molecular mechanism of bacterial Tcps-dependent evasion strategy employed to suppress the host immune response using experimental approaches is challenging. Most studies have reported low protein solubility when the bacterial Tcp is expressed at full-length or as protein domains (Patterson et al., 2014;Rana et al., 2013;Salcedo et al., 2013;Ve et al., 2012). In silico homology modeling studies provide an opportunity to establish a pipeline for structural modeling and analysis of any protein as part of understanding the molecular mechanism of the protein function and determining therapeutic targets (Ke et al., 2016;Sliwoski et al., 2014;Ve et al., 2012). In this study, NTDs of selected bacterial Tcps were selected for physicochemical characterization and homology structural modeling using in silico approaches. The study presents the physicochemical characteristics of selected NTDs that are vital for protein stability during the purification of the proteins. In addition, the study provides the characteristics of secondary structures and 3D models (of acceptable quality) of the NTDs. This study will be the base of future biochemical and experimental structural studies, which may aid in elucidating the functional molecular mechanism of the pathogenic bacterial Tcps.