The evolutionary characteristics and structural biology of Gallus toll‐like receptor 21

Abstract Toll‐like receptors (TLRs) are an important part of the innate immune system, acting as a first line of defense against many invading pathogens. The ligand known to bind Gallus toll‐like receptor 21 (gTLR21) is the unmethylated cytosine phosphate guanine dideoxy nucleotide motif; however, the evolutionary characteristics and structural biology of gTLR21 are poorly elaborated. Our results suggest that gTLR21 is phylogenetically and evolutionarily related to the TLR11 family and is perhaps a close ortholog of the Mus TLR13. Structural biology of homology modeling of the gTLR21 ectodomain structure suggests that it has no Z‐loop like that seen in Mus TLR9. The cytosolic toll‐IL‐1 receptor region of gTLR21 contains a central 4‐stranded parallel β‐sheet (βA‐βD) surrounded by 5 α‐helices (αA‐αE) on both sides, a highly conserved structure also seen in other TLRs. Molecular docking analysis reveals that the gTLR21 ectodomain has the potential to distinguish between different ligands. Homodimer analysis results also suggest that Phe842 and Pro844 of the BB loop and Cys876 of the αC helix in gTLR21 are conserved in other cytosolic toll‐IL‐1 receptor domains of other TLRs and may contribute to the docking of homodimers. Our study on the evolutionary characteristics and structural biology of gTLR21 reveals that the molecule may have a broader role to play in innate immune system; however, further experimental validation is required to confirm our findings.

patterns of TLRs 10-13 appear to be species specific. 4 TLRs 1-9 are conserved in humans and mice, and their immunological functions have received much attention and in-depth study. On the basis of their localization, these TLRs are largely divided into 2 subfamilies-cell-surface TLRs and intracellular TLRs. Cell-surface TLRs include TLR1, TLR2, TLR4, TLR5, TLR6, and   TLR10, which mainly recognize microbial membrane components such as lipids, lipoproteins, and proteins. TLR2 can heterodimerize with TLR1 or TLR6 and recognize lipoproteins and peptidoglycans from Gram-positive bacteria. TLR4 and TLR5 recognize the lipopolysaccharides of Gram-negative bacteria and bacterial flagellin, respectively.
TLR10 is a pseudogene in mice, but the human TLR10 can collaborate with TLR2 to recognize ligands from Listeria and is involved in sensing influenza A viral infections. 5 Intracellular TLRs (including TLR3, TLR7, TLR8, TLR9, TLR11, TLR12, and TLR13) are localized in the endosomal compartments and recognize nucleic acids originating from bacteria and viruses, as well as self-nucleic acids in disease conditions such as autoimmunity. 1 TLR3 has been found to recognize viral double-stranded RNA. 6 Murine TLR7 and human TLR8 predominantly function in detecting GU-rich single-stranded RNA (ssRNA) from viruses. 7 TLR9 and TLR13 recognize unmethylated cytosine phosphate guanine dideoxy nucleotide (CpG-DNA) motifs and bacterial 23S ribosomal RNA, respectively. 8,9 Interestingly, TLR11 responds to flagellin, much like TLR5. 10 In a recent study, TLR12 was found to be highly similar to TLR11 and functions in recognizing profilin from Toxoplasma gondii. 11 Several types of TLRs have been identified in other vertebrates, including fish and birds. Our current knowledge of Gallus TLRs (gTLRs) has been greatly advanced by the assembly of the genome sequence of the chicken (Gallus gallus). Previous studies have demonstrated the presence of 10 TLRs in Gallus. The gTLRs 3, 4, 5, and 7 are close orthologs of the corresponding TLRs found in other vertebrates and have similar immunological functions. 4 In Gallus, the mammalian TLRs 1, 6, and 10 are replaced by TLR1La and TLR1Lb from an evolutionary point of view. 12 The duplicated genes (TLRs 2a and 2b) of Gallus are both orthologs of the single TLR2 found in mammals. Gallus TLR21 is an ortholog of the TLR21 in fish and amphibians. 13 It appears that TLR15 is unique to birds and some reptilian species. TLR15 responds to in Salmonella enterica infections and is reported to have a unique auto-activation mechanism. 14 The avian TLR21 is an ortholog of the TLR21 proteins from teleost and amphibian species, and clusters into the TLR11 subfamily. Surprisingly, gTLR21 has been shown to localize to the endoplasmic reticulum in cells transfected with the gTLR21 gene and can recognize unmethylated CpG-DNA; therefore, it functions much like TLR9 in mice. [15][16][17] To date, no 3D structure of gTLRs has been obtained by X-ray crystallography with a high level of confidence. Furthermore, no experimental data on the detailed molecular structure of gTLR21 complexes are as yet available. Computational methods may reveal information that is not easy to obtain by experimental means, and therefore, it is necessary for us to use computational methods to facilitate biological research. In this study, we use the computational methods to investigate the genetic evolution of gTLR21 and predict different binding patterns that the gTLR21 protein may have with potential ligands.
The purpose of our study is to improve our understanding of gTLR21 protein biophysics; the techniques we describe here may be used as tools for identifying potential binding locations and catching sight of exploring novel competitive inhibitors, biosensors, network components, and in vaccine development.

| Phylogenetic analysis
In this study, sequences of all full-length TLR proteins (not including the protein whose structure is being predicted) derived from known vertebrate species belonging to mammals, reptiles, avians, and teleosts were downloaded from the Uniprot database (http://www.uniprot. org/). First, aligned sequences for the TLR proteins were generated with MAFFT (L-INS-i). 18 Following this, a phylogenetic tree was constructed using the neighbor-joining method with the JTT + I + G substitution model developed 19 by Prottest3.4 and bootstrap sampling was performed 1000 times. The display, annotation, and management of phylogenetic trees was performed in iTOLv3. 20

| Analysis of residue conservation and secondary structural elements
The alignment results obtained in the first step were submitted to the ConSurf algorithm for the evaluation of evolutionarily conserved amino acid residue positions. 21 The conservation scale ranged from 1 to 9 grades representing different degrees of conservation at the residue positions in gTLR21. The secondary structures of TLR proteins are distinguished through LRRfinder. 22 2.3 | Template searching, homology modeling, and interfacing analysis for the LRR region of gTLR21 with differential potential ligands Because of low sequence identity between the target and template proteins (<40%), we chose to use the best available templates for multiple homology modeling for the LRR region of gTLR21. For this, human TLR3 (PDB ID: 1ziw), monkey TLR7 (PDB ID: 5gmh), human TLR8 (PDB ID: 3wn4), horse TLR9 (PDB ID: 3wpc), and mouse TLR13(PDB ID: We used known sequences of full-length TLR proteins from vertebrates to construct phylogenetic relationships based on the neighbor-joining method ( Figure 1A). In the phylogenetic tree obtained, all TLR protein sequences were divided into 6 families. Obviously, TLR21 was found to cluster within the TLR11 family and was clearly an ortholog of Mus TLR13. This phylogenetic analysis was consistent with the earlier findings. 32 Besides building phylogenetic relationship tree, we also analyze the conserved amino acid residues in the complete TLR21 protein sequences of reptilian, avian, and teleost species ( Figure 1B). The gTLR21 owns equally high amino acid identities to Anser TLR21, it is possible that TLR21 among avians are highly orthologue and the avian TLR21s are closely related to reptilian. Furthermore, TLR21 is not found in humans or other mammals. Although TLR21 appears to have been lost in most vertebrates, it is particularly present in a minority of vertebrates like some reptilians, avians, and several fish. These data indicate that the evolution of TLR21 is according to the phylogeny of species and probably subjected to the species-specific constraints. 13,33 The TLR11 family includes 2 TLR subfamilies-TLRs 11-13 and TLRs 20-22; these members of the TLR11 family are probably derived from the TLR1 lineage. 33,34 The results of our phylogenetic relationship analysis indicate that the TLR11 family can be split into 3 subfamilies (including TLRs 11,13,and 22). Consistent with previous studies, the clade with the TLR subfamilies 4, 11, and 15 subsequently cluster into the TLR1 family, along with the TLR2 subfamilies. 34,35 The results of the phylogenetic analysis of TLR21 indicate gTLR21 is highly orthologous to other avian TLR21s; TLR21 proteins are not unique to birds, but are also wide spread in reptilian and teleost species. 13 The analysis of conserved amino acid residues using complete sequences of TLR21 proteins from different species indicates that the sequences share high identity with each other. This result further provides support for the phylogenetic relationships between the avian, reptilian, and teleost TLR21 proteins.  A, This phylogenetic analysis shows that TLRs can be divided into 6 subfamilies in vertebrates and that gTLR21 is assigned to Family11. The each subfamily has own color. B, gTLR21 is highly orthologous to Anser cygnoides, and the TLR21 subfamily is conserved in different species

| Structural analysis of the ECD and intracellular domains of gTLR21
The structure of gTLR21 ECD was modeled and optimized with Modeller9.18. The scores for the stereochemical quality of the candidate structures calculated by SAVES show that the derived structures of the gTLR21 ECD are reasonable (Tables S1A, S2A, and S3A).
Simultaneously, the score of target model is −6.02 in ProSA-web, which indicates that its distribution of residual energies is also acceptable ( Figure S4A).  "L" represents Leu, Ile, Val, or Phe; "N" represents Asp, Thr, Ser, or Cys; and "x" represents any amino acid. 32 The LRRhs of gTLR21 are similar to conserved LRR subtypes and can form β-sheets packing the concave surface of the gTLR21 ECD, the remaining "irregular" LRR motifs bear variable similarities to different subtypes and form a convex surface structure ( Figure 3A). It is obvious that gTLR21 can form a noncanonical horseshoe-shaped structure where the N-and C-terminal ends of protein extensively interact with each other; this is similar to the "closure" of an oval-shaped structure formed by Mus TLR13. 9 When comparing the LRR modules of gTLR21, Mus TLR13, and Mus TLR9, an interesting observation was that the 14th LRR motif in gTLR21 lacks the long insertion known as a "Z-loop" (Figure 3A). The Z-loop is necessary for TLR9 dimerization and is involved in recognizing ligands for TLRs 7 and 8. 36 These results indicate that gTLR21 may have its own unique patterns for the recognition of ligands.
In this study, we have demonstrated that TLR21 is more likely to form an oval-shaped structure like TLR13, rather than a canonical horseshoe-shaped structure ( Figure 3A). We could not detect the pres-   3.3 | Interaction analysis for potential ligand-ECD complexes formed by the gTLR21 protein

| Potential pockets analysis for LRR domains of gTLR21
The Meta Pocket 2.0 server successfully predicted 5 most likely binding sites in the LRR domains of gTLR21 ( Figure 4). The results of the prediction show that binding site A is a large hole in the center of the LRR domains ( Figure 4A). A similar positioned pocket has previously been well studied in many TLRs. In the TLR1/TLR2 complex, this binding pocket of them not only is involved in ligand (lipopeptide) recognition but also is necessary for dimerization of TLRs 1 and 2. 40 Protein-protein interactions are also observed at a similar site inTLRs 7 and 8, which are required for homo-dimerization of TLRs 7 and 8. 7 These results reveal that the binding site A may be involved in the biological functions of gTLR21 ECD. Binding site E, which can form a relatively large surface pocket ( Figure 4B), is also found in similar regions of the Mus TLR9 ECD and may play a role in recognizing the unmethylated CpG-DNA FIGURE 4 Mapping of predicted binding sites on the surface of the Gallus TLR21 (gTLR21) ectodomain (ECD). Five most likely binding sites are highlighted using red dots on the surface of the LRR model of gTLR21 by Meta Pocket2.0. The surfaces of the modeled structure of gTLR21 ECD are colored green, and the potential binding atoms and residues are indicated by yellow and magenta hatches, respectively. The potential binding clusters are indicated by cyan spheres motif. 8 It is clear that a similar large hole in the inner concave face of gTLR21 ECD also constitutes a binding site C, which was previously identified in Mus TLR13 ( Figure 4C). It has been proven that the binding site C is used for specific recognition of ssRNA. 9 The remaining prediction results indicate the presence of small slit structures that may participate in binding other small potential ligands.

| Structural biology of the predicted CpG-DNA interface within the ECD of gTLR21 complex
The gTLR21 protein can recognize unmethylated CpG-DNA as a "danger" ligand to alert the innate and adaptive immune systems; gTLR21 can activate downstream pathways affecting the immune system much like mammalian TLR9. 15,16 However, the mechanisms by which this specific recognition occurs have not yet been elaborated. Our results suggest that CpG-DNA ( Figure 5A) acts as a "molecular bridge" to penetrate through the concave face of gTLR21 LRR domains ( Figure 5B).
Our results on modeling structural features indicates that the interface comprising the LRRNT and LRRs 1-3, 5, and 11 motifs can enhance the affinity of gTLR21 binding to CpG-DNA ( Figure 5C). Recent studies have also shown that the binding region spans from the LRRNT to LRR10 motif of TLR9. 8 We also tried to identify the amino acid residues potentially involved in the interactions at the binding surface. Results show that the base C1 in CpG-DNA motif forms direct hydrogen bonds with Arg678, Glu680, and Pro704 ( Figure 5D). We have also found that Ser630 in LRR23 interfaces with the base A2 via hydrogen bond ( Figure 5D). In the interaction domain, the bases G4, A5, and C6 potentially form salt bridges with Lys702, Arg701, and Arg55 ( Figure 5E), respectively. Three amino acid residues are found to the interface with the G7 base via multiple intermolecular forces in which the Asn79 and Ser81 are located in LRR1 and the Asp103 located in LRR2. Thr101 of LRR2 and Asp127 of LRR3 are also devoting to interacting with T8 base. The backbone phosphates of T9 are recognized by Arg173 of LRR5 ( Figure 5F). Simultaneously, the side chains of Arg177 and Arg326 conform an interface with the bases C10-T12 of 3 ′ arm of the CpG-DNA ( Figure 5G). The structures of the agonistic CpG-DNA bound to gTLR21 described in this study reveal the structural bases of CpG-DNA recognition by gTLR21.

| Structural characteristics of the interaction of the potential ligand, ssRNA, with the LRR modules of gTLR21
We have also investigated the potential binding mechanism of gTLR21 with ssRNA. The ssRNA molecule ( Figure 6A) likely fits along the inner concave surface of gTLR21, with its 5 ′ and 3 ′ arms binding to the Cand N-terminal ends of gTLR21 ECD ( Figure 6B), respectively. This is consistent with how ssRNA molecules bind TLR13, except for opposite orientation of binding of TLR3-double-stranded RNA complex. 6 Structural features suggest that ssRNA also forms a stem-loop-like FIGURE 6 The sequence-specific recognition mechanism of single-stranded RNA (ssRNA) motif by Gallus TLR21 (gTLR21). A, The sequence and structure of ssRNA. B, C, The ssRNA could form a stem-loop-like structure so that its 5 ′ and 3 ′ arms mainly fit along the inner concave surface formed by the C-and N-terminal ends of the gTLR21 ECD, respectively. D, The bases A2054 and G2056-2058 of the ssRNA can bind to the interaction surface surrounded by residues located in LRR17-22 via potential π-cation interaction, hydrogen bonds, and salt bridges. E, The backbone phosphates of base A2059 is recognized by Lys49 and form hydrogen bonds with Asp47 and Arg55, whereas the base A2060 interfaces with Arg55, Glu653, and Arg701 via salt bridges. F, The base G2061 in the ssRNA motif interfaces with Glu653-Asn654 of LRR24, whereas Arg678 of LRR25 and Arg701 and Lys702 of LRR26 contribute to a combined interface with the base A2062 via multiple intermolecular forces. G, Tyr82 of LRR1 and Tyr106 of LRR2 are also involved in maintaining the structure of gTLR21-ssRNA complex via hydrogen bonds. The side chains of gTLR21 are colored blue (in the ball-and-stick model), and the bases of ssRNA motif are green. H-bonds, πcation interactions, and salt bridges are indicated by blue, yellow, and wheat dashed lines, respectively structure that is highly similar to those observed in TLR13-ssRNA complex ( Figure 6C); the stem-loop-like structure is also essential for TLR13 recognition of ssRNA. 9,41 Our structural biology analysis also strongly indicates that the predicted interaction surface is surrounded by the LRRNT, LRR1-2, LRR17-22, and LRR24-26 motifs of the gTLR21 ECD to fix the ssRNA in position.
The amino acid residues, which could potentially participate in the interactions at the binding surface, are discussed here. Our results indicate that there may be a π-cation interaction between the base A2054 and Arg474 of LRR17. The presence of hydrogen bonds is likely provided by the side chains of Asp496, Arg547, Gln576, and Ser603 in our results.
Simultaneously, the bases G2056 and G2057 may potentially form salt bridges with Asp522 and Arg547, respectively, in the combined domain ( Figure 6D). With the extension of the ssRNA motif, A2059 is recognized by Asp47, Arg55, and Lys49 via multiple intermolecular interactions. It is clear that the backbone phosphates of the base A2060 are recognized by Arg55, Glu653, and Arg701 ( Figure 6E). The Glu653 toAsn654 of LRR24, Arg678 of LRR25, and Arg701 and Lys702 of LRR26 also contribute to the maintenance of this structure via different intermolecular forces ( Figure 6F). The docking results indicate that Tyr82 of LRR1 and Tyr106 of LRR2 also devote to bind to the bases of 3 ′ arms of the ssRNA motif via hydrogen bonds ( Figure 6G). Recently, a series of studies has demonstrated that Mus TLR13 is a receptor for the vesicular stomatitis virus 42 and could detect sequence-specifc areas of 23S ribosomal RNA from bacteria. 41 The results of our study described here reveal that gTLR21 may have the potential to specifically recognize ssRNA like Mus TLR13 9 and that this could contribute to the development of a strong immune mechanism in Gallus.
The TLRs ECD can form a horseshoe-shaped structure with a concave surface that participates in the recognition of various pathogens.

| Homodimer analysis for the TIR domains of gTLR21
The docking results indicate that TIR monomers of gTLR21 are able to form homodimer complexes with one another ( Figure 7A). Threedimensional structures of the gTLR21 TIR domains display a crystallographic asymmetric dimerism. Interaction analyses suggest that 12 amino acid residues (Phe842, Pro844, Gly845, Ser847, Ile848, Ile849, Arg868, Arg872, Cys876, Glu907, Ser909, and Tyr911) can contribute to the formation of the dimeric interface ( Figure 7B). The results of the alignment analysis ( Figure 7C) show that Phe842, Pro844, and Gly845, which located in the BB loop, are highly conserved in other TLRs and that residues in the DD loop (Glu907, Ser909, and Tyr911) have been reported to play an important role in the dimerization interface. 40 We also find that Ile848 located in the αB helix and Cys876 located in the αC helix are highly conserved; both residues are likely to participate in forming TIR homodimers similar to those formed by TLR10. 39 This study represents an attempt to apply computational methods such as protein-protein docking analysis to explore the interactions of TIR domains to investigate the mechanisms of signaling induced by gTLR21. The homodimer analysis results suggest that the gTLR21 TIR region has the potential to form homodimers much like those formed by TLR6 and TLR10. 38,39 We have also found 3 potential amino acid residues (Phe842, Pro844, and Cys876) at the homodimer interface that highly conserved in other TLRs; these residues may play an essential role in signal transduction that triggered by TLR and interleukin-1 (IL-1). 38,39,47 TLR9 was unable to elicit an immune response to unmethylated CpG-DNA in MyD88 knockout mice, 48 probably because the induction of type I interferons, particularly type I interferon-α, by TLR9 depends on the MyD88-IRF7 pathway in pDC cells. 49 TLR13 also appears to induce a MyD88-dependent signaling pathway to trigger the activation of NF-κB; and TLR13 is also dependent on IRF7 for activating type 1 interferon pathways. 42 Results of recent studies indicate that gTLR21 ectopic expressed in HEK-293 cells could regulate NF-κB and furthermore could mediate the expression of cytokines in HD11 cells on both exogenous CpG-DNA stimulation. 15,16 Taken together, these data indicate that theTIR domain participates in protein-protein interactions with intracellular adaptor proteins for signaling processes.

| CONCLUSIONS
By performing a phylogenetic and evolutionary analysis of TLR21 proteins from majority animals, we report that TLR21 is phylogenetically related to TLR11 family and is perhaps a close ortholog of Mus

SUPPORTING INFORMATION
Additional Supporting Information may be found online in the supporting information tab for this article.