Low compositions of human toll-like receptor 7/8-stimulating RNA motifs in the MERS-CoV, SARS-CoV and SARS-CoV-2 genomes imply a substantial ability to evade human innate immunity

Background The innate immune system especially Toll-like receptor (TLR) 7/8 and the interferon pathway, constitutes an important first line of defense against single-stranded RNA viruses. However, large-scale, systematic comparisons of the TLR 7/8-stimulating potential of genomic RNAs of single-stranded RNA viruses are rare. In this study, a computational method to evaluate the human TLR 7/8-stimulating ability of single-stranded RNA virus genomes based on their human TLR 7/8-stimulating trimer compositions was used to analyze 1,002 human coronavirus genomes. Results The human TLR 7/8-stimulating potential of coronavirus genomic (positive strand) RNAs followed the order of NL63-CoV > HKU1-CoV >229E-CoV ≅ OC63-CoV > SARS-CoV-2 > MERS-CoV > SARS-CoV. These results suggest that among these coronaviruses, MERS-CoV, SARS-CoV and SARS-CoV-2 may have a higher ability to evade the human TLR 7/8-mediated innate immune response. Analysis with a logistic regression equation derived from human coronavirus data revealed that most of the 1,762 coronavirus genomic (positive strand) RNAs isolated from bats, camels, cats, civets, dogs and birds exhibited weak human TLR 7/8-stimulating potential equivalent to that of the MERS-CoV, SARS-CoV and SARS-CoV-2 genomic RNAs. Conclusions Prediction of the human TLR 7/8-stimulating potential of viral genomic RNAs may be useful for surveillance of emerging coronaviruses from nonhuman mammalian hosts.


INTRODUCTION
The novel coronavirus disease 2019  caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has developed into a global pandemic Zheng, 2020). Understanding the virology of SARS-CoV-2 and the development of Forsbach et al., 2008). Moreover, certain GU-or AU-rich RNA sequences were described to induce human TLR7-and TLR8-mediated immune responses (Forsbach et al., 2011;Krüger et al., 2015;Zhang et al., 2018). Kosuge et al. (2020) found that there is a bias to the mutations occurring in SARS-CoV-2 variants, with a preference for cytosine (C) to uracil (U) mutations. The degree of the increase in U nucleotides in SARS-CoV-2 variants correlates with enhanced production of cytokines, such as TNF-a and IL-6, in cell lines. Overall, these results indicate that genome sequence variations in RNA viruses (such as coronaviruses) may induce different degrees of human TLR7-and TLR8-mediated immune responses and, as a consequence, result in different degrees of disease severity. Therefore, genome sequence diversity may endow single-stranded RNA viruses with different abilities to evade the host TLR 7/8-mediated innate immune responses.
Yang & Chen (2012) developed a computational method to evaluate the human TLR 7/8-stimulating ability of single-stranded RNA virus genomes based on their human TLR 7/8-stimulating triribonucleotide compositions. In this study, the method was applied to analyze the RNA genomes of coronaviruses infecting humans. A logistic regression model was proposed for prediction of coronaviruses (from nonhuman animals) with low human TLR 7/8-stimulating activity (and, as a consequence, n higher potential to evade the human TLR 7/8-mediated innate immune responses).
Ninety-five oligoribonucleotides (ORNs) and 39 ribonucleotide tetramers with experimentally validated human TLR 7/8-stimulating activity were identified from 17 research reports (Table S3). The sequences of TLR 7 proteins from different organisms exhibit high variations. For example, the sequence identity of TLR 7 proteins from humans and mice is 81%. The preferences for ligand nucleotide compositions of TLRs from different organisms might be different. Since the experiments validating the TLR 7/8stimulating activity of these ORN sequences were conducted using human cells, the TLR-stimulating triribonucleotide composition and TLR-stimulating scores described in this study should be considered to be specific for human TLR 7/8.

Weighted triribonucleotide compositions of single-stranded RNA virus genomes
A method to evaluate the human TLR 7/8-stimulating ability of single-stranded RNA virus genomes based on their human TLR 7/8-stimulating triribonucleotide compositions was developed by Yang & Chen (2012). The 4 3 = 64 possible trimers are labeled as X 1 , X 2 , …, X 64 . Each trimer frequency f Xi is defined as here c Xi is the number of trimer Xi and l is the total number of trimers. The trimer weights w Xi were computed using the following formula: where w þ Xi and w À Xi are the weights of overrepresented and underrepresented human TLR 7/8-stimulating trimers, respectively. If the relative frequency of a trimer in the human TLR 7/8-stimulating ORN sequences is greater than 1/64 (the expected value of a random distribution), the trimer is considered to be human TLR 7/8 stimulatory. Otherwise, the trimer is considered to be nonhuman TLR 7/8-stimulatory. Each trimer is assigned a weight based on the logarithm of its relative frequency in the human TLR 7/8-stimulating ORN sequences (Fig. 1).
For any individual RNA virus genome, the positive and negative weighted trimer compositions were calculated and are referred to as Score S and Score N, respectively, these scores are collectively referred to as the human TLR 7/8-stimulating scores. Score S for stimulating trimers was calculated as and Score N for nonstimulating trimers was calculated as where c Xi is the number of trimer Xi that appear in the viral genomic RNA (with i = 1, … 64). l is the total number of trimers in the viral genomic RNA. Higher Score S and lower Score N values indicates greater numbers of human TLR 7/8-stimulating triribonucleotides in the viral RNA genome and implies that stronger human TLR 7/8mediated innate immunity may be induced by this viral RNA. Conversely, lower Score S and higher Score N values indicate greater numbers of human TLR 7/8 nonstimulating triribonucleotides in the viral RNA genome and, as a consequence, a higher potential for evasion of the human TLR 7/8-mediated innate immune responses (Fig. 1).

Data analysis
Data manipulation was performed with Perl scripts written by the author. The heatmap, stripchart of Logit P values and scatter plots of Score S and Score N values were plotted , PeerJ, DOI 10.7717/peerj.11008 4/16 using the ggplot2 package of R (the R package for statistical computing). Logistic regression was conducted with the glm function of R. Linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) were performed with the lda and qda functions, respectively, in the MASS package of R. Naive Bayes and support vector machine (svmLinear, svmPoly and svmRadial) classifiers in the caret package of R were used. Tenfold cross-validation The cross-validation test is well-known and has been used in many computation-based studies (Le, Ho & Ou, 2017;. The weighted triribonucleotide compositions of coronavirus genomes were randomly split into 10 subsets. Ten pairs of training (90% data for model building) and test (10% data for model evaluation) data were combined from the 10 subsets and used to perform 10-fold cross validations. For each test run, a training set was used to train a model, and the model was then tested using the test set. All methods (logistic regression, LDA, QDA, naive Bayes, svmLinear, svmPoly and svmRadial) were used to perform 10-fold cross validations. Sensitivity was computed with the following formula: sensitivity = true positive/(true positive + false negative). Specificity was computed with the following formula: specificity = true negative/(true negative + false positive). Accuracy was computed with the following formula: accuracy = (true positive + true negative)/total number of samples.

Analysis of coronaviruses from nonhuman animals
Seven models derived from the seven methods (logistic regression, LDA, QDA, naive Bayes, svmLinear, svmPoly and svmRadial) were used to analyze data of coronaviruses from nonhuman animals. All seven methods were selected for binary classification to distinguish human coronaviruses causing common colds and severe acute respiratory syndromes. Using the results of logistic regression as a standard, the overall agreements between the results of the logistic regression model and those of the other 6 models were computed. The overall agreements were computed by the following formula: (true positive + true negative)/total number of samples.

Compositions of triribonucleotides in genomes of coronaviruses infecting humans
The similarity of triribonucleotide compositions was not consistent with the similarity of genome sequences (phylogenetic analysis in Fig. 2A). The overall triribonucleotide compositions of genomic (plus) strand and complementary (minus) strand RNAs of coronaviruses infecting humans are shown in Fig. 2B. These results indicate that triribonucleotide compositions provide novel information that cannot be revealed by phylogenetic analysis.
Human TLR 7/8-stimulating potential of human coronavirus genomes   To predict the human TLR 7/8-stimulating potential of coronaviruses, a logistic regression model was constructed as follows: S and N are the Score S and Score N values for the positive strand of a viral genome sequence. Sr and Nr are the Score S and Score N values for the negative strand of a viral genome sequence. 229E-CoV, OC43-CoV, NL63-CoV and HKU1-CoV were used as the highly stimulating group. MERS-CoV, SARS-CoV and SARS-CoV-2 were used as the  poorly stimulating group. After the model selection procedure, the following model was selected as the final model: The results of 10-fold cross-validation are shown in Table 1. The averages of the intercepts and coefficients of the 10 models were used to construct the logistic regression model as follows: The logistic regression model using all data is:

Comparison with other methods
Six methods were used to validate the results of logistic regression. The results of 10-fold cross-validations using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naive Bayes and three support vector machine classifiers (linear, polynomial and radial) were the same as those using the logistic regression model (Table 1). The high values (≅1) of sensitivity, specificity and accuracy may be due to the almost complete separation of the Score S values of the two groups. These results are consistent with the data shown in Fig. 3.

DISCUSSION
The innate immune system especially the TLR 7/8-interferon pathway constitutes an important first line of defense against single-stranded RNA viruses Note: Set01-Set10 using 9/10 data (as training data) was used to do regression, and 1/10 data as test data. All: 10/10 data was used to do regression. (Vierbuchen, Stein & Heine, 2019;Kikkert, 2020;Nelemans & Kikkert, 2019). Conversely, RNA viruses have evolved multiple strategies to evade host innate immune responses to increase the success rate of infection. Several molecular mechanisms by which positive-sense single-stranded RNA viruses evade innate immune responses have been identified (Ye et al., 2020). Since interferons have great protective effects during early viral infection, evasion of the immune response may have differential effects on the clinical outcome of viral disease (De Marcken et al., 2019). Therefore, prediction of the human TLR 7/8-stimulating activities of genomic RNAs of single-stranded RNA viruses provides a method to evaluate the risk potential posed by emerging single-stranded RNA viruses.
The results of this study suggest that the genomic (positive strand) RNAs of MERS-CoV, SARS-CoV and SARS-CoV-2 are composed of low proportions of human TLR 7/8stimulating triribonucleotides. The weak human TLR 7/8-stimulating potential of the genomic (positive strand) RNAs of MERS-CoV, SARS-CoV and SARS-CoV-2 may lead to a high ability to evade the human TLR 7/8-mediated innate immune responses during the initial stage of viral infection. In contrast, the strong human TLR 7/8-stimulating potential of the genomic (positive strand) RNAs of 229E-CoV, NL63-CoV, OC43-CoV and HKU1-CoV may confer a high probability of triggering strong human TLR 7/8mediated innate immune responses. Different strengths of TLR 7/8-mediated innate immune responses during the initial stage of viral infection may lead to different clinical outcomes of the disease (Frieman & Baric, 2008;Wong, Lui & Jin, 2016;Yokota, Okabayashi & Fujii, 2010). Evaluation of the human TLR 7/8-stimulating potential of viral genomic RNAs may be useful for surveillance of emerging coronaviruses.
Coronavirus infections have been considered novel emerging zoonotic diseases (Streicher & Jouvenet, 2019;Salata et al., 2019;Menachery et al., 2020). Evaluating the risk potential posed by zoonotic coronaviruses is necessary. The logistic regression model constructed in this study can be used to evaluate the human TLR 7/8-stimulating potential of genomic RNAs of coronaviruses from other mammals and birds. For example, the human TLR 7/8-stimulating potential of 1,361 coronavirus genomic (positive strand) RNAs from six mammalian (bat, camel, cat, civet, dog and pig) and avian hosts were computed using the logistic regression model (Eq. 7). Logit P ≅ 1 indicates a human TLR 7/8-stimulating ability equivalent to that of general human coronavirus (229E-CoV, NL63-CoV, OC43-CoV and HKU1-CoV) genomic RNAs. Logit P ≅ 0 indicates weak human TLR 7/8-stimulating potential equivalent to that of the highly pathogenic coronavirus (MERS-CoV, SARS-CoV and SARS-CoV-2) genomic RNAs. As shown in Fig. 5, many of the coronavirus genomic (positive strand) RNAs from bats, camels, cats, civets and pigs exhibit weak human TLR 7/8-stimulating potential equivalent to that of highly pathogenic coronavirus (MERS-CoV, SARS-CoV and SARS-CoV-2) genomic RNAs. Six methods were used for comparison with the logistic regression model. Using the results of logistic regression as a standard, the overall agreements between results of logistic regression model and those of the other six models are shown in Table 2. Most of the analysis results (except the results from the SVM radial classifier) were consistent with the prediction of the logistic regression model. The predictions obtained using the logistic regression model proposed in this study suggest that the routes and risks of contact with those animals (the natural reservoirs of animal coronaviruses) should be addressed. This observation may be an important key point to prevent the outbreak of emerging infectious diseases.

CONCLUSIONS
The results of this study suggest that MERS-CoV, SARS-CoV and SARS-CoV-2 may have a relatively low human TLR 7/8-stimulating potential and relatively high ability to evade human TLR 7/8-mediated innate immune responses. Prediction of the human TLR 7/8-stimulating potential of viral genomic RNAs may be useful for surveillance of emerging coronaviruses from nonhuman animal hosts.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.