ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Brief Report

Use of the informational spectrum methodology for rapid biological analysis of the novel coronavirus 2019-nCoV: prediction of potential receptor, natural reservoir, tropism and therapeutic/vaccine target

[version 1; peer review: awaiting peer review]
PUBLISHED 27 Jan 2020
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Emerging Diseases and Outbreaks gateway.

This article is included in the Coronavirus collection.

Abstract

A novel coronavirus recently identified in Wuhan, China (2019-nCoV) has expanded the number of highly pathogenic coronaviruses affecting humans. The 2019-nCoV represents a potential epidemic or pandemic threat, which requires a quick response for preparedness against this infection. The present report uses the informational spectrum methodology to identify the possible origin and natural host of the new virus, as well as putative therapeutic and vaccine targets. The performed in silico analysis indicates that the newly emerging 2019-nCoV is closely related to severe acute respiratory syndrome (SARS)-CoV and, to a lesser degree, Middle East respiratory syndrome (MERS)-CoV. Moreover, the well-known SARS-CoV receptor (ACE2) might be a putative receptor for the novel virus as well. Additional results indicated that civets and poultry are potential candidates for the natural reservoir of the 2019-nCoV, and that domain 288-330 of S1 protein from the 2019-nCoV represents promising therapeutic and/or vaccine target.

Keywords

2019-nCoV, Wuhan coronavirus, SARS, MERS

Introduction

Fears are mounting worldwide over the cross-border spread of the new strain of coronavirus (denoted as 2019-nCoV) originated in Wuhan, the largest city in central China, after its spread to Thailand and Japan. The newly emerging pathogen belongs to the same virus family as the deadly severe acute respiratory syndrome and Middle East respiratory syndrome coronaviruses (SARS-CoV and MERS-CoV, respectively). The World Health Organization (WHO) has recently published surveillance recommendations for a possible “large epidemic or even pandemic” of the novel coronavirus and it has issued guidelines for hospitals across the world. However, many questions about 2019-nCov remain unanswered: (i) what is the origin and/or natural reservoir of the virus? (ii) is it easily transmitted from human to human? and (iii) what are the potential diagnostic, therapeutic and vaccine targets? Currently, only nucleotide sequences of eight human 2019-nCoV isolates are available without any additional information about biological properties of the virus, beyond the morphology confirmation of the virion using electronic microscopy. This is likely not enough information to answer the important abovementioned questions.

The informational spectrum method (ISM), a virtual spectroscopy method for analysis of proteins, is based on the fundamental electronic properties of amino acids and requires only nucleotide sequence availability to investigate proteins1. For this reason, ISM was previously used for analysis of novel viruses for which little or no information were available25. Here, the 2019-nCoV was analyzed with ISM to identify its possible origin and natural host, as well as putative therapeutic and vaccine targets.

Methods

Sequences

The S1 surface protein sequences from 8 human 2019-nCoV, deposited in the publicly available GISAID database (assessed on January 19, 2020), were analyzed by ISM. The studied sequences were BetaCoV/Wuhan/IVDC-HB-04/2020, BetaCoV/Wuhan/IVDC-HB-01/2019, BetaCoV/Wuhan/IVDC-HB-05/2019, BetaCoV/Wuhan/IPBCAMS-WH-01/2019, BetaCoV/Wuhan/WIV04/2019, BetaCoV/Wuhan-Hu-1/2019, BetaCoV/Nonthaburi/61/2020, and BetaCoV/Nonthaburi/74/2020.

In the phylogenetic analysis, different amino acid sequences of other coronaviruses were also included: (i) S1 proteins from the following viruses: AVP78042, AVPvp78031, AY304486, AY559093, JX163927, YN2018B, KY417146, used already by other authors in the study of the phylogenetic relationship between 2019-nCoV and nearest bat and SARS-like CoVs (GISAID database); and (ii) S1 proteins from three first isolated human MERS-CoV: AGG22542, AFS88936, AFY13307, deposited in the GISAID database

The ISM

Detailed description of the sequence analysis based on ISM has been published elsewhere2. According to this approach, sequences (protein or DNA) are transformed into signals by assignment of numerical values of each element (amino acid or nucleotide). These values correspond to electron-ion interaction potential6, determining electronic properties of amino acid/nucleotides, which are essential for their intermolecular interactions. The signal obtained is then decomposed in a periodical function by the Fourier transformation. The result is a series of frequencies and their amplitudes. The obtained frequencies correspond to the distribution of structural motifs (primary structure) with defined physico-chemical characteristics responsible for the biological function of the putative protein corresponding to the analyzed sequence. When comparing proteins that share same biological or biochemical function, the technique allows detection of code/frequency pairs that are specific for their common biological properties. The method is insensitive to the location of the motifs and, therefore, does not require previous alignment of the sequences. In addition, this is the only method that allows immediate functional analysis.

Phylogenetic analysis

The phylogenetic tree of S1 proteins from coronaviruses was generated with the ISM-based phylogenetic algorithm ISTREE, previously described in detail elsewhere7. In the presented analysis, we calculated the distance matrix with the amplitude on the frequency F(0.257) as the distance measure between sequences.

Results and discussion

In order to compare informational similarity between 2019-nCoV, SARS-CoV, MERS-CoV and Bat SARS-like CoV, the cross-spectra (CS) of S1 proteins from these viruses were calculated. Figure 1a shows the CS of 2019-nCoV, SARS-CoV and MERS-CoV. These CS contain only one dominant peak corresponding to the frequency F(0.257). Figure 1b displays the CS of S1 proteins from 2019-nCoV and Bat SARS-like CoV. Amplitudes in these latter CS are significantly lower than in those CS presented in Figure 1a. These results show that (i) S1 proteins from 2019-nCoV, SARS-CoV, MERS-CoV and Bat SARS-like CoV encode common information, which is represented with the frequency F(0.257), and (ii) S1 proteins from 2019-nCoV are remarkable more informationally similar with S1 from SARS-CoV and MERS-CoV than with S1 from Bat SARS-like CoV. This suggests that biological properties of 2019-nCoV are apparently more similar to SARS-CoV and MERS-CoV than to Bat SARS-like CoV.

04f5c70c-727c-49a2-960b-6f6293ead8f1_figure1.gif

Figure 1. Cross-spectrum (CS) of S1 proteins.

(a) CS of S1 from SARS-CoV, MERS-CoV and 2019-nCoV; (b) CS of Bat SARS-like CoV and 2019-NCov. The abscissa represents the frequencies from the Fourier transform of the sequence of electron-ion interaction potential corresponding to the amino-acid sequence of proteins. The lowest frequency is 0.0 and the highest is 0.5. The ordinate represents the signal-to-noise ratio (the ratio between signal intensity at one particular IS frequency and the main value of the whole spectrum, S/N).

To confirm this conclusion, the ISM-base phylogenetic tree for S1 proteins was calculated (Figure 2). In this calculation the amplitude on the frequency F(0.257) was used as the distance measure. As observed in Figure 2, all analyzed 2019-nCoV S1 amino acid sequences are grouped with SARS-CoV and MERS-CoV and separated from Bat SARS-like CoV. This indicates that 2019-nCoV are more phylogenetically similar to SARS-CoV and MERS-CoV than to Bat SARS-like CoV. This result differs from those obtained with the homology-based phylogenetic analysis, which showed that 2019-CoV are closely related to Bat SARS-like CoV (https://platform.gisaid.org/epi3/frontend#lightbox1296857287).

04f5c70c-727c-49a2-960b-6f6293ead8f1_figure2.gif

Figure 2. Informational spectrum method-based phylogenetic tree for S1 proteins from SARS-CoV, MERS-CoV, Bat SARS-like CoV and 2019-nCoV.

The frequency F(0.257) as the distance measure was used.

It has been previously shown that the dominant frequency in the informational spectrum of viral envelope proteins corresponds to interaction between the virus and its receptor2,3,7,8. The ISM analysis showed that the frequency component F(0.257) is present in the CS of S1 SARS-CoV and its receptor angiotensin converting enzyme 2 (ACE2)9, but not in the CS of S1 MERS-CoV and its main receptor dipeptidyl peptidase 4 (DPP4)10. Of note is that both receptors ACE2 and DPP4 are expressed in airway epithelia. Presence of F(0.257) in the informational spectrum of MERS-CoV (Figure 1) suggests also possible interaction between this virus and the ACE2. The dominant peak on the frequency F(0.257) in the CS of S1 from SARS-CoV and MERS-CoV and ACE2 supports this possibility (Figure 3), although this has not been formally proved for MERS-CoV11.

04f5c70c-727c-49a2-960b-6f6293ead8f1_figure3.gif

Figure 3. Cross-spectrum of ACE2 and S1 proteins from SARS-CoV and MERS-CoV.

The abscissa and the ordinate are as described in Figure 1.

As it is shown in Figure 1a, the frequency F(0.257) is also present in the informational spectrum of the 2019-nCoV, suggesting that ACE2 might be the receptor for this novel coronavirus too. Calculation of the CS for S1 protein from the 2019-nCoV and all ACE2 sequences available at the UniProt database revealed that the highest amplitudes on the frequency F(0.257) correspond to ACE2 from civet and chicken. This result indicates that these species can be included as potential candidates for the natural reservoir of the 2019-nCoV. However, it is possible that 2019-nCoV viruses use very different receptors in the natural host(s) and not only the ACE2 as it is the putative case in humans.

Finally, the S1 amino acid sequence from the 2019-nCoV was scanned to look for the domain that gives the highest contribution to the information represented by the frequency F(0.257) (Figure 4a). This analysis revealed domain 266–330 (numbering concerns the maturated protein) is essential for interaction of 2019-nCoV with ACE2. Of note is the striking homology between these domains of S1 proteins from 2019-nCoV and SARS-CoV, but not from MERS-CoV for which ACE2 is not the main receptor (Figure 4b).

04f5c70c-727c-49a2-960b-6f6293ead8f1_figure4.gif

Figure 4. Domain of S1 protein which is important for 2019-nCoV/ACE2 interaction.

(a) Mapping of the domain of S1 protein from 2019-nCoV (BetaCoV/Wuhan/IVDC-HB-01/2019) which gives the dominant contribution to the information represented with the frequency F(0.257). (b) Sequence homology between domains of S1 proteins from SARS-CoV and 2019-nCoV with essential contribution to the information corresponding to the frequency F(0.257).

In conclusion, results of the presented in silico analysis suggest the following: (i) the newly emerging 2019-nCoV is highly related to SARS-CoV and, to a lesser degree, MERS-CoV; (ii) civets and poultry are potential candidates for the natural reservoir of the 2019-nCoV and (iii) domain 288–330 of S1 protein from the 2019-nCoV represents promising therapeutic and/or vaccine target. Further research on these issues are needed, including the development of reverse genetics and animal models to study the biology of 2019-nCoV.

Data availability

Underlying data

Sequence data of the viruses were obtained from the GISAID EpiFlu™ Database. To access the database each individual user should complete the “Registration Form For Individual Users”, which is available alongside detailed instructions. After submission of the Registration form, the user will receive a password. There are not any other restrictions for the access to GISAID. Conditions of access to, and use of, the GISAID EpiFlu™ Database and Data are defined by the Terms of Use.

Comments on this article Comments (0)

Version 4
VERSION 4 PUBLISHED 27 Jan 2020
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Veljkovic V, Vergara-Alert J, Segalés J and Paessler S. Use of the informational spectrum methodology for rapid biological analysis of the novel coronavirus 2019-nCoV: prediction of potential receptor, natural reservoir, tropism and therapeutic/vaccine target [version 1; peer review: awaiting peer review] F1000Research 2020, 9:52 (https://doi.org/10.12688/f1000research.22149.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 4
VERSION 4 PUBLISHED 27 Jan 2020
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.