The Multiple Sequence Alignment and Evolutionary Analysis of Angiotensin-converting Enzyme 2 Diverse Species

The sudden outbreak of new coronavirus (2019-nCoV) in Wuhan significantly influences in China, with wide attention attracted to investigate its origin, intermediate host, transmission pathway, and pathogenic mechanism. 2019-nCoV is very similar to SARS. They may use the similar receptor to recognize and attack host cell. Angiotensin-converting enzymes 2 (ACE2) is a main receptor of 2019-nCov and SARS, which interact with the surface spike glycoproteins (S proteins) of the coronaviruses. However, we still don’t know how the 2019-nCoV infected human and which species is the intermediate host. In this study we analyzed data of amino acid sequence of ACE2 and constructed phylogenetic tree. The phylogenetic tree showed ACE2 of human is highly similar to Pongo abelii and Pan troglodytes. Furthermore, our results suggest that Pongo abelii and Pan troglodytes may be another host and we must protect these animals from be infected. Our result is beneficial for a global pneumonia epidemic prevention.


Introduction
Scientists discovered a new coronavirus in Wuhan, China in December 2019, which has caused a huge public health crisis, affecting the health of tens of thousands of people [1]. The new virus 2019-nCoV is very similar to SARS. In the clinical cases, 2019-nCov infection will lead to fever and cough, followed by more and more serious lung infection and respiratory, thus endangering the life of the infected patients [2]. The latest research have elucidated the mechanism of virus recognition of the host cells, finding out the essential protein, angiotensin-converting enzymes 2 (ACE2), also very important in the process of SARS virus recognition [3][4][5].
Since the global outbreak of SARS in 2003, a large number of studies have shown the role of ACE2 on the host cell surface to be the receptor for SARS-Cov [6][7][8][9]. ACE2 which could interact with the surface spike glycoproteins (S proteins) of the coronaviruses, widely expressed in vivo, especially in kidney and testis, might play an essential role in the regulation of cardiovascular, renal function, and fertility with more than 600 complete coding sequences and about 3400 protein sequences for ACE2 searched on the NCBI [10][11][12]. Meanwhile, the three dimensional structure of full-length ACE2 has been elucidated recently [13,14]. All those research will help with the foundation for further research. What's more, to explore the difference of ACE2 in diverse species is also an important task, which will contribute to assessing whether the other species could recognize those viruses and interact with the protein on the viruses. The most important thing is to find the intermediate host. The sequence decides the structure, which would influence the physiological function, so studying the protein sequence will provide the valuable information to better understand ACE2 [15].
In this study, I will compare amino acid sequences of ACE2 in different species and suggest possible intermediate host according to the sequence similarity and diversity. Starting from the protein sequence downloaded from NCBI, the multiple sequence alignment and phylogenetic tree building were employed to explore the conserved region and variable region, as well as the distance of protein sequence among the diverse species.

Method
Sequence data-set was downloaded from NCBI database searched by the keywords "angiotensin I converting enzyme 2", including 600 reference protein sequences, the detailed species distribution showed in Table 1. In order to get the sequence similarity, ClustalW was employed for multiple sequence alignment, with all sites selected for calculation, and the other settings set to the default values [16]. Then, with the multiple sequence alignment results, MEGAX was used for phylogenetic tree building with Maximum Parsimony method, one of the criteria for choosing the best phylogenetic tree [17]. To raise the reliability of the phylogenetic tree, the resampling method was adopted with the bootstrap value set as 1000. The rough results showed more than 300 sequences redundant data and other ACE protein isoforms. Therefore, only 314 protein sequences for ACE2 were used for the final phylogenetic tree building and analysis. To present tree data clearly, the phylogenetic tree was visualized by the an online tool iTOL [18].

Collection of ACE2 Protein Sequences
The ACE2 protein sequence was downloaded from the NCBI database as angiotensin I converting enzyme 2 as keywords. As a result, 3472 protein sequences were obtained, of which 671 were well reviewed and formed the reference organisms. The species mainly includes mammals, birds, lizards, turtles, insects, anthozoans, molluscs, and bacteria, with the detailed information shown in Table 1.
Multiple sequence alignment of ACE2 protein. To identify the similarity and diversity of the 671 download ACE2 protein sequences, multiple sequence alignment was performed with ClustalW software. The results showed that the downloaded ACE2 protein sequences had high diversity because of existing the ACE1 proteins, ACE3 proteins, and other ACE-like isoforms. In Figure 1, the green clade represents ACE2 proteins, the others are redundant data showed in orange clade, ignored in the further analysis.

Phylogenetic Tree Analysis of ACE2 Protein
Phylogenetic tree has been widely used for evolutionary relationship analysis of biomolecules. To build a phylogenetic tree, a series of approaches have been developed, such as Neighboring Joining method Maximum Likelihood method, Maximum Parsimony methods, Bayesian method, and so on. Here, we used the Maximum-Parsimony method to build the phylogenetic tree for the high efficiency. The partial results for primate clade showed in Figure 2, that the ACE2 of human had the highest similarity to Pongo abelii and Pan troglodytes with meanwhile Macaca nemestrina located at the furthest position with homo sapiens. Since the protein sequence of ACE2 in homo sapiens and Pongo abelii and Pan troglodytes are very similar, the 2019-nCov may also cause diseases in Pongo abelii and Pan troglodytes. Furthermore, these kinds of animals are most suitable for experiments because of the similar protein sequences. So the response to coronaviruses may be similar, meaning possible transmission among those species. The reliability of the branch is 0.979 which is very high and means this result is very reliable. Because the protein sequence of ACE2 in Macaca mulatta locates has higher difference from that of homo sapiens, compared to Pongo abelii and Pan troglodytes, it is less possible that the virus would spread to human from Macaca mulatta. Furthermore, the epidemic situation caused panic, as the news reported that many families worried their pets (such as cats and dogs) may transmit the virus and killed their pets. Thus, we investigated the ACE2 similarity among Human, dog and cat. The results show that the distance of cats (Felis catus, marked in red frame) and dog (Canis lupus, marked in blue frame) is far away from Human in the tree ( Figure 3). Therefore, it is a low possibility to transmit virus from pets to Human.

Conclusion
As 2019-nCoV has spread all over the world, it is critical now to figure out all about of this coronavirus. ACE2 protein plays an essential role on virus invading the host making the analysis of ACE2 protein sequence helpful to provide information on the virus transmission among the species. In order to protect us from this virus, we must find out the potential host. In the study, we collect the data on protein sequence of ACE2 and build the phylogenetic tree to analyze the similarities and diversities among the different species which could be contributed to suggesting certain preventive measures. In particular, we found that protein sequence of ACE2 in Pongo abelii and Pan troglodytes is similar with human. So these animal may be infected by the 2019-nCoV. In the further, we should do more virus detection during them. Besides, the pets (such as cats and dogs) is a low possibility to be infected. Whereas, the ACE2 protein is common in different species and highly conservative. So 2019-nCoV may transmit between different species. Wildlife animals may be possible virus reservoir. We must protect susceptible population from these animals.