Comparative analysis of SARS-CoV-2 and its receptor ACE2 with evolutionarily related coronaviruses

The pandemic COVID-19 is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and it is spreading very rapidly worldwide. To date, the origin and intermediate hosts of SARS-CoV-2 remain unclear. In this study, we conducted comparative analysis among SARS-CoV-2 and non-SARS-CoV-2 coronavirus strains to elucidate their phylogenetic relationships. We found: 1, the SARS-CoV-2 strains analyzed could be divided into 3 clades with regional aggregation; 2, the non-SARS-CoV-2 common coronaviruses that infect humans or other organisms to cause respiratory syndrome and epizootic catarrhal gastroenteritis could also be divided into 3 clades; 3, the hosts of the common coronaviruses closest to SARS-CoV-2 were Apodemus chevrieri (a rodent), Delphinapterus leucas (beluga whale), Hypsugo savii (bat) , Camelus bactrianus (camel) and Mustela vison (mink); and 4, the gene sequences of the receptor ACE2 from different hosts could also be divided into 3 clades. The ACE2 gene sequences closest to that of humans in evolution include those from Nannospalax galili (Upper Galilee mountains blind mole rat), Phyllostomus discolor (pale spear-nosed bat), Mus musculus (house mouse), Delphinapterus leucas (beluga whale), and Catharus ustulatus (Swainson's thrush). We conclude that SARS-CoV-2 may have evolved from a distant common ancestor with the common coronaviruses but not a branch of any of them, implying that the prevalent pandemic COVID-19 agent SARS-CoV-2 may have existed in a yet to be identified primary host for a long time.

AGING member of the Coronavirus family, Betacoronavirus genus and Sarbecovirus subgenus, with a 30 kb genome [5,6]. Currently the bat coronavirus RaTG13 (GenBank No.: MN996532) is shown to be the most closely related with SARS-CoV-2 by whole genome comparisons [7,8], and pangolin, mink, snake and turtle are deemed to be the intermediate hosts of this virus [1,9,10]. However, to date the origin and the intermediate hosts of SARS-CoV-2 remain unclear.
Here, we analyzed the complete genome sequences of 200 SARS-CoV-2 strains, including 176 from America (USA), 17 from China (CHN), 2 from Spain (ESP), 2 from Hungary (HUN), 1 from Peru (PER), 1 from Colombia (COL) and 1 from Pakistan (PAK), using the MEGA-X software [11]. As shown in Figure 1, the SARS-CoV-2 strains could be grouped into 3 clades, C I, CII and CIII. The viral genomes showed regional aggregation. The SARS-CoV-2 strains from China  "Fu or Ne", the SARS-CoV-2 were in the clades CI, CII and CIII respectively with furthest (Fu) or nearest (Ne) from the roots of the evolutionary tree; 2 "Near with Fu or Ne", the viruses in the common coronaviruses that were infect humans and nearest with the "Fu or Ne". In order to elucidate the relationships between SARS-CoV-2 and the common coronaviruses that also infect humans, we chose genome sequences of six SARS-CoV-2 strains, i.e., MT263395 (furthest), MT263421 (nearest); MT251973 (furthest), MT263420 (nearest); MT259229 (furthest), MT263389 (nearest), which were in the clades C I, C II and C III, respectively, and were the furthest or nearest from the root of the evolutionary tree. We then combined the six SARS-CoV-2 strains with 293 common coronavirus strains that infect humans in the comparative sequence analysis. As shown in Figure 2, the 293 common coronaviruses that infect humans were divided into 3 clades, and there were 12 common coronaviruses that were particularly close to the SARS-CoV-2 strains in evolution ( Figure 2 and Table 1). Very interestingly, the disease caused by the 12 common coronaviruses was exclusively respiratory syndrome (Table 1); these common coronaviruses were identified in 2013, 2014 and 2015 (Table 1).
So far, the bat, pangolin, mink, snake and turtle have been assumed to be the intermediate hosts of the SARS-CoV-2 virus [1,[7][8][9][10]. Researchers have also found many coronaviruses in other organisms [1,9,10]. In order to identify the intermediate hosts of SARS-CoV-2, we chose genome sequences of the six SARS-CoV-2 strains and made comparisons with those of 53 common coronaviruses that infect other organisms. As shown in AGING Figure 3. The evolutionary tree of common coronaviruses that infect other organisms and their phylogenetic comparisons with SARS-CoV-2. These common coronavirus strains could be grouped into 3 clades, with 6 of the coronavirus strains being particularly close to the SARS-CoV-2 in evolution. Note:

AGING
1 "Fu or Ne", the SARS-CoV-2 were in the clades CI, CII and CIII respectively with furthest (Fu) or nearest (Ne) from the roots of the evolutionary tree; 2 "Near with Fu or Ne", the viruses in the common coronaviruses that were infect other organisms and nearest with "Fu or Ne". hosts could be divided into 3 clades, with those that were closest to that of humans in evolution being from Nannospalax galili (Upper Galilee mountains blind mole rat), Phyllostomus discolor (pale spear-nosed bat), Mus musculus (house mouse), Delphinapterus leucas (beluga whale), and Catharus ustulatus (Swainson's thrush). Figure 3, the common coronaviruses were divided into 3 clades, with six common coronaviruses being particularly close to the SARS-CoV-2 strains in evolution ( Figure 3 and Table 2). The diseases caused by the six common coronaviruses were respiratory syndrome and epizootic catarrhal gastroenteritis ( Table  2). The hosts of the common coronaviruses closest to SARS-CoV-2 were Apodemus chevrieri (a rodent), Delphinapterus leucas (beluga whale), Hypsugo savii (bat), Camelus bactrianus (camel) and Mustela vison (Mink) ( Table 2). Those common coronaviruses were identified in 1998, 2006, 2011 and 2015 ( Table 2).
In summary, in this work, we found 1, the SARS-CoV-2 strains analyzed could be divided into 3 clades with regional aggregation; 2, the common coronaviruses that infect humans or other organisms causing respiratory syndrome and epizootic catarrhal gastroenteritis were particularly similar to COVID-19 and could be divided into 3 clades, with SARS-CoV-2 being clearly separated from the common coronaviruses in evolution; 3, the hosts of the common coronaviruses closest to SARS-CoV-2 were Apodemus chevrieri (a rodent), Delphinapterus leucas (beluga whale), Hypsugo savii (bat), Camelus bactrianus (camel) and Mustela vison (mink); and 4, the gene sequences of the receptor ACE2 from different hosts could be divided into 3 clades. The ACE2 gene sequences closest to that of humans in evolution include those from Nannospalax galili (Upper Galilee mountains blind mole rat), Phyllostomus discolor (pale spear-nosed bat), Mus musculus (house mouse), Delphinapterus leucas (beluga whale), and Catharus ustulatus (Swainson's thrush).
Based on these analyses, we conclude that SARS-CoV-2 may have evolved from a relatively distant common ancestor with the other coronaviruses but not a branch of any of them, implying that the prevalent pandemic COVID-19 agent SARS-CoV-2 may have existed in a yet to be identified primary host for a long time.

AUTHOR CONTRIBUTIONS
Study concept or design: FFL, SLL; Data collection: QZ, GYW; funding: FFL, SLL; drafting/revising of manuscript: all the authors.