Molecular Evolution of Human Coronavirus 229E in Hong Kong and a Fatal COVID-19 Case Involving Coinfection with a Novel Human Coronavirus 229E Genogroup

ABSTRACT Compared to other human coronaviruses, the genetic diversity and evolution of human coronavirus 229E (HCoV-229E) are relatively understudied. We report a fatal case of COVID-19 pneumonia coinfected with HCoV-229E in Hong Kong. Genome sequencing of SARS-CoV-2 and HCoV-229E from a nasopharyngeal sample of the patient showed that the SARS-CoV-2 strain HK13 was most closely related to SARS-CoV-2 type strain Wuhan-Hu-1 (99.99% nucleotide identity), compatible with his recent history of travel to Wuhan. The HCoV-229E strain HK20-42 was most closely related to HCoV-229E strain SC0865 from the United States (99.86% nucleotide identity). To investigate if it may represent a newly emerged HCoV-229E genotype in Hong Kong, we retrieved 41 archived respiratory samples that tested positive for HCoV-229E from 2004 to 2019. Pneumonia and exacerbations of chronic airway diseases were common among infected patients. Complete RdRp, S, and N gene sequencing of the 41 HCoV-229E strains revealed that our contemporary HCoV-229E strains have undergone significant genetic drift with clustering of strains in chronological order. Two novel genogroups were identified, in addition to previously described genogroups 1 to 4, with recent circulating strains including strain HK20-42 belonging to novel genogroup 6. Positive selection was detected in the spike protein and receptor-binding domain, which may be important for viral evolution at the receptor-binding interphase. Molecular dating analysis showed that HCoV-229E shared the most recent common ancestor with bat and camel/alpaca 229E-related viruses at ∼1884, while camel/alpaca viruses had a relatively recent common ancestor at ∼1999. Further studies are required to ascertain the evolutionary origin and path of HCoV-229E. IMPORTANCE Since its first appearance in the 1960s, the genetic diversity and evolution of human coronavirus 229E (HCoV-229E) have been relatively understudied. In this study, we report a fatal case of COVID-19 coinfected with HCoV-229E in Hong Kong. Genome sequencing revealed that our SARS-CoV-2 strain is highly identical to the SARS-CoV-2 strain from Wuhan, compatible with the patient’s recent travel history, whereas our HCoV-229E strain in this study is highly identical to a recent strain in the United States. We also retrieved 41 archived HCoV-229E strains from 2004 to 2019 in Hong Kong for sequence analysis. Pneumonia and exacerbations of chronic airway diseases were common diagnoses among the 41 patients. The results showed that HCoV-229E was evolving in chronological order. Two novel genogroups were identified in addition to the four preexisting HCoV-229E genogroups, with recent circulating strains belonging to novel genogroup 6. Molecular clock analysis dated bat-to-human and bat-to-camelid transmission to as early as 1884.

addition to the four preexisting HCoV-229E genogroups, with recent circulating strains belonging to novel genogroup 6. Molecular clock analysis dated bat-to-human and batto-camelid transmission to as early as 1884.
Compared to other human CoVs, relatively little is known about the genetic diversity and evolution of HCoV-229E. Since the first isolation of HCoV-229E in 1965 as a novel common cold virus from organ culture (12), only 24 complete genome sequences of HCoV-229E have been available in GenBank, including one from the cDNA clone Inf-1 of the laboratory-adapted strain VR-740 (13) and 23 from clinical isolates in the United States, Italy, Netherlands, Germany, and Haiti (14)(15)(16). In one early study, sequencing of the spike genes of three geographically and chronologically distinct HCoV-229E strains showed rather limited variations (17). However, a subsequent study demonstrated genetic drift in the spike and nucleoprotein genes between 25 chronologically distinct HCoV-229E strains circulating in Australia between 1979 and 2004, which were clustered into four groups, group 1 to 4 (18).
In this study, we report a fatal case of coronavirus disease 2019 (COVID-19) coinfected with HCoV-229E. To investigate if the HCoV-229E strain may represent a newly emerged genotype, we also retrieved archived respiratory samples that tested positive for HCoV-229E in Hong Kong and performed complete RdRp, S, and N gene sequencing. The clinical characteristics of patients were analyzed in relation to molecular epidemiology data. Molecular dating was performed to understand the evolution of HCoV-229E.

RESULTS
A fatal case of COVID-19 coinfected with HCoV-229E. A 39-year-old Chinese man with a history of diabetes mellitus was admitted to hospital in late January 2020 with progressive shortness of breath 1 week after returning from Wuhan. On admission, his temperature was 38.2°C with 100% oxygen saturation at room air. Complete blood count showed normal leukocyte count but lymphopenia (0.9 Â 10 9 /liter). High-resolution computed tomography of thorax showed bilateral multifocal ground-glass opacities, compatible with COVID-19. His nasopharyngeal aspirate (NPA) for SARS-CoV-2 quantitative reverse transcription PCR (qRT-PCR) and COVID-19 loop-mediated isothermal amplification (LAMP) (19) were positive. HCoV-229E was also detected by multiplex PCR for respiratory pathogens (BioFire FilmArray). He was initially treated with amoxicillin-clavulanate and doxycycline and later lopinavir/ritonavir and interferon beta-1b. Despite treatment, he developed hypothermia and cardiac arrest 4 days after admission and failed resuscitation.
Clinical characteristics of patients with HCoV-229E infections in Hong Kong. To investigate the genetic relatedness of HCoV-229E strain HK20-42 with circulating HCoV-229E strains in Hong Kong, we retrieved archived nasopharyngeal samples from two regional hospitals that tested positive for HCoV-229E by RT-PCR from 2004 to 2019. A total of 41 HCoV-229E-positive samples were identified, and the clinical characteristics of the 41 patients are summarized in Table 1. Nineteen were males, and 22 were females. Twenty-six were adults, and 15 were children (median age 43 years, range from 1 month to 96 years). Notably, most patients were at the extremes of age (14 patients #8 years old and 15 patients $70 years old, among which four patients were $90 years old).
Upper respiratory tract infection (URTI) and pneumonia were the most common diagnoses. Exacerbations of asthma (patients 15, 25, and 39) or chronic obstructive pulmonary disease (COPD; patients 8, 13, 18, and 41) were also common. Sixteen patients (38%) presented with symptoms of URTI, with two patients complicated by asthmatic exacerbation, one by febrile convulsion, and one by tonsillitis. Of the other 16 patients with pneumonia, 10 were elderly, $70 years old. Yet, the 39-year-old male patient (patient 26) did not have underlying disease and presented with fever and cough after recent travel to mainland China. His NPA was also positive for influenza A virus. The 57year-old male patient (patient 11) with pneumonia had also just returned from a trip to mainland China. The pneumonia was complicated by exacerbation of asthma (patient 39) or COPD (patient 18) in two patients.
Four other patients also had recent travel history before symptom onset, with three (patients 3, 19, and 39) having returned from mainland China and one (patient 20) from Indonesia. The father and grandmother of the 1-month-old neonate (patient 19) were noted to have recent respiratory illnesses. Four other patients (patients 2, 7, 32, and 34) also had a history of recent contact with household members or friends with respiratory illnesses. The 14-year-old boy (patient 2) reported members of his dragon boat racing team having similar respiratory symptoms.
Besides patient 26 with influenza A virus coinfection, three other patients (patients 13, 21, and 31) had coinfection by Haemophilus influenzae, Streptococcus pneumoniae, and Pseudomonas aeruginosa, respectively. Except for the 90-year-old man with pneumonia who died of secondary bacterial pneumonia caused by Klebsiella pneumoniae 1 month later, all 40 other patients survived.
Phylogenetic analysis of complete RdRp, S, and N genes of HCoV-229E strains in Hong Kong. To study the evolutionary relationship of the HCoV-229E strain HK20-42 with other local strains, the complete RdRp, S, and N genes of HCoV-229E from the 41 NPAs were amplified and sequenced. Strain HK20-42 and the other 41 HCoV-229E strains possessed 98.71 to 98.92%, 96.05 to 96.62%, and 97.52 to 98.21% nucleotide identities to the RdRp, S, and N sequence of HCoV-229E Inf-1 reference sequence, respectively. Phylogenetic analysis of the RdRp sequences showed that older HK strains from 2004 to 2006 (n = 29) and five HK strains from 2009 to 2011 were closely related, while eight HK strains from 2011 to 2020, including HK20-42, formed a distinct cluster with a high bootstrap value of 82% ( Fig. 2A).
Phylogenetic analysis of S sequences demonstrated six distinct genogroups with different circulating periods. In line with a previous study (18) (Fig. 2B). The human aminopeptidase N (hAPN) is known to be the receptor for S protein binding in HCoV-229E, with the receptor-binding domain (RBD) located at amino acid positions 293 to 435 of the S1 region as determined by crystal structure analysis (21). The predicted RBD region of the 42 HK strains shared 89.04% to 90.44% nucleotide identity (78.32% to 79.72% amino acid identity) to the HCoV-229E Inf-1 reference sequence.
Phylogenetic analysis of N sequences showed tree topology similar to that of S genes, with the older Australian strains forming distinct clusters, though with some strains exhibiting slightly different phylogenetic positions. Five HK strains from year 2004 to 2005 were closely related to Australian genogroup 4 strains from 2001 to 2003, with nucleotide identities of 99.32% to 100%. Twenty-nine HK strains were distantly related to strains from genogroups 1 to 4 and formed a distinct cluster with a bootstrap value of 92%. Eight other HK strains from 2011 to 2020, including HK20-42, were closely related to each other to form another new cluster (genogroup 6) (Fig. 2C).
Selective pressure analysis. Using the 42 RdRp, S, and N gene sequences, the ratios of nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site (K a /K s ) were calculated ( Table 2). The highest K a /K s ratios were observed in the S gene (0.412), while RdRp (0.111) and N (0.143) genes showed lower K a /K s ratios. When the S gene of different genogroups was analyzed, the highest K a /K s ratio was observed in genogroup 6 (0.933), while the other genogroups showed much lower K a /K s ratios. K a /K s ratios (v ) in the S gene were calculated on a codon-by-  (Table 2). Nevertheless, three codons were predicted to have a v of .1 by at least two different methods with statistical significance, including codons 26 and 288 by fixedeffects likelihood (FEL) and random-effects likelihood (REL) methods, and codon 314 by REL and mixed-effects models of evolution (MEME) methods, indicating possible functional constraints at these positions. Codon 288 was predicted as a positively selected site among genogroup 6 by at least two methods, which accounts for the relatively high K a /K s ratio in genogroup 6. While residue 288 (Val) is conserved among the prototype and genogroups 1 to 5, V288A, V288M, and V288E were observed in genogroup 6. All three codons were distributed within the S1 domain, and codon 314 was situated within the RBD, indicating strong selective pressure at the receptor-binding interphase during viral evolution. Estimation of divergence time. To understand the evolution of HCoV-229E and its divergence time from 229E-related CoVs in animals, their complete ORF1ab sequences were subjected to molecular clock analysis using the Bayesian uncorrelated exponential relaxed molecular clock and exponential growth coalescent model (Fig. 3). The estimated mean substitution rate was 3.03 Â 10 24 (1.9 Â 10 24 to 4.3 Â 10 24 ) substitutions per site per year. The time of the most recent common ancestor (tMRCA) of all 229Erelated CoVs was dated back to 1765 (95% highest-posterior-density region [HPD], 1722 to 1962), while that of bat 229E-related CoV AT1A-F1, camel/alpaca 229E-related CoVs, and HCoV-229E was estimated at 1884 (95% HPD, 1884 to 1885), suggesting that 229E-related CoVs may have jumped from bats to humans and other animals around 130 years ago. Interestingly, the tMRCA of HCoV-229E and camel/alpaca 229E-related CoVs was estimated at 1920 (95% HPD, 1915 to 1926), while that of HCoV-229E was estimated at 1953 (95% HPD, 1949 to 1958) and that of camel/alpaca 229E-related CoVs at 1999 (95% HPD, 2000 to 2000). This suggests that existing camel/alpaca 229Erelated CoVs may have evolved from a relatively recent common ancestor, while HCoV-229E may have emerged in humans from other camels/alpacas or yet-unidenti-

DISCUSSION
We report a fatal case of COVID-19 coinfected with HCoV-229E, the latter belonging to a new genogroup 6 arisen as early as 2011. The genome of the SARS-CoV-2 strain  (22), suggesting that routine testing for other respiratory pathogens during the COVID-19 pandemic is unreliable to rule out SARS-CoV-2 infection. The present study described the molecular diversity and evolutionary dynamics of HCoV-229E in Hong Kong. Although HCoV-229E has been known for more than half a century (12), evolutionary studies of HCoV-229E have been scarce. In this study, we showed that the contemporary HCoV-229E strains from Hong Kong have undergone significant genetic drift. Phylogenetic analysis of the S gene showed that HCoV-229E has continued to evolve with time to generate new genogroups, with clustering of strains in chronological order. The earliest group, genogroup 1, comprises strains  (29), and SARS-CoV-2 (9.9 Â 10 24 ) (30). The mutation rates of HCoVs are lower than some other RNA viruses, which may be explained by the possession of nonstructural protein 14 (nsp14) which encodes a proofreading RNase called ExoN and is crucial for CoV RNA synthesis and maintaining CoV replication fidelity (31,32). Positive selection, especially in the spike protein and RBD, may have been important during HCoV-229E evolution at the receptor-binding interphase, probably to adapt to new environmental changes or evade the immune system. This is in contrast to the belief that human CoV is relatively stable as it has been circulating in humans for decades. Previous HCoV-229E spike-receptor structural analysis focused only on the interaction of the RBD in complex with hAPN (21,33), but the role of positively selected residues outside the RBD remains unclear. Further investigations are warranted to understand the possible effects of mutations of these two codons outside the RBD in spike-receptor binding and HCoV-229E evolution.
The present study revealed the estimation of divergence times for human-and animal-associated 229E viruses. Similarly to SARS-CoV, HCoV-229E may have also originated from bats. A previous study has reported CoVs closely related to HCoV-229E in bats of the genus Hipposideros in Africa (34,35). These bat viruses shared .90% amino acid sequence identities in the seven conserved replicase domains for CoV species demarcation, suggesting that they belong to the same CoV species as HCoV-229E (35). Moreover, CoVs even closer to HCoV-229E have been identified in alpacas in the United States and dromedary camels in Africa and the Arabian Peninsula (15,36). However, it remains to be determined if dromedaries or related animals have served as an intermediate host for bat-to-human transmission, or if both humans and dromedaries have acquired 229E-related CoVs independently from a common ancestor. Alpaca and dromedary camel viruses showed a similar deletion in the S1 domain of spike protein as HCoV-229E, which was not found in bat viruses. An additional putative ORF8 downstream of the nucleocapsid gene was present in bat and dromedary viruses but not in HCoV-229E. The only exception was a small in-frame deletion found in a dromedary strain, KCSP1, and a partial deletion in alpaca virus (15,35). Although these features may suggest bat-to-camelid-to-human transmission, our molecular dating analysis showed a relatively recent time of divergence (tMRCA 1999) among dromedary/alpaca viruses. Therefore, it is possible that yet-unidentified animals may have served as the intermediate hosts for bat-to-human and bat-to-camelid transmission. This is also supported by the monophyletic clades formed by human and camelid viruses, respectively, in previous phylogenetic studies (15). Yet, one limitation of the molecular clock analysis is that dromedary/alpaca viruses available for analysis were detected only after 2008, which may affect the accuracy of tMRCA estimation. Moreover, frequent recombination among coronaviruses may pose problem for divergence time estimation analysis. Multiple recombination events occurred involving HCoV-229E, bat 229E-related CoVs, and alpaca 229E-related CoVs, with major recombination breakpoints occurring within ORF1ab and the beginning of the S gene (35). Phylogenetic analysis using specific nsp regions such as RdRp, 3CLpro, and helicase demonstrated that the topologies of nsp regions were incongruent because of multiple recombination events as opposed to that of ORF1ab (data not shown). This scenario was also observed in the work of Corman et al. (35), where a monophyletic clade was formed by human and alpaca 229E-related CoV in ORF1ab and S genes but alpaca 229E-related CoV was clustered with bat 229E-related CoVs in M, E, N, and ORF4 genes. Therefore, the diverse topologies and frequent recombination events among coronaviruses may affect the accuracy of tMRCA estimation.
Since human CoVs are seldom included in routine respiratory virus detection panels in clinical laboratories, the prevalence and importance of HCoV-229E may have been underestimated. The causative role of HCoV-229E in respiratory tract infections was first confirmed in the 1960s when healthy volunteers who received inoculations of cultured HCoV-229E developed symptoms of the common cold (37). While immunocompetent adults can be infected with HCoV-229E, immunocompromised patients as well as infants and the elderly are particularly susceptible, with higher chance of severe disease (38,39). The prevalence of HCoV-229E varies widely among different countries and study periods, ranging from 4.6 to 17.0% among all coronaviruses detected in respiratory specimens (40)(41)(42)(43)(44). While other human coronaviruses are known to exhibit seasonal patterns, the seasonality of HCoV-229E has been less clearly understood, partly due to the low detection rate and limited epidemiological data. While a winter and early spring seasonality has been suggested, detection of HCoV-229E has been found to disperse in all four seasons in other studies (40,42,43). In the present study, the clinical spectrum of diseases of HCoV-229E infection is similar to that of HCoV-HKU1 and HCoV-OC43 (10,25,40,41), with most patients in the extremes of age or with underlying diseases, although healthy, young patients were occasionally affected. In contrast to the traditional belief that human coronaviruses are mostly associated with mild URTI or common cold, a significant proportion of patients in the present study had severe infections such as pneumonia or complications such as exacerbation of COPD. While no particular genogroup of strains was associated with more severe infections, further studies with inclusion of more cases may allow better understanding of the pathogenicity of the different genogroups. RNA extraction. Viral RNA was extracted from the NPAs of the corresponding patients using the QIAamp viral RNA minikit (Qiagen, Hilden, Germany). The RNA pellet was eluted in 60 ml of DNase-free, RNase-free double-distilled water and was used as the template for RT-PCR.

MATERIALS AND METHODS
Genome sequencing of SARS-CoV-2 and HCoV-229E detected from the fatal case and phylogenetic analysis. RNA was converted to cDNA by a combined random-priming and oligo(dT) priming strategy and specific primers for SARS-CoV-2 and HCoV-229E, respectively. The complete genomes of SARS-CoV-2 and HCoV-229E from the fatal COVID-19 case were amplified and sequenced using primers shown in Tables S1 and S2 in the supplemental material. Both forward and reverse strands of the PCR products were sequenced twice with an ABI Prism 3700 DNA analyzer (Applied Biosystems) using the respective PCR primers. Sequences were assembled and manually edited to produce the final sequences of the viral genomes using Geneious R11 (Biomatters, Auckland, New Zealand). A phylogenetic tree using nucleotide sequences was constructed using the maximum likelihood method in MEGA 7.0 with the GTR 1 G 1 I substitution model. Bootstrap replicates of 1,000 were selected for assessment of the robustness of branches.
RT-PCR and sequencing of complete RdRp, S, and N genes of HCoV-229E and phylogenetic analysis. The RNA was converted to cDNA by a combined random-priming and oligo(dT) priming strategy. The complete RdRp, S, and N genes of HCoV-229E from 41 NPAs were amplified and sequenced using the primers shown in Table S3 and strategies described previously (10,40,41). The nucleotide and deduced amino acid sequences of the RdRp, S, and N genes were compared to those of HCoV-229E sequences available in GenBank. Phylogenetic trees using nucleotide sequences were constructed using the maximum likelihood method in MEGA 7.0 with TN93 1 G model for RdRp, TN93 1 G model for S, and T92 1 G model for N genes. The substitution models were selected according to the Akaike information criterion (AIC) implemented in MEGA 7.0 (45). The robustness of branches was assessed by bootstrap analysis with 1,000 replicates.
Selective pressure analysis. The number of synonymous substitutions per synonymous site, K s , and the number of nonsynonymous substitutions per nonsynonymous site, K a , for RdRp, S, and N genes were calculated using the Nei and Gojobori substitution model with Jukes-Cantor correction in MEGA 7.0 as described previously (46). Sites under positive selection were inferred using single-likelihood ancestor counting (SLAC), fixed-effects likelihood (FEL), and random-effects likelihood (REL) methods as implemented in the DataMonkey server (available online at https://www.datamonkey.org/). Positive selection for a site was considered to be statistically significant if the P value was ,0.05 for SLAC and FEL methods and the Bayes factor was .50 for the REL method. The mixed-effects model of evolution (MEME) was used to detect positively selected sites under episodic diversifying selection in particular positions among different clades within a phylogenetic tree, even when positive selection was not evident across the entire tree. Positively selected sites with a P value of ,0.05 were reported.
Estimation of divergence time. Divergence time was calculated using complete ORF1ab sequence data of bat 229E-related CoVs, camel 229E-related CoVs, alpaca-related CoV, and HCoV-229E, with the Bayesian Markov chain Monte Carlo (MCMC) approach as implemented in BEAST (version 1.8.2) as described previously (25,47). Analyses were performed using the GTR 1 G model with coding sequences partitioned into the first plus second positions versus the third position, and rate variations between sites were described by a four-category discrete gamma distribution using the uncorrelated exponential relaxed molecular clock and exponential growth coalescent model. The MCMC run was 2 Â 10 8 steps in length, with sampling every 1,000 steps. Convergence was assessed on the basis of the effective sampling size after a 10% burn-in with Tracer software version 1.7.1 (available online at http://tree.bio.ed.ac .uk/software/tracer/). The mean time of the most recent common ancestor (tMRCA) and the highest posterior density regions at 95% (HPD) were calculated. The tree was summarized in a target tree by using the Tree Annotator program (version 1.8.2) included in the BEAST package by choosing the tree with the maximum sum of posterior probabilities (maximum clade credibility) after a 10% burn-in.
Accession number(s). The complete genome sequences of SARS-CoV-2 and HCoV-229E and the complete RdRp, S, and N nucleotide sequences of the 41 HCoV-229E strains were deposited in GenBank under accession no. MT786446 and MT797634 to MT797757.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.