Quantifying the effect of government interventions and virus mutations on transmission advantage during COVID-19 pandemic

Background The coronavirus disease 2019 (COVID-19) pandemic has become a major public health threat. This study aims to evaluate the effect of virus mutation activities and policy interventions on COVID-19 transmissibility in Hong Kong. Methods In this study, we integrated the genetic activities of multiple proteins, and quantified the effect of government interventions and mutation activities against the time-varying effective reproduction number Rt. Findings We found a significantly positive relationship between Rt and mutation activities and a significantly negative relationship between Rt and government interventions. The results showed that the mutations that contributed most to the increase of Rt were from the spike, nucleocapsid and ORF1b genes. Policy of prohibition on group gathering was estimated to have the largest impact on mitigating virus transmissibility. The model explained 63.2% of the Rt variability with the R2. Conclusion Our study provided a convenient framework to estimate the effect of genetic contribution and government interventions on pathogen transmissibility. We showed that the S, N and ORF1b protein had significant contribution to the increase of transmissibility of SARS-CoV-2 in Hong Kong, while restrictions of public gathering and suspension of face-to-face class are the most effective government interventions strategies.


Introduction
Coronavirus disease 2019 (COVID-19), a respiratory infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was declared to be a public health emergency of international concern by the World Health Organization (WHO) in January 2020 [1]. As of November 23, 2021, there are over 256 million cumulative confirmed cases of COVID-19, with more than 5.1 million deaths globally [2].
Previous studies identified a clear relationship between molecular-level mutation activity in the Spike (S) protein and transmission advantage of SARS-CoV-2 [3][4][5][6][7], and evolution in SARS-CoV-2 virus pose huge challenges in the continuous control of the outbreak [6,8]. However, most of the evaluations of epidemiological impact of mutations were based on modelling substitutions in the S protein.
Multiple mutations, or mutations in other proteins, have not been systematically considered. On the other hand, public health intervention policy could also modify infectious disease transmissibility. Studies in many countries and regions indicated that a series of https control measures, such as cordons sanitaire and social distancing, could effectively mitigate the spread of COVID-19 [7,[9][10][11][12][13][14]. In response to the widespread community COVID-19 transmission, Hong Kong government has implemented various public health interventions. Though both virus evolution and government interventions critically determine the COVID-19 transmissibility, there has been few statistical methods to jointly analyze multiple mutations and policy measures [15]. In this study, we proposed a statistical framework to assess the genetic activities of multiple proteins in SARS-CoV-2, and quantified the effect of public health interventions and integrated mutation activities on COVID-19 transmissibility. We used the publicly available COVID-19 surveillance data and the human SARS-CoV-2 strains in Hong Kong as a demonstrating example.

Materials and methods
The human SARS-CoV-2 strains were retrieved from Global Initiative on Sharing all Influenza Data (GISAID) [16]. All available strains in Hong Kong SAR with the collection date ranging from January 21 to December 16, 2020 amount a total number of 412 fulllength sequences and 298 Spike protein sequences. Three full-length sequences without clear collection date were then excluded. Multiple sequence alignment was performed using MAFFT (version 7) [17] and the Wuhan-Hu-1 genome (GISAID: EPI_ISL_402125) was considered as the reference strain.
The time-varying effective reproduction number R t was employed to measure the instantaneous transmissibility of infectious disease [5,[18][19][20], which was defined as the expected number of secondary cases arising from a primary case infected at and before time t. The data of R t as well as number of confirmed cases were collected from website of Centre for Health Protection (CHP) in Hong Kong [21]. The information of government interventions was collected from Hong Kong Government news page [22]. The interventions included closures of schools, orders for government employees to work from home, restrictions of restaurant dining, restrictions on group gatherings, closures of entertainment places and suspensions of non-essential public services. All the interventions were coded as binary variables, see Supplementary Material S1 for details.

Quantifying the genetic activity by g-measure
Most mutations in the SARS-CoV-2 genome were expected to be deleterious and purged swiftly [23]. In previous studies [7,24,25], we have proposed a computation framework to detect key mutations whose prevalence reached dominance and maintained for a period of time. The key mutations were expected to associate with epidemiological intensity and mutation advantage at population scale. Then, the time-varying activities of key mutations can be measured by the summation of their prevalence in a certain period of time, namely g-measure. Thus, the g-measure reflects the overall level of genetic activities of key mutations in the sample sequences, and is denoted as = g [g ] t for time interval t. In this study, we employed the g-measure to quantify all key mutations in the genome of SARS-CoV-2.
A sliding window was applied to the investigating periods. Let w denote the window size that indicated a constant period length, and s denote the step length between two consecutive windows. Hence, for = g [g ] t in time t, the g-measure was computed based on sample sequences collected from t − w/2 to t + w/2. The w was set to be 15 days, and s is 3 days.

Integrating analysis of multiple proteins
Besides mutations in S protein [5,15], substitutions in other proteins may also involve in intra-viral protein interactions and further affect viral fitness [26]. With this regard, we integrated the genetic risk variants under the framework of the Polygenic Risk Score (PRS) [27][28][29]. Let p be the effect size of g-measure calculated for protein p, and then a linear model is applied to fit the relationship, where g(p) is the g-measure of protein p during the investigated period. Denote ˆp as a sample estimate, the integrative g-measure of multiple proteins is, The advantage of this combination is that it avoids the collinearity between genetic activities on multiple proteins when estimating coefficients simultaneously [27]. Following the framework, we also integrated the effect of government interventions I t in terms of effect size on R t of each intervention. Generalized linear regression was applied to estimate the effect of government interventions and g-measures on the transmissibility R t . Alternative fitting functions can be chosen flexibly according to sample size. All statistical analysis was conducted in R (version 3.6.3), and the two-sided pvalue < 0.01 was considered as statistically significant.

Results
First, we summarized the mutation activity of each protein by gmeasure for the SARS-CoV-2 virus. Of the 10 proteins in the genome, five of them contained dominant substitutions for constructing the g-measure, which are the spike (S), nucleocapsid (N), open reading frame 1 (ORF1), ORF3a and ORF8. Next, we tested the association between the g-measure of these proteins and the transmissibility variable R t . In univariate analysis, four of the proteins were significantly associated with R t (Table S2.1). After controlling for governmental interventions, the g-measure of three proteins remained to exhibit positive and significant association with the virus transmissibility, which were the S protein (coefficient = 0.34, p-value < 0.001), ORF1b (coefficient = 0.24, p-value < 0.01) and N protein (coefficient = 0.22, p-value < 0.01) (Table S3.1). The S protein exhibited the strongest impact on the increasing of virus transmissibility. Our estimates showed that R t would increase by 0.34 corresponding to one-unit increase in the g-measure of the S protein. Direct comparisons of the effect sizes of different genes showed the influence of mutations in the ORF1b is 70.6% of the S protein, and the effect size of N protein is 64.7% as much of the S protein.
The effect of government interventions was first examined by univariate analysis. Three out of six types of government interventions were significantly related to the reduction of R t (Table S2.2), including the suspension of face-to-face class, ban on gatherings in public places and providing limited public services. After controlling the mutation summary statistics, only the first two interventions remained significant, among which the ban on gatherings had the largest effect on reducing R t (coefficient = −2.21, p-value < 0.001).
Next, we examined how well the genetic measure and government interventions could explain the trend of R t . Following Eq. 2, an integrated g-measure was constructed to summarize the overall mutation activities from S, N and ORF1b, and an integrated government intervention variable was formed likewise to account for the suspension of face-to-face class and ban on gathering. A generalized linear model was applied to evaluate the genetic and policy intervention contribution to R t (Fig. 1a). These two summary measures explained 63.2% (R-squared) of the variability of virus transmissibility. The intercept 2.40 in the model indicated the reproductive number when no genetic mutation and intervention occurred, which was very close to the result estimated in January, 2020 when the virus just began to spread in China (R 0 = 2.68, 95% CI 2.47-2.86) [30].

Discussion and conclusion
In this study, we quantitatively assessed the effect of genetic activities in multiple proteins and public health interventions on the transmissibility of SARS-CoV-2 in Hong Kong SAR.
The S, ORF1b and N protein mutations were shown to be positively and significantly associated with virus transmissibility after controlling for governmental interventions. The finding could be reasonably explained by the biological evidences. The S protein is responsible for attachment of the virus to host cell-surface receptor [31] and is the principal target of neutralizing antibodies [32], while mutations on S protein will affect functional properties thus increase virus infectivity and monoclonal antibodies escape [23]. For instance, experimental studies have shown that the variants containing 614 G on the S protein were significantly more infectious than the variant carrying 614D. Genetic variants carrying A475V, L452R, V483A, and F490L, which are located on the receptor binding site (RBD), became resistant to neutralizing antibodies [33,34]. Besides, the N protein is also a major target for antibody response and contains T cell epitopes [35], and the RNA-dependent RNA polymerase (RdRp) in ORF1b plays a central role in the replication and transcription cycle of SARS-CoV-2 [36]. These proteins are critical in determining the course of transmission, infection and reproduction of the virus.
Moreover, previous studies found strong evidences between nonpharmaceutical interventions and the reduction of R t in multiple regions [10,11,14]. In this study, the genetic aspect of the virus was further included in the model to control the effect of mutations. Public gathering restriction and suspension of face-to-face teaching were estimated to have significant effect on mitigating R t . Besides, our fitted model accurately captured the three waves of COVID-19 outbreak in Hong Kong (Fig. 1, red curve), occurred in March, June-July and November, 2020. During these periods, the estimated R t raised to greater than 1.0 for over 10 successive days (Fig. 1a, blue  curve).
Several limitations of this study should also be noted. First, the COVID-19 cases and SARS-CoV-2 strains were mapped to timeline according to their reporting time and sequence collection date, while temporal lag might exist in reality [37]. Second, the genetic activities of import cases may not contribute to R t since the imported cases have not seeded transmission due to quarantine measures in Hong Kong. In this study, the g-measure only accounted activities of mutations whose prevalence reached dominant and maintained for a period of time. The occasionally increased mutation prevalence was not included in the g-measure, and thus minimized the bias causing by sequences of imported cases. Moreover, during the first wave in March 2020 in Hong Kong when most of the cases were imported [38], the scale of g-measure was small. And the second and third waves in Hong Kong were mainly driven and composed by local transmission (Fig. 1b). Therefore, the limitation due to importation are minor. Third, the statistical association between the g-measure  Fig. 1a shows the observed R t (red curve) and fitted R t (blue curve) in Hong Kong, and the grey dash line represents R t equals to 1.0. The number of confirmed cases is shown in Fig. 1b. They are classified as imported case (light red bars), epidemiologically linked with imported case (dark red bars), local case (light blue bars) and epidemiologically linked with local case (dark blue bars).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) and R t , and interpretation of a causative relationship should be considered together with biological mechanisms and evidence.
To conclude, this study provided a convenient statistical framework to evaluate the effect of genetic contribution and non-pharmaceutical intervention on modifying pathogen transmissibility. We showed that the S, ORF1b and N protein had significant contribution to the increase of R t during the first three waves of COVID-19 in Hong Kong, while restrictions of public gathering and suspension of faceto-face class are the most effective government interventions strategies. collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Ethics approval and consent to participate
The ethical approval or individual consent was not applicable.