Forecast of the COVID-19 trend in India: a simple modelling approach

By February 2021, the overall impact of the COVID-19 pandemic in India had been relatively mild in terms of total reported cases and deaths. Surprisingly, the second wave in early April becomes devastating and attracts worldwide attention. On April 30, 2021, India became the first country reporting over 400,000 daily new cases. Multiple factors drove the rapid growth of the epidemic in India and caused a large number of deaths within a very short period. These factors include a new variant with increased transmissibility, a lack of preparations exists national wide, and health and safety precautions poorly implemented or enforced during festivals, sporting events, and state/local elections. Moreover, India's cases and deaths are vastly underreported due to poor infrastructure, and low testing rates. In this paper, we use the COVID-19 mortality data in India and a mathematical model to calculate the effective reproduction number and to model the wave pattern in India. We propose a new approach to forecast the epidemic size and peak timing in India with the aim to inform mitigation in India. Our model simulation matched the reported deaths accurately and is reasonably close to results of serological study. We forecast that the IAR could reach 43% by June 13, 2021 under the current trend, which means 532,629 reported deaths with a 95% CI (552,445, 513,194) ie., double the current total deaths. Our approach is readily applicable in other countries and with other type of data (e.g. excess


Introduction
The COVID-19 pandemic has lasted for more than a year now. The severity of the impact of COVID-19 varied wildly across nations. While the pandemic started to slow down in most developed countries due to the combined effects of high infection attack rate and high vaccination coverage, the impact in India had been mild in terms of the total number of deaths and cases, until April 2021. An on-going second wave of the COVID-19 devastates India 'unexpectedly' since early April 2021. The unfold of the COVID-19 in India is as follows. On January 30, 2020, the first case of COVID-19 was reported in Thrissur, India(1). On March 12, 2020, the first COVID-19 fatality in India was reported (2). On March 31, 2020, a Tablighi Jamaat religious congregation event in Delhi resulted in a quarantine of 22,000 people. In March 2020, the cluster-containment campaign was carried out. However, by March 15, 2020, only 10% of the testing capacity had been used per day in India. By early July 2020(3), the seroprevalence reached 57% or 16% among inhabitants in three slums or other areas of Mumbai, which is the commercial capital of India. These high infection attack rate (IAR, proportion of population got infected) caught wide media attention. A supermodel for COVID-19 progression developed by a government-appointed committee (4) estimated that the COVID-19 (first wave) peaked in October 2020 and it would be under control by February 2021. Unexpectedly, a new variant, named Lineage B.1.617, was detected in India in October 2020 (5,6) and is blamed as one of the factors for the second wave. In October 2020, B. The early stage of COVID-19 epidemic mitigation in India was promising. As reported in (9), it had the highest number of daily tests in the world by the third quarter of the year 2020. The vaccination program was also initiated on 16 January 2021. By February 2021, the confirmed cases of COVID-19 had dropped to 9,000 people per day. However, the pandemic changed dramatically thereafter. A major second wave of the COVID-19 emerged in India in early April 2021. By the end of April, an average of 300,000 daily new cases and 2000 daily deaths were reported. For instance, on April 30, a total of 400,000 daily new cases and 3500 daily deaths were reported. Multiple factors drove the rapid expansion of the epidemic in India and caused a large number of deaths, which include the appearance of a new SARS-CoV-2 variant (Lineage B.1.617), the lack of preparation, poorly-implemented or enforced health and safety precautions during festivals, sporting events, state/local elections. As of May 3, 2021, nearly 20 million confirmed cases and 218,959 deaths were reported in total. During the period of the second wave in India, test positivity has increased dramatically from 2% on March 1 to 22% on May 1.
Seroprevalence estimated that the prevalence of infection among people ten years or older was 6.6% and a cumulative 74.3 million people had be infected with COVID-19 by August 2020 (10). However, only 3,621,245 cases and 64,469 deaths were reported by August 2020 (9). A survey (7) conducted by the Indian Council of Medical Research (ICMR) found that 21.4% of 28,589 surveyed people above 18 years old had been infected by Feb 4, 2021.These serological studies found a much higher IAR than reported cases, namely a very low ascertainment rate or reporting rate. Therefore, one needs to take into account these serological studies into modelling to yield reasonable IAR estimation and useful forecast for pandemic mitigation.
Mathematical modelling can be successfully used to explain and predict the spread of infectious diseases (11,12). In this work, we adopt a simple compartmental epidemic model. The model assumes a time-dependent transmission rate due to the changes in human behaviour and the implementation of control measures. Thus models for forecast need to incorporate time-varying transmission rate. Other epidemic parameters, including the generation interval, the reporting rate, and the infection fatality rate may vary as well. But for the sake of simplicity of the model and identification issues (whether the change can be detected via model fitting), we focus on time-varying transmission rate and assume other parameters to be constant.

Methods
We download daily reported COVID-19 cases and deaths from (9). To have a better understanding of the historical seasonality of respiratory diseases, we also download weekly reported influenza cases from (13). We adopt a susceptible-exposed-infectious-hospitalized-death-recovered (SEIHDR) model with a time-varying transmission rate: Here, the compartments S, E, I, H, D, and R denote susceptible, exposed, infectious, hospitalized, total death, and recovered individuals, respectively. Parameter ( ) is the transmission rate,  is the infectiousness emergence rate (0.5 per day),  is the infectiousness disappearance rate (1/3 per day), is the removed rate (1/14 per day),  is the ratio of hospitalized cases out of all infected cases, and  is the proportion of deaths out of hospitalized cases. All parameters are constant except ( ) being time-varying. Since we did not fit model to the daily hospitalized cases, the exact definition of hospitalized class is not relevant. Namely, the H class can be viewed as severe cases or symptomatic cases. We consider that ≪ 1 and ≪ 1. The product equals the infection fatality rate. These two cannot be disentangled with purely death data and without additional reliable data (e.g. hospitalized data). Based on previous works (17)(18)(19), ( ) = exp(cubic spline with nodes) as an exponential cubic spline with nodes evenly distributed over the study period. Using standard model selection technique, we found =8 attains the smallest the second order Akaike Information Criterion (AICc). Thus =8 is used in this work. The time step size is one day and D t is the daily deaths. The reported deaths were defined as t C with ~(mean D , variance D (1 D )).
Here, the parameter denotes the overdispersion and accounts for the measurement noise due to surveillance and heterogeneity among individuals. The basic reproductive number 0 ( ) = ( )/ is a function of time. We fit the model to reported deaths with state-of-the-art iterated filtering (20,21), and assume that death report is relatively insensitive to testing policy, testing effort, and testing availability. Although an even better way would be to use the excess deaths (22), our modelling approach is readily applicable to excess deaths. Our study period (training part) is from March 12, 2020, to April 28, 2021. We append 45 days of N.A. value (not applicable) to the end of the training data. Then we fit our model to the whole time series. The last 45 days (from April 29 to June 13, forecast part) play no role in the fitting since the data for that period are N.A. Our cubic spline spans from the whole time series anyway (both training part and forecast part). Thus the transmission rate over the training part is estimated from the data, the transmission rate over the forecast part is a natural extension of that over the training part. Thus we can simulate our model and yield both the training part and the forecast, for we have the initial state (end state of the training part) and transmission rate (natural extension) for the forecast part.    The daily death per capita in India is still at a relatively low level compared to the other three countries. However, from the beginning of April 2021, the rapid elevation of the second wave of COVID-19 in India is devastating with 300,000 daily new cases for14 consecutive days and 3,000 daily deaths for 5 consecutive days. Such a dramatic increase in daily cases attracted worldwide attention and global concern. Figure 2 also shows that the vaccination coverage rate is 44.79% in the United States of America while 9.42% coverage in India. The vaccination program was launched on January 16, 2021, in India, and 4 million doses per day were administered by April 2021. However, the overall coverage of vaccines at the population level is still low. Some states in India were unable to begin vaccination due to the shortage of vaccination supplies. The shortage of vaccination supplies makes the epidemic of COVID-19 in India even worse.  Our estimated IAR is about 10% by Feb 2021, which is lower than the estimates of serological study of ICMR 21.4% (23). However, our estimated IAR is about 5.4% by August 2020, which is very close to the serological study of 6.6% (10).

Discussion and Conclusion
We proposed a simple model approach for modeling and forecast the COVID-19 epidemic in India. The method can be readily applied to other countries. Given the daily reported death data with a length of 13 months, we forecast the trend of COVID-19 in the next 45 days (April 29 to June 13, 2021), by fitting a model with a cubic spline type of transmission rate function to the whole study period including both the 13 months observed data and the 1.5 months N.A. deaths. In calculating the fitting performance, each N.A. will yield a likelihood of 1, thus zero contribution to the log-likelihood. In other words, if an N.A. data point appears in the middle of the training data part, it will yield a conditional likelihood 1, thus zero contribution to log-likelihood, whereas the transmission rate is not only continuous at that data point but also continuous on the second order derivative. Here we add N.A. data at the end of the observed data, but the logic is similar. When the forecast part is relatively short, this method should work well. Hence, the reconstructed transmission rate naturally contains a part over the forecast period, which shows a decreasing trend. This decreasing trend mimics the decreasing trend in last March-April. This decreasing trend means that the current spreading of COVID-19 in India has already started to slow down (e.g., on the second order derivative). The objective of this study is to model and forecast the transmission of COVID-19 in India. The estimated infection attack rate (IAR) could reach 43% by June 13, 2021. Huang et al. (24) predicted that 33,470,999 (upper bound) confirmed cases would be reported by May 31, 2021, which means a 2.4% infection attack rate (case/population). This is obviously low. If a 1/10 (or 1/20) reporting rate is considered, then the IAR becomes 24% (or 48%). This is largely reasonable. Namely, a reporting rate needs to be assumed. In our case, we explicitly modeled infection fatality rate (IFR) and assumed that the death data is relatively reliable. The estimated infection fatality rate is 0.1%, which is significantly lower than a biologically reasonable level 0.6%. For instance, Russell et al. (17) estimated an IFR of 0.6% in China. This implies that the COVID-19 death is under-reported by a factor of 1/5 or 1/6 (25). Thus, these estimates of IFR relied on the confidence of the COVID-19 death data. If death data are under reported by a factor, then the IFR could be under estimated by the same factor.
The long-term prediction is very difficult, due to the ever-changing reality like human behavioral change and governmental action. Here we proposed a simple modeling approach. We assumed that the transmission rate in the forecast part is a natural extension of the training part. Both parts constitute a whole (exp-) cubic spline function. Ranjan et al. (26) estimated the peak for the second wave to occur in mid-May 2021 with a daily count exceeding 0.35 million. However, on April 30, 2021, a total of 400,000 daily new cases and 3,500 daily deaths were reported, and there are 300,000 daily new cases for14 consecutive days and 3,000 daily deaths for 5 consecutive days. The prediction in (26) is apparently lower than reality. According to the seroprevalence results (6), Murhekar et al. estimated a cumulative 74.3 million infection while 3,621,245 cases were reported in India by August 2020.Then mortality data of COVID-19 in India was used to predict the transmission of COVID-19 in India, which is largely reasonable.
The strength of this work includes that we fit a simple model to the reported death which is believed insensitive to COVID-19 testing efforts compared to reported cases. Mild cases will less likely to be tested. Thus fluctuation in the reported cases likely contains variation in testing efforts. We assume the transmission rate in the forecast period is a natural extension of that over the training part. Both the first and second derivatives are continuous in the transition point between training and forecast. To our knowledge, this approach is novel. We compared our estimated IAR to the reported serological studies. To our knowledge, very few studies have done such a comparison. We argue that it is important that the model output is largely in line with the serological study at different time points.
The limitation of this work includes: we assume all parameters are constant except for the transmission rate; the model is for the whole country while we ignore the heterogeneity across regions. We only relied on the reported data and adopted a non-mechanistic cubic spline function type of transmission rate. Alternatively, one could consider explicitly incorporate all kinds of control measures (e.g. google mobility matrix, etc.) We ignored the effects of vaccination since the two-dose coverage only reached a lecvel <10% by now.
We conclude that multiple factors drove the rapid growth of the second wave and caused a large number of deaths within a very short period in India. Under current trend, the IAR will likely reach a level of 43% by June 13, 2021. Stronger control measure than what we forecasted could lower the estimated IAR by June 13, 2021. However, considering that our estimated IAR by Feb, 2021 was lower than the ICMR estimate, an IAR at 43% by June 13, 2021 should be reasonably close to the reality.