Taiwan ended third COVID-19 community outbreak as forecasted

Accurate forecasting of community outbreaks is crucial for governments to allocate healthcare resources correctly and implement suitable non-pharmaceutical interventions. Additionally, companies must address critical questions about stock and staff management. Society’s key concern is when businesses and organizations can resume normal operations. Between December 31st 2019 and 2021, Taiwan experienced three separate COVID-19 community outbreaks with significant time intervals in between, suggesting that each outbreak eventually came to an end. We identified the ratio of the 7-day average of local & unknown confirmed to suspected cases as the key control variable and forecasted the end of the third outbreak by the exponential model. We forecasted the end of the third outbreak on Aug. 16th with threshold ratios of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.2\cdot 10^{-4}$$\end{document}1.2·10-4. The real observations crossed the threshold on Aug. 27th, eleven days later than forecasted, with the last case of the third outbreak confirmed and quarantined on Sept. 20th. This demonstrated the accuracy of the proposed forecasting method in predicting the end of a local outbreak. Furthermore, we highlight that the ratio reflects the effectiveness of contact tracing. Effective contact tracing together with testing and isolation of infected individuals is crucial for ending community outbreaks.


Data cleaning and preprocessing
According to Taiwan CDC News Bulletins, cases 16821, 16841, 16851, 16856, 16862, and 16865 were wrongly reported as abroad positive cases and CDC updated their local positive confirmed date on 2021-12-17.However, 2021-12-17 is not their real positive test date.Therefore, we have corrected the abroad positive dates as the local positive dates and removed the 7 confirmed cases on 2021-12-17.
We removed the first CDC News Bulletin for the days when more than one was issued, thus keeping the later report with the larger cumulative numbers.Then we corrected the number of days in between the report for all cases with more than one day in between by evenly distributing the change in cumulative number over the days in between and ensuring that the cumulative sum equals the reported one.Due to the change in the way of counting the number of suspected and excluded on 2020-03-06, the number of suspected cases is unreasonably low before it and a jump correcting for the change occur on that day.These artefacts are clearly seen in Figure 1 as a dip in the number of suspected before and a value slightly above 10 000 on this day.These artefacts clearly influence the 7-day moving average of the number of suspected that we need for getting a stable estimate of the ratio of confirmed infected to suspected cases.The total daily number of suspected cases in the CDC Open Data Portals dataset does not contain this error, so we decided to use it instead.
Since the number of excluded cases only was available in the CDC News Bulletins and contain a similar change in the way of counting on 2020-03-06, we assumed that the ratio of excluded to suspected was correct and used the ratio to estimate the number of daily excluded based on the total number of daily suspected from the CDC Open Data dataset.To ensure that the cumulative sum of daily excluded matched the reported values in the News Bulletins after March 6th, we added or removed excluded cases proportionally to the daily suspected from the days between two reports iteratively from the later report day to the former to avoid errors from rounding to whole integers later.This resulted in an estimate of the daily total number of excluded cases that on each day with a CDC News Bulletin report match the reported number exactly.The result of our preprocessing is visualised in Figure 1, where the number of suspected cases from the corrected News Bulletin and Open Data closely match after 2020-03-06.Note that a red circle without a dot in it indicate that more than one day had passed between the consecutive News Bulletins and the change is therefore larger than expected from one day to the next.We have verified that the cumulative number of excluded always remains smaller than the cumulative number of suspected.
Since our analysis is dependent on the 7-day moving average of suspected and excluded we also evaluated the relative difference in the 7-day moving average daily values obtained from the CDC News Bulletin and Open Data, relative to the Comparison of the change in number of suspected between CDC News Bulletins (red circles), the total daily number of suspected in the CDC Open Data (blue x), and the change in number of suspected between CDC News Bulletins corrected for the number of days in between (red dot), and the 7-day moving average of the later two (blue and red lines), and the 7-day moving average of the change in number of excluded between CDC News Bulletins corrected for the number of days in between (magenta line), and the 7-day moving average of our estimate of the daily number of excluded based on the total daily number of suspected in the CDC Open Data (cyan line) latter.The relative difference in number of suspected remains below 41% every day after 2020-03-06, below 10% 81% of the days, and below 1% 33% of the days.The large relative differences occur on separated days with small relative differences in between and are caused by differences in on which day extreme numbers of suspected are reported, see Figure 2.This is to be expected, since the Open Data most probably contains all suspected cases on each day, while the News Bulletins contain the cases added since the last bulletin.Thus these differences may shift values by one day, but no more.We cannot tell which of these two data sources is more reliable, but considering that the Open Data at least is corrected for the change in the way of calculating the number of suspected we decided to use it and our estimate of the total daily number of excluded based on it.This does not affect the ratio of excluded suspected cases, but it affects the ratio of local & unknown confirmed to suspected cases by up to 41% on individual days, which in practice means that specific values may be shifted in time by one day.Since our forecast of when the third outbreak will end is based on 19 time points any effects of this difference are well within the estimated confidence intervals.

Refitting to the updated data, 2023-03-08
The Taiwan CDC has made minor changes to the data, so we repeated the data cleaning and refit the exponential model to the cleaned data.All results are similar to our previous prediction and the forecasting result is shown in Figure 3.The forecasted end date of the third outbreak is on Aug. 13th with threshold ratio of 1.2 * 10 −4 The predicted half-time is 10 days (95% confidence interval: 8.9-10.4).We fitted this model to the ratio of the 7-day moving average of local & unknown confirmed to the 7-day moving average of suspected cases from the peak on 2021-05-27 to 2021-06-16, i.e.21 time points.The estimated parameters are a 0.030 (95% confidence bounds 0.029 -0.031) and b 0.10 (95% confidence bounds 0.096 -0.11).

Multiple prediction cycles
Since we operate on the ratio of local confirmed to suspected cases and we assume the policy, i.e.NPI, to remain the same, our forecasts are only valid as long as this assumption holds.During each outbreak in Taiwan, the contact tracing and testing effort was ramped-up, i.e. a change in policy happened during each outbreak.Since we did not know at what pace this ramp-up was going to occur, we considered it best to do each forecast under the assumption of the policy being unchanged.This is so that our forecasts show a continued outbreak until the moment when the ramp-up is sufficient to end the outbreak.We consider it important to maintain a sense of urgency so there is motivation to change the policy until the policy is sufficient to end the outbreak so that the forecasts contribute to the right actions being taken.This implies that the forecasting needs to be updated every few days and the latest forecast is always the most accurate.Using a shorter interval for the training means that the model quicker picks up on changes in policy but also becomes more sensitive to noise in the data and random events.
We fit the exponential model with a fixed 14-day moving interval with a stride of 3 days and forecast the ratio of the future 3 weeks.The results are shown in Figure 4.The fitting result of the third outbreak is starting upward since it is in the increasing trend, so the forecast goes up.It is until the peak time, May 27th, 2021, the forecasting result starts giving a negative slope and shows a strong indication of a decreasing trend.
To determine the minimum fitting period, we consider the initiation of our fitting process when the ratio begins to decline.Effectively mitigating the outbreak requires an increase in contact tracing capacity.This capacity could be assessed by examining the ratio of excluded to suspected cases, as depicted in Figure 1d.The ratio initially dips below 1 at the onset of the outbreak and rises above 1 again on June 6, 2021.As this ratio gets close to 1, it means we're nearing a normal level of contact tracing capacity.We investigated the influence of adjusting the data fitting period, forecasting the ratio and determining the date on which the fitted line crosses the threshold.The comparison between the forecasted end date for the third outbreak and the actual threshold passing date of August 27th, 2021 is presented in the accompanying Table .1.The R 2 consistently remains above 0.95.The absolute difference in days between the true end date and our forecasted end date remains below 14 days when considering the fitting period from May 27th to June 4th or beyond.The ratio of excluded to suspected cases remains close to 1 indicating a normal state of contact tracing capacity.Furthermore, we also forecasted the end of the first and second outbreaks by changing the fitting period.The results are presented in Table 2 and 3 where C(t) is the daily cumulative number of confirmed cases, r is the growth rate, k is the carrying capacity, and α governs the shape of the growth curve.The parameter values obtained from the fitted result are r = 2.9 × 10 −5 , k = 1.24 × 10 4 , and α = 5.3 × 10 −5 .
To facilitate a comparison with our exponential model, we transformed our forecasting of the ratio of local confirmed cases to suspected to daily confirmed cases.This transformation involved multiplying the ratio by the average value of the suspected cases in our fitting period (May 27 to June 16).The comparison result is shown in Figure 5.For the Richards model, we fitted the data from April 20 to June 16, marked as the magenta-shaded area.Our results better align with the actual data (red line) compared to the Richards model.

7/17
2 0 2 1 -0 5 -0 8 2 0 2 1 -0 5 -2 8 2 0 2 1 -0 6 -1 7 2 0 2 1 -0 7 -0 7 2 0 2 1 -0 7 -2 7 2 0 2 1 -0 8 -1 6 2 0 2 1 -0 9 -0 5 Figure1.Comparison of the change in number of suspected between CDC News Bulletins (red circles), the total daily number of suspected in the CDC Open Data (blue x), and the change in number of suspected between CDC News Bulletins corrected for the number of days in between (red dot), and the 7-day moving average of the later two (blue and red lines), and the 7-day moving average of the change in number of excluded between CDC News Bulletins corrected for the number of days in between (magenta line), and the 7-day moving average of our estimate of the daily number of excluded based on the total daily number of suspected in the CDC Open Data (cyan line)

Figure 3 .
Figure3.Refit of the forecast of the ratio of the 7-day moving average of local & unknown confirmed to the 7-day moving average of suspected cases and our forecast of the end of the third outbreak.The three community outbreaks are marked in blue (1st), green (2nd), and red (3rd).The cases between in grey are not connected Figure 4. Forecasts of the exponential model (black solid lines) with 14-day moving window with a stride of 3 days.The three community outbreaks are marked in blue (1st), green (2nd), and red (3rd).The cases between in grey are not connected.

Figure 5 .
Figure5.Comparison of the forecast results of the exponential model (black solid line) and the Richards model (magenta dash-dotted line).The three community outbreaks are marked in blue (1st), green (2nd), and red (3rd).The cases in grey are not connected.The magenta-shaded area is the time period for fitting the Richards model.

Table 1 .
. Forecasting the end of the third outbreak keeping the start date fixed on May 27th and varying the end date of the fitting interval from May 30, 2021, to June 15, 2021.The actual data crossed the threshold we used to define the outbreak end on August 27th, 2021.
During the escalation of the third outbreak, Taiwan's CDC ceased the daily reporting of recovery numbers.Specifically, there is a gap in recovery data from May 20th, 2021, to June 13th, 2021.Unfortunately, the 7-day moving average time data utilized for fitting our exponential model spans from May 27th, 2021, to June 16th, 2021, which largely overlapes the gap in the recovery data, creating challenges in comparing the SIR type model with our exponential model.We thus implement the classic Richards model.The Richards model is described by the ordinary differential equation (ODE):

Table 2 .
Forecasting the end of the first outbreak keeping the starting date fixed on February 28th and varying the end date of the fitting interval from March 3rd to April 13th, 2020.The actual data crossed the threshold ratio on April 15th, 2020.

Table 3 .
Forecasting the end of the second outbreak keeping the starting date fixed on January 22nd and varying the end date of the fitting interval from January 25th to February 5th, 2021.The actual data crossed the threshold ratio on February 5th, 2021.

Table 4 .
The data used for forecasting

Table 5 .
The data used for forecasting(continued)

Table 6 .
The data used for forecasting(continued)

Table 7 .
The data used for forecasting(continued)

Table 12 .
The updated data used for refitting(continued)