Shigellosis seasonality and transmission characteristics in different areas of China: A modelling study

Objective In China, the burden of shigellosis is unevenly distributed, notably across various ages and geographical areas. Shigellosis temporal trends appear to be seasonal. We should clarify seasonal warnings and regional transmission patterns. Method This study adopted a Logistic model to assess the seasonality and a dynamics model to compare the transmission in different areas. The next-generation matrix was used to calculate the effective reproduction number (Reff) to quantify the transmissibility. Results In China, the rate of shigellosis fell from 35.12 cases per 100,000 people in 2005 to 7.85 cases per 100,000 people in 2017, peaking in June and August. After simulation by the Logistic model, the ‘peak time’ is mainly concentrated from mid-June to mid-July. China's ‘early warning time’ is primarily focused on from April to May. We predict the ‘peak time’ of shigellosis is the 6.30th month and the ‘early warning time’ is 3.87th month in 2021. According to the dynamics model results, the water/food transfer pathway has been mostly blocked off. The transmissibility of different regions varies greatly, such as the mean Reff of Longde County (3.76) is higher than Xiamen City (3.15), higher than Chuxiong City (2.52), and higher than Yichang City (1.70). Conclusion The ‘early warning time’ for shigellosis in China is from April to May every year, and it may continue to advance in the future, such as the early warning time in 2021 is in mid-March. Furthermore, we should focus on preventing and controlling the person-to-person route of shigellosis and stratified deploy prevention and control measures according to the regional transmission.


Introduction
Shigellosis, also known as bacterial dysentery, leads to a high incidence in children under five years old in the middleand low-income countries and is commonly found among travelers and men who have sex with a man (MSM) in the high-income country (Kotloff et al., 2018). Although incidence keeps decreasing in China, the disease burden is unevenly distributed (Zhang, Ding, et al., 2016). For example, the average incidence in Sichuan Province was 22.12 per 100,000 from 2004 to 2014, but the incidence in Hunan Province declined from 15.8 in 2005 to 6.0 per 100,000 in 2010 (Ma et al., 2015;Xu et al., 2017). The peak of incidence and the duration varied widely between Northern and Southern China (Zhang, Ding, et al., 2016).
The seasonality of epidemics of shigellosis is known. Two peaks of shigellosis incidence were reported in Vietnam, with the central peak occurring during the rainy season and a secondary peak in the early or late rainy season (Lee et al., 2017). In China, the peak of shigellosis incidence was from June to August (Zhang, Ding, et al., 2016). The peak in Zhejiang Province occurred between July and October from 2005 to 2010 (Yan et al., 2018). The Logistic model has been used to analyze the relationship between meteorological factors and shigellosis incidence but not to estimate the seasonal epidemic (Zhang, Ding, et al., 2016, Zhang, Si, et al., 2017. However, the Logistic model could evaluate the seasonality of infectious diseases Xie et al., 2015, Zhang, Si, et al., 2017. It is thus possible to build an early warning process for shigellosis based on the Logistic model. Most studies adopted the Autoregressive Integrated Moving Average model (ARIMA) to predict the trend of shigellosis incidence (Tang et al., 2014;Wang et al., 2016, Zhang, Ding, et al., 2016. However, they did not estimate the transmissibility. The SusceptibleeExposedeInfectious/AsymptomaticeRecoveredeWater/Food (SEIARW) model was successfully employed for a shigellosis school epidemic in Changsha City in 2014 . Another study indicated that the SEIARW model could estimate the transmission of shigellosis in Hubei Province and predicted it would be interrupted in 2029 . Then, to estimate transmission for various genders, a sex-based SusceptibleeExposedeInfectious/Asympto-maticeRecovered (SEIAR) model was used . None of them, however, compared transmission in four locations.
This study calculated the seasonality of reported cases each month in China from 2005 to 2017. We then chose four locations in China to investigate variations in transmissibility.

Ethics statement
This disease monitoring and control endeavor was part of the Chinese Center for Disease Control and Prevention's (CDC) normal responsibilities in Yichang City, Xiamen City, Chuxiong City, and Longde County, thus no institutional review or informed consent was necessary. All data analyzed were anonymized.

Data collection
The provincial dataset of reported shigellosis from all over China (except Hong Kong, Macao, and Taiwan) from 2005 to 2017 was collected from the Public Health Science Data Center of the Chinese Center for Disease Control and Prevention (CDC). It included the monthly reports of cases and incidence of shigellosis in the whole population and separated age groups from 2005 to 2017. The study collected the shigellosis data of each patient from four areas' CDC, i.e., Yichang City from 2005 to 2018, Xiamen City from 2005 to 2018, and Chuxiong Yi Autonomous Prefecture (also known as Chuxiong City) from 2008 to 2018, and Longde County from 2005 to 2018. In the 2005 Shigellosis incidence map, the geographic coordinates of four chosen sites were determined (Fig. 1). The variables collected in the study, including sex, age, and date of onset and the demographic data were collected from the Statistical Yearbook.

Logistic model
The Logistic model is an ordinary differential equation (ODE) model used to describe the self-growth characteristics of biological populations (Tsoularis & Wallace, 2002). The equation is as follows: The left side of the equation shows the instantaneous rate of cumulative cases (n). k is the correlation coefficient, and N is the upper limit of cumulative cases. After the illness has become saturated in the population, the mean of 1À n N symbolizes the epidemic's growth limit and the creation of the immunological barrier. The onset cases gradually decrease, and the epidemic ends. The Logistic model shows a type of 'S' curve, and the disease process leads 'slow-fast-slow'. The general solution to equation (1) is: c represents a constant. According to the characteristics of the Logistic model, it calculates the third derivative of equation (2) and makes it equal to 0, then: The t 1 ¼ ÀcÀ1:317 k is the turning point of the curve in the model changing from slow to fast (epidemic acceleration). The t 2 ¼ À c k is the turning point of the curve changing fastest (the peak of illness onset). The t 3 ¼ Àcþ1:317 k is the turning point of the curve in the model, changing from fast to slow. Z. Zhao, M. Yang, J. Lv et al. Infectious Disease Modelling 7 (2022) 161e178 Any pair of fitted data is substituted into equation (2), which can calculate the parameter c, and the month of 'epidemic acceleration' per year was calculated by equation (4). We can further calculate the mean and standard deviation (SD) for the month of 'epidemic acceleration'. Considering that implementation of health decisions and interventions could take a while, we add a SD to the month of 'epidemic acceleration', which is defined as 'early warning time'.

SEIARW model
Two transmission routes are considered in this study. The first one is from person-to-person and the second one is water/ food-to-person. In the SEIARW model, the compartment S, E, I, A, and R represent susceptible, exposed, infectious, asymptomatic, and recovered individuals, respectively Zhao, Chen, et al., 2020). The compartment W refers to water or food in the environment. The definition of the six compartments is provided in Table 1. In the model (Fig. 2), we assume that: a) The disease does not spread vertically. All persons are equally susceptible. The natural birth rate is br, and the mortality rate is dr. b) The susceptible people will be infected after contact with an infected person or contaminated water/food. The transmission rate was b and b w , respectively. Meanwhile, we hypothesized that symptomatically infected persons are transmissible at k times the rate of asymptomatically infected people. c) A proportion (1-p) of exposed individuals will change to symptomatically infected people (I) following an incubation period. In contrast, the rest (p) of exposed individuals will become asymptomatic (A) following a latency period (the period during which the exposed individuals become an asymptomatic person). d) Both symptomatic and asymptomatic infected people will recover at the rate g and g 0 , respectively, with 1/g and 1/g 0 representing their average infectious periods.
e) The infected people shed pathogens into W at rates m and m 0 respectively for symptomatic and asymptomatic people. We assumed m' ¼ cm.
f) Shigella will die in the water/food after a given time, and the daily decreasing rate of the pathogen is εW.
The differential equation is expressed as follows: The left side of the equation shows the instantaneous rate of change of S, E, I, A, R, and W at time t. In the model adapted the mass action incidence rate ½bSðI þkAÞ which is better suited to outbreaks with smaller populations . Standard incidence rate ½bSðI þkAÞ=N is more suitable to simulating epidemics with the large populations according to some studies (Anderson & May 1986;Li et al., 2003). Although we should model based on huge populations using standard incidence rates, we must unify the units between person and pathogen in the reservoir (water/food) Tien & Earn, 2010). Actually, because the normalized total population can be taken to equal 1, the results produced by normalization are essentially the same as those obtained by using standard incidence rate. The normalized equations are as follows: The equation is as follows: ds dt

Estimation transmissibility
The basic reproductive number (R 0 ) was always defined as the expected number of secondary infections that result from introducing a single infected individual into an otherwise susceptible population (Chen et al., 2014a(Chen et al., , 2019Zhang et al., 2020). However, we cannot estimate R 0 in this study because we used surveillance data from four regions. Local government intervention measures always influence the wave of surveillance data. We defined the reproduction number estimated after the intervention as an effective reproductive number (R eff ) . Therefore, we adopt R eff to quantify the transmissibility.
In the study, we used next-generation matrix approach to calculate R eff . According to equation (5), we can get the disease-free equilibrium point as: ðN; 0; 0; 0; 0; 0Þ (7) In the matrix: By the next generation matrix approach, we can get the next generation matrix and R eff for the SEAIRW model: The 'knock-out' simulation, which involves blocking the various shigellosis transmission channels, is another approach for estimating transmissibility (Zhao et al., 2020a(Zhao et al., , 2020b. In this study, we simulated four 'knock-out' situations in this investigation, each shutting off one or two transmission routes:

Parameter estimation
The definition of parameters b, b, b w , b w , k, u, p, g, g 0 , ε, c, m, m 0 , br and dr refer to Table 2. The following information or assumptions were used to estimate the parameters: a) We set parameters k, g 0 , ε and c as 0.3125 (range: 0e1), 0.0286 (range: 0e0.0357), 0.6931 (range: 0e1) and 0.3125 (range: 0e1), respectively, based on the epidemiological characteristics of shigellosis and our previous studies Zhao, Chen, et al., 2020). b) The proportions of asymptomatic individuals were reported to range from 0.0037 to 0.27 (Bovee et al., 2012;Khan et al., 2006;Qadri et al., 1995). We set p ¼ 0.1 in the SEIARW model. c) The incubation of shigellosis was reported to range from 1 to 3 days (Debnath et al., 2018;Organization, 2008;Xiao et al., 2012). We assumed the incubation period was equal to the latency period. In the model, we set u as 1 (range: 0.3333e1). d) The symptoms of shigellosis usually last for a week, but some individuals may experience them for several weeks (Lampel et al., 2018, Prevention, 2020). In the model, we assume that the course of the disease is up to three weeks. We set g to 0.0741 (range: 0.0477e0.1428).
e) The br and dr were 0.00605 and 0.00524 in Yichang City, 0.01027 and 0.00456 in Xiamen City, 0.0108 and 0.00643 in Chuxiong City, 0.01651 and 0.00758 in Longde County. According to the local Statistical Yearbook, the total population of each year was determined differently in four districts.

Simulation method and statistical analysis
Berkeley Madonna 8.3.18 (developed by Robert Macey and George Oster of the University of California at Berkeley; Copyright ©1993e2001 Robert I. Macey & George F. Oster, University of California, Berkeley, CA) was employed for the model simulation. The simulation methods (RungeeKutta method of order four with tolerance set to 0.001) were the same as those previously published research (Chen et al., 2016Liu et al., 2015). Berkeley Madonna adopted the curve fitting of the least root-mean-square deviation. GraphPad Prism 7.0 (GraphPad Software, La Jolla, CA) and Microsoft Office Excel 2019 (Microsoft, Redmond, WA, USA) were employed for the figure development and data analysis. All the maps presented in this study were created with Pyecharts 1.2.1 (2017e2019. Powered by docsify), a package from Python software, version 3.6.1 (Copyright 2001e2017; Python Software Foundation, Powered by Heroku). The coefficient of determination (R 2 ) calculated SPSS 21.0 (IBM Corp, Armonk, NY, USA) was adopted to judge the model goodness of fit. The Linear model in SPSS 21.0 was used to fit and predict the data of peak and early warning time.

Sensitivity analysis
In our model, the seven parameters were split into 1,000 values according to their range. The value of mean and mean ± SD was exited after the sensitivity analysis in the model. As the simulation method was the same in each year with different areas, the sensitivity analysis was just performed for 2009 data of Longde County.

Epidemiological characteristics of shigellosis
In China (excluding Hong Kong, Macao, and Taiwan), the incidence of shigellosis has fallen from 35.12 cases per 100,000 in 2005 to 7.85 cases per 100,000 in 2017 (Fig. 3). The reported incidence varied in different areas. The top medians of reported incidence were 94.35/100,000 in Beijing City, 65.50/100,000 in Tianjin City, 42.87/100,000 in Gansu Province, 41.08/100,000 in Xinjiang Uygur Autonomous Region, 40.56/100,000 in Tibet Autonomous Region, and 33.69/100,000 in Ningxia Hui Autonomous Region.
The reported cases and incidence per month seasonally decreased from 2005 to 2017, in the whole population and among infants under 1 (Fig. 4). The reported incidence in the total population (median: 1.05/100,000) showed a decreasing trend. Individuals < 5 years old displayed a high incidence, especially those under 1 (median: 13.01/100,000), from 2005 to 2017 (Fig. 5). In Fig. 6, the reported incidence displayed a decreasing trend in Yichang City, Xiamen City, and Long County. However, it showed an increasing trend before 2012 in Chuxiong City. The medians of reported incidence were 52.20/100,000, 22.78/ 100,000, 13.03/100,000, and 6.83/100,000 in Longde County, Yichang City, Chuxiong City, and Xiamen City, respectively.

Seasonal characteristics of shigellosis
There is a seasonal transmission characteristic of shigellosis in all provinces in China (Fig. 11), which is most common from May to October (Fig. 11B). The incidence for each age group also shows a seasonality. The incidence of each age group slowly increases from January to May; the peak is from June to August, then gradually decreases from September to December (Fig. 12B).
From 2005 to 2017, the result of the Logistic model in the total population (Table 3) located the mean of 'epidemic acceleration' at 4.50 months (SD: 0.27) and of "peak incidence" corresponded to is 6.76 months (range: 5.49e6.43 months). Meanwhile, the 'epidemic acceleration month' was gradually shortened year after year. Thus, the 'warning time' was 4.23 months in the total population. In children < 1 years old (Table 4), the mean month of 'epidemic acceleration' was 4.55 months (SD: 0.17), and the 'peak incidence' was at 6.78 months (range: 5.51e6.23). Therefore, the 'early warning time' should be 4.38 months. The results of Linear model (Fig. 13) show the decreasing trend of 'peak time' (R 2 ¼ 0.638, P ¼ 0.001) and 'early warning time' (R 2 ¼ 0.812, P < 0.001) in total population. We predict the 'peak time' is 6.30 months and 'early warning time' is 3.87 months in the total population of 2021. There is slowly rising trend in the 'peak time' (R 2 ¼ 0.066, P ¼ 0.397) and slowly decreasing trend in the 'early warning time' (R 2 ¼ 0.031, P ¼ 0.566) in < 1 years old. We predict the 'peak time' is 6.90 months and 'early warning time' is 4.47 months in < 1 years old of 2021.

Transmissibility of shigellosis in different areas
In Yichang City and Xiamen City, the 'knock-out' simulation ( Fig. 14) showed that the number of cases in scenario B (b w ¼ 0) was consistent with that in scenario D (control). Meanwhile, the number of cases in scenario A (b ¼ 0) was consistent with scenario C (b and b w ¼ 0). However, in Chuxiong City and Longde County, the number of cases in scenario B (b w ¼ 0) was slightly different from scenario D (control).

Model validity
The Logistic model estimated seasonal influenza and established the early warning mechanism (Cheng et al., 2018;Míguez et al., 2016). According to R 2 of the linear regression, the Logistic model exhibited high goodness of fit with the reported data. This finding suggests that the model was suitable for estimating the seasonal characteristics of shigellosis. The method of the Logistic model could also be used to estimate the different epidemic stages.
The chi-square results showed that the SEIARW model was suitable for estimating shigellosis transmission, which was consistent with the results of previous research . The model was more sensitive to parameter g. This finding suggests that we should collect the parameter from actual data instead of the literature.

Epidemiological characteristics
From 2005 to 2017, the descriptive data revealed a declining trend in shigellosis. Beijing City, Tianjin City, Gansu Province, Xinjiang Uygur Autonomous Region, Tibet Autonomous Region, and Ningxia Hui Autonomous Region had the most significant occurrences. Except in Beijing and Tianjin, this incidence feature is directly tied to economic situations. Shigellosis has a lower incidence in areas with better economic conditions (von Seidlein et al., 2006). However, the high incidence of shigellosis in Beijing City and Tianjin City may be related to a high population density and low traveling population. Among the four selected areas, the incidence in Longde City is higher than in Yichang City, which was higher than in Chuxiong City and higher than in Xiamen City. The economic conditions in Longde County and Chuxiong City areas are much lower than those found in Xiamen City and Yichang City. However, the frequency of flood disasters in Yichang City is higher than in other areas, which leads to a relatively higher incidence. For example, the rainfall in Yichang City mainly increased and there were 12 times of rainstorms in 2007, and the rainstorm caused heavy disasters in 6 counties of Yichang City in 2020 (Organization, 2017).
Most studies have indicated that shigellosis remains a high disease burden in children 4 years old (Kotloff et al., 2013(Kotloff et al., , 2018Liu et al., 2016). This is consistent with our results showing that the disease occurs mainly in children under five years old. Meanwhile, the results showed that the various areas had reported different incidences between genders and several age groups.

Seasonal characteristics of shigellosis
In this study, the Logistic model showed that the peak of incidence was at 6.76 months (range: 5.49e6.43 months), and the 'early warning time' was 4.23 months. Although the shigellosis incidence is decreasing in China, it is essential to build an early warning process, especially for children under five (Kotloff et al., 2018). Our results suggest that we might implement the prevention and control measures in early April every year. Meanwhile, in summer and autumn, the factors such as reduced air pressure, abundant rainfall, and easy breeding of bacteria, provide suitable conditions for the transmission of Shigella (Das et al., 2018;Farag et al., 2013;Xu et al., 2014). A shigellosis research in Chongqing City found that the incidence was  highest from May to October (Meng et al., 2019), while another study in Zhejiang Province found that the incidence was highest in the summer and fall from July to October (Yan et al., 2018). Indeed, according to our description, there was heterogeneity of seasonality in different provinces. We assume that it is related to the difference in meteorological, economic, and demographic factors. There exist differences in rainfall, sunshine hours, and temperature between southern and northern China. The economic conditions in the south are better than those in the north, and the former has a relatively high population density. In addition, we observed the seasonal heterogeneity in different age groups, suggesting that we should re-build the early warning mechanism for the age group with a heavy disease burden. Furthermore, we found a decreasing trend of 'peak time' and 'early warning time' in the total population, but a stable trend in < 1 years old, which may relate to the decrease in shigellosis incidence or the reproduction of shigella. Meanwhile, this finding suggests that we should establish the early warning mechanism at the end of March to control the transmission in the total population and in the early of April to prevent the transmission in < 1 years old.

Transmissibility of shigellosis in different areas
The 'knock-out' simulation revealed that water and food have a little role in the transmission of shigellosis, suggesting that the disease is predominantly transmitted from person to person. This is congruent with the results of other studies , Kotloff et al., 2018Zhao, Chen, et al., 2020). However, it contradicts another study that concluded that water  and food had a high contribution to shigellosis transmission in a school outbreak . Furthermore, our model was constructed on the complete population and has been applied to surveillance datasets for many years. It is very different from shigellosis transmission features in a specific outbreak. Our work further indicated that the transmissibility in Longde County was higher than in Xiamen City, followed by Chuxiong City and Yichang City, which might be due to the lower economic situation in Longde County and Chuxiong City than in Yichang City and Xiamen City. According to the Statistical Yearbook, Xiamen City, a Chinese special economic zone, had 4,110,000 population in 2018 over a surface of 1,700.61 km 2 . However, Yichang City reported 4,135,900 residents for 21,084 km 2 . The contact frequency and the travel population in Xiamen City should be much higher than Yichang City. Our finding suggests that decreasing the travel population and contact frequency might reduce the transmission of shigellosis.

Limitations
Because this study, like many modeling studies, relies on assumptions, the truth may be more complex. The logistic model was used at the national level, but more specific evaluations of individual provinces or cities are required. Another restriction is that, while we studied transmissibility in different places, aspects such as weather, economics, and demography were not included in our datasets and could not be analyzed in this study. We need to explore further the economic, environmental, and cultural influences which could correlate with shigellosis transmission.

Conclusions
Shigellosis remains a high disease burden in China, especially for children under 1. The month of the 'peak time' is 6.76 months, 'epidemic acceleration' is 4.50 months, and the recommended 'early warning time' is 4.23 months. The seasonality and transmission features are different in several areas. The prevention and control measures should be applied, especially in low-income provinces in China.

Ethics approval and consent to participate
Institutional review and informed consent were not required for this study. All data analyzed were anonymized.

Consent for publication
Not applicable.

Availability of data and materials
Data supporting the conclusions of this article are included within the article.