Identification of Distribution Characteristics and Epidemic Trends of Hepatitis E in Zhejiang Province, China from 2007 to 2012

Hepatitis E virus is a common hepatotropic virus that causes serious gastrointestinal symptoms. Data of reported HEV cases in Zhejiang Province was collected between 2007 and 2012. Descriptive epidemiological methods and spatial-temporal epidemiological methods were used to investigate the epidemiological trends and identify high-risk regions of hepatitis E infection. In this study, the average morbidity of hepatitis E infection was 4.03 per 100,000 in Zhejiang Province, peaking in winter and spring. The ratio between the male and the female was 2.39:1, and the high-risk population was found to be aged between 40 and 60. Trend surface analysis and IDW maps revealed higher incidences in the northwestern counties. The spatial-temporal analysis showed comparable incidences in the counties at the basins of three rivers, mostly under administration of Hangzhou Municipality. Besides, the seasonal exponential smoothing method was determined as the better model for the retrieved data. The epidemiological characteristics of HEV suggested the need of strengthened supervision and surveillance of sanitary water, sewage treatment and food in high-risk areas especially around the Spring Festival. Additionally, time series model could be useful for forecasting the epidemics of HEV in future. All these findings may contribute to the prevention and control of HEV epidemics.

HEV aroused huge attention as a significant public health issue in China in the past decade 25 . Thanks to improved sanitation and living environment, hepatitis E is under better control with new infections coming primarily from sporadic cases and occasional food-borne outbreaks 26 . However, its morbidity in economically developed Zhejiang Province was found to be high in 2011, and the specific transmission route was still controversial.
This study aimed to identify HEV epidemic features using techniques in spatial-temporal analysis. Geographic information system (GIS) showed vital advantages over the conventional epidemiological techniques in the primitive surveillance of communicable diseases, rapid quantification of the susceptible population, effective allocation of the health resource and timely formulation of the prevention strategies. The spatial differentiation of the HEV infection provided a map visualizing the diversity of HEV infection in the different regions, and the in-depth statistics with spatial autocorrection could recognize the graphical clusters of HEV infection, which is helpful in identifying the potential risk factors and evaluating the efficiency of intervention measures 27 . In our study, time series models were also chosen to provide scientific clues for the control and intervention with HEV infection in the Province. Autoregressive integrated moving average (ARIMA) model and Exponential Smoothing Method (ESM) were compared for the better one by error testing indexes. ARIMA model, widely used in non-stationary time series, can take into account the changing trends, periodic changes, and random disturbances, removing seasonal patterns 28 . For instance, FY Tang et al. established an ARIMA model based on the shigella cases of Jiangsu Province from 2001 to 2011 to predict the potential cases from August to December in 2011, and the prediction was almost identical to the observed data 28 . ESM can also be used for the medium-and long-term prediction 29 .
To our knowledge, this study is the first to identify the characteristics of HEV epidemics in Zhejiang Province using the spatial-temporal statistic methods. The results would be helpful in detecting the potentially high-risk regions for necessary interventions.

Materials and Methods
Ethics Statement. This study was approved by the Ethics Committee of Zhejiang Provincial Center for Disease Control and Prevention. All personal information was kept confidential as required.
Profile of Zhejiang Province. Zhejiang Province, an economically developed Province, is located in the southeast China between longitudes 118°E-123°E and latitudes 27°N-32°N. It has a land area of 101,800 square kilometers, accounting for 1.06% of the whole country and making itself one of China's smallest Provinces. It is composed of two sub-provincial cities (Hangzhou and Ningbo) and nine prefecture-level cities (Wenzhou, Huzhou, Jiaxing, Shaoxing, Jinhua, Quzhou, Zhoushan, Taizhou and Lishui) covering 90 counties. As a coastal Province, Zhejiang features a subtropical monsoon climate with the annual mean temperature between 16.3 and 18.4 °C and the annual precipitation ranging from 1489.5 to 1903.3 mm in 2012. Data Collection. In China, uniform diagnostic criteria are set for notifiable diseases. Hospitalized patients diagnosed as hepatitis E should be immediately reported through China information system of disease prevention and control by local physicians. Data of HEV cases in Zhejiang Province from 2007 to 2013 was exported from this system 30 . Two staff members (Xuhui Zhu and Zhaofan Wu) independently extracted vital details of the cases as of the day of onset, including the year of the report and the patient gender, age, address and occupation. The population data was also retrieved from the same system. Analysis of Epidemiological Characteristics. Effective data was abstracted from the exported documents, and the distribution and epidemiological characteristics were depicted for age, gender, season, and occupation.
IDW for Interpolation Maps. The inverse distance weighted (IDW) interpolation was employed to predict the incidences of hepatitis E at the individual geographic spots under county level 31 . The results of IDW interpolation generated maps with estimated HEV infections from 2007 to 2012. Spatial Variation Analysis. Spatial analysis was aimed to detect geographic variation in relation to HEV outbreak and to explore the potential clustering regions 32 . Firstly, the trend surface analysis was done to frame the overall tendencies and to identify the outliers of the HEV incidence in different geographical locations 33 . Z is the dependent variable (incidence rates), and X and Y are independent variables 28 . Autocorrelation analysis was carried out, including general spatial autocorrelation analysis and local spatial autocorrelation analysis. The general autocorrelation used the Global Moran's I Index 34 . Moran's I was defined as follows: where n was the number of counties, X i and X j were the indicators of autocorrelations from unit index i and j. W i,j was the matrix of spatial weights. If unit i was adjoined to unit j, W i,j = 1; if not, W i,j = 0. Moran's I Index was from − 1 to 1. Moran's I > 0 implied clustering in the spatial distribution, Moran's Index < 0 dispersing in the spatial distribution, and Moran's Index = 0 a random spatial distribution. If the P value of Global Moran's Index was statistically significant, the local autocorrelation analysis of Local Moran's I and Local Getis-Ord G were both done to determine the positive autocorrelation (High-High or Low-Low autocorrelation) or negative autocorrelation (High-Low or Low-High autocorrelation), thus revealing hotspots [35][36][37] . cylindrical window scanning across Zhejiang Province revealed the time and regions of clustering, the base diameter of this moving window representing the potential areas of clustering and the height representing the time of clustering 39 . In our study, the maximum spatial cluster size and maximum temporal cluster size were all set to 50%. Log-likelihood Ratio (LLR) was employed to identify the special clusters by comparing the observed incidence with expected one 40 . Eventually, Monte Carlo test was conducted to determine the most likely clusters 41 .

Pure Temporal Clustering Analysis and
Time-series Analysis. Exponential smoothing method (ESM), a frequently used time series model, can be utilized for the short-term prediction and the resolution of the problem involving medium-and long-term prediction 42,43 . In ESM, an initial value in prediction period was determined as the average of the values in the first few periods. With this new observed value appeared, the earliest one was removed and the new observed value entered in the group. Thus, each new prediction value was calculated based on the new observed value, the initial predicted value and weight of the latest observed value 44 . Another widely used time series model is ARIMA (p, d, q), consisting of three sections in the order of auto regression (p), the degree of difference (d) and the order of moving average (q) 45 . If the original data was not smooth, the moderate finite difference and/or exponential transform would be implemented to transform data into the stationary one 46 . Both autocorrelation function (ACF) and partial autocorrelation function (PACF) were examined to confirm the optimal parameters of p and q. Finally, the Lung-Box tests for white noise test and Bayesian Information Criterion (BIC) for the optimal goodness-of-fit were performed to decide on the ultimate model 47,48 . SPSS was used for the establishment of the optimal model of ESM and SAS for that of ARIMA. A better model was ultimately determined by minor error testing indexes, including Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Square Error (MSE) and Root Mean Square Error (RMSE), which can depict system errors 49 . Ultimately, compared with the actual HEV reported data in 2013, the predicted value (predicted cases with 95% confidence interval) were calculated through the optimal model and the predicted accuracy rate was also evaluated.   Besides, the age distribution of the selected population ranged from 1 to 98 years old, with the high-risk ages being mainly between 40 and 60. Occupational distribution of all reported cases was, from highest to the lowest, peasants (53%), and workers (12%), retirees (7%), house workers (5%), cadres (4%), traders (4%) and others (15%). All above were shown in Fig. 1. Further analysis was made to identify the differences in relation to gender and occupation (Pearson χ 2 (5) = 432.1290, P < 0.001). The results showed that males were more susceptible to HEV infection than females except house workers (Table 1).

Incidence Maps. The incidences of HEV infection from 2007-2012 in Zhejiang
Province were mapped ( Fig. 2), and the IDW interpolation was performed for the prediction of all the incidences in the map (Fig. 3). The high-risk regions were found to be in northwest and eastern coastal areas of Zhejiang Province in 2007. Afterwards, the incidence rates both in the northwest corner and eastern coastal areas shrank gradually by 2012. However, it is worth mentioning that the incidence in 2011 presented the highest contour line in northwest areas, suggesting clusters there. Besides, comparable incidence rates were observed in the northwest counties whereas low incidence rates in other area.
Trend Surface Analysis. The trend surface analysis was employed to identify the geographic trends of the incidence (Fig. 4). A coordinate system was created (one axis for each direction with X for West-East and Y for South-North). The projections of incidence rates (Z axis) reflected the variation trend of West-East with South-North and Southwest-Northeast with Southeast-Northwest. The trend surface analysis implied a higher incidence of HEV infection in the northwest Zhejiang Province (Table 2). Spatial Distribution. In the spatial clustering analysis, the general autocorrelation suggested that HEV infected cases were not of random distribution ( Table 3). Results of local Moran's I and local Getis-Ord G showed 29 high-high clusters, 8 low-low clusters and 1 low-high cluster at the county level (Table 4). Lin'an City was found     Temporal Clustering Analysis and Spatio-temporal Analysis. In the temporal clustering analysis, the discrete Poisson model found the vital temporal clusters in the year of 2011. Besides, the combined spatio-temporal analysis showed three clustering regions from 2007 to 2012 (Fig. 5).
Time-series analysis. The optimal time series models were created using SAS and SPSS, including simple seasonal ESM and ARIMA (6,1,0). Further screening was done based on the indices of MAE, MAPE, MSE, and RMSE. Simple seasonal ESM was finalized as the ultimate model (Table 5 and Fig. 6). Additionally, the predicted accuracy rate of simple seasonal ESM was 75% (9/12), as were shown in Table 6.

Year Area
LG i Z score LGi

Discussion
This study was aimed to identify the epidemiological characteristics, the spatial-temporal trend, and the regular pattern of HEV infection in Zhejiang Province. The findings showed that men were more susceptible to HEV infection than women probably because men had more chances to contact HEV due to underlying occupational exposure. The epidemiological characteristics of time distribution suggested that the epidemics frequently clustered from January to April, during which time falls the Spring Festival in China. This is when the enormous population flow occurs and activities of visiting relatives and friends surge, a phenomenon that increases difficulty in public health surveillance, particularly in the supervision of drinking water and food. This warrants strengthened monitoring of drinking water and food and health education in hand washing during this period of time. Also, two major prevention strategies, reducing exposure to HEV and inducing autoimmunity by vaccination, have been globally proven effective in the prevention and control of HEV infection 50,51 . HEV vaccine, with increasing maturity of the clinical application, should be considered for an annual immunization plan in specific population before January in China. Additionally, different from the age ranged from 20 to 40 of the high risk population in the previous study, vulnerable population was found to be aged between 40 and 60 in our study 52 . Accordingly, the target population for health education and vaccination strategy should be adjusted. ArcGIS and Scan statistics have been widely applied to detect the abnormal patterns of temporal and spatial clusters in the epidemiological studies 41,[53][54][55] . In the present study, the incidence maps with IDW revealed a higher incidence of HEV infection in the northwest counties from 2007 to 2012, which agreed with the findings of the trend surface analysis. The significant clustering pattern of the HEV epidemics from 2007 to 2012 was firstly proved by the general spatial autocorrelation statistics. Then, a total of 29 high-high clusters were discerned by local autocorrelation analysis in spatial dimension. Thus, based on the findings of clusters, this identified locations should pay more attention to the prevention and control of HEV infection and increased available clinical resources allocation in high-risk locations. Interestingly, nearly 73 percentages of high-high clusters were located in the administrative areas of Hangzhou Municipality in the northwest of Zhejiang Province. The Spatio-temporal analysis identified three clusters, the most likely cluster of which was also in Hangzhou with a relative risk of 2.39. We suspected that such a distribution pattern might have been attributed to the lack of clean water sources and inadequate sewage system. To confirm our hypothesis and further explore contributing factors, we investigated the main rivers in these clustering areas. We found that these clusters were laced with three inland drainages: Qiantang River, Jiao River and Ou River. Meanwhile, Lin'an City was the husbandry base of sheep and cow, and Yuhang County served as the main pork and poultry base in Hangzhou 56 . Therefore, the inadequate sewage system and the improper disposition of animal wastes in these areas might have contributed to the HEV infection   along Qiantang River, partly explaining the high incidence in Hangzhou. Additionally, Jiao River and Ou River, both located at the estuary of the East Ocean, have abundant fishing resources, and some infected marine products such as oyster might lead to the epidemics in these clustering areas. Although in-depth field investigations were not performed to verify our hypothesis, the findings could help with control and prevention of HEV infection in Zhejiang Province. With the diverse mechanisms and preconditions, each model has its own limitations in forecasting epidemics. With the obvious seasonal trend of the HEV infection, the time series models were created. Both models fit our data well, and the simple season ESM was chosen as the optimal time series because of lesser errors. Besides, the actual number of reported cases in 2013 further validated its reliability. These findings could provide clues for the model selection in the prediction of HEV infection.
Limitations. Several limitations should be mentioned with this study. 1) Some potential carriers or asymptomatic cases might not have been captured in available system and retrospective study itself, which might have led to bias. Besides, anti-HEV detection in large country was more easily available and/or used than rural area. Hence, the potential phenomenon of clusters might be not only related to the epidemic but the capacity of detection; 2) Local Moran's I could reflect the seriousness of the epidemic in spatial pattern, but it might be disturbed by population fluctuation; the power of circular scan statistics could be limited in some irregular counties; 3) Due to inadequacy in risk factors and geographic, climatic and socioeconomic information, a more intensive study was impossible with the correlation analysis and regression model; 4) Some hypotheses have not been verified because further field epidemiologic investigations were not carried out; 5) With only two models being tested in our study, some better models might been detected among the artificial neural network model, Markov model and other models.

Conclusions
This study investigated the epidemiological characteristics, spatial distribution, temporal distribution and spatial-temporal distribution of HEV infection in Zhejiang Province from 2007 to 2012. The disease tended to cluster from January to April at the turn of winter, suggesting an underlying seasonal variation. The age range of the susceptible population was found to be 40-60 against the previously reported 20-40. Geographically, HEV infection scattered across the Province and clustering was mainly obvious in the northwest areas. The spatial distribution analysis and spatial-temporal distribution analysis identified a total of 29 high-high spots in three different clusters, and all were associated with river systems. Thus, strengthening food and water supervision was critical in controlling and preventing the epidemics particularly around the Spring Festival. The seasonal ESM fit better than ARIMA model in forecasting the reported epidemics of HEV infection. Despite its limitations, this study may contribute to the allocation of the health resources, surveillance of high-risk regions, population immunization, and identification of possible influential factors.