Spatio-temporal distribution of tuberculosis and the effects of geographical environmental factors in China

Background : Although the World Health Organization reports that the incidence of tuberculosis in China is decreasing year by year, the burden of tuberculosis in China is still very heavy. Understanding the spatial and temporal distribution pattern of tuberculosis in China and its influencing factors will provide effective reference for the prevention and treatment of tuberculosis. Methods: Data of TB incidence from 2005 to 2017 were collected. Time series and global spatial autocorrelation were used to analyze the temporal and spatial distribution pattern of tuberculosis incidence in China, and a new method Geodetector was used to analyze the geographic environmental factors affecting the TB incidence. Results: In addition to 2007 and 2008, the TB incidence decreased in general. TB has a strong spatial aggregation, and the western part of China is the high-value aggregation area of TB incidence. Socio-economic factors fail to pass the significance test. Meteorological factors and air pollution have also been proved to be related to the TB incidence. The interaction between these factors has produced mutually reinforcing effects. Conclusion: The spatial and temporal distribution pattern of tuberculosis in China and the factors affecting its geographical environment are studied in order to provide reference for the formulation of tuberculosis prevention and control policies and increase investment.

A large number of studies have shown that the temporal and spatial distribution of TB has complex dynamic characteristics, and the spatial and temporal distribution patterns of TB have different characteristics in different scales. A review of the spatial analysis of TB epidemiology found that methods such as spatial scan statistics and Moran's I spatial autocorrelation analysis were widely used in spatial distribution studies [1]. For example, Rao analyzed the spatial pattern of TB in Qinghai Province of China by using Moran's I spatial autocorrelation analysis and spatial scan statistical method [2,3]. It was found that the distribution of TB was not random, and had obvious cluster. The same results also occurred in the study of the spatial distribution of TB in Linyi and Zhejiang, China [4,5]. TB also has obvious characteristics on time scale. Sadeq did a long time series study on TB and found that the incidence of TB in Morocco declined from 2005 to 2014 [6].
Although the incidence of TB had been decreasing, it also had seasonal characteristics in the same year. A study of TB in Lahore, Pakistan, found that TB has distinct seasonal characteristics. It concluded that the incidence of TB is high in summer and low in winter. A more detailed study of China also showed this feature [7]. The study showed that the highest incidence of TB was in April in 2009, June and July in 2010 and 2011, and the lowest in winter. The seasonal incidence of TB, may be associated with environmental factors [5]. In You's research on TB in Beijing and Hong Kong, the descriptive analysis and Poisson regression analysis showed that the increase in the number of TB cases notified in the current month was significantly related to the increased monthly PM2.5 concentrations [8].
Climate changes in recent years have led to extreme temperatures, air pollution and other disasters. It was found that the incidence of TB was significantly correlated with regional geographic environmental factors such as temperature, wind speed and air pollution [9]. Keerqinfu established a time series model of TB and meteorological factors in Beijing. A correlation was found between the number of TB cases and meteorological factors and the number of TB cases in the future was predicted by combining seasonal meteorological factors with sarimax model [10]. Fernandes had studied the relationship between climatic factors and air quality with TB in the Brazilian Federal District [11]. The results showed that the meteorological factors such as temperature, humidity, precipitation and the concentration changes of air pollutants such as CO, CO, SO2 and NO2 were the risk factors of TB. Zhu explored the relationship between the incidence of TB and air pollutants in Chengdu, China and concluded that PM10, NO2 and SO2 were positively correlated with the incidence of TB, and there was a lagging effect [12]. A recent study on the relationship between pulmonary TB and SO2 in Ningbo found that short-term exposure to low SO2 levels may reduce the risk of TB [13].
In this work, time series and Moran's I spatial autocorrelation analysis were used to measure the temporal and spatial distribution pattern of TB in China. A new Geodetector method was used to explore the impact of socio-economic factors, meteorological factors and air pollutants on TB incidence. In order to provide the basis for preventing and reducing the TB incidence.

Data sources
The 2005~2017 TB incidence in China was collected from the Public Health Science Data

Spatial autocorrelation method
Spatial autocorrelation method is used to measure the degree of interdependence between geographic data of a location and other data of the same kind. Spatial autocorrelation can be divided into global spatial autocorrelation and regional spatial autocorrelation. Global spatial autocorrelation represents the degree of spatial aggregation and correlation, and regional spatial autocorrelation. Represents hot spots in space. Moran's I is used to measure spatial interdependence between data and to characterize their spatial distribution types. Moran's I index ranges from -1 to 1, and I > 0 indicates positive spatial correlation. The larger the value, the higher the spatial agglomeration. I < 0 indicates negative spatial correlation. The smaller the value, the greater the spatial difference. I = 0 indicates spatial randomness.

Geodetector method
Spatial hierarchical heterogeneity refers to the difference between different types or regions of an attribute. Spatial heterogeneity can be identified, tested or found by Geodetector [14]. The basic principle of Geodetector is that if a variable is the influencing factor of an attribute, the spatial distribution of the two variables should be similar [15].
The factor detector is used to detect the spatial differentiation of TB incidence and the explanatory power of influencing factors. It is measured by q value.
where q is the explanatory power of influencing factors, m = 1, 2... L is the number of categories; Nm and N are the number of layers m and the number of regional units; 2 and 2 are m and the regional variance respectively. The range of q is [0,1]. The larger the q value, the more obvious the explanatory power of this factor to the spatial differentiation of TB.
The occurrence of disease is a comprehensive process. Different influencing factors may have synergistic or antagonistic effects on the impact of disease. The interaction between the two factors will weaken or enhance the explanatory power of influencing factors for disease. Interaction detection calculates the determinant power of two influencing factors for disease, namely the q value of influencing factors x and y, calculates the q value q (x, y) when the spatial distribution of the two influencing factors is superimposed, and compares q (x), q (y) and q (x, y) to determine the type of interaction.  From seasonal scales, the TB incidence is very different from that of the season. It was the highest in spring, followed by winter and the least in autumn. According to the monthly incidence rate curve,

Time pattern of TB incidence
The TB incidence in a year has a very obvious time rule. It was the highest in January, and was the lowest in December, and the TB incidence decreased significantly in February. The TB incidence decreased gradually after a short increase in March. Figure 2 shows the geographical distribution and change of the TB incidence at city level in China had a great fluctuation. There was a trend of growth in some regions. This may also be related to statistics, and statistical means were constantly improving, and the population of statistics was more complete, which may also cause an abnormal increase in TB incidence. Overall, the TB incidence in the eastern coastal and northern areas was relatively low. In northwest China and southern China, the TB incidence was high and the disease burden was heavy. The TB incidence showed a trend of decreasing from west to east and from south to north.

Spatial patterns of TB incidence
The TB incidence is high in Western China. The higher incidence is generally in the western regions such as Xinjiang, Tibet, Qinghai and Guizhou.  figure 2, it can be concluded that the western region is the key area to strengthen the prevention and control of tuberculosis in the future. In order to quantitatively evaluate the spatial distribution and aggregation of tuberculosis in China, The global spatial autocorrelation analysis and Getis-Ord Gi* statistics analysis were used to calculate the global Moran's I index and the incidence rate hotspots, to reflect the clustering characteristics of the spatial distribution of TB in China.
The TB incidence in China shows a clear spatial aggregation. The Moran's I index is used to quantify the spatial clustering of diseases.  Based on the global spatial autocorrelation analysis, the Getis-Ord Gi* statistics analysis is used to identify the hotspots and coldspots of the TB incidence. Figure 3 shows the cold and hot spots of TB incidence in China in 2007~2017 respectively, which was in 90%, 95% and 99% confidence levels. According to the Z value of Getis-Ord Gi* statistics analysis, the areas can be divided into four levels by ± 1.56, ± 1.96 and ± 2.58. It represents whether the cold and hot spot is significant or not. The incidence rate of tuberculosis in China is mainly in the southern part of Xinjiang, Qinghai, and Guizhou, Hunan, Chongqing and Guangxi. The cold spot areas are located in the eastern coastal area, and the southwest of Yunnan has also shown the characteristics of cold spot in 2013 and 2015. It can be clearly observed from the chart that the coldspots areas of TB incidence is increasing gradually. In recent years, it extends from the eastern coast to the inland. The hotspots areas of TB incidence are still located in the northwest and south of China. However, the hotspots of tuberculosis in southern China tend to expand to the further south.

Correlation of environmental factors and TB
In order to study the spatial heterogeneity of TB incidence and explore the driving factors that affect its spatial distribution, three types of geographical environmental factors were selected. Table   4 shows the specific factors. The factors were processed in ArcGIS10.2. Since each factor was meteorological station or point data, Kriging interpolation method was used to carry out the optimal linear unbiased interpolation to obtain the geographical environment factor data covering China.
The cut-off values were determined through natural breaks. Natural breaks method is a statistical method which classifies and classifies according to the law of numerical statistical distribution.
Finally, each geographical environment factor was divided into six categories. Based on the Geodetector method, the TB incidence in 2017 was selected as a sample to explore the similarity between the spatial distribution of TB incidence and geographical environment factors. According to the TB incidence and the classification results of selected factors, Formula 1 was used to calculate the determinant power of the factors and the p value. Table 5 shows the factor detection results of three kinds of geographical environment factors.
The result shows that the q values of population density and GDP were 0.011 and 0.009 respectively, and they were not significant. Other socio-economic factors also fail to pass the significance test. (0.011). The significance of annual mean wind speed, annual mean air pressure and NDVI for the spatial distribution of TB incidence was not significant. It can be seen that the meteorological factors, such as mean humidity and annual mean temperature, have obvious determinant power on the spatial distribution of TB incidence, and the annual average relative humidity is the main controlling factor, followed by the annual average temperature.  Table 6 shows the detection results of air pollutant factors. In the 6 air pollutants, the determinant power of pollutants on TB incidence were PM10 (0.202), O3 (0.199), NO2 (0.172), PM2.5 (0.147), SO2 (0.070), CO (0.060), respectively. The q-statistic of each air pollutant passed the significance test. All the six air pollutants were associated with the TB incidence. PM10 is the main controlling factor of spatial differentiation of TB incidence, followed by O3. Interaction detection is used to explore whether the interaction between the single factor influencing factors has an impact on the TB incidence, and what type of impact they have. Table 7 show that the q-statistic of the interaction detection results were larger than that of single factor. It also represented all the q-statistics that pass the significance test (P < 0.05). X7∩X8 had the highest value(0.402), which means that the geographical distribution of the combination of annual mean relative humidity and the annual mean pressure is more similar to that of the TB incidence. The interaction type of X7∩X12 and X7∩x15 were double factor enhancement, and the others were nonlinear enhancement.

Discussion
Due to the problems that there are regional differences in meteorological factors, social and economic factors are unevenly distributed in China, and air pollutants in some areas has been aggravated which was driven by industrialization and urbanization, the TB incidence was different in geographical distribution. The study on the spatial and temporal distribution pattern and influencing factors of TB incidence will help us to improve our understanding of the disease and provide important reference for effective prevention and treatment of TB and national policy formulation. The study found that China's TB incidence has been reduced significantly in 2005-2017. And it has obvious time characteristics. It also showed a clear pattern of disease in spatial distribution. The TB incidence was higher in Northwest and South China and gradually decreased from west to East and from south to north. By combining the TB incidence and meteorological factors, social-economic factors and air pollutants, it was found that the correlation between socialeconomic factors and the TB incidence was not obvious, and the correlation between humidity, air temperature and air pollutants was obvious.
This study concluded that the TB incidence decreased year by year. After the WHO put forward the goal of ending TB, China had also responded positively by implementing measures such as strengthening TB publicity and printing and distributing the action plan to Stop TB (2019-2022), and achieved remarkable results. The study found that the TB incidence had obvious seasonal characteristics in time, which was the same as the conclusion of China's regional studies. The TB incidence was the highest in January, and decreased in February. It increased in March, then continued to decline, and the TB incidence in December was the lowest. Other regional studies have found that the time pattern of TB incidence in China is lagging behind in different regions [12,16].
The TB incidence was also observed seasonally in other countries [17,18].
Medical geographic mapping is not only a research method, but also an important means to reflect the research results of medical geography.Medical geographic map can directly express the relationship between disease and geographical environment. According to the geographical distribution map of TB incidence, our research found that the high value areas of TB incidence are in the western and southern parts of China. These areas are also hot spots, and the eastern coastal areas and the northern areas are low value areas of TB incidence. The study found that the TB incidence had an obvious aggregation trend. The same studies also showed that TB cases were clustered [19,20]. TB was transmitted by air and was greatly affected by population density and living conditions, which leads to a strong aggregation distribution of TB [21].
This study used Geodetector method to study the driving factors of TB incidence. Geographic detector is a new method developed in recent years to explore the spatial variability of variables. It has been successfully applied to the study of the influencing factors of many diseases [22,23]. This study found that the association between social-economic factors and TB incidence was not obvious.
However studies have shown that TB is a poverty-related disease [24]. GDP represents the economic development of the region. Both of them and medical and health expenditure indirectly reflect the health level of the region. The detection and treatment of TB in developed areas are more convenient and accurate than those in less developed areas. The corresponding medical conditions in economically developed areas are also more perfect, and the treatment of TB is more timely and efficient [25]. There is no corresponding conclusion in this paper, which may be related to the method and the data used in this paper. The scale effect of data may lead to the concealment of correlation. Geodetector showed that meteorological factors and air pollutants also have impacts on the TB incidence. The study also considered the interaction between different factors on the impact of TB incidence. Other studies on TB have shown the similar results [19,20,26]. Temperature is a protective factor for the onset of TB [19,27]. Increasing temperature will inhibit the growth of TB bacteria. Humidity affects the immune system. During the rainy season in Cameroon, high TB incidence was also observed [28]. Air pollutants can affect the human respiratory system and also the transmission vector of TB bacteria. Various studies on air pollutants had proved their impacts on TB [13,29,30]. This study used the Geodetector method to study the relationship between the spatial distribution pattern of TB incidence and the influencing factors. And the applicability of Geodetector method in public health was proved again.
There are still some shortcomings in this study. As a medical geography study, the Geodetector method can only be used to count the factors of artificial selection and the statistical relationship between diseases. It can not strictly assess the internal relationship between social economic factors, meteorological factors and air pollutants and TB incidence. Moreover, after all TB cases are collected in an aggregation area (administrative boundary), the scale effect may make the error larger, and some correlations might be concealed. The influencing factors of TB are very complex.
Therefore, in the selection of influencing factors, this study carefully selected the factors used in other medical research and medical geography research. These factors had been proved to be related to TB in medicine. At present, the prevention and control situation of TB is still very serious. The research on the spatial and temporal distribution patterns of TB and its influencing factors can provide the basis for the focused, efficient and targeted prevention and treatment of TB.

Conclusion
The TB incidence in China has been decreasing from 2005-2017 years to the present. There were significant hot spots in Western and southern China. The TB incidence in eastern coastal areas and northern areas was low. Meteorological factors and air pollutants have been confirmed to be associated with the TB incidence. The interaction between the various factors produces a mutual enhancement effect, which needs further study. The results showed that there was a strong spatial aggregation of TB, and it was suggested that more attention should be paid to detection and targeted investment should be increased in the aggregation area.