Spatial and space-time clustering of tuberculosis in Gurage Zone, Southern Ethiopia

Introduction Spatial targeting is advocated as an effective method that contributes for achieving tuberculosis control in high-burden countries. However, there is a paucity of studies clarifying the spatial nature of the disease in these countries. This study aims to identify the location, size and risk of purely spatial and space-time clusters for high occurrence of tuberculosis in Gurage Zone, Southern Ethiopia during 2007 to 2016. Materials and methods A total of 15,805 patient data that were retrieved from unit TB registers were included in the final analyses. The spatial and space-time cluster analyses were performed using the global Moran’s I, Getis-Ord Gi* and Kulldorff’s scan statistics. Results Eleven purely spatial and three space-time clusters were detected (P <0.001).The clusters were concentrated in border areas of the Gurage Zone. There were considerable spatial variations in the risk of tuberculosis by year during the study period. Conclusions This study showed that tuberculosis clusters were mainly concentrated at border areas of the Gurage Zone during the study period, suggesting that there has been sustained transmission of the disease within these locations. The findings may help intensify the implementation of tuberculosis control activities in these locations. Further study is warranted to explore the roles of various ecological factors on the observed spatial distribution of tuberculosis.

Introduction Ethiopia remains among high Tuberculosis (TB) endemic countries in the world, with an estimated annual incidence of 177 per 100,000 population [1]. TB places an extraordinary public health, financial and social burden in the country [2,3]. It is one of the most important infectious diseases responsible as a leading cause of death and second cause of hospital admission [4,5]. The patients face various levels of isolation and rejection, including loss of employment, reduced education opportunities, vulnerability to disability and divorce or spoiled marriage prospects [6]. Moreover, co-infection with Human Immunodeficiency Virus and the emergence of resistance to numerous anti-TB agents are recognized as increasing problems in the country [7,1].
During the past two decades, Geographic Information Systems (GIS) and spatial statistics were used to detect spatial and space-time clustering of TB in many developing countries [8][9][10][11][12]. Several studies have been conducted at the national, regional and district levels to investigate the epidemiology of TB in Ethiopia [13][14][15][16]. Although these studies reveal helpful information for TB control programs, spatial context such as clustering patterns of the disease have rarely been taken into account. Spatial and space-time cluster analyses of TB may help public health officials discover the high-risk geographical areas and population groups that require targeted interventions [8][9][10][11][12]. Lack of such information may contribute for the partial effectiveness of TB control programs in Ethiopia, where the national TB control program implements a uniform approach to allocate resource across the regions.
This study aims to identify the location, size and risk of purely spatial and space-time clusters for high occurrence of TB in Gurage Zone, Southern Ethiopia during 2007 to 2016. The information may contribute to more effective budget allocation, active search for symptomatic patients, drug distribution, recruitment of skilled human resources, guiding the design of vaccination programs, community awareness creation through public health advocacy, and identifying factors behind the spread of the disease in high-risk areas.

Study area
The study was conducted in the Gurage Zone, Southern Ethiopia, which is located between 77 6' and 8˚45' N latitude and 37˚46' and 38˚71' E longitude (Fig 1).The zone is divided into 13 districts and two town administrations (at Butajira and Wolkite).There are 403 rural and 20 urban kebeles (the smallest administrative units with a population of 5,000 on average) in the zone. About 84% of the populations live in the rural areas [17]. The zone has an average population of about 1,453,531 during 2007 to 2016 [18].
The zone has a total of 6 hospitals, 70 health centers, 414 health posts and 92 clinics that are involved in the prevention and control of TB. The health posts and clinics provide health education, identify suspects, refer patients to health centers and support treatment through trained health extension workers from the community. Health centers perform sputum microscopy, treatment and the referral of smear-negative and extra-pulmonary cases to hospitals for further management, while hospitals render diagnosis, treatment and inpatient care services [19].
The zone is selected for conducting this study mainly for two reasons. First, the zone has a geographic diversity characterized by mountain ranges and valleys. This may impact a geographic access to TB diagnosis and treatment centers [20][21]. Hence, it is assumed that TB cases are clustered in specific-geographic locations, and implementing a uniform TB control measures may not be effective in mitigating the problem. Second, according to the Southern Nations, Nationalities, and Peoples' Regional State Health Department Report of 2016 the zone is among the highest hit areas by TB in the region [19].

Data sources
The study was conducted from June to September, 2017. The list of DOTS-providing health facilities were obtained from the Gurage Zone Health Department database. Trained data collectors retrieved the patient information on sex, age, address, TB type, patient category and date of treatment started from unit TB registers at DOTS-providing health facilities in the Gurage Zone during 2007 to 2016. The patients' addresses with similar names but from Spatial and space-time clustering of tuberculosis different geo-locations were linked to their correct geo-locations to prevent duplication. TB cases were diagnosed using pathogen detection, X-ray, and pathologic diagnosis according to the diagnosis criteria recommended by the national TB diagnosis and treatment guideline of Ethiopia [22]. The population and geo-location data for each kebele in the Zone were obtained from the Central Statistical Agency of Ethiopia (CSA).
Data quality control. The training of data collectors and supervisors emphasized issues such as data extraction format, field methods, and record keeping. The data were double entered and checked page-by-page by year, district, kebele and health facilities against unit TB registers for consistency and completeness throughout the entire data collection.
Data management and processing. Data entering, validating, cleaning and coding were employed using MS Excel (MicroSoft, Redmond, WA, USA). In the study area TB data were reported by the Basic Management Units. The reports might include cases outside of the administrative catchment or miss cases from their catchment enrolled in neighboring health facilities. The reason for this could be that the patients could cross the administrative boundaries for seeking health services by reason of access, preference and quality of care. Data aggregation based on the correct address of patients could help understand the true spatial nature of TB burden in the study area. Therefore, the patient data were aggregated at kebele level for spatial analyses in this study. Kebele centroids were used to represent a geographically weighted central location as coordinates.

Spatial autocorrelation analyses
The global Moran's I statistic was run in ArcGIS 10.2to examine the presence of spatial clustering of TB in the whole study area. The value of Moran's I is calculated based on the deviation from the mean of two neighboring values. The following equation is used to calculate the Moran's I statistic [8]: where z i is the deviation of a prevalence of TB for kebele i from its mean ðX i À " XÞ, z j is the deviation of a prevalence of TB for kebele j from its mean ðX j À " XÞ, ω i,j is the spatial weight between kebele i and j, n is the total number of kebeles, and S 0 is the aggregate of all the spatial weights: The z I -score for the statistic is computed as: and The spatial relationships among kebeles were conceptualized by calculating the spatial weights from the input file containing the prevalence rate of TB for each kebele (the number of TB cases divided by the population of a given year and multiplied by 100,000) and the geo-coordinates data. A first order queen polygon continuity weights matrix, which defines the neighbors as those with either a shared border or vertex, was used for spatial weights [10]. The spatial weighing matrix was constructed in Geographical data analysis tool (GeoDa) by using the kebele-level polygon shape file Statistical significance for high occurrence of TB was decided when the Z-score !1.96 and a P-value 0.05.

Purely spatial and space-time cluster analyses
The Kulldroff's scan statistic was performed in SaTScan 9.2 to identify the location, size and severity of purely spatial and space-time clusters using the number of TB cases, the population for each kebele, the year of TB diagnosis and the geo-coordinates data as input files (S1 and S2 Tables) [23]. The discrete Poisson model was used with the assumption that the number of TB cases at each location was Poisson distributed with a known population at risk. Scan circles of various sizes were used to identify the purely spatial clusters for high occurrence of TB. The upper limit for the maximum cluster size was set to 50% of the population at risk, which allowed small and large clusters to be detected. To identify space-time clusters for high occurrence of TB, a cylindrical window with the circular geographic base representing to the space and height to time was used. The size of the window was limited to 50% of the expected number of TB cases, and the time was set to the time period from 2007 to 2016. In both cases, the likelihood ratio was computed to measure a Relative Risk (RR) of TB occurrence within the cluster when compared to the risk outside using Monte Carlo simulations. The maximum number of replications for Monte Carlo simulation was set to 99,999. The cluster with the maximum Log Likelihood Ratio (LLR) was defined as the most likely cluster. The P-value was created using the combination of approximation [23]. A standard of 'no geographical overlap' was selected to report secondary clusters. Statistical significance was reported when a P-value was 0.05. The Getis-Ord G Ã i statistic was also implemented in ArcGIS 10.2 to identify the locations of clusters for high occurrence of TB. The G Ã i statistic performs the spatial analysis by looking at each kebele within the context of a neighboring kebele. The local sum for a kebele and its neighbors is proportionally compared to the sum of all kebeles. When the local sum is much different than the expected local sum and that difference is too large to be the result of random chance, a statistically significant Z-score result. The following equation is used to compute the G Ã i statistic [8]: where x j is the prevalence of TB for kebele j, ω i,j is the spatial weight between kebeles i and j, n is the total number of kebeles, " . Therefore, the G Ã i statistic is a Z-score. The spatial relationships among kebeles were conceptualized by calculating the spatial weights from the input file containing the prevalence rate of TB for each kebele and the geo-coordinates data. A first order queen polygon continuity weights matrix, which defines the neighbors as those with either a shared border or vertex, was used for spatial weights [10]. Statistical significance for high occurrence of TB was decided when the G Ã i ! 1:96 and a P-value 0.05.

Ethical considerations
The study protocol was reviewed and approved by the Research and Ethical Committee (REC) of the School of Public Health, and the Institutional Review Board (IRB) of the College of Health Sciences, Addis Ababa University. Since the study used data from the retrospective review of unit TB registers during 2007 to 2016, both the REC and IRB waived the requirement for informed consent from the patients. For this reason, informed consent was not obtained from the patients. A letter of support was obtained from the Gurage Zone Department of Health to obtain information from all districts and health facilities. The anonymity of cases was kept by using pseudo identification. Medical records were stored in a secure place to help maintain the confidentiality of the clinical information of cases.  Fig 2).

Spatial autocorrelation
The global Moran's I statistic was significant for each year, implying that there were clusters in the distribution of TB in Gurage Zone, Southern Ethiopia (Table 1).

Purely spatial clusters
The purely spatial cluster analyses identified the clusters for high occurrence of TB at peripheral areas of the geographic zone. The most likely cluster with 291 TB cases (71.19 expected cases) was detected at southwest of Abeshege district. The size of the cluster was within a   Fig 3).  The nature of the clusters for high occurrence of TB in the study area was evaluated for each year from 2007 to 2016. There were considerable spatial variations in the risk of TB by year (Table 3, Fig 4). Nearly consistent results were obtained by the G Ã i statistic (Fig 5).  Fig 6).

Discussion
This study aims to identify the location, size and risk of purely spatial and space-time clusters for high occurrence of TB in Gurage Zone, Southern Ethiopia during 2007 to 2016. The clusters with high likelihood of TB occurrence were detected in border areas of the zone. The possible explanation for this could be that there were frequent cross-border population movements from the neighboring border areas of Jimma, Yem, Hadiya, Silte, West Shewa and East Shewa zones for economic and social reasons, which could favor the disease transmission in these areas. This is true according to other studies [24]. Therefore, future TB prevention and control efforts in these areas should include strengthening health infrastructure, staff capacity building, considering early diagnosis and treatment of symptomatic cases, and increasing community awareness. Furthermore, establishing a cross-border collaboration network may also help reduce the disease burden in these areas.  Spatial and space-time clustering of tuberculosis Three space-time clusters that were persistent for five years were detected in the zone. This might be due to the uniform implementation of TB prevention and control activities in the zone without targeting high-risk geographical areas [10]. Thus, application of GIS and spatial statistical techniques to identify purely spatial and space-time clusters for high occurrence of TB can be recommended for optimal utilization TB resources [25].
This study used the Kulldroff's scan and Getis-Ord G Ã i statistics to detect statistically significant clusters for high occurrence of TB in the Gurage Zone, Southern Ethiopia. The Kulldorff's scan statistic is widely used in the field of public health to detect purely spatial and space-time clusters of infectious diseases, including TB [9][10][11]. The method searches for disease clusters without prior hypothesis on their location, size or time period. Moreover, Monte-Carlo randomization technique for hypothesis testing gives the empirical joint distribution of the statistics and hence accounts for the correlation among the statistics, providing a P-value after taking into account multiple testing [26]. In this and previous studies Kulldroff's scan and Getis-Ord G Ã i statistics have generated comparable results in identifying geographic areas where unusually high rates of TB occurred [10,27]. However, some inconsistencies were observed which may be due to the fact that the methods used varying assumptions in determining clusters. The Kulldroff's scan statistic calculated the maximum likelihood ratio of TB cases with relation to the underlying population in the area to detect larger clusters [23], whereas the Getis-Ord G Ã i statistic examined each distance-defined grouping of values to identify more localized clusters [8]. Since there is no one gold standard method to detect disease clusters in spatial analyses it would be better to use more than one method at a time to cross-validate the results. This study has some limitations. The study did not include TB patients who would remain undiagnosed, and those residents who were diagnosed and treated at health facilities outside the Gurage Zone. These could affect the nature of TB distribution by underestimating the prevalence. Another limitation of this study was that the Kulldroff's scan statistic used circular spatial scanning windows and space-time cylinders with circular spatial bases which could not detect irregular shaped clusters, and could include a few non-significant locations. The annual projected population based on the 2007 census was used to provide up-to-date denominator population numbers which could be affected by uneven population growth across the kebeles. Besides, the confounding effects of covariates, like age and sex were not controlled in the spatial analyses since it was not possible to access the geo-coordinates data for the individual patients at the unit TB registers. This could also bias the results.
The strength of this study was that multiple methods were used to detect spatial clusters for high occurrence of TB in the study area. A high resolution spatial data were used to examine the spatial and space-time clustering of TB at the smallest administrative unit, which might be the best approach for TB control planning. A longer study period data were used to evaluate the changes in spatial and space-time clustering of TB. The study covered a wider geographical area containing urban and rural areas. Errors related to the geo-coding of cases were avoided by linking each case to the correct home address using geo-codes from the CSA. The spatial data included all forms of TB to explore the high-risk geographical locations which require more focused public health attention.

Conclusions
This study showed that TB clusters were mainly concentrated at border areas of the Gurage Zone, suggesting that there has been sustained transmission of the disease within these locations. The findings may help intensify the implementation of TB control activities in these locations. Further study is warranted to explore the roles of various ecological factors on the observed spatial distribution of TB.
Supporting information S1