The status of geo-environmental health in Mississippi: Application of spatiotemporal statistics to improve health and air quality

Data enabled research with a spatial perspective may help to combat human diseases in an informed and cost-effective manner. Understanding the changing patterns of environmental degradation is essential to help in determining the health outcomes such as asthma of a community. In this research, Mississippi asthma-related prevalence data for 2003–2011 were analyzed using spatial statistical techniques in Geographic Information Systems. Geocoding by ZIP code, choropleth mapping, and hotspot analysis techniques were applied to map the spatial data. Disease rates were calculated for every ZIP code region from 2009 to 2011. The highest rates (4–5.5%) were found in Prairie in Monroe County for three consecutive years. Statistically significant hotspots were observed in urban regions of Jackson and Gulf port with steady increase near urban Jackson and the area between Jackson and meridian metropolis. For 2009–2011, spatial signatures of urban risk factors were found in dense population areas, which was confirmed from regression analysis of asthma patients with population data (linear increase of R2 = 0.648, as it reaches a population size of 3,5000 per ZIP code and the relationship decreased to 59% as the population size increased above 3,5000 to a maximum of 4,7000 per ZIP code). The observed correlation coefficient (r) between monthly mean O3 and asthma prevalence was moderately positive during 2009–2011 (r = 0.57). The regression model also indicated that 2011 annual PM2.5 has a statistically significant influence on the aggravation of the asthma cases (adjusted R-squared 0.93) and the 2011 PM2.5 depended on asthma per capita and poverty rate as well. The present study indicates that Jackson urban area and coastal Mississippi are to be observed for disease prevalence in future. The current results and GIS disease maps may be used by federal and state health authorities to identify at-risk populations and health advisory.

has laid a path to understanding the spatiotemporal dimensions of asthma in MS. Understanding its prevalence and outbreaks may help to make better decisions, save lives, and design healthy communities.
It is hypothesized that the spatiotemporal extent of asthma-related health problems is associated with the prevailing air pollution and asthma causes are distinctive for urban and rural areas. The objectives of this research are to: (1) assess and map the asthma rates in urban and rural MS, (2) highlight asthma health as a spatiotemporally significant disease and not simply from random events, (3) interpolate and model the air quality data concerning the particulate matter (PM 2.5 ) and ground-level ozone (O 3 ), and (4) analyze the statistical association of asthma to air pollution and poverty.

Study area
According to the 2010 census, the MS population is about 3 million people and has a density of 63 persons per square mile [26]. The primary economic activity of Mississippians is agriculture, fishing, mining, and timber. With a land area of 46,923.3 square miles, there are 82 counties, 5 urbanized areas, 64 urban clusters, 69 urban areas, and 424 ZIP Code tabulation areas in MS, see Figure 1 [26].
The following flowchart (Figure 2) explains the methods followed in processing and analyzing air quality and asthma health data. Geospatial statistical techniques were applied to the data of air quality and asthma patients visits. Asthma-related patient (inpatient, outpatient, and emergency visits) data were geocoded to ZIP Code boundaries, and hospital network data containing patient bed information were geocoded to street line data, ( Figure  2). Later, the data were mapped using quantitative choropleth techniques in ArcGIS. Asthma per capita was calculated using census 2010 population data to understand the differences in urban and rural prevalence. Annual levels of PM 2. 5 and O 3 were spatially interpolated by ordinary kriging method.
Asthma per capita data were further explored by spatial statistical techniques (hot spot analysis) to identify the hot and cold spots in urban and rural areas ( Figure 2). Additionally, census population, and poverty data are taken as independent variables were statistically analyzed to understand their association with asthma. Time series models (or) seasonal cycles of asthma-related visits were also generated to reveal the temporal prevalence. The viable options for air quality metrics are to rely on the measurements from routine regulatory and deposition networks, intensive aircraft and ground-based field studies, radiosonde programs, satellite measurements, ground-based remote sensing networks, focused, fixed-site, and special purpose networks [27]. Table 1 below explains the data types used in the study, their sources and the spatial resolution at which the data were obtained.
These estimates reflect the economic characteristics of a geographic area over the entire five year period, and data are available for all geographic areas down to the census block group level [28]. The Census Bureau uses a set of dollar value thresholds that vary by family size and composition to determine who is in poverty. If a family's total income is less than the dollar value of the appropriate threshold, then that family and every individual in it are considered to be in poverty. Similarly, if an unrelated individual's total income is less than the appropriate threshold, then that individual is considered to be in poverty [29].

Geocoding of patient data and hospital addresses
Data were preprocessed in Microsoft access software and segregated by yearly format to be further analyzed using ArcGIS. The first step in the analysis process is geocoding, which assigns patient counts to the corresponding ZIP code. An address locator was created for ZIP code polygons. This tells the ArcMap which is reference data [30]. Once the address locator was created, patient data were geocoded by ZIP code. On average, 96% of geocoding accuracy was achieved for the patient data, the rest of 4% was an error because of the absence of ZIP code information. To visualize the patient counts by ZIP code, patients and ZIP code layers were spatially joined. Later, quantitative choropleth mapping technique was applied to show the number of patients by ZIP code. Data were classified into increasing interval classes based on equal interval method and a color scheme was assigned to each class. Each interval class had a range of patient numbers that increased with each class. Street line geocoding uses a foundation address database to match the addresses of health events or healthcare facilities. This method was chosen for hospitals because of the completeness of the hospital database for address and the method places points at an accurate location on the ground.
An address locator for street lines was created and the hospitals were geocoded to streets, and the fields containing latitude and longitude values were added to the database. Output contained few errors in the form of unmatched addresses because of errors in hospital address database or in street line database. The unmatched addresses were examined and fixed case by case by an interactive re-matching process.
2.1.1. Finding asthma rates by spatial aggregation-Once patient data were geocoded by ZIP codes, the patient counts were joined to the ZIP code polygon. Disease rates were then calculated by dividing the counts with a total population of that ZIP code area.
2.1.2. Investigation of asthma patients using hotspot analysis-In the process of hot spot analysis, a powerful set of spatial statistical tools were used to look at the distribution of values associated with geographic features. The tools used in this part of analysis were (1) Data Management tools (Project and Copy Features), (2) Spatial Statistics (Collect Events, Hot Spot analysis), and kriging tool in spatial analyst extension. By default, patient data were using geographic coordinate system instead of the projected coordinate system. Project tool was applied, and the data were projected by the NAD_1983_UTM_Zone_15N projected coordinate system to preserve the distance. The data that fall within a ZIP code were aggregated by applying "Collect events tool" and the resulting feature contained an "ICount" field reflecting the number of patients in that ZIP code area. The aggregated feature class was used as input and ICount field was used for hot spot analysis.

2.1.3.
Establishing the spatial relationships-Hotspot analysis tool looks for the spatial relationships in the data of interest. A feature with a high value surrounded by the other features with high values is called statistically significant hot spot (red areas) and the feature with a low value surrounded by other features with low values is called statistically significant cold spot (blue areas). The default spatial relationship is "Fixed Distance Band" means the features neighboring to each other within a critical distance receive a higher weight in spatial computation and the features away from the critical distance have no influence, and the distance method is "Euclidean Distance". Distance band was chosen by finding the appropriate scale of analysis. It is difficult to predict the optimal distance (the distance at which the spatial processes are most active and exhibit clusters) band based on what geographical extent these asthma rates are promoting clusters. To estimate an optimal distance band, "Incremental Spatial Autocorrelation tool" was used to find a distance band that reflects the maximum spatial autocorrelation. This tool runs the spatial autocorrelation at increasing distances and assigns a Z-score for the observed spatial autocorrelation and the Z-scores were plotted against the increasing distance. The peaks were observed, and the distance bands were selected based on the of ZIP code areas. Once the optimal distance bands were estimated, the values were used in hot spot analysis. The output of hot spot analysis is a feature class where each feature had been assigned a Z-score and a P-value. Later, kriging method was applied to hotspot feature class and a continuous raster surface of heat map was generated.

Analysis of air quality
During the period 2007-2011, air quality data were analyzed in relation to asthma rates. The two environmental factors that were studied in this research were (1) ground level O 3 and (2) PM 2. 5 . MS had been observing the air quality since many years with a sparse network of stationary ground monitoring stations (Supplementary Figure S1). Mississippi Department of Environmental Quality (MDEQ) is the only agency that operates the ambient air quality network in the state [31]. O 3 monitoring starts in the month of March and continued until the end of October for each year. PM 2.5 sampling was done every third day and continued throughout the year. Lack of monitoring station network at specific spatial and temporal intervals was the bottleneck for optimal data size, thus to improve the richness of spatial data and the model accuracy, data were collected from the neighboring states of Louisiana, Arkansas, Tennessee, and Alabama to be integrated into the geospatial statistical analysis. Sampling stations available for each year are varied by a range for each pollutant studied. From 2007 to 2011, the yearly numbers of sampling stations for O 3 were 141, 89, 89, 95, and 97, and the stations for PM 2.5 were 124, 119, 135, 138, and 131, respectively.

Spatial interpolation-
In this study, a geostatistical method called ordinary kriging was applied to estimate the values at unmeasured locations. The principles of geostatistics operate on two key tasks: (1) to uncover the dependency rules and (2) to make predictions. Kriging is based on semivariogram and covariance functions, and the prediction of unknown values [32]. This method not only predicts the values at unmeasured locations but also provides the measure of the accuracy of prediction [33,34]. where: Z(s 0 ) is the value to be predicted for location so; N is the number of measured values; λ i are the weights assigned to each measured point; Z (S i ) is the observed value at the location S i .

The geography of asthma prevalence
Spatial analysis indicated that the asthma-related patients have increased geographically over a decade. Largest numbers of patients were observed in urban regions, and the highest asthma rate was found in rural regions (Supplementary Figure S2). Evidence from statistically significant hot spots (northwest, southwest, northeast, and southcentral regions) of asthma rates indicated that the disease had taken a major turn between 2009 and 2011. Highest rates of asthma (4-5.5%) were observed in Prairie, Monroe County for the three consecutive years ( Figure S2). Prairie, Monroe County (red spot on the map) appeared to be the victim of asthma with highest levels of observed prevalence for three consecutive years, 2009-2011. The areas of northwest and southcentral MS are also significant for asthma. Although a higher asthma prevalence is not seen in the coastal region populations ( Figure  S2), the number of patients is increased every year near Gulfport-Biloxi-Pascagoula because of increasing population [38].
The maps of statistically significant hot spots and cold spots of asthma rates from 2009 to 2011 are presented in Figure 3. The summary of Z-scores and the distance observed at maximum autocorrelation were presented in Table 2. A high Z-score and small P value for a feature indicate a significant hotspot, a low negative Z score, and small P value indicates a significant cold spot [39]. Asthma health is not random in populations. It shows a significant spatial phenomenon indicated by the spatial clustering ( Figure 3). The high Z-scores showed the Table 2 below indicate that the residuals are statistically significant and reject the null hypothesis that the asthma health is spatiotemporally random and spatially not autocorrelated.

Temporal pattern mining on asthma patient data
A steady and decreased trend in asthma-related visits was observed from 2005 to 2009 and the trend increased again beginning in 2010 ( Figure 5). The highest daily average rates were observed in 2005 and the lowest in 2009 (Figure 4). For the remaining years, rates had fallen mostly between those two years. Average numbers of visits were lowest  in the months of June and July for all the analyzed data, increased beginning in August and peaked between the months of October and November.
From 2003 to 2011, the observed temporal pattern of asthma inpatient, outpatient, and emergency visit data is presented in Figure 4. The similarity in patients visits cycle, the pattern of timing, and common peaks were constant for all analyzed years, which indicated that the factors responsible for asthma exacerbations might be constant. These frequent exacerbations may be an indication of the greater severity of disease [42]. Asthma exacerbations peaked between September and October in each yearly cycle in MS. A related study investigated the similar phenomenon in North America and concluded that the increased consultations for childhood asthma every September were uniquely related to school return [43]. Children heading back to school had closer personal contact with many more children, therefore, increasing their exposure to viral and bacterial infections that could trigger an asthma attack [44]. A graph of total asthma-related patient visits is presented in Figure 5.

Statistical interpretation of asthma and population size
Between 2009 and 2011 (Figure 8), approximately 59% of asthma-related inpatient, outpatient, and emergency visits could be explained by the population size (coefficient of determination, R 2 = 0.588 and correlation coefficient, r = 0.77). The relationship linearly increased to a maximum of 65%, R 2 = 0.648 until it reached a population size of 3,5000 per zip code and the relationship decreased to 59% as the population size increased to a maximum of 47000 per ZIP code. This is a statistical signature about the contribution of other variables at densely populated (>3,5000/ZIP code) areas. Generally, higher populations were seen in urban regions, thus indicating the effects of urban risk factors in the prevalence of asthma. Figure 8 provides an explanation about the dependency of asthma-related visits to corresponding population size at ZIP code spatial scale. Although a higher asthma prevalence was observed in rural MS, its association was proportionately distributed in the areas of ZIP codes with a population range of up to 35,000 indicating that the risk factors were constantly simple. As the population range increased above 35,000, the phenomenon became complex and was evidenced by the scattered data points in Figure 8, and also evident from population regression. It implies that complex risk factors were involved in the ZIP codes of urban areas with a larger population, which indicated that geography and urban settings are playing important roles. Research studies from many cities have documented that the urban heat island effects range from decreases in air quality, increased energy consumption, and alteration of the regional climate to direct effects on human health [41].
Based on the results in supplementary Table S1, the adjusted R-squared value was significantly improved (R-squared-0.723, 0.934, and 0.930 for 2009, 2010, and 2011 respectively) from the model that used only the population data (R-squared-0.588 in Figure  8). Within the population, the regression results explain the asthma dependency on children and older adults ( Figure 9) and the probabilities with asterisk resemble the statistical significance. Coefficient [a] with a positive value indicates the positive relationship with asthma and a negative value indicates a negative relationship. Figure 9 provides the data for understanding the relationship of each variable with asthma count. The statistical significance in the Jarque-Bera statistics indicates a significant clustering in the fitted model, and the residuals from regression analysis are normally distributed, which means that the spatial pattern in the asthma health is not randomly generated. The asthma count of adjacent years is well correlated to the studied year, that means the health condition is mostly tied to the threshold levels from neighboring years indicating a consistent population that possessed the health condition.
Poverty and asthma per capita can be explained by the PM 2.5 for 2011, as shown in Figure  10 and coefficient [a] in supplementary Table S2. The relationship is positive from the positive sign on coefficient [a] and is further explained by the probability and robust probability columns, where an asterisk shows a statistical significance. Jarque-Bera statistic is also statistically significant, which means that the residuals from the fitted model did not show a random behavior, and the spatial pattern explains a significant clustering in the fitted variables. There is overwhelming evidence that exacerbations of asthma in terms of casualty attendances, hospital admissions, and deaths are related to poverty or to groups that are prevalent in poor sectors of the society [46]. Many published articles have discussed the poverty problem in the MS Delta [37,[47][48][49].
Correlation analysis revealed an overall correlation coefficient (r) of 0.186 between monthly average levels of ozone and asthma count ( Figure 11). The trend appeared to be synchronized in recent years from the second half of 2009. Since then, r increased to 0.572, indicating that there was a moderate positive correlation [35] between both the data.
Correlation analysis did not reveal any significant association between PM 2.5 and asthma (Figure 12), and there was a negligible negative correlation [35] of −0.125 observed between the data.

Discussion
The comprehensive geospatial approach undertaken in this research is first of its kind for addressing asthma problem in MS. Derivation of statistics of asthma at zip code level is unique in this research. The stakeholders, the public, and the administrators can understand about dissemination of disease and its affected regions by the help of produced maps and data. During 2009-2011, northwest MS has seen a continuous increase in the prevalence of asthma (Supplementary Figure S2 and Figure 3). This region is a floodplain of black alluvial fertile soils known for its agricultural heritage. Although poverty continues to prevail, farming remains the backbone for this region's economy [36]. Many of the urban living patterns were seen in the rural delta region of MS such as the limited access to health care, pollution, and other environmental factors [37]. Large-scale interventions to reduce morbidity and mortality among rural patients with asthma in the U.S. have not been designed or implemented, despite rural Americans representing one of the most highly disadvantaged populations in the U.S [40]. Higher prevalence of asthma in rural MS could be an indication to support this fact. Asthma exacerbations were increasingly observed in the Jackson County region from 2006, urbanization might be an important risk factor in this region, which contributes to the accumulation of air pollutants by trapping heat.
Moderately significant associations were observed between monthly levels of O 3 to asthma exacerbations, and the monthly associations were insignificant with PM 2.5 . This indicated that asthma populations are susceptible more towards seasonally varying environmental variables (O 3 ), also for seasonal variability in asthma exacerbations almost similar trend is observed in every season for all the years studied. For 2011, there is a positive relationship observed and the annual PM 2.5 levels depended on asthma per capita and poverty rate, as shown on Figure 10 and supplementary Table S2. The lag analysis might be useful to understand the health effects during short-term air quality-asthma events, but the current study had utilized the data from multiple years. The relationships with O 3 were established only during the O 3 sampling season from March to October in every year because the data were available only during this period.

Conclusion
Asthma is a complex health problem in MS, and distinctive spatial and temporal dimensions have been observed in asthma data. As hypothesized, its prevalence and risk factors were different in rural and urban areas. The risk factors were constantly simple in rural areas and more complex in urban areas. The highest rates (4-5.5%) were discovered in Prairie in Monroe County for three consecutive years. Statistically significant hotspots were observed in urban regions of Jackson and Gulf port, which mean the areas where a high number of patients are found in neighboring ZIP codes. It was visualized from hot spot analysis that the asthma rates have steadily increased near urban Jackson and the area between Jackson and Meridian metropolis. It is recommended that the Jackson urban area is monitored closely for disease prevalence in the future. The highest concentration levels Resulting spatial data and information produced to provide a new insight into the management of environmental health data. The visualizations, info-graphics, and pictograms could be useful for decision making and asthma-related healthcare delivery in Mississippians. Statistically significant hotspots indicated that causative spatial processes are at work. Spatial analysis of asthma showed significant clusters in this delta region. Therefore, geographic clusters of poverty-asthma associations would be interesting to understand and may better visualize through regional scale analysis. Every year, vast amounts of population characteristics data are being generated by U.S. Census Bureau and are made available through its publicly available portals.
The environmental health GIS research results and disease maps generated from this research could potentially be useful to federal and state health authorities in treating the population at risk, as well as to develop and implement health advisories. Educational and awareness programs could be initiated, and proactive health needs may be delivered in targeted regions of MS (Delta, coastal, southcentral, and Prairie regions).

Limitations
A network of sparse spatial observations for air quality is a limiting factor to obtain a trustworthy data at specific spatial intervals, and this is a backdrop for MS. Not all sources of data are available at a spatial resolution of ZIP codes. Intercensal population estimates for all zip codes were limited for this study. Because of this reason, disease rates were calculated for 2009, 2010, and 2011 by taking the 2010 population data as being constant for three years. Change of support problems (COSP) may result when trying to link the exposure data to the health outcome information because the two variables have inherently different scales. The disease is specific to an individual, but air quality varies over a continuum. Hence, these two different types of data not be always related in a way that permits a valid inference [51].

Recommendations
The contribution of other confounding (agriculture, mobile source pollution, socioeconomic, and racial) variables must be investigated to address the problem from its roots. Once these risk factors are identified, the public could be informed, and proactive measures can be taken to avoid the triggers and exposure. It may be a good decision to implement health education programs and health advisories in the MS Delta population. A street line data with information on updated routes and travel speed limits may be required to analyze the drive time for patients, which is a technique called network analysis that is used to better identify the underserved patients and deliver appropriate health care needs [52]. MDEQ may have to review plans for the observational network. A disease rate of around 4 to 6% in some zip codes is a significant issue to be addressed. Analysis of recent health data might reveal many insights on the potential conditions and geospatial distribution of asthmatics.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.