Quantifying the land and population risk of sewage spills overland using a fine-scale, DEM-based GIS model

Accidental releases of untreated sewage into the environment, known as sewage spills, may cause adverse gastrointestinal stress to exposed populations, especially in young, elderly, or immune-compromised individuals. In addition to human pathogens, untreated sewage contains high levels of micropollutants, organic matter, nitrogen, and phosphorus, potentially resulting in aquatic ecosystem impacts such as algal blooms, depleted oxygen, and fish kills in spill-impacted waterways. Our Geographic Information System (GIS) model, Spill Footprint Exposure Risk (SFER) integrates fine-scale elevation data (1/3 arc-second) with flowpath tracing methods to estimate the expected overland pathways of sewage spills and the locations where they are likely to pool. The SFER model can be integrated with secondary measures tailored to the unique needs of decision-makers so they can assess spatially potential exposure risk. To illustrate avenues to assess risk, we developed risk measures for land and population health. The land risk of sewage spills is calculated for subwatershed regions by computing the proportion of the subwatershed’s area that is affected by one modeled footprint. The population health risk is assessed by computing the estimated number of individuals who are within the modeled footprint using fine-scale (90 square meters) population estimates data from LandScan USA. In the results, with a focus on the Atlanta metropolitan region, potential strategies to combine these risk measures with the SFER model are outlined to identify specific areas for intervention.


II. Data Cleaning
The sewage spill records contain an overflow address, which were geocoded using the ESRI World Geocoder in ArcGIS Pro 3.0.3.There were significant variations in the way that the addresses were written because these are records collected from a variety of sources.-195 -075 and C-195-073."The geocoder was mostly able to parse the addresses even with these inconsistencies of address input and was able to match 1727 and find ties for 41 of the 1909 records.The average of the geocoding score was 97.03.There were four records that were geocoded outside of the state of Georgia, which were removed from further analysis.Later in the process, in an attempt to mitigate the issues with geocodes, any record that did not meet the following criteria were dropped: the geocode score was under 90, or the geocode's city/state and the spill record city/state did not match.This resulted in 1600 records left.Further, 57 records were dropped because they recorded 0 liters of sewage spilled, meaning that the persons notifying the Georgia Environmental Protection Division were unable to quantify the amount of sewage spilled or the spill was less than 1 gallon.
Of the other six datasets, four had minor data cleaning efforts applied.The 29 3DEP Digital Elevation Model (DEM) tiles that compose the state of Georgia were merged using gdalwarp (see supplementary materials for the specific tiles and the command line arguments at https://osf.io/hs6g9/).For both the elevation raster and the LandScan USA 2021: day raster, we wanted to minimize the amount of records imported into the database, so we utilized the ArcGIS Pro 3.0.3Extract by Mask tool (ESRI 2023a), where the raster was clipped by the 400meter buffer surrounding the traceline.The subwatershed layer within the NHDPlus High Resolution dataset was extracted and all polygons that were within the TIGER/Line Georgia state shape file (U.S. Census Buereau, 2021c) were kept.Some of the subwatershed geometries were "invalid" geometries because of self-intersections, so we utilized the PostGIS function, ST MakeValid() (PostGIS, 2023c), with default parameters to "make valid" these polygons.Lastly, in the ACS table, we removed excess columns and ensured the formatting of the "Geo FIPS" column would join with the TIGER/Line census tracts "geoid" column.

III. Data Integration
In order to handle the differences in support of the spatial data, we performed spatial data transformations on two datasets.The LandScan USA 2021: day and the merged elevation raster of the USGS DEMs were transformed into points using ArcGIS Pro 3.0.3"Raster to Point (Conversion)" tool (ESRI, 2023d).This tool creates a point at the center of the raster cell.Then a buffer is added to the point; for the LandScan points, the buffer size was 35 meters while the DEM points had a buffer of 4.5 meters.These buffer sizes were intentionally chosen to be just slightly smaller than the raster cell size so to count spatial areas only when the polygons intersected a significant portion, so that a spatial unit is not included if only touching a small sliver of the cell area.This choice could mean that buffers around the start and end point are slightly under grouped, or it could also mean that the population counts are less than if joined with the raster.The changing of the support of these raster data also allows for easier spatial joins within PostGIS.The support for the NLCD 2019 Percent Developed Imperviousness raster (U.S. Geological Survey, 2019) also had to be changed to polygons.To do this, Zonal Statistics as Table (ESRI, 2023f) was performed on the raster for each of the subwatershed regions in the Atlanta region.This tool extracted the mean value of imperviousness for the subwatershed region.

C. Development of Figures
For all maps in figures, ArcGIS Pro 3.0.3(ESRI (Environmental Systems Research Institute, Inc.), 2022) was utilized.For Figures 2 and 3, we created fake spill locations at high elevation regions of Georgia in order to best illustrate these components of the model.For Figures 4 and 5, we utilized real spill information to create these informative figures.Figure 6a, Public Health Exposure Score for Atlanta was created by clipping all census tracts that were out of the bounds of the Atlanta Place Boundary (U.S. Census Bureau, 2021d).The choropleth map is split by quantiles (evenly split between the 5 classes).The top quintile of Figure 6a, was utilized to create Figure 6b.These census tracts were joined with the ACS 2021 columns, Total: Male: Under 5 Years "ACS21 5yr B01001003" and Total: Female: Under 5 Years "ACS21 5yr B01001027."Figure 6c was created by clipping all the subwatershed regions boundaries by the Atlanta Place Boundary (U.S. Census Bureau, 2021d).The choropleth map is split by quantiles, or evenly split between the 5 classes.To create the imperviousness percentage for subwatersheds map, Figure 6d, the raster underneath the polygons was summarized by subwatershed region using ArcGIS Pro's Zonal Statistics tool, which gets the mean of the raster cells for each of the subwatershed regions in the Atlanta region.The classes (seen in the legend) are split by the quantile method (evenly split).Figure B1ab was created just like Figure 6ac, but with the Day Landscan.Figure 7a was created by intersecting the top quintile from Figure 6a and 6c.The pins are the centroids of the intersection geometries.Figure 7b was created by overlaying the text of the summed quantity of the 5 spills that occurred at this location (662,123).Next to this model we took screen shots from Google Streetview (Google Maps, 2022) of the small urban farm across the street.We anonymized the location so not to impact the community farm.

Figure B1 :
Figure B1: Top figure: Choropleth map of census tracts by population risk metric using night Landscan raster; Bottom figure: Top quintile of population risk map by population under 5 from ACS 2021.

Table 1 :
Datasets Information Two differing examples of the overflow address are: "Manhole on the West Side of the Northernmost section of Burnt Hickory Road just south of the railroad at the intersection of Burnt Hickory Road and Hwy 293" and "2414 Stone Road Between Manhole ID C