Using geographical information systems and cartograms as a health service quality improvement tool

Introduction: Disease prevalence can be spatially analysed to provide support for service implementation and health care planning, these analyses often display geographic variation. A key challenge is to communicate these results to decision makers, with variable levels of Geographic Information Systems (GIS) knowledge, in a way that represents the data and allows for comprehension. The present research describes the combination of established GIS methods and software tools to produce a novel technique of visualising disease admissions and to help prevent misinterpretation of data and less optimal decision making. The aim of this paper is to provide a tool that supports the ability of decision makers and service teams within health care settings to develop services more efﬁciently and better cater to the population; this tool has the advantage of information on the position of populations, the size of populations and the severity of disease. Methods: A standard choropleth of the study region, London, is used to visualise total emergency admission values for Chronic Obstructive Pulmonary Disease and bronchiectasis using ESRI’s ArcGIS software. Population estimates of the Lower Super Output Areas (LSOAs) are then used with the ScapeToad cartogram software tool, with the aim of visualising geography at uniform population density. An interpolation surface, in this case Arc-GIS’ spline tool, allows the creation of a smooth surface over the LSOA centroids for admission values on both standard and cartogram geographies. The ﬁnal product of this research is the novel Cartogram Interpolation Surface (CartIS). Results: The method provides a series of outputs culminating in the CartIS, applying an interpolation surface to a uniform population density. The cartogram effectively equalises the population density to remove visual bias from areas with a smaller population, while maintaining contiguous borders. CartIS decreases the number of extreme positive values not present in the underlying data as can be found in interpolation surfaces. Discussion: This methodology provides a technique for combining simple GIS tools to create a novel output, CartIS, in a health service context with the key aim of improving visualisation communication techniques which highlight variation in small scale geographies across large regions. CartIS more faithfully represents the data than interpolation, and visually highlights areas of extreme value more than cartograms, when either is used in isolation.


Introduction
Geographical Information Systems (GIS) have a long history of use within health care, particularly public health (Wennberg and Gittelsohn, 1973;Twigg, 1990;Bullen et al., 1996;Higgs and Gould, 2001;Nacul et al., 2011), to visualise the epidemiology of disease, delivery of health services and allocation of resources.The use of small-area data to describe variations in healthcare was developed in the 1970s by Wennberg and Gittelsohn (1973); over recent years the increased use and capabilities of technology, combined with improved access to data, has allowed the development of atlases that visualise geocoded health data.The production of atlases in the United Kingdom (UK) (NHS Right Care, 2010), building on similar American atlases produced by the Dartmouth Institute (Dartmouth Medical School, 1998), have raised questions about why there are variations in clinical outcomes at regional and local levels.Subsequently significant strides have been made to reduce unwarranted variation in clinical outcomes through improving the quality and consistency of care processes e.g. a Chronic Obstructive Pulmonary Disease (COPD) Care Bundle (Hopkinson et al., 2012).Whilst atlases have been instrumental in highlighting the disparities in health outcomes nationally there still remains some concern about how these data are translated into strategies to improve outcomes (Joyce, 2009).With the potential for more and more data to be available for such analysis, including patient reported outcome measures (PROMs) and patient reported experience measures (PREMs), the question arises at how technologies can be used to ''intelligently'' interpret the data to improve the quality of services that will translate into improved outcomes and patient experience.
GIS can often be perceived as simple visualisation of data through mapping, this belies the complexity of data and the interaction with the geographical or spatial plane used for display.Communicating this complexity effectively and accurately provides opportunities for the ''viewer'' to interpret the data and identify patterns.When visualising datasets, the resolution of display is critical to the information portrayed, with larger geographic aggregations masking heterogeneity at finer scales, and potentially causing incorrect interpretation of the data.However, operating at finer resolutions can result in a loss of data in areas of relatively small geographies and, when using choropleths, importance can mistakenly be assigned to areas of larger extent, rather than areas of higher value (Pickle and Carr, 2010).Different methodologies demonstrate varying strengths and weaknesses for data visualisation; decision makers must utilise the most appropriate method for their purpose.
A key challenge is how to present service activity data in such a way as to be helpful to commissioners, clinicians, healthcare managers and policy makers to support initiatives to improve the quality of services (e.g.Future Hospital Commission, 2013; Green et al., 2012;Noble et al., 2012;Welch and Allen, 2006).In this paper, we explore the use of a combination of methods as a possible solution to this visualisation problem, using health service utilisation data derived from Hospital Episode Statistics (HES) (www.hscic.gov.uk/hes)data as an example.

Geographic visualisation-problems and solutions
There are two common problems in geographic visualisations that are often overcome using methods that are simple to apply and interpret; cartograms and interpolation surfaces, which are discussed in the following sections.

Problem 1
When working with real data pertaining to differences in the geographic sizes of the units in question can prevent ones interpretation of underlying patterns within the data values of interest as larger areas have bigger visual impact.A solution to this is the use of cartograms.

Cartograms
Modern cartograms stem from Émile Levasseur's work in value-by-area maps (Tobler, 2004) with the aim of creating variable sized rectangles to show a value such as population, and grouping them to correspond to their geographical position.Dorling Maps (Dorling, 1995) produce a visual output without retaining topography, which is the reason it was not chosen for this study, by converting each region into a circle whose size represents the chosen variable.The first computer generated cartogram was created by Waldo Tobler, named pseudo cartograms, where areas are expanded or compressed along the latitude/longitude grid to achieve equal value density.Gastner & Newman's method (2004), a variation of Tobler's termed densityequalising maps, is based on a diffusion process where geographies are said to ''flow'' until uniform density of the chosen variable is achieved resulting in a lack of real geography.This method provides the functionality to modify the parameters to adjust the amount of geographical preservation and the achievement of uniform density.The development of the density-equalising visualisation technique was to solve problems associated with good spatial resolution, small sample sizes and variable population density.The Gastner & Newman algorithm was developed with the aim of simplifying the complexity of cartogram creation and their outputs by improving their usability, which has been incorporated into a number of pieces of software including ESRI's ArcGIS software (ESRI, 2014) and the open source product ScapeToad (Chôros Laboratory, 2014).However when using a large area high resolution choropleth with high data heterogeneity a cartogram can become visually problematic (Kaspar et al., 2011).

Problem 2
When data have been attributed to specific points across a study region as discretisation of values can occur with no knowledge about the values in between the points.Spatial point analysis has a history in disease and epidemiology research, dating back to John Snow's seminal cholera mapping to a local water pump in Victorian London.The pattern analysis of such point geographies has been in use since the 1950s, originally in geography and plant ecology (Gatrell et al., 1996).A solution is to use an interpolation surface.

Interpolation surfaces
Modern software and databases have expanded the usage and abilities of geographical point analysis to create surface layers from data points.Interpolation surfaces, as with density equalising maps, are a visualisation tool that can provide insight into the connected spatial variation of data.The general concept is to connect a series of points with recorded values to create a surface over the region they cover, which predicts the value in the unrecorded space between them.In health data visualisation these are predominantly used for 'hotspot' analysis to identify areas of highest occurrence.The output can be purely related to the density of points, or linked to a value of that point, e.g.number of hospital attendances.However, this estimation of the values in unrecorded space is reliant on the density of the recorded values, and can create fallacious signals rather than representations of underlying data, due to mathematical approximations between data points.

Health data
The National Health Service (NHS) in the UK collates a large amount of data generated from each interaction a patient makes with the health service.Data are reported to the Health and Social Care Information Centre, which processes 125 million individual episodes of care each year, in the form of HES.The most common activity data collected are outpatient department attendances, elective admissions to hospital and emergency department (ED) attendances and subsequent admissions.Although anonymised to protect the privacy of patients, the data include demographic and clinical information that can be used by researchers to analyse temporal and geographical trends, as well as more complex analyses.Numerous published studies have demonstrated the utility of HES data, including the identification of population and healthcare factors associated with increased COPD admissions (Calderón-Larrañaga et al., 2011), increases in emergency admissions related to sickle cell disease (Aljuburi et al., 2012) and identifying links between the smoking ban and a reduction in asthma-related emergency admissions (Millett et al., 2013).The continuing development of new technologies and novel platforms for data visualisation have led to the development of explicit strategies in the UK to harness data for research through greater access to patient data across research and industry (Department for Business Innovation and Skills, 2011).Furthermore, UK policies have been developed that explicitly focus on improving the delivery and effectiveness of clinical services through the use of routine data and metrics (Department for Health, 2008).
Informed decision making is a key part of healthcare resource allocation.GIS allows for a more detailed analysis of small area variation, developing the integration of multiple aspects of data into a clear message (Wennberg & Gittelsohn 1973;Noble et al., 2012).Outputs do not require the user to have intimate knowledge of the data integration, though it can help their understanding as well as in furthering the implementation of GIS and its various aspects into decision making (Baum et al. 2010;Clarke et al., 1996;Ricketts 2003).The more accurate the data representation, the better able commissioners, health service providers and the community will be to act towards improvement.
Visualising geographic health data has often been based around point locations, and aggregations of these to polygonal regions of real geographies.The presented work does not aim to demerit any such work, which is considered by the authors to be the backbone of geo-visualisation, but instead provide an adapted perspective that can be used to aid decision making for health service provision.At the time of data collection UK health services were geographically arranged into hierarchical divisions consisting of 10 Strategic Health Authorities (SHA) and 152 Primary Care Trusts (PCT) who were responsible for local delivery of care.Census based geographies are aggregated to Lower Super Output Areas (LSOA) that are intended to cover an area of 1500 residents, though there exists some variation.The concern aired here is that the use of these real geographies loses the visual impact of variation that occurs in public health relating to population density, as well depending on boundary specific data with no information of the relationship between neighbouring localities.
We describe a new method that attempts to overcome the problems described and combine the strengths of the two separate techniques, cartograms and interpolation surfaces.In the following section, using an example dataset, we analyse the results and discuss the potential utility of this method in light of expert feedback and in general principles.

Methods
ESRI's ArcGIS 10.1 for Desktop and the open source software ScapeToad are used to combine simple GIS methods and tools, manipulating the data to provide a novel methodological output, a combined cartogram interpolation surface we term CartIS.To demonstrate the practical application of CartIS we analysed data on COPD and bronchiectasis taken from the ICD-10 (international classification of diseases, tenth version) codes J40-J44, J47 and J96.X that specified ED admissions in all England during the 2010/ 11 financial year.Data obtained from HES was aggregated to LSOA geographies, retaining anonymity, allowing it to be directly linked to the Office for National Statistics (ONS) LSOA boundary dataset (www.ons.gov.uk).Mid 2010 population count estimates and LSOA boundaries were retrieved from the ONS, population estimates are based on 2001 census population data.These boundaries could then be aggregated to PCT and SHA geographies (www.edina.ac.uk/census) using LSOA centroids, a necessity as the 2004 PCT and SHA boundaries are not coterminous with the 2001 LSOAs, or each other outside of London.These LSOA boundary alterations reflect population changes to ensure that boundaries adhere, as near as possible, to covering an area of 1500 residents.To provide sufficient but not over exhaustive analysis, the use of LSOAs has been restricted to London SHA.Whilst PCTs have now been replaced by Clinical Commissioning Groups (CCGs), any conclusions drawn can still be considered relevant due to the focus on a finer scale of geography and the comparison between the two.Data was linked to the LSOA shapefile using the LSOA code, which for COPD and bronchiectasis is taken from the patient's place of residence.
Cartograms created using ScapeToad based on the mid-2010 population estimates of LSOAs in London SHA, were used to visualise these data accounting for differences in population density, using the same classification as the normal geography to allow direct comparison.This software tool uses the Gastner & Newman method ( 2004) to perform an iterative diffusion using the chosen value attribute upon the original geography.Input parameters used within the tool have been recorded and placed into Appendix A.
Once completed, interpolation surfaces were created with ArcGIS's Spline tool, one each for the normal and cartogram geographies, using the LSOA centroids and both included all LSOAs within 10 km of London SHA, to prevent possible edge effects (Gatrell et al. 1996).The interpolation surface is then clipped back to the London SHA extent before being classified with a colour gradient to clearly depict the variation within the dataset.
For direct comparison between cartogram and real geographies, an average nearest neighbour test was carried out using the centroids of LSOAs from both geographies to assess the pattern of dispersion which creates a Z-score and nearest neighbour ratio, given by the ratio of the observed mean distance to the expected (random) mean distance to the nearest centroid.A count of the number of pixels in the interpolation surface raster gave a percentage of those that were extreme values, given by the limits of the original data.LSOA size was analysed to assess the geographic effect of uniform population density in the cartogram.

Results
Visualises COPD admission values for England SHAs (1a), London PCTs (1b) and London LSOAs (1c) highlighting heterogeneity in the data which would be lost at higher geographies Fig. 1.The nature of LSOA geographies fluctuating in size relative to population density visually highlights areas where geographies are larger, though not necessarily of more significance.Fig. 2 presents a possible solution to this issue using the properties of a cartogram, which deforms the geographies based on the mid-2010 population estimates.The effect of using a cartogram is that the range of sizes that each LSOA covers is reduced, with the ratio of largest area to smallest area decreasing from 860:1 in standard geography to 18:1 in the cartogram, while the average remains roughly the same at 0.334 km 2 .Clearly visible is the apparent swelling of areas in central London, where the population is denser, while the areas of LSOAs on the outskirts decreases, leading to an overall equalising of LSOA area size variation.While this does, to best effect, solve the issue of visual priority, the gradient classification still limits the user's visual understanding of data variation and connectivity.
The use of an interpolation surface, which estimates surface values at unsampled points based on known surface values of surrounding points, is demonstrated in this example with the ArcGIS spline tool.The classification is based on the minimum and maximum of the dataset and is initially produced upon normal geography, Fig. 3a.The combination of both the cartogram and spline, CartIS, produces a population-based geography with a stretched classification surface, Fig. 3b.
The nearest neighbour ratio and z-score produced from the average nearest neighbour tool within ArcGIS was higher for LSOAs within the cartogram, 1.24 and 31.78,than for real geography, 1.15 and 19.57, respectively.The percentage of extreme low and high values taken from the interpolation surfaces were 20.6% and 0.06% for real geography and 23.2% and <0.01%respectively for CartIS.

Discussion
Data at SHA level shows some geographical heterogeneity but whilst it is useful for broad brush benchmarking, it is ineffectual in identifying specific areas that could be targeted for improvement.To date, the standard unit for large area analysis has been the PCT, utilised for regional benchmarking of clinical outcomes, as demonstrated by the various atlases.The analysis at LSOA level provides greater detail of the geographical heterogeneity within each PCT, as shown for London in Fig. 1c and highlights that trends at this level can be more clearly represented at smaller geographies with data that is already available.
Following a literature review that uncovered no similar previous work (Appendix B), the method presented here, CartIS, tests the feasibility of combining these two techniques, density-equalising maps and interpolation surfaces, to overcome the issues of focus on geography rather than populations.CartIS is an attempt to produce visualisations that work on the strengths of these separate techniques and produce an easily readable representation that can be used for health care analysis.
The use of a cartogram transformation to reshape the geographic regions so that a chosen attribute, e.g.population density, is uniform depends on the resolution of data upon which this deformation occurs.A higher resolution can leave the visualisation unrecognisable, while a lower resolution will not ensure that the variable is uniform (Gastner and Newman 2004).In practice, the CartIS uses a resolution which aims to maximise the uniformity of the chosen variable rather than on preserving geographies to ensure accurate information can be portrayed.It should be noted that unless practitioners have a clear idea of the ''undistorted '' map (i.e. Fig 1b), correctly interpreting cartograms (i.e.Fig 2) can be problematic, although some shapes are distinctive enough that even when heavily distorted they are recognisable.In other maps, features such as rivers and major roads can be inserted to aid with visual interpretation.Following on from the LSOA, PCT and SHA regions, the use of an interpolation surface allows the visualisation of the variability in the dataset and locations of similar neighbouring values.The nature of the tool is to use the nearest n number of points to create a smooth surface across the region of study using, in this example, LSOA centroids.What this means is that areas of lower population density have larger space over which to smooth values, a side effect of which is extreme values that do not represent the underlying data.This is caused as a result of the technique, meaning it is not constrained to the minimum or maximum of the input dataset.Thus, when two high values are surrounded by lower values the created surface peaks above and in-between these points to ensure that it is smooth.The extreme values generated when there are no further points to calculate from beyond the boundary of data are referred to as edge effects.These are overcome through the inclusion of values that lie outside of the study region (Gatrell et al., 1996) and then cropped prior to completion.
The comparison of real and cartogram geography LSOA size, nearest neighbour analysis and interpolation surface extreme values reinforce the strength of CartIS as a viable method.A decrease in range and similar average size from real to cartogram LSOAs is an expected effect of the cartogram as it is designed to achieve uniform population density.The goal of uniformity guarantees a change in LSOA geography but maintains similar average size to aid visual recognition while enabling detailed small scale analysis.The z-scores and nearest neighbour ratio reinforce this by indicating that LSOA centroids within the cartogram are more dispersed, as shown by the higher values, and therefore mitigate the visualisation problem of population density.While the percentage of extreme negative values is greater for CartIS, this is of little impact and is an expected product of the zero weighted sampling of admission values.CartIS does manage to reduce the percentage of extreme positive values and the visualisation methods used highlight positive values rather than negative ones.Healthcare resources will not be allocated to areas of low or zero prevalence, but rather to focus on genuine need in areas of high prevalence.
GIS is a useful method for resource planning and is increasingly being used in health care where it is important to present as true a picture of the healthcare needs of the population as possible, but there is an uneven knowledge of GIS skills in such settings (Ricketts, 2003).Although large scale evaluation was not done, preference testing towards the CartIS methodology on a convenience sample of 66 was undertaken (Appendix C).Feedback supports the presented method but reinforced concerns around the ability of non-professionals to fully grasp the output.We maintain that the CartIS method is superior to other solutions, despite some feedback relating to the viewer comprehension of the visualisation, because CartIS more faithfully represents small scale data than the presented alternatives.If there were an increased uptake in usage we consider that there would be greater comprehension and application of the information contained within CartIS maps.Indeed, half of the feedback respondents, when the methodologies behind the maps were explained, chose CartIS as the best single map visualisation.Due to the expert focus of the sample group there is space for expansion and development of this feedback that would provide greater value and a more in-depth understanding of the user's needs and skills when using CartIS and GIS as a whole.Further research is also required on the potential dependence of the interpolated output based on the precise choice of parameters, and effect on the interpretation of the outputs this might have.
Developing this work further it is envisaged that disease and non-disease specific investigations would be undertaken at both regional and national scales for the purpose of influencing strategic planning and targeting of population level interventions.The work presented here is an exemplar dataset that can be applied to any number of similar, richly detailed analyses.

Conclusion
In this article we have combined the use of cartograms and interpolation surfaces into CartIS.It provides an effective tool for the visualisation of variation in small scale geographies across larger areas to individuals with little technical understanding of the methodology used.The difference between normal and cartogram geographies warrants the additional explanation and teaching, be that through lectures, tutorials or user guides, which would allow individuals to assess and effectively implement changes based on the data.While GIS has been around decades, healthcare professionals and the public's understanding of the techniques available remains limited, as such does the ability of the skilled GIS practitioner to invoke their use in improvement projects.CartIS represents a visualisation with population density and public health data more closely aligned to the data than simple interpolation and easier to interpret than a cartogram.The CartIS method provides a mapping product that forms a visual narrative for decision makers and has the advantages of providing information on the position and size of populations as well as the severity of disease.This method is presented for consideration and suitability testing.

Fig. 1 .
Fig. 1.Choropleths of total COPD emergency admissions from 2010/11 at (a) SHA level for England, (b) PCT level in London and (c) LSOA level in London.