The Canadian Geospatial Data Infrastructure and health mapping L’infrastructure des données géo-spatiales canadienne et la cartographie de la santé

Due to the recent outbreak of SARS and the danger of pandemic Bird Flu, the ability to strengthen health surveillance and disease control is a growing need among governments. The development of the Canadian Geospatial Data Infrastructure (CGDI) has shown great potential in many industries such as emergency management, public health, disaster relief, environmental impact assessment, transportation, and land information systems. In this paper, our aims are to use the CGDI and to identify its usability in supporting online health mapping. To identify the usability of the CGDI for health mapping, we employed nine usability metrics. We also designed an architecture based on the CGDI to support the basic functions for health mapping, and implemented an infectious disease simulation for New Brunswick and Maine. Within Cybergeo : Revue europeenne de géographie, Cartographie, Imagerie, SIG, N°434, 08/12/2008 2 the CGDI framework, this research enabled cross-border health data visualization, integration, sharing, and exploring the spatio-temporal trends of an infectious disease outbreak through thematic maps. Based on the experience of the developers and the feedback from users, an evaluation of the usability matrix with the CGDI components (technical standards, national framework data, enabling technologies, and common data policies) was explored using this cross-border health mapping application. The use of the CGDI in health applications has a great potential in supporting effective and secure health data sharing and integration. Enrichment of the CGDI would further facilitate the data sharing and improve decision making efficiency and effectiveness.


Introduction to the Canadian Geospatial Data Infrastructure (CGDI)
Spatial data have a reference to geographic locations in space, which helps in the understanding of the "where" problem, e.g., the spatial location and area of some features, and the spatial distribution and correlation of some phenomena. According to the United States General Accounting Office, almost 80 percent of all government information has a geospatial context (GAO, 2003). Analyzing government information with spatial data has shown great prospects in many areas such as emergency management, human health and environment, disaster relief, transportation, land information systems, etc. Thus, it could be quite useful to integrate spatial data for decision making processes in public health practice.
The complexity of spatial data, the diversity and heterogeneity of spatial sources and spatial data formats create barriers for users of the CGDI in the public health domain. With the rapid development of geospatial science and Web-based technology, it is now possible to share geospatial information through a distributed network. The CGDI is a framework that facilitates the sharing of Canada's spatial data through the information highway. Since spatial data are collected by different levels of government and organizations, housing all the spatial data in a central data warehouse would be too costly and risky. The CGDI does not host all the spatial information in a central data warehouse, but attempts to create an interoperable infrastructure that allows various communities to share geospatial information. The CGDI is composed of four key components: technical standards, national framework data, common data policies, and enabling technologies (GeoConnections, 2006a). Technical standards guide the sharing of location based information in an interoperable way. National framework data is the base component of the CGDI, and it is integrated from different providers. Enabling technologies are used to develop online applications based on the endorsed standards. Common data policies are agreed by various agencies to reduce data duplication and support data sharing. The vision of the CGDI is "to enable access to authoritative and comprehensive sources of Canadian geospatial information to support decision making" (GeoConnections, 2005).

Health mapping and geospatial aspects
Since Dr. John Snow combined geospatial information to analyze cholera deaths about 150 years ago (Mcleod, 2000), integrating disease studies with the geographical aspect has received great attention. Cliff and Hagget (1988) illustrated atlas of disease distribution (such as respiratory tuberculosis, malaria and measles) in analyzing the epidemiological data. The geographical understanding and exploring of diseases are very useful with current outbreaks of HIV/AIDS and SARS (Gould, 1993;Banos and Lacasa, 2007). Geographical studies in health can deal with many factors such as determining the disease distribution, spatial and temporal clustering, spatial and temporal trends, spatio-temporal disease modeling, and analyzing health facility capacity.
1) Disease mapping can represent disease incidences using locations, classify disease information into different levels, or display disease distribution information with charts. Choropleth maps are usually used to depict patterns of disease rates, and spatial continuity is assumed to generate smooth maps (Boulos, 2004).
2) The excess of cases in space (a geographic cluster), in time (a temporal cluster), or in both space and time is called a cluster (Boulos, 2004). Spatial clustering helps in the detection of prevalence regions of the disease. Many spatial clustering algorithms have been implemented so far, such as the Geographical Analysis Machine method (Openshaw et al., 1987) and the spatial scan statistic method (Kulldorff, 1997), while others are for areal data like the cluster detection based on Geary's c and Moran's I methods. Temporal clustering aids in understanding how the disease emerges in time. Spatio-temporal clustering is a challenge as it integrates the space dimension with the time dimension, and many knowledge discovery and data mining methods have been applied to it (Neill et al., 2005).
3) Analyzing spatio-temporal trends can explain how the peak of a disease moves from one region to another through time. Generally, two methods are used in visualizing the spatio-temporal trend of a disease (Cromley and McLafferty, 2002). One is to use map sequences, a series of maps showing the disease distribution at different time points. The other way uses animation technology, with visualized maps of a disease as it passes through a certain time interval. Ogao (2006) mentioned three types of animation methods: passive, interactive, and inference-based animations according to the levels of interactivity and complementary domain knowledge that each of them offers to the user. 4) Spatio-temporal modeling can be used to predict disease outbreaks and the diffusion of a disease. The approaches used in spatio-temporal modeling include stochastic modeling methods, logistic regression methods, Bayesian methods, etc (Kleinn et al., 1999;Yang et al., 2005;Yu and Christakos, 2006). Moreover, some recent studies use artificial intelligence techniques in disease simulation (Yergens et al., 2006). Various kinds of factors can be examined in the disease modeling process, such as Normalized Difference Vegetation Index (NDVI), air pollution, temperature, race and income. 5) Health facility capacity analysis includes applications such as mapping the health service locations and needs, identifying new sites for health facilities and finding the nearest clinic location (Cromley and McLafferty, 2002).

Usability metrics
According to the ISO-9241-11 standard, system usability is measured by "the extent to which the intended goals of use are achieved, the resources that have to be expanded to achieve the intended goals and the extent to which the user finds the use of the product acceptable" (ISO, 1998). Hunter et al. (2003) introduced approximately 40 elements about spatial data usability. Considering the geospatial aspects in health mapping, the important goal is to achieve effective and secure health data sharing. Taking this goal into account, we designed the following nine elements in evaluating the usability of the CGDI (including national framework data, common data policies, technical standards, and enabling technologies) in health mapping.
Cost. Cost means the users' expenses for their applications and plays an important role in the factors of usability. A flexible data sharing network could increase the reuse of data and service, which can reduce the cost of data collection. Finally, the relatively low cost of data accessing is very attractive to users.
Accessibility. Accessibility means the quality of accessing the standards, data, and service. Accessibility determines how users are likely to use the information. Common interfaces and well maintained metadata would facilitate the discovery and access of the required data and services.
Response time. In emergencies, timely access to data has received great attention. Processing time and transmission time are the two primary concerns in data dissemination. The increase in computer processing power and the development of optimal algorithms will improve the processing time. The transmission time depends on the network topology, data compression methods, and progressive transmission.
Data quality. Data are likely to be collected by different authorities or organizations, with different levels of resolution. According to ISO 19113 principles, the quality elements of spatial data include completeness, logical accuracy, positional accuracy, temporal accuracy, and thematic accuracy. High resolution data is essential in the modeling and statistical analysis of geospatial health applications.
Reliability. The trust and quality of the data and service access are considered in many applications. Highly availability of the data and services is important in the use of them.
Exchangeability. Exchangeability deals with the quality of the capacity to exchange information. Standards are useful in the exchange of information.
Interoperability. Interoperability is the ability to communicate, execute programs, or transfer data among various functional units, even though the user has little or no knowledge of the unique characteristics of those units (ISO, 1993). Good interoperability ensures that the contents are understandable.
Cartographic Representation. The representations of spatially related information in two to three dimensional maps or graphics can give a vivid way to understand the information.
Security. Security is used to protect the privacy and confidentiality of data and services, and it is a fundamental principle for most applications. While considering the security factor, the efficiency of data access should not be greatly affected.

Standards in the CGDI
To address spatial data sharing and interoperability, several international organizations such as OGC and ISO/TC211 are working on the construction of basic standards and application specifications. The ISO/TC 211 group works more on abstract standards, while OGC concentrates on the implementation specifications. The main standards that the CGDI adopts are from the ISO/TC 211 and OGC. The CGDI-endorsed specifications fall into the following categories : Data representation. Web Map Service (WMS) provides standard interfaces for producing maps (OGC, 2006). Styled Layer Descriptor (SLD) enables userdefined symbolization of geospatial features (OGC, 2005a).
Data accessing. Web Feature Service (WFS) supports feature level geospatial data operation (OGC, 2005b). Web Coverage Service (WCS) provides access to coverage data such as remote sensing images and digital elevation data (OGC, 2006b).
Data manipulation. Web Processing Service (WPS) supports spatially related data processing through the Web (OGC, 2005c).
Data discovery. Geodata discovery service is used for retrieving geospatial data. FGDC CSDGM and ISO 19115 are used as metadata standards.

Architecture design
Since disease outbreaks are usually spatially distributed, using the geographical information framework for the development of Web-based health systems could improve health data sharing, outbreak detection, and disease control. Based on the above mentioned metrics, the purpose of this research is to design an application to apply the CGDI in health mapping. The architecture design uses CGDI-endorsed standards for health data sharing and supporting health decision making. This architecture provides the basic functions for geospatial health applications including thematic mapping, spatio-temporal processing, spatio-temporal trend representation, and health facility distribution.

Figure1: Architecture design
Figure1 shows our designed architecture. Spatially related data can be accessed from the web services provided in the CGDI. In health applications, we can create new web services such as WMS, WFS, WPS, and WCS. These services can be registered to the CGDI. The health portal is used for the service integration map visualization.

Implementation of a health application 4.3.1 Study sites and data description
Experience with infectious disease outbreaks, especially the recent SARS outbreak, has demonstrated the increasing concern with infectious diseases, which needs an international strategy (David, 2003). The Province of New Brunswick (Canada) and the State of Maine (USA) are our study sites, which share a common, highly traveled international border. Since people are more likely to visualize information based on jurisdiction regions, we used the administrative areas as our infectious disease mapping boundaries. Different health organizations or users require different levels of details of health data. Meanwhile, considering the privacy of the health data, certain different health organizations or users can only access and track certain levels of health data. Thus, we chose six level administrative/census areas that cover the entire territory of both sides of the border. The six levels of New Brunswick are Province, Health Region, Census Division, Census Subdivision, Forward Sortation Area, and Dissemination Area. In Maine, the corresponding levels are State, Health Service Area, County, County Subdivision, Zip code, and Census Block Group. The province or state is the top level. The health region / health service area level is the location of the patient's hospital in the classification system. The census division/county level is the joint group of neighboring municipalities merged together for the purpose of regional planning and managing common services (such as police or ambulance service). The census subdivision/county subdivision level is the municipalities or areas treated as municipal equivalents for statistical purposes. The forward sortation area/zip code is assigned to one or more postal zones. The dissemination area/census block group level is the relatively stable geographical unit composed of one or more blocks (the smallest geographical areas for which population and dwelling counts are disseminated).
The data used in both sides include spatial data, census data and patient data of New Brunswick and Maine. These data are acquired from different health departments and web services from the CGDI and the National Spatial Data Infrastructure (NSDI). In addition, simulated influenza outbreak data for 120 days (includes the influenza cases, other data such as grocery retail, grocery supply, fuel retail, fuel supply, school, pharmacy, and hospital bed occupation) based on a 1968 influenza variation are generated for the spatio-temporal analysis. For health mapping, the essential task is the geo-coding process, which locates patient data from the recorded street or postcode. After the geo-coding, it is possible to roll-up the patient data or other data sets through the bottom-up choice using spatial operations to analyze spatial adjacency relationships such as point in polygon, polygon in polygon, etc. This also helps to protect confidential data sets by aggregating patient data to a health region or polygon.

Mapping variables for health data processing
The first step towards the understanding and explanation of any geographical phenomenon is thematic mapping (Benenson and Omer, 2003). For decision making on disease outbreaks, there are many factors that would influence the mapping result, such as identifying population density, health inequalities, racial tendency, environmental pollution, social recognition, economic development, and cultural difference. In this research, we mainly concentrated on the demographic factor and its influence on the disease outbreak. Other factors are not currently integrated, as low frequency values would negatively impact classification methods. We used the following established statistical methods: a) Crude Morbidity Rate (CMR): the total number of incidents relative to the total population in their population group (Eqn. 1). I is the sum of patients for each geo-cell, risk P is the population-at-risk total for each geo-cell, and const ρ is the Population Constant (e.g., 1,000). The purpose of these statistical methods is to provide a processing capacity for data representation that is consistent across temporal and jurisdictional layers. The above twelve statistical values are calculated. These values may be expressed by multi-dimensional vectors: temporal dimensions (e.g., 5-years, annual, seasonal, monthly, weekly, daily), data use dimension (e.g., America or Canada separate data for standardization or both), gender divisions (e.g., male, female, both), age group (e.g., 0-4, … 65+, total), geographic divisions (e.g., Dissemination Areas/Census Block Group, Census Divisions/County, State/Province, etc.), and disease types (e.g., influenza). The calculated values could be used to create classification maps. We could generate a thematic map from selecting one value of each dimension, such as the parameters (Census Division/County level, Year 2002 week 1, age group 65+, Indirect Standardized Morbidity Ratio, male, influenza, and both data used -Maine and New Brunswick), etc. The calculated values could be used to generate pie charts or bar charts, for instance, the three age group distribution of the parameters (Census Division/County level, time Year 2002 week 1, Indirect Standardized Morbidity Ratio, male, influenza, and both data used).

Health mapping results
With the health facility data published by WMS, WFS, or WCS, and the statistical processing functions provided by WPS, health data could be accessed via the Internet. Moreover, the services can be integrated to support health surveillance. Figure 2 shows a map viewer integrating two WMS (hospital distribution) and a WPS (SMR rate at health region level in 1999). Also, the time tag in the service could be used to achieve animated maps. Figure 3 shows the time tags included in WMS maps of simulated data on day 20 of the disease risk level and the hospital bed information. Figure 4 shows the WMS maps of the same data on day 80.

Discussions
Hosted by the Emergency Measures of the Province of New Brunswick, an exercise of "High Tide" enlisted many participants to test the decision making environment within the framework of the CGDI. With the experience from the developers in this health mapping application and the feedback from the users participating in the "High Tide" exercise, we developed a matrix that links usability metrics to the four key components of the CGDI in health mapping, as shown in In the technical standards, the CGDI adopted many international standards in describing, publishing, visualizing, accessing, and manipulating geospatial resources. The standards are highly accessible through the Internet. Meanwhile, the standards are developed version by version with good reliability. In health mapping, sharing the data through the standard interfaces is convenient for data access. CGDI-endorsed standards have been successfully applied to health data mapping as we do in this study. As a result, the low standard development cost for health mapping is possible within the CGDI framework. With these standards, the access of health data could be achieved through standard interfaces, which make the information exchange very easy between different organizations. However, the standards mainly solve the syntactical heterogeneity. To achieve semantic interoperability in health fields still requires the development of geospatial health ontology. As to cartographic representation, CGDIendorsed standards support various representation formats, such as JPG, GIF, PNG, GeoTiff, so the cartographical representation of health data can be done without difficulty. However, thematic mapping support is relatively weak in CGDI-endorsed standards. The SLD standard only supports classification maps, and gives no standard way in generating chart styles. Meanwhile, it is better to develop some thematic mapping standard for health mapping, such as defining some standard symbology or color ramp in describing specific kinds of health information. Moreover, the developments of multi-media standards including sound can support better understanding of social phenomena. In regard to security, currently there are few related standards under the CGDI.
The national data framework is the core of the CGDI. The initiative of the CGDI is to "collect data once, share many times" (GeoConnections, 2006a). It is estimated that up to 80 percent of the cost of geospatial applications is spent on the spatial data collection process. The spatial data collection cost used in health data mapping can be shared with many other departments who use these data, such as forestry departments, agricultural departments, emergency departments, etc. With the shared cost and less redundant work, the data collection cost is relatively low. Through the GeoConnections discovery portal, geospatial data and services can be discovered using keywords, location, and/or theme. The CGDI encourages organizations that are closest to the source to provide the data. This encouragement could provide users with the data in good quality and precision, and eliminate duplication and overlap problems. The precision of spatial data is important for analyzing health information. Geo-coding is often used to map health records to their geographical locations. Spatial data are updated frequently, and methods to obtain the data accurately and timely are beneficial to decision making. Sometimes, different versions of spatial data exist in the CGDI, and the update frequency is also a problem. Both difficulties lead to the reliability problem of data quality. As different laws govern access and use of public health information, the CGDI is not so comprehensive in providing health data. There are also some reliability problems with the CGDI. Although there are lots of geospatial data and services existing in the CGDI, the availability and performance of the data access are unknown. In the health field, the current standards and rules in dealing with spatially visualizing confidential information are seriously limited (Leitner, 2006). Our studies used the statistical and geographical mask with data aggregations to certain levels for visualization to maintain the privacy of health information. When compared with original data, the aggregated results within different spatial resolution might show some differences. With the CGDI-endorsed standard, health data could also be easily shared with different planning or health departments. Health data in the CGDI are exchangeable as shown in our cross border application, since the WMS service is compatible with the American NSDI as shown in our case study. The cartographic representation of the data, which conforms to the technical standards, is satisfying. The SLD standard can solve the possible style problems with data access from different services. In the CGDI, data are stored in a distributed environment rather than a central database, so the security is greatly enhanced to overcome a central database crash.
Presently, the enabling technologies in the CGDI use the distributed serviceoriented architecture for the geospatial domain. The web services technology is mature and is easy to implement. These technologies are highly accessible and reliable in web environments. Most geospatial health information systems have used thin client or thick client architectures, and it is usually difficult to reuse and integrate them. With the adoption of the service-oriented architecture, reusability and integration can be greatly improved. However, the response time of web services is not so satisfactory due to their platform neutral implementation. The semantic based service and data integration is not mature yet; thus, the semantic interoperability still needs development. The use of Web-based technology is acceptable for representation and visualization, but it is not quite suitable for cartography, e.g., printing high quality maps. Cartographical consideration is often overlooked in many Web-based mapping applications. Since the enabling technologies protect security through web secure services, good security could be achieved.
The common policies harmonize the access and use of geospatial information in the CGDI, and they have good accessibility for people to participate and use the CGDI. The policy making process in the CGDI considers the extensibility of the policies and the exchangeability of the policies among other countries as well. For the use of public health information, different jurisdictions have different laws governing access; thus some policies need to be developed for cooperative mechanism in preventing, tracking, and responding to the disease outbreak. Some policies still need to deal with reliability problems, such as whether the services are running or not and whether the data are updated or not. As to security issues, the policies do not mention at which level or type, geospatial data should be kept secure. The policies for public health data are highly valuable because of the confidentiality issues.

Conclusions
Recent disease outbreaks have demonstrated the need for geographic applications in public health. Public health is one of four priority applications at Geoconnections in the development of the CGDI (GeoConnections, 2006b). In this research, we have implemented health data sharing based on the CGDI framework and evaluated the usability of the CGDI in health mapping. This research will foster the use of the CGDI in health studies and the implementation of new web services for public health within the CGDI for online data sharing and access. The information provided by the CGDI will be more comprehensive with the enrichment of health data. Currently, few studies concentrate on the usability of Spatial Data Infrastructures (SDI), and this study might bring a novel approach by using the feedback of developers and users in the evaluation process. The usability metrics we designed here are mainly based on our application. In the future usability evaluation of SDI, more comprehensive and indepth metrics and methodologies should be considered for better evaluation.
The health mapping application based on the CGDI can lower the cost of data sharing, use the standard for data access, and provides real-time map visualization to users. This study shows the high usability of the CGDI in supporting disease management and decision making to local, provincial/state, and national officials, and the public. The quality of the cartographical representation in this application is limited by the capabilities of Web-based GIS, and it still has to be improved to enhance the understanding of disease phenomena by health practitioners and the general public. Our future work will be devoted to advancing the usability of the CGDI in health applications for data sharing.