A framework for the modelling of uncertainty between remote sensing and geographic information systems
Introduction
Good science requires statements of accuracy by which the reliability of results can be understood and communicated. Where accuracy is known objectively, then it can be expressed as error, where it is not, the term uncertainty applies (Hunter and Goodchild, 1993). Thus, uncertainty covers a broader range of doubt or inconsistency, and in the context of this paper, includes error as a component. The understanding of uncertainty as it exists in geographic data remains a problem that is only partly solved Story and Congleton, 1986, Goodchild and Gopal, 1989, Veregin, 1995, Ruiz, 1997, Worboys, 1998. However, without quantification, the reliability of any results produced remains problematic to assess and difficult to communicate to the user. A geographic information system (GIS) provides a whole series of tools with which data can be manipulated, without offering any control over misuse. To quote Openshaw et al. (1991):
A GIS gives the user complete freedom to combine, overlay and analyse data from many different sources, regardless of scale, accuracy, resolution and quality of the original map documents and without any regard for the accuracy characteristics of the data themselves.
This is a serious issue; without quantification of uncertainty, the results themselves may only be considered as qualitative information, and this greatly devalues their merit in both a scientific and a practical sense. To compound the problem, in the fusion of activities from remote sensing and GIS, an integrated approach to managing geographic information is required. This must necessarily support many different types of data (Ehlers et al., 1991), gathered according to different models of geographic space (Goodchild, 1992), each possessing different types of inherent errors and uncertainties (Chrisman, 1991). As well as providing individual support for these different models of space, it is necessary to explicitly include methods to keep track of uncertainty, as data is changed from low level forms (such as remotely sensed image data) to the higher level abstractions required by cartography and GIS (such as distinct objects and themes). It is particularly toward the propagation of uncertainty through different conceptual models that this paper is oriented.
Whether a particular data set can be considered suitable for a given task depends on many different criteria, and despite the fact that various aspects of uncertainty can be measured objectively, their importance will be largely determined by the current task. Our overall goal with modelling uncertainty is therefore threefold: (i) to produce a statement of uncertainty to be associated with each data set so that an objective statement of reliability may be reported, (ii) to develop methods to propagate uncertainty as the data is processed and transformed, and (iii) to ultimately determine the suitability of a data set for a given task (‘fitness for use’). Another valid goal, not considered here, is to communicate uncertainty information to the user (e.g. Hunter and Goodchild, 1996).
A good deal of research effort has been directed towards the study of uncertainty in geographic data. A useful framework, recognising the separate error components of value, space, time, consistency and completeness was proposed by Sinton (1978) and later embellished by Chrisman (1991). However, uncertainty in geographic data can be described in a variety of alternative ways, such as those provided by Bedard (1987), Miller et al. (1989) and Veregin (1989). Although different, these approaches all have a number of aspects in common, including the observation that uncertainty itself occurs at different levels of abstraction. For example, positional and temporal errors describe uncertainty in a metric sense within a space–time framework, whereas completeness and consistency represent more abstract concepts describing coverage and reliability, and are consequently more problematic to describe.
Work to date on uncertainty addresses the inherent errors present within specific types of data structure (e.g. raster or vector) or data models (e.g. field, object). The affects of combining data layers together within these various paradigms have been studied by Veregin, 1989, Veregin, 1995, Openshaw et al. (1991), Goodchild et al. (1992), Heuvelink and Burrough (1993), Ehlers and Shi (1997) and Leung and Yan (1998). Scant attention has so far been given to the problem of modelling uncertainty as the data is transformed through different models of geographic space, a notable exception is the work of Lunetta et al. (1991). A typical path taken by data captured by satellite, then abstracted into a suitable form for GIS is shown in Fig. 1, and involves four such models. Continuously varying fields are quantised by the sensing device into image form, then classified and finally transformed into discrete mapping objects. The overall object extraction process is sometimes referred to as semantic abstraction (Waterfeld and Schek, 1992) due to the increasing semantic content of the data as it is manipulated into forms that are easier for people to work with.
When transforming data between different conceptual models of geographic space, the uncertainty characteristics in the data may change; in that techniques used to transform the data also alter the inherent uncertainty and in addition may introduce further uncertainty of their own. Furthermore, many of the abstraction techniques employed combine data with different uncertainty characteristics, for example multi-temporal image classification (Jeon and Landgrebe, 1992) or knowledge-based feature extraction (McKeown, 1987). Consequently, two inter-related problems must be addressed, namely:
- 1.
How do the uncertainty characteristics of data change as data are transformed between models?
- 2.
How do the transformation methods used affect and combine the uncertainty present in the data (Lodwick et al., 1990)?
One of the consequences of the separation of GIS and RS activities into separate communities and separate software environments is that there is an artificial barrier between the two disciplines. Thus, the integration of these two branches of science is to some extent an artificial problem. As a result, there is no easy flow of meta-data between systems, inter-operability is often restricted to the exchange of image files or object geometry and the problem of managing uncertainty is compounded.
The four stages shown in Fig. 1 represent four models of geographic space that are considered here. They are termed the field (F), image (I), thematic (T) and object (or feature) (O) models and are typical (though not exhaustive) of those used in the integration of GIS and remote sensing activities. These models represent the conceptual properties of the data only and are considered here as independent from any particular data structure that might be used to encode and organise the data.
The description of uncertainty used here follows that proposed by Sinton (1978). It covers the sources of error as they occur in remote sensing and GIS integration (although other approaches may be equally valid). Uncertainty is restricted to the following properties: (i) value (including measurement and label errors), (ii) spatial, (iii) temporal, (iv) consistency and (v) completeness. These are symbolised below using the following five letters of the Greek alphabet (α, β, χ, δ and ε, respectively). Of these, α, β and χ can be applied either individually to a single datum or to any set of data. The latter two properties of consistency and completeness can only apply to a defined data set since they are comparative (either internally among data or to some external framework). Issues regarding scale and resolution of the data (e.g. Bruegger, 1995) are postponed; these are not objective uncertainty criteria, but instead become important when assessing the ‘fitness for use’ of data for a specific task. Also, we do not concern ourselves here with structures for the provision of lineage information (e.g. Lanter, 1991), although these are certainly required to support the propagation of uncertainty.
The techniques proposed for quantifying error and uncertainty referred to in this paper are not in themselves new. The purpose of the formal notation developed later is to describe succinctly and unambiguously the forms of uncertainty that exist in the data, where they originate from and what they affect. Ongoing work by the authors and others is attempting to quantify some of these uncertainty terms as they relate to the integration of GIS and remote sensing, for example ISPRS Commissions 2, 3 and 4 span the full range of integration activities described here (see: http://www.isprs.org/technical_commissions.html for more details).
Section snippets
Description of data model properties and their uncertainty characteristics
More formally, a single geographic datum (a) at any level of abstraction can be described by its value (d), spatial extents (s) and temporal extents (t) (Gahegan, 1996) each of which have associated uncertainty u; where u has three components, α, β and χ
The above expression assumes that α, β and χ may be orthogonalised. As will be shown later, it is often difficult to treat these components separately, since each may affect the others. A similar formulation, using upper case
The transformation process between geographic models
This section describes the three transformation processes between geographic models, field→image, image→theme and theme→object and their effects on the data characteristics introduced above.
Discussion, conclusions and future work
The expressions described above seem to offer some insight into the complex transformation processes that occur within integrated GIS, particularly with reference to keeping track of the sources of errors and uncertainties and their effects. Table 1 categorises these errors and uncertainties as they apply to the four geographic models considered here.
Uncertainty propagates from left to right across the table as the higher models inherit uncertainty properties from the lower models and add to
References (41)
Geographical data modelling
Comput. Geosci.
(1992)Performance characterisation in computer vision
CVGIP-Image Understanding
(1994)Computation with imprecise geospatial data
Comput., Environ. Urban Syst.
(1998)Uncertainties in land information databases
Theory for the integration of scale and representation formats: major concepts and practical implications
Fuzzy mathematical methods for soil survey and land evaluation
J. Soil Sci.
(1989)The error component in spatial data
- et al.
Modelling uncertainty in photo-interpreted boundaries
Photogramm. Eng. Remote Sens.
(1996) - et al.
Integration of remote sensing and GIS: data and data access
Photogramm. Eng. Remote Sens.
(1991) - et al.
Error modelling for integrated GIS
Cartographica
(1997)
Specifying the transformations within and between geographic data models
Trans. GIS
A model to support the integration of image understanding techniques within a GIS
Photogramm. Eng. Remote Sens.
Recent developments towards integrating scene understanding within a geographic information system for agricultural applications
Trans. GIS
Development and test of an error model for categorical data
Int. J. Geogr. Inf. Syst.
The use of contextual information in the classification of remotely sensed data
Photogramm. Eng. Remote Sens.
Propagation of errors in spatial modelling with GIS
Int. J. Geogr. Inf. Syst.
Error propagation in cartographic modelling using Boolean logic and continuous classification
Int. J. Geogr. Inf. Syst.
Mapping uncertainty in spatial databases, putting theory into practice
J. Urban Reg. Inf. Syst. Assoc.
Communicating uncertainty in spatial databases
Trans. GIS
Cited by (76)
A review of machine learning in processing remote sensing data for mineral exploration
2022, Remote Sensing of EnvironmentPer-pixel land cover accuracy prediction: A random forest-based method with limited reference sample data
2021, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :In fact, such a PLCA map provides spatially-explicit accuracy information and aims to increase the value of an accuracy assessment procedure. The PLCA brings great opportunities for end-users of LC maps to consider this information more appropriately in future decision making, as well as for producers to improve quality of the produced LC maps (Comber et al., 2012; Gahegan and Ehlers, 2000; Zhang et al., 2018). So far, different approaches have been adopted to predict PLCA.
Decadal trends in mangrove and pond aquaculture cover on Hainan (China) since 1966: mangrove loss, fragmentation and associated biogeochemical changes
2020, Estuarine, Coastal and Shelf ScienceCitation Excerpt :Also the conversion rate of 52% in WWE was still higher than the average conversion rate of mangrove forest to pond aquaculture of 30% in South East Asia from 2000 to 2012 (Richards and Friess, 2016). Transformation of spatial data into GIS products has the potential to introduce different sources of uncertainty (Gahegan and Ehlers, 2000). The types of uncertainty most relevant to the object extraction method by point estimates used here are incorrect assignment of object type, object shape error and consistency of object formation during the extraction procedure, as well as completeness, consistency and precision (resolution) of the images.
Geovisualizing attribute uncertainty of interval and ratio variables: A framework and an implementation for vector data
2018, Journal of Visual Languages and ComputingHierarchical semantic cognition for urban functional zones with VHR satellite images and POI data
2017, ISPRS Journal of Photogrammetry and Remote SensingPredicting individual pixel error in remote sensing soft classification
2017, Remote Sensing of Environment