Improving dissolve spatial operations in a simple feature model

https://doi.org/10.1016/j.advengsoft.2008.03.014Get rights and content

Abstract

This paper presents an algorithm to improve the performance of a spatial operation called ‘dissolve’ widely used in Geographic Information System (GIS) through spatial database systems. In simple feature models (lacking of persistent topology) executing some common spatial operations requires a high amount of system resources. Such common operations occur for example in the ‘OpenGIS Simple Features for SQL’ protocol (SFS), a client-server interoperability standard defined by ‘The Open Geospatial Consortium, Inc.’ (OGC). The specific spatial operation studied in this paper is called ‘dissolve’. It is carried out using the union spatial operator defined by OGC) and consists of removing the boundaries between adjacent polygons. The proposed algorithm improves substantially the performance of this spatial operation and it needs between 100 and 1000 times less amount of resources. This way it enables the database server to carry out this spatial operation on huge datasets containing up to millions of geometries. To check and to validate this algorithm a new open source software package (PGAT) has been developed.

Introduction

The use of spatial databases in Geographic Information System (GIS) like Oracle Spatial1 or PostGIS2 has increased substantially in the recent years. One of the reason of this behavior has been the adjustment of these systems to well-known standard protocols defined by the ‘Open Geospatial Consortium, Inc.’ (OGC) like the ‘OpenGIS Simple Features for SQL’ (SFS) [9]. This implementation specification defines interfaces that enable transparent access to geographic data held in heterogeneous processing systems on distributed computing platforms using the SQL language. When the geographic objects are stored using a simple feature model (SFM) the geometries do not share arc or nodes [8], that is, they do not hold the topology spatial relationships in a persistent way [4]. The SFM is not a best choice for operations taking into account relationships between features (such as spatial relations, topological predicates) [6], in fact, some of the spatial operations defined in the SFS specifications do not work well because they do not consider the spatial relationships between different features.

The motivation of this work is to get an algorithm that can work in a proper way with medium and huge datasets especially performing spatial operations as removing boundaries between adjacent polygons. This way, the institutions (especially public institutions which might be more interested in using open source software) [2] can use the open source spatial databases and analyze the geographic information even if it is made up of millions of geometries. So far, this could not be possible using free software and/or standard protocols (SFS and other OGC protocols). The aim of this research is to make it possible.

One of the operators defined by SFS that does not work in a proper way is the spatial operator union defined according to the OGC as “Union (anotherGeometry: Geometry): Geometry – Returns a geometric object that represents the point set union of this geometric object with anotherGeometry” (Fig. 1) [10]. This spatial operator is used to remove the boundaries between adjacent geometries. It can be applied to polygons, arcs and points features. Despite the fact that the standard name of this spatial operator (according to the OGC) is called ‘union’, in GIS terminology the resulting operation of applying this operator is commonly known as ‘dissolve’.

It is necessary to say that in any moment we are talking about an overlapping function but some readers can get confused because the OGC ‘union’ spatial operator has the same function name that the GIS ‘union’ overlapping operation.

The dissolve spatial operation is a common useful operation in GIS [3]. Take for example a layer containing urban areas: obtaining the block boundaries starting from information about lots requires carrying out this spatial operation by grouping the polygons contained in each block [7] (obviously the lots spatial table does not contain any attribute column with information about the corresponding blocks). As it is described in the next section this spatial operation does not have an obvious solution in a simple feature model because the spatial database does not know which lots belong to each block unlike a GIS with persistent topology [3], [12]. In other words the spatial database does not contain any information about what the disjointed polygons are.

To improve the performance of the dissolve spatial operation we need to collect the spatial relationship grouping of the disjointed polygons. According to OGC the spatial operator union can be used for joining (dissolving) two features. The spatial databases like Oracle Spatial or PostGIS define a SQL aggregate operator based on the union operator. This aggregate function enables these databases to join more than two features [13]. For example to perform a dissolve operation in the whole layer lots the SQL sentence is

INSERT INTO “public”. “blocks” (“geom”) SELECT multi (geomunion (“geom”)) FROM “public”. “lots”.

This SQL aggregate (called geomunion in PostGIS and sdo_aggr_union in Oracle Spatial) works in the following way: in a first step it joins the first two geometries (A, B) to obtain just one geometry (c), then it joins this new polygon (c) with a third geometry (C) to obtain a new polygon again (d). The process is repeated as many times as geometries are stored in the spatial table. This way the new geometries obtained are bigger than the previous ones. The process uses an increasing amount of computing resources (memory, time) in each iteration. The final result is a huge geometry (multi polygon in this case). Even though the source spatial table contains just a few thousand of geometries, this final geometry could be made up of millions of vertexes stored in just one row in the spatial table. The resulting geometry is very complex, thus, of limited usefulness for carrying out other spatial operations. Furthermore, the use of a spatial index in subsequent operations does not make help because the table has just one row.

To test the performance of the dissolve operations a computer with the following characteristics was used: Pentium Dual Core 2 1600 MHz with 1 Gb Ram, running Open Suse Linux 10.2, PostgreSQL 8.1 and PostGIS 1.2.

Section snippets

Approaching the problem

Fig. 3 charts the time to dissolve a spatial table corresponding to a real cadastral dataset like the one showed in Fig. 2. The tests have been carried out only with 10,000 geometries in order to limit the resources needed for the computation. As it is pointed out in Fig. 3 (non-fragmented), PostGIS takes around 1600 s (almost half an hour) just for dissolving 10,000 polygons (lots). The resulting spatial table contains only one row. This geometry is a complex multipolygon made up of more than

Solution

As the reader can notice, this article does not talk about how to deal with the object attributes during the dissolve process. Actually it does not offer any difficulty and it is completely solved just using the aggregate and statistic SQL standard functions. The software package developed to test this algorithm considers all of these options (see the bottom of the screen capture in Fig. 8). Consequently the rest of the article the dissolve process referrers just to the geometry component.

Fig. 4

Experiments

To obtain reliable conclusions and make an exhaustive analysis some tools have been developed under an open source package called PGAT [5]. This software package has been developed by the authors of this paper. To apply the algorithm showed in this paper this package creates spatial datasets simulating the structure of spatial clustered polygons according to the user defined parameters. Then the designed algorithm is applied and the new dissolved layers can be displayed using PGAT.

Conclusions and future work

The authors have designed and evaluated an algorithm for dissolving polygons that uses much less resources than the current approaches, e.g., SQL aggregate dividing the spatial table into several groups. For dissolving a spatial table made of 100,000 geometries our algorithm requires 200 MB, whereas 1800 MB are needed for the fragmentation algorithm. Our tests have been made comparing the proposed algorithm with the fragmented one that is already an improvement of using only one aggregate

Acknowledgements

This project has been developed in the University of Victoria (British Columbia, Canada) thanks to the Grant awarded by “La Secretaria de Estado de Universidades e Investigacion del Ministerio de Educacion y Ciencia” from Spain (Ref. 2006-0264).

References (16)

  • W. Aref et al.

    SP-Gist: an extensible database index for supporting space partitioning trees

    Journal of Intelligent Information Systems

    (2001)
  • Coll E, et al. Information and management in local administration. Research Project BIA2003-07914 sponsored by the...
  • B. Davis

    GIS: a visual approach

    (2001)
  • Galdi D. Spatial data storage and topology in the redesigned MAF/TIGER system. US Census Bureau. Geography division....
  • Martinez-Llario J. PGAT open source software. Available online at <http://sourceforge.net/projects/pgat>;...
  • Oosterom P, et al. The balance between geometry and topology. In: Proceedings of 10th international symposium on...
  • P. Oosterom et al.

    Spatial data management on a very large cadastral database

    Comput Environ Urban Syst

    (2001)
  • Oosterom P, Verbree E. Storing and manipulating simple and complex features in database management systems. In:...
There are more references available in the full text version of this article.

Cited by (3)

  • Design of a Java spatial extension for relational databases

    2011, Journal of Systems and Software
    Citation Excerpt :

    Jaspa is born in this context, and its main goal is to fill that gap. The state of the art shows that Oracle Spatial and PostGIS are the most consolidated spatial extensions for RDBMSs (Martinez-Llario et al., 2009). We will focus subsequently on the latter as it has been the spatial extension that leads the Open source world and is the main reference for Jaspa.

  • Union algorithm for polygon set based on multi-level grid

    2014, Journal of Zhejiang University, Science Edition
View full text