Skip to main content

Advertisement

Log in

A dissimilarity function for geospatial polygons

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Similarity plays an important role in many data mining tasks and information retrieval processes. Most of the supervised, semi-supervised, and unsupervised learning algorithms depend on using a dissimilarity function that measures the pair-wise similarity between the objects within the dataset. However, traditionally most of the similarity functions fail to adequately treat all the spatial attributes of the geospatial polygons due to the incomplete quantitative representation of structural and topological information contained within the polygonal datasets. In this paper, we propose a new dissimilarity function known as the polygonal dissimilarity function (PDF) that comprehensively integrates both the spatial and the non-spatial attributes of a polygon to specifically consider the density, distribution, and topological relationships that exist within the polygonal datasets. We represent a polygon as a set of intrinsic spatial attributes, extrinsic spatial attributes, and non-spatial attributes. Using this representation of the polygons, PDF is defined as a weighted function of the distance between two polygons in the different attribute spaces. In order to evaluate our dissimilarity function, we compare and contrast it with other distance functions proposed in the literature that work with both spatial and non-spatial attributes. In addition, we specifically investigate the effectiveness of our dissimilarity function in a clustering application using a partitional clustering technique (e.g. \(k\)-medoids) using two characteristically different sets of data: (a) Irregular geometric shapes determined by natural processes, i.e., watersheds and (b) semi-regular geometric shapes determined by human experts, i.e., counties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Arkhangel’skii AV, Pontryagin LS (1990) General topology I: basic concepts and constructions dimension theory. Encyclopaedia of Mathematical Sciences, Springer, Berlin

    Book  Google Scholar 

  2. Balcan M-F, Blum A, Vempala S (2009) Clustering via similarity functions: theoretical foundations and algorithms, Vol. 5. Citeseer, Available from http://citeseerx.ist.psu.edu

  3. Bilenko M (2003) Learnable similarity functions and their applications to record linkage and clustering, appears as technical report, UT-AI-TR-03-305

  4. Bonchi F, Gionis A, Ukkonen A (2011) Overlapping correlation clustering. In: 2011 IEEE 11th international conference on data mining (ICDM), Barcelona, Spain, pp 51–60.

  5. Buchin K, Buchin M, Wenk C (2006) Computing the Fréchet distance between simple polygons in polynomial time. In Proceedings of the Twenty-Second Annual Symposium on Computational Geometry (Sedona, Arizona, USA, June 05–07 2006) SCG ’06. ACM, New York, NY, pp 80–87

  6. Chen Y, Garcia EK, Gupta MR, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776

    MATH  MathSciNet  Google Scholar 

  7. Clark PJ, Evans FC (1954) Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology 35(4):445–453

    Article  Google Scholar 

  8. Cliff AD et al (1975) Elements of spatial structure. A quantitative approach. Cambridge University Press, Cambridge

    Google Scholar 

  9. Dobkin DP, Kirkpatrick DG (1985) A Linear algorithm for determining the separation of convex polyhedra. J Algorithm 6:381–392

    Article  MATH  MathSciNet  Google Scholar 

  10. Donnelly KP (1978) Simulations to determine the variance and edge effect of total nearest neighbourhood distance. In: Hodder I (ed) Simulation methods in archeology. Cambridge University Press, Cambridge, pp 91–95

    Google Scholar 

  11. Egenhofer MJ, Sharma J, Mark DM (1993) A critical comparison of the 4-intersection and 9-intersection models for spatial relations: formal analysis. Auto-Carto 11:1–11

    Google Scholar 

  12. Egenhofer MJ, Franzosa R (1994) On the equivalence of topological relations. Int J Geogr Inf Syst 9(2):133–152

    Google Scholar 

  13. Egenhofer MJ, Mark DM (1995) Modeling conceptual neighborhoods of topological line-region relations. Int J Geogr Inf Syst 9(5):555–565

    Google Scholar 

  14. Egenhofer MJ, Clementini E, Felice PD (1994) Topological relations between regions with holes. Int J Geogr Inf Syst 129–144

  15. Ester M, Frommelt A, Kriegel H-P, Sander J (1998) Algorithms for characterization and trend detection in spatial databases. Published in proceedings of 4th international conference on knowledge discovery and data mining (KDD-98)

  16. Furlong K, Gleditsch NP (2003) Geographic opportunity and neomalthusian willingness: boundaries, shared rivers, and conflict. The Joint Sessions of Workshops European Consortium for Political Research, Edinburgh, UK

  17. Gardoll SJ, Groves DI, Knox-Robinson CM, Yun GY, Elliott N (2000) Developing the tools for geological shape analysis, with regional- to local-scale examples from the Kalgoorlie Terrane of Western Australia. Aust J Earth Sci 47(5):943–953

    Article  Google Scholar 

  18. Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Fransisco

    Google Scholar 

  19. Huang H, He Q, Chiew K, Qian F, Ma L (2012) CLOVER: a faster prior-free approach to rare-category detection. Knowl Inf Syst doi:10.1007/s10115-012-0530-9

  20. Huchtemann D, Frondel M (2010) Increasing the efficiency of transboundary water management: a regionalization approach. J Water Resource Prot 2(6):501–506

    Google Scholar 

  21. Jiao L, Liu Y (2008) Knowledge discovery by spatial clustering based on self-organizing feature map and a composite distance measure. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B2

  22. Joshi D, Soh L, Samal AK (to appear). Spatio-Temporal Polygonal Clustering with Space and Time as First-Class Citizens, accepted to Geoinformatica

  23. Joshi D, Soh L, Samal AK (to appear) Redistricting Using Constrained Polygonal Clustering, accepted to IEEE Transactions on Knowledge and Data Engineering

  24. Joshi D, Samal AK, Soh L (2009a) Density-based clustering of polygons. IEEE symposium series on computational intelligence and data mining, Nashville, TN, pp 171–178

  25. Joshi D, Samal AK, Soh L (2009b) A dissimilarity function for clustering geospatial polygons. In: 17th international conference on advances in geographic information systems (ACM SIGSPATIAL GIS 2009). Seattle, WA, pp 384–387

  26. Joshi D, Soh L, Samal AK (2009c) Redistricting using heuristic-based polygonal clustering. In: IEEE International Conference on Data Mining. Miami, FL, pp 830–836

  27. Lance GN, Williams WT (1967) Mixed-data classificatory programs I. Agglomerative systems. Aust Comput J 1:15–20

    Google Scholar 

  28. Luxberg UV (2004) Statistical Learning with similarity and dissimilarity functions. Ph. D. thesis, Technische. Universität Berlin, Berlin

  29. Mandl T (1998) Learning similarity functions in information retrieval. Zimmermann, Hans-Jürgen (ed.): EUFIT ‘98. 6th European congress on intelligent techniques and soft computing. Aachen, Germany, 8.-10.9.1998. S. 771–775

  30. Morvant E, Habrard A, Ayache S (2012) Parsimonious unsupervised and semi-supervised domain adaptation with good similarity functions, Knowledge and Information Systems (5 July 2012), pp 1–41. doi:10.1007/s10115-012-0516-7

  31. Perruchet C (1983) Constrained agglomerative hierarchical classification. Patt Recognit 16(2):213–217

    Article  Google Scholar 

  32. Rao A, Srinivas V (2005) Regionalization of watersheds by hybrid-cluster analysis. J Hydrol 318(1):37–56

    Google Scholar 

  33. Rakthanmanon T, Keogh EJ, Lonardi S, Evans S (2012) MDL-based time series clustering. Knowl Inf Syst. doi:10.1007/s10115-012-0508-7

  34. Rote G (1991) Computing the minimum Hausdorff distance between two point sets on a line under translation. Inf Process Lett 38(3):123–127

    Google Scholar 

  35. Shapiro L, Stockman G (2001) Computer Vision. Prentice Hall, Englewood Cliffs

    Google Scholar 

  36. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a dataset via the Gap Statistic. J Royal Stat Soc (Series B), 63(2):411–423

    Google Scholar 

  37. Tombari F, Mattoccia S, Stefano LD (2009) Full-search-equivalent pattern matching with incremental dissimilarity approximations. IEEE Trans Patt Anal Mach Intell 31(1):129–141

    Google Scholar 

  38. Turi R, Ray S (1998) \(K\)-means clustering for colour image segmentation with automatic detection of \(k\). In: Proceedings of international conference on signal and image processing. Las Vegas, Nevada, pp 345–349

  39. Wang L, Yang C, Feng J (2007) On learning with dissimilarity functions. In: Proceedings of the 24th international conference on machine learning. Corvallis, OR

  40. Webster R, Burrough PA (1972) Computer-based soil mapping of small areas from sample data. J Soil Sci 23(2):210–234

    Article  Google Scholar 

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grants No. 0219970 and 0535255. We would like to thank David Marx for his valuable insight and feedback. We would also like to extend our thanks to Lei Fu, Bill Waltman, and Tao Hong for their assistance in data processing and preparation for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepti Joshi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Joshi, D., Soh, LK., Samal, A. et al. A dissimilarity function for geospatial polygons. Knowl Inf Syst 41, 153–188 (2014). https://doi.org/10.1007/s10115-013-0666-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0666-2

Keywords

Navigation