Skip to main content

Spatial Data Cleaning

  • Chapter
  • First Online:
Spatial Data Mining
  • 2919 Accesses

Abstract

There are always problems in spatial data, which makes spatial data cleaning the most important preparation for data mining. If spatial data are input without cleaning, the subsequent discovery may be unreliable and thus produce inaccurate output knowledge as well as erroneous decision-making results. This chapter discusses the problems that occur in spatial datasets and the various data cleaning techniques to remediate these problems. Spatial observation errors mainly include stochastic errors, systematic errors, and gross errors, such as incompleteness, inaccuracy, repetitiveness, inconsistency, and deformation in spatial datasets from multiple sources with heterogeneous characteristics. Classical stochastic error models are further categorized as indirect adjustment, condition adjustment, indirect adjustment with conditions, condition adjustment with parameters, and condition adjustment with conditions. All of these models are considered parameters as non-random variables in a generalized error model. By selecting weight for iteration, Li Deren made the assumption of two multi-dimensional alternates of Gauss-Markov when he established the distinction and reliability theory of adjustment. As a result, Li extended Baarda theory into multiple dimensions and realized the unification of robust estimation and least squares.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Burrough PA, Frank AU (eds) (1996) Geographic objects with indeterminate boundaries. Taylor and Francis, Basingstoke

    Google Scholar 

  • Clark CF (1997) Evaluating the uncertainty of area estimates derived from fuzzy land-cover classification. Photogram Eng Remote Sens 63:403–414

    Google Scholar 

  • Dasu T (2003) Exploratory data mining and data cleaning. Wiley, New York

    Book  MATH  Google Scholar 

  • Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) (1996) Advances in knowledge discovery and data mining. AAAI/MIT, Menlo Park, pp 1–30

    Google Scholar 

  • Goodchild MF (2007) Citizens as voluntary sensors: spatial data infrastructure in the world of Web 2.0. Int J Spat Data Infrastruct Res 2:24–32

    Google Scholar 

  • Hernàndez MA, Stolfo SJ (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Disc 2:1–31

    Article  Google Scholar 

  • Inmon WH (2005) Building the data warehouse, 4th edn. Wiley, New York

    Google Scholar 

  • Kim W et al (2003) A taxonomy of dirty data. Data Min Knowl Disc 7:81–99

    Article  MathSciNet  Google Scholar 

  • Koperski K (1999) A progressive refinement approach to spatial data mining. Ph.D. Thesis, Simon Fraser University, British Columbia

    Google Scholar 

  • Li DR, Yuan XX (2002) Error handling and reliability theory. Wuhan University Press, Wuhan

    Google Scholar 

  • Shi WZ, Fisher PF, Goodchild MF (eds) (2002) Spatial data quality. Taylor & Francis, London

    Google Scholar 

  • Smets P (1996) Imperfect information: imprecision and uncertainty. Uncertainty management in information systems. Kluwer Academic Publishers, London

    Google Scholar 

  • Smithson MJ (1989) Ignorance and uncertainty: emerging paradigms. Springer, New York

    Book  Google Scholar 

  • Wang XZ (2002) Parameter estimation with nonlinear model. Wuhan University Press, Wuhan

    Google Scholar 

  • Wang ZZ (2007) Sequence to principles of photogrammetry. Wuhan University Press, Wuhan

    Google Scholar 

  • Wang SL, Shi WZ (2012) Chapter 5 data mining, knowledge discovery. In: Kresse W, Danko D (eds) Handbook of geographic information. Springer, Berlin, pp 123–142

    Google Scholar 

  • Wang SL, Wang XZ, Shi WZ (2002) Spatial data cleaning. In: Zhang S, Yang Q, Zhang C (eds) Proceedings of the first international workshop on data cleaning and preprocessing, Maebashi TERRSA, Maebashi City, 9–12 Dec, pp 88–98

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deren Li .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, D., Wang, S., Li, D. (2015). Spatial Data Cleaning. In: Spatial Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48538-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48538-5_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48536-1

  • Online ISBN: 978-3-662-48538-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics