Abstract
There are always problems in spatial data, which makes spatial data cleaning the most important preparation for data mining. If spatial data are input without cleaning, the subsequent discovery may be unreliable and thus produce inaccurate output knowledge as well as erroneous decision-making results. This chapter discusses the problems that occur in spatial datasets and the various data cleaning techniques to remediate these problems. Spatial observation errors mainly include stochastic errors, systematic errors, and gross errors, such as incompleteness, inaccuracy, repetitiveness, inconsistency, and deformation in spatial datasets from multiple sources with heterogeneous characteristics. Classical stochastic error models are further categorized as indirect adjustment, condition adjustment, indirect adjustment with conditions, condition adjustment with parameters, and condition adjustment with conditions. All of these models are considered parameters as non-random variables in a generalized error model. By selecting weight for iteration, Li Deren made the assumption of two multi-dimensional alternates of Gauss-Markov when he established the distinction and reliability theory of adjustment. As a result, Li extended Baarda theory into multiple dimensions and realized the unification of robust estimation and least squares.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Burrough PA, Frank AU (eds) (1996) Geographic objects with indeterminate boundaries. Taylor and Francis, Basingstoke
Clark CF (1997) Evaluating the uncertainty of area estimates derived from fuzzy land-cover classification. Photogram Eng Remote Sens 63:403–414
Dasu T (2003) Exploratory data mining and data cleaning. Wiley, New York
Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) (1996) Advances in knowledge discovery and data mining. AAAI/MIT, Menlo Park, pp 1–30
Goodchild MF (2007) Citizens as voluntary sensors: spatial data infrastructure in the world of Web 2.0. Int J Spat Data Infrastruct Res 2:24–32
Hernà ndez MA, Stolfo SJ (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Disc 2:1–31
Inmon WH (2005) Building the data warehouse, 4th edn. Wiley, New York
Kim W et al (2003) A taxonomy of dirty data. Data Min Knowl Disc 7:81–99
Koperski K (1999) A progressive refinement approach to spatial data mining. Ph.D. Thesis, Simon Fraser University, British Columbia
Li DR, Yuan XX (2002) Error handling and reliability theory. Wuhan University Press, Wuhan
Shi WZ, Fisher PF, Goodchild MF (eds) (2002) Spatial data quality. Taylor & Francis, London
Smets P (1996) Imperfect information: imprecision and uncertainty. Uncertainty management in information systems. Kluwer Academic Publishers, London
Smithson MJ (1989) Ignorance and uncertainty: emerging paradigms. Springer, New York
Wang XZ (2002) Parameter estimation with nonlinear model. Wuhan University Press, Wuhan
Wang ZZ (2007) Sequence to principles of photogrammetry. Wuhan University Press, Wuhan
Wang SL, Shi WZ (2012) Chapter 5 data mining, knowledge discovery. In: Kresse W, Danko D (eds) Handbook of geographic information. Springer, Berlin, pp 123–142
Wang SL, Wang XZ, Shi WZ (2002) Spatial data cleaning. In: Zhang S, Yang Q, Zhang C (eds) Proceedings of the first international workshop on data cleaning and preprocessing, Maebashi TERRSA, Maebashi City, 9–12 Dec, pp 88–98
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, D., Wang, S., Li, D. (2015). Spatial Data Cleaning. In: Spatial Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48538-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-662-48538-5_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48536-1
Online ISBN: 978-3-662-48538-5
eBook Packages: Computer ScienceComputer Science (R0)