Skip to main content
Log in

A constrainedk-means clustering algorithm for classifying spatial units

  • Statistical Applications
  • Published:
Statistical Methods and Applications Aims and scope Submit manuscript

Abstract

In some classification problems it may be important to impose constraints on the set of allowable solutions. In particular, in regional taxonomy, urban and regional studies often try to segment a set of territorial data in homogenous groups with respect to a set of socio-economic variables taking into account, at the same time, contiguous neighbourhoods. The objects in a class are thus required not only to be similar to one another but also to be part of a spatially contiguous set. The rationale behind this is that if a spatially varying phenomenon influences the objects, as could occur in the case of geographical units, and this spatial information were ignored in constructing the classes then it would be less likely to be detected. In this paper a constrained version of thek-means clustering method (MacQueen, 1967; Ball and Hall, 1967) and a new algorithm for devising such a procedure are proposed; the latter is based on the efficient algorithm proposed by Hartigan and Wong (1979). This algorithm has proved its usefulness in zoning two large regions in Italy (Calabria and Puglia).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anania G, Cersosimo D, Costanzo GD (2001) Le Calabrie contemporanee. Un'analisi delle caratteristiche degli ambiti economico produttivi sub-regionali. In: Scelte pubbliche, strategie private e sviluppo economico in Calabria. Conoscere per Decidere, Rubbettino, Soveria Mannelli, 333–380

  • Ball GH, Hall DJ (1967) A clustering technique for summarizing multivariate data. Behavioural, Science12, 153–155

    Google Scholar 

  • Batagelj V (1984) Agglomerative methods in clustering with constrains. Preprint Series Dept. Math. Univ. Ljublijana22 (102), 5–19

    Google Scholar 

  • Caliñski T, Harabasz J (1974) A dendrite method for cluster analysis. Communications in Statistics3, 1–27

    Article  MATH  Google Scholar 

  • Christofides N (1975) Graph Theory. Academic Press, London.

    MATH  Google Scholar 

  • Cressie NAC (1993) Statistics for spatial data. Wiley, New York

    MATH  Google Scholar 

  • De Soete G, DeSarbo WS, Furnas GW, Carrol JD (1984) The estimation of ultrametric and path trees from rectangular proximity data. Psychometrika49, 289–310

    Article  Google Scholar 

  • De Soete G, Carrol JD (1994)K-means clustering in a low-dimensional Euclidean space. In: Diday E et al. (eds.) New approaches in classification and data analysis, pp. 212–219. Springer, Berlin Heidelberg New York

    Google Scholar 

  • DeSarbo WS, Mahajan V (1984) Constrained classification: the use of a priori information in cluster analysis. Psychometrika49, 187–215

    Article  MATH  Google Scholar 

  • Ferligoj A, Batagelj V (1982) Clustering with relational constraint. Psychometrika47, 413–426

    Article  MATH  MathSciNet  Google Scholar 

  • Ferligoj A, Batagelj V (1983) Some types of clustering with relational constraint. Psychometrika48, 541–522.

    Article  MATH  MathSciNet  Google Scholar 

  • Ferligoj A, Batagelj V (1992) Direct multicriteria clustering algorithms. Journal of Classification9 (1), 43–61

    MATH  MathSciNet  Google Scholar 

  • Ferligoj A, Batagelj, V (1998) Constrained clustering problems. In: Proceedings of IFCS '98, Rome, 541–522

  • Ferligoj A, Batagelj V (2000). Clustering relational data. In: Gaul W, Opitz O., Schader M (eds.) Data analysis, Springer, Berlin heidelberg New York, 3–15

    Google Scholar 

  • Gordon AD (1973) Classifications in the presence of constraints. Biometrics29, 821–827

    Article  Google Scholar 

  • Gordon AD (1980) Methods of constrained classification. In: Tomassone R (ed.) Analyse de données et informatique. (INRIA, Le Chesnay), 149–160.

    Google Scholar 

  • Gordon AD (1999) Classification. Chapmann & Hall, London

    MATH  Google Scholar 

  • Gordon AD (1987) Parsimonious trees. Journal of Classification4, 85–101

    Article  MATH  Google Scholar 

  • Gordon AD (1996) A survey of constrained classification. Computational Statistics & Data Analysis21, 17–29

    Article  MATH  MathSciNet  Google Scholar 

  • Gordon AD (1996) (a). How many clusters? An Investigation of five procedures for detecting nested cluster structure. In: Hayashi C et al. (eds.) Data science, classification, and related methods. Berlin Heidelberg New York, Springer, 109–116

    Google Scholar 

  • Gordon AD, Vichi M (2001) Fuzzy partition models for fitting a set of partitions.Psychometrika 66, 229–248

    Article  MathSciNet  Google Scholar 

  • Harary F (1969) Graph theory Addison-Wesley, Reading, MA

    Google Scholar 

  • Hartigan JA (1975) Clustering algorithms Wiley, New York

    MATH  Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm AS 136: Ak-means clustering algorithm. Applied Statistics28 (1), 100–108

    Article  MATH  Google Scholar 

  • Hubert LJ (1974) Some applications of graph theory to clustering. Psychometrika39 (3), 283–308

    Article  MATH  MathSciNet  Google Scholar 

  • Lebart L (1978) Programme d'agrégation avec contraintes. Le Cahiers de l'Analyse des Données3, 275–287

    Google Scholar 

  • Lechevallier Y (1980) Classification sous contraintes. In: Diday E et al. (eds.) Optimisation en classification automatique INRIA, Paris, 677–696

    Google Scholar 

  • Lefkovitch LP (1980) Conditional clustering. Biometrics36, 43–58

    Article  MATH  Google Scholar 

  • Legendre P (1987) Constrained clustering. In: Legendre P et al. (eds.) Developments in numerical ecology. Springer, Berlin Heidelberg New York

    Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: LeCam LM et al. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematic, Statistics and Probability, vol. 1, Statistics, University of California Press, Berkeley, CA, 281–298

    Google Scholar 

  • Maravalle M, Simeone B, Naldini, R (1997). Clustering on trees. Computational Statistics & Data Analysis24, 217–234

    Article  MATH  MathSciNet  Google Scholar 

  • Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika50, 159–179

    Article  Google Scholar 

  • Mills G (1967) The determination of local government boundaries. Operational Research Quarterly18, 243–255

    Article  Google Scholar 

  • Monestiez P (1977) Méthode de classification automatique sous contraintes spatiales. Statistique et Analyse des Données3, 75–84

    Google Scholar 

  • Murtagh F (1985) A survey of algorithms for contiguity-constrained clustering and related problems. Computer Journal28, 82–88

    Article  Google Scholar 

  • Openshaw S (1977) A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling. Transaction of the Institute of British Geographers52, 247–258

    Google Scholar 

  • Seber GAF (1984) Multivariate observations, Wiley, New York

    MATH  Google Scholar 

  • Späth H (1980) Cluster analysis algorithms. Ellis Horwood, Chichester

    MATH  Google Scholar 

  • Taylor PJ (1973) Some implications of the spatial organizations of elections. Transaction of the Institute of British Geographers60, 121–136

    Article  Google Scholar 

  • Upton G, Fingleton B (1985) Spatial data analysis by example, vol. 1, Wiley, New York

    Google Scholar 

  • Vicari D (1990) Indici per la scelta del numero dei gruppi. Metron49, 473–492

    Google Scholar 

  • Webster R (1977) Quantitative and numerical methods in soil classification and survey. Clarendon Press, Oxford New York

    Google Scholar 

  • Wilson RJ (1996) Introduction to graph theory. Addison Wesley Longman, England

    MATH  Google Scholar 

  • Zani S (1993) Classificazione di unità territoriali e spaziali. In: Zani S (ed.) Metodi statistici per le analisi territoriali. Franco Angeli, Milano, 93–121

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Damiana Costanzo, G. A constrainedk-means clustering algorithm for classifying spatial units. Statistical Methods & Applications 10, 237–256 (2001). https://doi.org/10.1007/BF02511650

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02511650

Key words

Navigation