A comparison of two methods for classifying trajectories: a case study on neighbourhood poverty at the intra-metropolitan level in Montreal

In recent years several studies have examined changes in the distribution of poverty in the North American cities, with most empirical work assessing neighbourhood change between two time points. This paper aims to make a methodological contribution to the study of neighbourhood change, by comparing two classification methods, one classical (k-means clustering) the other more novel (Latent Class Growth Modelling; LCGM) to identify groups of census tracts having followed similar trajectories of poverty in the Montreal metropolitan area, Canada. Here trajectories of poverty are measured over a twenty year period, using five time points. The relative performance of the LCGM vs. the k-means clustering was assessed using a series of multinomial logistic regressions examining how different socioeconomic variables were associated with the trajectories of poverty. Results showed that k-means and LCGM identified similar groups of census tracts characterised by ascending, descending, or stable poverty levels throughout the period, with LGCM only marginally outperforming k-means clustering.


Introduction
In the past decades, rapid urban expansion in conjunction with a growing income gap (Heisz et al., 2006) and structural forces such as economic restructuring, demographic shifts, as well as changes in government policies have impacted on the socioeconomic divisions in North American cities (Kitchen et al., 2009;Walks, 2001), as well in European cities (Van Kempen et al., 2009).Hence the spatial dimension of urban and neighbourhood change in cities of developed countries is receiving increasing attention (Kitchen et al., 2009).To date, most of the empirical work has examined neighbourhood change between two time points (see for example the recent works of Mikelbank, 2006;Reibel et al., 2011;Vicino, 2008).Except the recent work of Mikelbank (2011) on the Cleveland-Akron metropolitan area, and the one of Apparicio et al. (2012) on Montreal, few studies have identified trajectories of change in socioeconomic conditions, allowing the direction and magnitude of trends to vary between more than two time points, and how this change plays out at the intra-metropolitan level.Yet processes of gentrification (Berry, 1986;Bunting et al., 1988;Clark, 1987;Ley, 1986) and the impoverishment of inner suburbs documented in more recent work (Cooke et al., 2006;Jargowsky, 2003;Lee et al., 2007;McConville et al., 2003;Short et al., 2007) are indicative of important transformations in the urban social geography, and are important to grasp in their dimensions.
In a recent paper (Apparicio et al., 2012), we use a new clustering technique for longitudinal data -latent class growth model (LCGM) -to identify trajectories of neighbourhood poverty in Montreal over five consecutive census years (1986 to 2006).Then, we conduct a multinomial logistic regression analysis for modelling the trajectories obtained by the LCGM.
Although based on the same spatial dataset of that previous study, our objective in this paper is quite different.While the classification methods such as k-means clustering and hierarchical cluster analysis are largely used in quantitative geography, the LCGM approach is still littleknown by many geographers.Consequently, the purpose of this study is to compare LCGM versus the k-means clustering to assess which method performs better in identifying trajectories of neighbourhood change.The potential benefits of the LCGM in quantitative research in urban geography are discussed as well.Indeed, this approach for classifying spatial longitudinal data can be applied to analyze the neighbourhood changes such as poverty, immigration, ethnic or social diversity, unemployment, etc.
For illustrative purposes, we analyze the geography of low-income population as measured at five time points over a 20-year period in Montreal, i.e. the trajectories of relative poverty at the neighbourhood level.In practical terms, the paper aims to identify groups of neighbourhoods characterized by a similar pattern of change in their poverty levels.The present study therefore attempts primarily to make a methodological contribution to the study of neighbourhood change.

Background Studying neighbourhood poverty change
Several studies concerned with changes in high-poverty neighbourhood, at least in the USA, have focussed on the growth of these neighbourhoods in suburban areas, and especially in innerring suburbs.Some studies aim to identify the evolution in the distribution of poverty zones in metropolitan areas, notably by opposing the central city to inner-ring suburbs (Cooke et al., 2006;Lee, 2011;Lee et al., 2007).Other research, such as the study by McConville and Ong (2003), is interested by the trajectories of poor neighbourhood, i.e. whether they stayed poor, worsen or improved over time, in relation to change in other neighbourhood conditions, e.g.ethnic, immigration, education, employment and family profiles.
In Canada, few studies have examined transformations of poverty neighbourhood.In 2000, Ley and Smith (2000) noted the changing nature of some census tracts in Toronto, Montreal and Vancouver over a twenty year period, observing that deprived neighbourhoods in 1971 were not necessarily deprived by 1991 and inversely, none-deprived neighbourhood in 1971 could have become deprived by 1991.Their observations were based on several cumulative indicators associated with deprivation (measured using thresholds) measured first in 1971 and then in 1991.
Using census data, Heisz and McLeod (2006) showed that both the proportion of low-income neighbourhoods and their spatial distribution across the different Canadian metropolitan areas varied between 1981 and 2001.For example, they observed that, compared to 1981, low-income neighbourhoods in Montreal and Toronto were less likely to be located in inner city neighbourhoods by 2001 and more in inner-ring suburban areas.Although this study by Heisz and McLeod identified broad trends for each metropolitan area, the authors did not identify neighbourhood trajectories per se.
A study by Kitchen and Williams (2009) set in Saskatoon, a Canadian metropolitan area of moderate size, analyzed neighbourhood change between 1991 and 2001, considering two 'subperiods', 1991-1996 and 1996-2001.Their analyses considered the socioeconomic profile of 58 neighbourhoods at the beginning of the period (1991), classifying them as low, middle or high socioeconomic status (SES) neighbourhoods.Neighbourhoods were then characterised as following three possible trajectories of change in SES between 1991and 2001, i.e. decline, improvement or stability. Kitchen and Williams study (2009) is interesting as it considered socioeconomic conditions at the beginning of the study period and their change over a 10 year period.Yet ten years of observation might be too short to identify important changes in urban processes such as filtering down (out-migration of households to newer and more elaborate dwellings and in-migration of households of lesser wealth and lower social status), gentrification, and the suburbanization of poverty.
Our study proposes to analyse trajectories of neighbourhood poverty on a longer 20-year period.As Kitchen and Williams (2009), we consider poverty levels at the beginning of the period, i.e. 1986, but also neighbourhood poverty levels in 1991, 1996, 2001 and 2006.Thus, over time, a neighbourhood could be characterised by an ascending trajectory, then a descending trajectory, followed by a stable level of poverty, to a declining slope at the end of the study period.At each time point poverty is measured as a continuous variable.This will allow us to identify trajectories with more precision and to identify an optimal number of trajectories.Statistically, this will be achieved by applying and comparing two clustering techniques to group neighbourhoods having followed similar trajectories of change in poverty levels.Hence the objective of this paper is essentially a methodological one, i.e. to identify a clustering method most suited to measure neighbourhood change.We will see later that these approaches allow maximising variation between trajectories and minimising variation within trajectories.This step is crucial if the aim is to identify the determinants of neighbourhood change and to measure their relative importance.This paper builds on previous work by defining trajectories of poverty using five time points, allowing for the magnitude and direction change in poverty to vary at each time point.
This precision, however, comes with methodological challenges, including the construction of a longitudinal database at the intra-metropolitan scale (i.e. at the census tract level) with comparable and harmonised socioeconomic data and geographical boundaries across several census years.Another methodological challenge is to find the most accurate approach to group neighbourhoods characterized by a similar evolution of their poor population over time, with each group (i.e.trajectory) being most different amongst themselves.In the next section, we discuss two possible techniques.
Identifying trajectories of neighbourhood change at the intrametropolitan level: the possibilities provided by k-means clustering and latent class growth modelling Modelling social change has mainly been tackled from time-series and econometric perspectives, namely to study economic cycles and changes in labour markets at broad geographic levels.It is also possible to envision trajectories of change in terms of 'clusters' of areas having followed a similar pattern of change along a variable (or variables) of interest over time.To date unsupervised clustering techniques such as hierarchical cluster analysis (HCA) and k-means clustering, or extensions of k-means (eg.fuzzy k-means (Friedman et al., 1998), partitioning around the medoid (Kaufman et al., 2005); see Jain (2007) for a description and a comparison of several extensions of the k-means clustering), mainstreamed in urban geography and social sciences, have mainly been applied to cluster spatial 'objects' with similar characteristics at one point in time (see for example Mikelbank, 2004;Vicino et al., 2011) and less often at two points in times (Reibel et al., 2007(Reibel et al., , 2011;;Vicino, 2008).
However, it is possible to apply these 'classical' clustering techniques to a longitudinal dataset to identify neighbourhood trajectories.For example, Mikelbank (2011) applied an HCA to group census tracts along several demographic, housing and socioeconomic variables extracted from four censuses (1970,1980,1990,2000).These variables were standardized (zscores) at each time point, and were then appended in one table, on which an HCA was computed.This allowed identifying five types of neighbourhoods over the period: struggling, struggling African American, stability, new starts, and suburbia.Thus, a census tract could belong to the same neighborhood type during the four census years, or it could change type one or more times.Finally, Mikelbank (2011) built several transition tables to identify stability or change in neighbourhood trajectories across two years or all the time periods.
Whereas it is possible to apply classical clustering techniques to longitudinal spatial datasets, new semi-parametric analytical procedures have recently been developed to groups 'objects' having followed similar trends over time, namely Latent Class Growth Models (LCGM) (Nagin, 2005).To date, LCGM has mainly been applied in psychology (Nagin, 2005) and epidemiology, for example, to group individuals with similar trajectories of change in health-related behaviours (Barnett et al., 2008;Brookmeyer et al., 2009).To our knowledge, it has been applied to spatial data to examine trajectories of change within a metropolitan area or a country in only three studies (Apparicio et al., 2012;Pearson et al., 2013;Riva et al., 2012).
K-means and LCGM are statistical clustering techniques that can be applied to classify objects (i.e.census tracts) into k number of clusters (i.e.trajectories) with similar change in a variable (i.e.poverty) over time.K-means is an exploratory statistical clustering technique that uses an allocation/re-allocation algorithm to optimally reassign census tracts to the nearest cluster centroid (Everitt et al., 2001).The goal is to maximize between clusters variations and to minimize within cluster variations, and thus to group into k types of local areas having followed similar trajectories of change in poverty over the study period.Some statistical software, e.g.SAS, optimizes the choice of initial cluster centres; thus, the random selection of cluster centres, potentially leading to different solution when the model is re-run, is no longer being a 'problem'.Compared to HCA algorithms, the number of k must be chosen a priori in k-means clustering and there are several methods to identify the optimal cluster solution (Milligan et al., 1985) including, amongst others: the Pseudo-F statistic (Calińskia et al., 2007); the Cubic Clustering Criterion (Sarle, 1983); or a more discursive method (Tibshirani et al., 2001) consisting of plotting the average distance to cluster centroid for each cluster solution and visually identifying the optimal cluster solution where there is a natural levelling off in the distribution (the 'elbow criterion').
LCGM is a semi-parametric approach to classification (Andruff et al., 2009;Collins et al., 2009;Duncan et al., 2009).Although each census tract will follow a unique trajectory of changing poverty levels, the heterogeneity in the distribution of census tracts is summarized by a finite set of polynomial functions, each corresponding to a discrete class or trajectory (Andruff et al., 2009;Nagin, 2005).Because the magnitude and direction of change can freely vary between trajectories, a set of model parameters, i.e. intercept and slopes, is estimated for each trajectory (Andruff et al., 2009;Nagin, 2005).For each trajectory, the slope and intercept are treated as fixed (equal) between census tracts.In LCGM, the optimal number of classes is informed by a built-up modelling approach whereby the modelling starts with a one-class model, and classes are subsequently added to evaluate improvement in model fit.The model providing the best fit to the data is identified by interpreting and comparing several diagnostic tools, including the model with the lowest Bayesian Information Criterion (BIC) and posterior probabilities of group membership (a maximum-probability assignment rule is used to assign each individual to the trajectory to which it holds the highest posterior membership probability) (Andruff et al., 2009).LCGM is now relatively easy to apply in software such as SAS (ProcTRAJ) (Jones et al., 2007), Mplus (Muthén & Muthén), and LatentGOLD (Statistical Innovations).The main differences between k-means and LCGM are summarised in Table 1 (adapted from LatentGold website: http://www.statisticalinnovations.com/articles/kmeans2a.htm).
The underlying principles of k-means and LCGM are thus different: one is a descriptive/ exploratory technique whereas the other adopts a semi-parametric approach to classification.It is also worth noting that K-Means requires continuous or dichotomous variables while LCGM can be applied to any type of distribution (continuous, ordinal, nominal, count and binomial).In this study, the variable used for the classification -i.e. the location quotient-is continuous.Yet no studies have compared how the methods fare in generating groups of spatial units having followed similar trajectories of change.

Table 1. Differences between k-means clustering and latent class growth modelling K-means LCGM Approaches to classification
Use of allocation/re-allocation algorithm (and ad-hoc distance measure) to optimally reassign objects to nearest cluster centre.
Probability-based method of classification also producing information on misclassification of object into clusters.

Identification of number of optimal clusters
Lowest average distance to cluster centre; cubic clustering criterion; pseudo-F statistic.
Various modelled-based diagnostics such as the BIC statistic and the posterior probabilities of group membership.

Types of variables and standardization
Interval scale and dichotomous variables for which Euclidean distance measures can be calculated.Standardization of variables is recommended.
Continuous, categorical (nominal or ordinal), counts variables or any combination of these.Standardization of variables is not necessary.

Covariates can be used to maximise the classification
Objectives of the study Aiming to better characterize trajectories of neighbourhood poverty change, the purpose of this study is to apply and compare two clustering techniques to 20 years of census data (five time points) to identify groups of neighbourhoods having followed similar trajectory of poverty between 1986 and 2006.We apply both k-means and LCGM techniques to assess which method performs better in identifying trajectories of poverty.The selection of the most accurate classification represents a crucial step before developing explanatory models of socioeconomic changes operating in metropolitan areas.
Several studies have demonstrated that the clustering accuracy of the k-means is superior to that of HCA, especially when computed on large data sets (see for example Abbas, 2008).In addition, results of HCA vary according to the distance metric (Euclidian distance, squared Euclidean distance, Mahalanobis distance, etc.) and the cluster method (single, complete, median, centroid linkages, Ward's method, etc.).To prevent comparing results of LCGM models to several variants of the HCA, we opted for k-means clustering.

Study area and data
This study is set in Canada, in the Montreal Census Metropolitan Area (CMA) comprising a population of about 3.64 million inhabitants spread over a territory of 4,259 km 2 in 2006 (Statistics Canada, 2007).Intra-metropolitan areas are defined using census tract boundaries.Administrative and census geographical boundaries in the Montreal CMA have changed considerably between 1986 and 2006: the number of census tracts increased from 698 to 825 over this period.Harmonisation of the geographic boundaries of census tracts was therefore necessary.Starting with the 1986 census tracts geographic boundaries (the earliest time point), this was achieved by aggregating contiguous census tracts in order to obtain comparable boundaries across all census years.A total of 611 census tracts was obtained.Relative poverty was measured every five years between 1986 and 2006 using data from the Canadian Census using the 'low-income cut-offs' variable calculated by Statistics Canada.This variable corresponds to the income level at which a household spend 20% or more of its income (before tax) on the basics (i.e.food, shelter and clothing) than does the average household of similar size (Statistics Canada, 2011).
This measure is the only one in the Canadian census that allows identifying low-income persons or families at a small geographical scale, e.g.census tracts (Apparicio et al., 2007;Séguin et al., 2012).Because comparing 'crude' poverty levels rates between census tracts and across time might be influenced by the changing economy (i.e.periods of recession or economic prosperity), poverty was modelled as a 'location quotient' so that, at each time point, the poverty rate of each census tract was divided by the rate observed for CMA as a whole; we are thus using a measure of relative poverty concentration.The proportion of low-income population in the CMA at each time point is showed in Table 2.The location quotient gives an overview of how, at any time, local poverty levels compare to the CMA average.This measure of concentration is largely used in urban and regional studies (Mikelbank, 2006;Shearmur et al., 2008;Shearmur et al., 2009;Vicino et al., 2011;Walks et al., 2008), and is computed as follow: Where: xi= low income population in private households the census tract i; ti= total persons in private households in the census tract i; X= low income population in private households in the CMA; T= total persons in private households in the CMA.
A location quotient greater than 1 indicates a concentration of poverty (i.e. a percentage of low-income population higher than that of the CMA) whereas a value below 1 indicates an underrepresentation of poverty (i.e. a percentage of low-income population lower than that of the CMA).

LCGM and k-means clustering methods to identify trajectories of relative poverty
Generation of trajectories of neighbourhood poverty was first conducted using LCGM, as this technique provides diagnostic statistics regarding the optimal cluster solution.We set as an initial criterion that each cluster/trajectory needed a minimum of 5% of the 611 census tracts, so a minimum of 30 census tracts per trajectory.This was set to ensure a minimum of observations per trajectory in later validation stage (e.g. and in line with minimum requirement of observation for regression analysis).As we had no a priori for the optimal number of classes, LCGM was conducted for 5 to 20 classes; a minimum of 5 classes was set in order to have a minimum of differentiation between groups of census tracts.The optimal cluster solution is identified by: 1) a minimum of 5% of census tracts per trajectories; and 2) the lowest BIC value.Analyses were conducted in LatentGOLD software (Statistical Innovations).
K-means clustering was conducted in SAS 9.2 (SAS Institute Inc), specifying again 5 to 20 clusters.The value of the average distance to cluster centroid for each cluster solution was plotted to identify a 'natural break' in the distribution, indicating the optimal cluster solution.In the end, the choice of the optimal number of cluster solutions was informed by the LCGM solution providing the best fit to the data.

Statistical analyses
To assess the relative performance of the LCGM vs. the k-means clustering in identifying trajectories of relative poverty, two sets of analyses were conducted.First, in a multinomial logistic regression (MLR), the variables used for the classification (i.e. the location quotients from 1986 to 2006) were modelled as predictors of the trajectories (trajectories are modelled as a categorical dependent variable).This approach is some form of discriminant analysis, used to test the performance of different methods of classification (see for example Magidson et al., 2002) or different numbers of cluster solutions.The idea here is to use the R-Square and model fit statistics from this analysis to inform which of the k-means and LCGM cluster solutions better summarizes the variation in poverty concentration.
A second series of MLR was then conducted to empirically examine how a set of predictors theoretically associated with poverty explain each trajectory: unemployment rate, single parent families (%), one-person households (%), the elderly (≥ 65 years) (%), recent immigrants (%), low education population (%), university education (%) and renters (%) (see Table 2 for a description of the values taken by the predictors between 1986 and 2006 for the study region).These variables were retained because recent studies have demonstrated that they are strongly associated with the spatial distribution of poverty across the Montreal CMA at the census tract level (Apparicio et al., 2007;Séguin et al., 2012).In separate MLR models, these predictors were modelled at the start of the period, i.e. 1986, at end, i.e. in 2006, and as variation between 1986 and 2006 (e.g. unemployment rate 2006 -unemployment rate 1986).A final model including baseline predictors and variation between 1986 and 2006 was run.For each model, the focus is on the strength (R-Square) and the fit (Akaike Information Criterion [AIC] and Bayesian Information Criterion [BIC]; lowest value of the AIC and BIC are indicative of better model fit) of the model.

Results
Location quotients of the low-income population for the five census years at the census tract level (CT) are mapped in Figure 2. As reported in previous studies (Apparicio et al., 2007;Séguin et al., 2012), CTs displaying a concentration of poverty are mainly located in the central part of the Island of Montreal, corresponding to inner-city neighbourhoods, whereas CTs characterized by an under-representation of poverty are observed in Laval and the North and South shores, corresponding to suburban areas.Over the 1986-2006 period, the presence of poverty in many central CTs (inner-city neighbourhoods) became less pronounced, while poverty gained grounds outside the central CTs on the island of Montreal in areas urbanized during the 1950's and 1960's.Some areas of concentrated poverty located in the northern periphery of the CMA disappeared over the period under study (these are old village centres in municipalities that have witnessed the arrival of new, wealthier populations).Optimal cluster solution for both the LCGM and k-means clustering are presented in Figure 3.According to the LCGM analysis, the 611 census tracts were optimally classified in eight trajectories, identified by the 'cluster solution' insuring a minimum of 5% census tracts per trajectory and with the lowest BIC values.The eight-cluster solution also appeared to fit the data well in the k-means analysis.However, this cluster solution lead to one group of areas comprising less than 5% of census tracts (n=26, 4.3%).Nonetheless, as we elected to use the LCGM model with the best fit to inform the selection of the optimal number of clusters, analyses are conducted on the eight-cluster solution for both the LCGM and k-means.
In order to facilitate their analysis, trajectories obtained from the k-means and LCGM were plotted using the 'gravity centres' of classes i.e. the mean values of the location quotients for the different classes (Figure 3); the graph easily characterizes each trajectory.In addition, the cartography of census tracts according to their poverty trajectory is shown in the Figure 4.As shown in Figure 4, both methods of classification have allowed to identify groups of census tracts having followed trajectories of increasing or decreasing levels of poverty.This indicates changes in concentration of low-income population in some neighbourhoods in the Montreal CMA.However, for a majority of census tracts, relative poverty levels did not change considerably over the period, as indicated by groups of census tracts with relatively "stable" trajectories throughout the period.Table 3 illustrates how the 611 census tracts were cross-classified in different trajectories, as per the k-means and LCGM cluster solutions; a rapid inspection of the Table 3 and of the Figure 5 shows that both methods produced relatively similar cluster solutions.The upward trend in poverty concentration in trajectory B is steeper in the LCGM output than in the k-means.Whereas there appears to be two trajectories of decreasing poverty concentration in k-means (D1 and D2), these appeared to be 'summed' under one trajectory in LCGM (trajectory D).In addition, LCGM further differentiate between groups of low poverty concentration.Although the LCGM and k-means cluster solutions do not exactly replicate each other, there are some similarities between the results of the two approaches of classification.For example, all observations in LCGM trajectory D are distributed across k-means trajectories D1 (82.2%) and D2 (17.8%).Likewise, all observations in k-means trajectory G are distributed among trajectories G (70.4%) and H (29.6%) of LCGM.When these results are mapped (Figure 4), discrepancies in classification of census tracts in the k-means and LCGM trajectories appear minimal.However, a clear distinction is visible in lower levels of concentration of poverty as illustrated by darker shades of blue in some of the suburbs located in the North and South shores of Montreal Island (i.e regions of Laurentides, Lanaudière and Montégérie).In both k-means and LCGM, trajectories A and B capture the traditional poverty of inner city areas located in the central city of Montreal (districts of Ville-Marie, Mercier-Hochelaga-Maisonneuve, Villeray-Saint-Michel-Parc-Extension, Côte-des-Neiges-Notre-Dame-de-Grâce, Le Sud-Ouest); these areas are characterized by trajectories of high concentration of low-income population.It is worth noting that trajectory B is increasing during the period, indicating that in this group of census tracts, concentration of poverty has increased between 1986 and 2006.Census tracts in trajectories C and E (k-means and LCGM) are characterized by increasing poverty concentration over the period; they are mainly located in inner suburbs (municipalities of Saint-Laurent, Saint-Léonard.Anjou, Montréal-Est, Montréal-Nord, LaSalle), characterized by null concentration of poverty in 1986, but increasing over the period.This supports other studies reporting a suburbanization of poverty in some North American metropolitan areas (Cooke et al., 2006;Lee, 2011;Madden, 2003).Trajectories D1 and D2 produced by k-means and trajectory D produced by LCGM correspond to gentrifying census tracts located in inner city areas, mostly in the districts of Plateau-Mont-Royal and Rosemont-La Petite-Patrie (Bourne, 1993;Ley, 1988;Rose, 1984;Walks et al., 2008).These groups of census tracts are characterized by decreasing concentration of poverty over the past 20 years.Finally, trajectory F and G in k-means and F, G and H in LCGM  Results of multinomial logistic regressions further inform on which of the k-means and LCGM performs better in identifying trajectories of neighbourhood change.To do so, we only present fit statistics from MLR models (R-Square, max-rescaled R-Square, AIC and BIC); results are reported in the Table 4. Model A examines how the location quotients measured at the five time points (and used for the classification) predict 'belonging' to a trajectory.Results show that LCGM very marginally outperformed k-means as shown by lowest AIC and BIC values.This indicates that LCGM might be more accurate in identifying groups of census tracts having followed similar trajectories of change over the period.
When other predictors of poverty were used to predict trajectories (models B to E), LCGM again seems to perform better than k-means.Yet the difference between models is again very modest.Indeed, as shown in Table 4, the R-Square of multinomial regression are very strong for both the k-means and LCGM models, given that the predictors of the models are known to be associated with poverty (excessive multicolinearity was not a problem in our models, as indicated by variance inflation factor values lower than 10 not reported here).R-Square values are essentially the same in all models (for example, R-Squares for model A are 0.915 for kmeans and 0.925 for LCGM).However, except for the model D, all values of AIC and BIC are lower for the LCGM clustering.

Discussion and conclusion
The aim of this paper was to compare two clustering techniques, k-means clustering and latent class growth models, in order to better characterize trajectories of neighbourhood poverty over a period of twenty years.Thus, the main contribution of this paper to the study of neighbourhood change is methodological, contributing novel applications of statistical clustering techniques to longitudinal data at the area level.Findings showed that both k-means and LCGM techniques were successful in grouping census tracts characterized by similar trajectories of change in poverty levels between 1986 and 2006.As observed in other studies, different trajectories of neighbourhood change in the Montreal CMA were captured, including trajectories of high concentration of low-income population in inner city areas; suburban areas characterized by increasing concentration of poverty over the twenty-year period.
These results supports other studies reporting a suburbanization of poverty in some North American metropolitan areas (Cooke et al., 2006;Madden, 2003); the gentrification of certain central areas (Heisz et al., 2006;Ley, 1986Ley, , 1988;;Walks et al., 2008) along the increase of poverty concentration in other parts of the central city during the last twenty years (Bourne, 1993;Madden, 2003).However, most census tracts are characterized by somewhat stable poverty levels over the period, either in terms of stable but high levels of poverty concentration, or stable but low representation of poverty.Yet, most studies of neighbourhood trajectories are mainly concerned with change, i.e. either increase or decrease in poverty levels.Nevertheless, our study shows that some areas have been persistently experiencing high levels of poverty between 1986 and 2006, whereas in others, poverty levels are continuously lower compared to the CMA as a whole.
Overall, the results of the two cluster solutions were relatively similar.However, we found some discrepancies between the maps of trajectories of relative deprivation obtained by k-means and LCGM methods.For the census tracts located in the suburbs, the LCGM solution identified three stable trajectories with low poverty levels against two for the k-means.In the central areas, the LCGM solution grouped into one trajectory 45 census tracts characterized by a decline of poverty concentration as a result of a gentrification process.Nonetheless, these 45 census tracts are divided into two trajectories of decreasing poverty concentration in k-means solution.In other words, the LCGM cluster solution provided more details in the suburbs while the k-means provided more details in central areas.
Finally, results of the multinomial logistic regressions showed that LCGM marginally outperformed k-means clustering.Since nowadays the LCGM method is more complex to implement than k-means clustering, the k-means still appears to be a robust and relevant approach to estimate trajectories of neighbourhood change where the clustering variable is measured continuously (e.g.location quotients).It should be recalled, however, that the k-means clustering could only be applied on continuous or binary variables whereas LCGM allows the identification of trajectories using variables with a range of different distributions (continuous, ordinal, nominal, count and binomial).Moreover, with the LCGM, it is also possible to maximize trajectories as a function of external variables (such as the predictors examined here for their association with the trajectories).Consequently, the LCGM approach is promising given its broader application.For instance, LCGM could be used to identify trajectories of neighbourhood change based on a count variable or a qualitative variable.Future studies are needed to test applications of LCGM to model neighbourhood change.

Figure 3 .
Figure 3. Cluster solutions identified by LCGM and k-means.

Figure 4 .
Figure 4. Trajectories of relative deprivation concentration obtained by k-means and LCGM methods.

Figure 5 .
Figure 5. Cartography of trajectories of relative deprivation concentration obtained by k-means and LCGM methods.
group census tracts characterized by lower poverty levels; they are essentially located in suburban areas of the CMA (Laval, North and South Shores and the West of the Island of Montreal) and in traditionally wealthier municipalities in the central part of the Island of Montreal (e.g.Westmont, Outremont and Town Mont-Royal).

Table 2 . Description of indicators of poverty for the Montreal CMA between 1986 and 2006 a
For 1986For , 1991For   and 1996censuses: Population 15 years and over with less than grade 13 without secondary school certificate; For 2001 census: Population 20 years and over with less than grade 13 without secondary school certificate; For 2006 census, population 15 years and over without diploma.c Population 15 years and over, except for the 2001 census where this information is available for population aged 20 years and over. b