Segregation Analyzer: a C#.Net application for calculating residential segregation indices Calcul d’indices de ségrégation: une application C#.Net dédiée au calcul des indices de ségrégation résidentielle

Segregation indices are today well known and increasingly used in urban studies. However, in the absence of specialized computer tools, calculating segregation indices soon becomes a long and complicated process. The odd free applications designed to calculate indices are implemented in geographic information systems (ArcInfo, ArcView and MapInfo). Users wishing to calculate indices by way of these applications must have the GIS software that contains the application and also a sufficient understanding of how to use geographic information systems—two conditions that can limit the use and correspondingly, broad access to residential segregation indices. To remedy this situation, we propose an independent and free application developed in C#.Net called Segregation Analyzer that allows some forty segregation indices (unigroup, intergroup, and multigroup) to be calculated quickly and easily, regardless of the data or city being studied. This application can be downloaded free of charge from the Spatial Analysis and Regional


The development and use of segregation indices
Segregation indices were developed in the United States, where the concentration of ethnic and racial groups has long been a concern, as evidenced by various Chicago School writings dating from the 1920s (Park, Burgess & McKenzie, 1925). Since the 1940s, a new generation of quantitative researchers have proposed residential segregation measures (Jahn et al., 1947cited in Rhein, 1994, including the classic Delta and dissimilarity indices of Duncan and Duncan (1955a and b) and exposure indices of Bell (1954). Briefly, these indices can be used to determine whether a group is unevenly distributed across a set of spatial units (e.g., metropolitan area census tracts), or whether two groups have similar spatial distributions. More recently, in the 1980s and 1990s, other U.S. researchers including Jakubs (1981), Morgan (1983), White (1986, Morrill (1991), and Wong (1993) developed spatial segregation measures 1 to refine or complement existing residential segregation measures. Meanwhile, another group of researchers proposed multigroup segregation indices to compare the spatial distribution of a number of groups at once (Theil, 1972;Sakoda, 1981;James, 1986;Carlson, 1992;Reardon, 1998;Wong, 1998;Reardon & Firebaugh, 2002). In the United States, the development and use of segregation indices are partially linked to African American desegregation policies. In this historic and political context, having clear, easy-to-interpret measures of the degree of segregation of the African American population from a residential, school, or labor market standpoint is crucial (Rhein, 1994: 137-140). Of course, the use of segregation indices is not limited to this objective alone. They are used particularly in urban studies to describe and compare the distribution of population groups in the metropolitan area that differ in ethnic origin or economic status, for example, based on their place of residence or, more rarely, their place of work.

The five dimensions of residential segregation and Massey and Denton's concept of hypersegregation
In a remarkable literature review on residential segregation indices, Massey and Denton (1988) classify the types and spatial manifestations of segregation into five distinct dimensions illustrated in Figure 1: evenness, exposure, concentration, clustering, and centralization. For each dimension, three types of indices are generally identified: one-group indices that measure a group's distribution compared to the entire population; intergroup indices that compare a group's distribution with that of another group; and multigroup indices that compare the spatial distribution of several groups at once. Evenness refers to the distribution of one or more population groups across the spatial units of the metropolitan area (e.g., census tracts). Evenness indices measure a group's over-or under-representation in the spatial units of a metropolitan area: The more unevenly a population group is distributed across these spatial units, the more segregated it is. In Figure 1, situation (a) indicates that the group is evenly distributed across the spatial units and will have one-group evenness index values of zero. For example, a group that represents 5% of the population of a metropolitan area is evenly distributed if it also represents 5% of the population of each spatial unit of this area. In contrast, situation (b) indicates segregative distribution: Members of group X only reside in four spatial units of the metropolitan area where they represent 25% of the total population of each spatial unit. While the spatial structures generated are completely different, a "segregative" status is also found in situations (c) to (j): Members of group X also only reside in four spatial units where they represent 25% of the total population; in other words, the evenness index values are identical in situations (b) to (j). This shows that using evenness measures alone (such as segregation index or dissimilarity index), although certainly valid, is not enough to capture the full complexity of population group distribution across a metropolitan area. Exposure is the degree of potential contact between members of the same group (one group) or between members of two groups (intergroup) inside spatial units (Massey and Denton, 1989). It measures the probability that members of one group will encounter members of their own group (isolation) or another group (interaction) in their spatial unit. A similar distribution of the two groups (in this case, intergroup evenness i.e. indice of dissimilarity takes a value of zero) across the city does not necessarily indicate a high degree of interaction between members of the two groups. Situation (d) in Figure 1 illustrates an extreme situation in which members of group X are completely isolated: They share no spatial unit with members of other groups, as they represent 100% of the population of each of the four spatial units where they reside; in situation (c), however, they share spatial units with members of other groups, as they represent 25% of the total population of each of the four spatial units where they reside.

Figure 1. The five dimensions of residential segregation
Concentration refers to the physical space occupied by a group. The less of the metropolitan area a group occupies, the more concentrated it is. According to Massey and Denton (1988), segregated minorities generally occupy a small portion of metropolitan areas. While situations are identical in terms of evenness, concentration is minimal in (e) and maximal in (f). Other indices measure clustering. The more contiguous spatial units a group occupies-thereby forming an enclave within the city-the more clustered and therefore segregated it is, according to this dimension. In Figure 1, situations (g) and (h) are identical in terms of evenness, but clustering is minimal in (g) and maximal in (h). Finally, centralization indices measure the degree to which a group is located near and in the center of the metropolitan area, which is usually defined as the central business district. The closer a group is to the city center, the more centralized and thus segregated it is according to this dimension. In Figure 1, the group is totally centralized in situation (j), unlike in (i).
In the U.S. literature, the implicit reason for using these five dimensions is not hard to guess: to determine whether the African American minority in a given city is concentrated in a ghetto. Indeed, what is an African American ghetto if not 1) a place where most of the African American community in the metropolitan area resides (uneven distribution); 2) a homogenous area largely inhabited by African Americans (high isolation); 3) a small part of the metropolitan area where the population density is among the highest in the metropolitan area (high concentration); 4) an enclave, an area formed of contiguous census tracts (high clustering); 5) an area generally located at the city center (high centralization). The work of Massey and Denton reflects this. In a study on the distribution of Blacks and Hispanics in 60 U.S. metropolitan areas, these sociologists concluded that African Americans in Baltimore, Chicago, Detroit, Milwaukee, and Philadelphia are hypersegregated, i.e. highly segregated according to all five dimensions (Massey & Denton, 1989). Therefore, grouping segregation indices into five dimensions represents a relevant analytical approach to exploring social, ethnic, age and economic residential segregation in metropolitan areas. They can also be used in regional studies. There exist some forty, even fifty, residential segregation indices, amongst which some involve relatively complex calculations. In addition, few computer applications are designed to calculate these indices (Reardon & O'Sullivan, 2004). To address this situation, we developed a C#.Net application called Segregation Analyzer that allows the computation of some forty segregation indices.

RESIDENTIAL SEGREGATION INDICES INCLUDED IN SEGREGATION ANALYZER
The Segregation Analyzer application includes a total of 42 segregation indices: 19 one-group indices, 13 intergroup indices, 8 multigroup indices, and two local indices (Table 1). Formulas for all of these indices are reported in Appendix 1. We will not go into the properties and meaning of each index, as these are discussed extensively by Denton (1988), Massey et al. (1996), Apparicio (2000), Hutchens (2001), Reardon and Firebaugh (2002), Wong (2003), and Reardon and O'Sullivan (2004). It is worth noting, however, that most segregation indices vary from 0 to 1 (i.e., from no segregation to maximum segregation). One-group, intergroup and multigroup segregation indices are global measures which serve to compute a value for the entire study area, thereby describing the "segregative status" of the population group(s) being studied. Therefore, these indices cannot help answer spatial questions such as "Where is a given population group located in the metropolitan area?" or "Is a given area dominated by one or more ethnic or social groups?" For this purpose, we use measures referred to here as local segregation indices, such as location quotient (LQ) and entropy or diversity index (H2) (see Table 7 in Appendix 1). The LQ helps us identify spatial units in the study area where a population group is under-represented (LQ > 1) or, conversely, over-represented (LQ < 1). The entropy or diversity index helps us identify spatial units that are completely homogenous (inhabited by only one population group, H2 = 0) or maximally diversified (all population groups are equal in size, H2 = 1). These two local indices, usually mapped, are also included in the Segregation Analyzer.

DESCRIPTION OF SEGREGATION ANALYZER
The Segregation Analyzer application does not include data. The user must build its own dataset in order to use the application. However, initiatives by U.S. and Canadian census organizations (U.S. Census Bureau and Statistics Canada) which served to broaden access to geographic data have certainly been highly effective. Today, anyone can find a geographic file of the spatial units in a Canadian or U.S. metropolitan area 2 (e.g., census tracts), as well as the related ethnic or socioeconomic data. 3 Consequently, it is relatively easy to create a geographic database in shapefile format (ESRI) to analyze the residential segregation of various population groups for a given urban area. However, without specialized computer tool, calculating indices can be a long and complicated process. This led us to develop a C#.Net application, Segregation Analyzer, that uses geographic shapefiles that can be downloaded free of charge from the Spatial Analysis and Regional Economics Laboratory website of INRS Urbanisation, Culture et Société (SAREL, http://laser.ucs.inrs.ca/EN/Download.html)?

Segregration Analyzer: a C#.Net application
The Segregation Analyzer application was developed in C#, a language that works with the Microsoft .Net platform. C#.Net makes it possible to develop Windows applications rapidly 4 . Segregation Analyzer is a standalone application which does not require GIS or statistical software. Consequently, GIS and statistical software knowledges are not required by users who wish to calculate indicesconditions which greatly facilitate the use of our application. The process of computing residential segregation indices is illustrated in Figure 2. Calculation of nonspatial segregation indices, which are the majority of all indices, contains three steps: 1) creation of a data table by reading the dbase file via an ODBC or OLEDB driver (this table contains populations of the various groups selected for each spatial unit of the metropolitan area); 2) Applying the various index formulas; 3) Exporting the results to an output file. For spatial segregation indices-particularly those proposed by Morgan (1983), White (1988), Morrill (1991), and Wong (1993 and)-the process is more complicated because of the necessity to calculate three matrices (contiguity, distance and common border length) and three vectors (center of gravity, area, and perimeter of polygons). To do so, we have also developed a DLL in C#.Net that reads the geometric information from shapefile 5 spatial units and then constructs the various matrices and vectors. Figure 3 shows the application interface, which is available in English, French, and Spanish. In addition, a help file in pdf format is provided with the application, including a user guide, formulas of indices, and references. Users can select one or more segregation indices to calculate as well as one or more population groups. Segregation indices are grouped into five categories: one-group, intergroup, multigroup, location quotient, and entropy index. Users can export results to Dbase, txt, tab and csv formats.

Calculation of geometric parameters for spatial segregation indices
A number of geometric parameters are required to calculate spatial segregation indices (see Table 2). For Morrill's indices (1991) and Wong's multigroup dissimilarity index (1998)-IS(adj), D(adj), and SD-a binary contiguity matrix between spatial units of the metropolitan area must be constructed. Similarly, Wong's IS(w) and ID(w) indices (1993) require to calculate a common border lengths matrix, while Wong's IS(s) and ID(s) indices also require to calculate two vectors, one for polygon areas and the other for polygon perimeters. In contrast, concentration indices (Delta, absolute concentration, and relative concentration-DEL, ACO, RCO) require only one vector to calculate -the area of spatial units. Finally, the calculation of clustering and centralization indices requires a distance matrix or a binary contiguity matrix (ACL and RCL) and a vector of polygon centers of gravity. Finally, the spatial segregation index S [1] proposed by Wong (1999) is based on centrographic analysis. In concrete terms, S represents the ratio between the intersection and the union of deviational ellipses of n population groups, minus one 6 .
where n is the number of population groups and Ei is the deviational ellipse of group i. (1) The advantage of our application is its independency from any GIS. Since we don't use any GIS programming language such as Avenue (ArcView), MapBasic (MapInfo) or ArcObjects (ArcGis), we have to calculate various geometric parameters by ourselves (centers of gravity, perimeters and areas of polygons; contiguity, distance and common border lengths matrices). We believe that it would be useful for researchers to know how these geometric parameters are computed in our application. Consequently, we describe these calculations in further detail in Appendix 2. 6 S vary from 0 to 1 i.e., from no segregation to maximum segregation. Take the case of two population groups. If these two population groups have identical spatial distributions, their ellipses are also the same -thus, we have: E1∩E2 = E1UE2 -where the value of S is equal to 0. Alternatively, if the two population groups have completely different spatial distributions, their ellipses will not intersect -thus, we have: E1∩E2 = 0 -where the value of S is equal to 1.

Computational time
Segregation Analyzer is fast enough to calculate a big variety of residential segregation indices in a reasonable time. To give an idea of the calculation speed of our application, the computational times 7 obtained for two geographic files having each 20 population groups are listed in Table 3. One containts census tracts (N = 846) and the other one containts census subdivisions of census metropolitan area of Montréal (N = 106). Obtained times are quite reasonable, they are less than three seconds for each index category, except for 13 intergroup indices, which required approximately 11 seconds due to the need to construct the three matrices and three vectors described above. Indeed, the calculation of geometric parameters took the most computational time, as evidenced by the differences in computational times between spatial and non-spatial indices. 7 These times were obtained on a computer with a Pentium IV 2.8 GHz processor and the Microsoft Windows XP Professional operating system.   (2004) rightly pointed out the lack of actual specialized computer tools for calculating residential segregation indices. The odd tools that currently exist are implemented in GIS. For example, Apparicio (2000) offers a tool that compute twenty four one-group and intergroup indices in MapInfo, while Wong (2003) has an application to calculate intergroup and multigroup dissimilarity indices in ArcView, as well as five spatial indices-ID(adj), ID(w), ID(s), SD, and S-and one local segregation measure.

Reardon and O'Sullivan
The Segregation Analyzer application we have just described is quite different in four respects. First, it is independent, i.e. not requires any GIS or statistical software. Even more, users don't need to understand how to use geographic information systems or any specific statistical software (e.g., SAS, STATA or SPSS). This application can also be used to calculate many more indices, although the list is not exhaustive: 42 in all, including 19 one-group indices, 13 intergroup indices, 8 multigroup indices, as well as two local segregation measures (location quotient and entropy index). Another major difference is that unlike the two applications mentioned above-Wong's in ArcView and Apparicio's in MapInfo-our application can calculate all one-group indices for a number of preselected population groups at once, rather than one at a time (likewise for intergroup and multigroup indices). Last, computational time is very fast. However, Segregation Analyzer does have its limitations, if not major shortcomings. At the present time, it is possible to compute local measures such as location quotient and entropy index, to generate the deviational ellipses of population groups, but it is not yet possible to have them mapped. Consequently, in a second phase of development, we wish to add a mapping module to Segregation Analyzer allowing for the geo-visualization of the distribution of population groups as well as the automatic mapping of the location quotient and entropy index. Notwithstanding these facts, we hope that Segregration Analyzer can help increase access to residential segregation indices by making them easy to calculate, regardless of the data or city being studied. To date, a few empirical studies have already used this application to analyze ethnic and social residential segregation in Montréal and in London (Apparicio & Leloup, in press;Charron & Apparicio;Mateos et al., 2006).