Measuring polycentric urban development: The importance of accurately determining the ‘balance’ between ‘centers’

Abstract In recent years, much research has been devoted to developing appropriate analytical frameworks to capture polycentric urban development (PUD). In a recent contribution to this journal, Bartosiewicz and Marcinczak (2020) present what is arguably the most comprehensive, comparative review to date of the degree to which different analytical frameworks produce consistent results. The purpose of this research note is to show why we believe parts of Bartosiewicz and Marcinczak's (2020) findings need nuance and qualification. Our starting point is that a useful comparison between different studies and measurement frameworks needs to consider the relevance of consistency in several key dimensions, two of which are particularly pertinent here: (1) the careful specification of what constitutes a ‘center’ in a polycentric urban system, and (2) the identification of the ‘balance’ between centers as a measure of the degree of polycentricity. Two brief empirical analyses of the degree of morphological polycentricity in Polish NUTS-3 areas and the Chinese city-regions along the ‘Yangtze Economic Belt’ are included. Finally, suggestions are provided to facilitate future comparative analyses of PUD.

In recent years, much research has been devoted to developing appropriate analytical frameworks to capture polycentric urban development (PUD). In a recent contribution to this journal, Bartosiewicz and Marcińczak (2020) present what is arguably the most comprehensive, comparative review to date of the degree to which different analytical frameworks produce consistent results. The purpose of this research note is to show why we believe parts of Bartosiewicz and Marcińczak's (2020) findings need nuance and qualification. Our starting point is that a useful comparison between different studies and measurement frameworks needs to consider the relevance of consistency in several key dimensions, two of which are particularly pertinent here: (1) the careful specification of what constitutes a 'center' in a polycentric urban system, and (2) the identification of the 'balance' between centers as a measure of the degree of polycentricity. Two brief empirical analyses of the degree of morphological polycentricity in Polish NUTS-3 areas and the Chinese city-regions along the 'Yangtze Economic Belt' are included. Finally, suggestions are provided to facilitate future comparative analyses of PUD.

Measuring polycentric urban development: relevance
Polycentric urban development (PUD) has evolved into a key concept in urban studies, both as an empirical assessment framework (Brezzi & Veneri, 2015;Hall & Pain, 2006;Taubenböck, Standfuß, Wurm, Krehl, & Siedentop, 2017) and as a normative territorial development goal (Bailey & Turok, 2001;Rauhut, 2017;Wang, Wang, & Kintrea, 2020). Van Meeteren, Poorthuis, Derudder, and Witlox (2016) point out that PUD, partly fueled by its popularity, has evolved into a stretched concept causing Babel-like confusion in the scientific debate. According to Davoudi (2003), the scientific ambiguity surrounding the meaning of PUD has even been instrumental in policy circles: as every actor involved in an urban planning process can attribute its own interpretation to it, it becomes easier to (seemingly) establish consensus (as exemplified by Granqvist, Sarjamo, & Mäntysalo, 2019). Nonetheless, the uneven, sometimes undue, and often fuzzy use of the PUD concept does not imply that it is unusable: instead, it raises the stakes of (1) being clear about what PUD does and does not entail, (2) translating this into appropriate analytical frameworks and (3) carefully specifying and examining the alleged virtuous processesincluding heightened economic productivity (Sun & Lv, 2020), lower environmental emissions (Burgalassi & Luzzati, 2015), and reduced traffic congestion (Li, Xiong, & Wang, 2019) -that are sometimes associated with it.
In recent years, much research has been devoted to the second strand of the PUD literature, which involves developing PUD typologies and measurement frameworks. Although some of the variations in these measurement frameworks can be attributed to the specifics of the research question, they invariably need to identify the 'basics' of PUD: whether there are multiple proximate centers without pronounced hierarchical differentiation between these centers. This generic starting point involves making decisions on a range of specific elements, including data sources (polycentric in terms of what?), the definition of the units of analysis (what is a center?), the identification of the lack of hierarchical differentiation (when can we speak of a balance and between large or small centers?), the underlying processes (what produces this pattern?), as well as the geographical scale (how far does the region or city stretch out?). In the literature, there has been increased attention on how to specify all or some of these elements properly. Examples include Möck and Küpper (2020), who explore the relevance of the territorial delineation of the study area and the number of units considered within it; Zhang and Derudder (2019), who examine the relevance of the identification of what constitutes a 'center'; Münter and Volgmann (2020), who develop a conceptual typology describing the various dimensions of PUD for the North-Western European context; and Burger, van der Knaap, & Wall (2014), who show that the analysis of PUD requires a more explicit consideration of issues of multiplexity and individual-level heterogeneity.
In a recent contribution to this journal, Bartosiewicz and Marcińczak (2020, p. 2) present what is arguably the most comprehensive and comparative review to date of "whether measuring polycentricity by various methods produces consistent results". To this end, they outline and compare the results of the most commonly used measurement frameworks when applied to both a set of ideal-typical urban structures and a number of actual Polish urban regions. They find that differences can be sizable, which has major repercussions in using a proper measurement framework when assessing PUD and/or when using these findings in subsequent analyses of its putative causal powers. Bartosiewicz and Marcińczak's (2020) analysis enriches the literature on a number of fronts. However, at the same time, we believe their analyses risk misportraying the potential relevance of some indicators. Against this backdrop, in this research note, we show why we believe at least this part of Bartosiewicz and Marcińczak's (2020) findings need nuance and qualification. Subsequently, we use this to provide suggestions for future comparative analyses of PUD.

What is a center? What is the balance?
Our starting point is that, as mentioned, a useful comparison between different studies and measurement frameworks needs to consider the relevance of several key dimensions, two of which are pertinent here: (1) a careful specification of what constitutes a 'center' and (2) the identification of the 'balance' between those centers. Both are to a large degree related in that the specification of what constitutes a center has ramifications for the number of centers N when identifying the presence or absence of balance. Depending on the scale of analysis, this usually involves having a cut-off point at which a settlement is deemed 'urban' (in the analysis of polycentric urban regions, see Möck & Küpper, 2020) or a specific area is deemed a 'sub-center' (in the analysis of intra-urban polycentricity, see Lee & Gordon, 2007;Krehl, 2015;Li et al., 2019). The identification of the balance, then, entails the measurement of differences in the distribution of 'importance', and has been implemented based on proportion-based measures (Burger, de Goei, Van der Laan, & Huisman, 2011;Li et al., 2019), rank-size regression (Burger & Meijers, 2012;Meijers & Burger, 2010), standard deviation-based measures (Green, 2007;Liu, Derudder, & Wu, 2016), and inequality measures such as Gini-type coefficients (Li & Phelps, 2017;Tsai, 2005). This is in line with Bartosiewicz and Marcińczak's (2020, p. 6) observation that "the number of urban centers, the differences between these cities in terms of size and their specialisation in the region… should all be taken into account".
However, a key component of choosing an 'appropriate' approach is that, as pointed out, both dimensions (i.e., the number of centers (N) and the balance among the centers) are related. As a result, they can be measured separately or conjointly. When measuring the two dimensions separately, researchers have to select a fixed number of centers N (i.e., isolating the effects of N, as in Burger et al., 2011;Burger & Meijers, 2012) or adjust indicators to account for the effects of N when gauging balance (i.e., through various normalization and standardization approaches as in Gini coefficients; Tsai, 2005;Li & Phelps, 2017). By contrast, measures based on standard deviation, such as Green's (2007) and Liu et al.'s (2016), combine the effects of N and the effects of balance. Such 'composite' indicators may accommodate varying levels of N and incorporate additional information about the magnitude of the urban system under investigation (see also Sun & Lv, 2020). This difference can be understood when considering the following toy example of two urban regions, with one region consisting of four centers (7, 1, 1, 1) and the other of eight centers (7, 1, 1, 1, 1, 1, 1, 1). The two regions will receive the same polycentricity score in rank-size calculations if the top 2, top 3, or top 4 centers are considered. However, standard deviationbased methods would consider the latter to be more polycentric, and primacy-based methods would point to a smaller level primacy in the latter case. Bartosiewicz and Marcińczak (2020, p. 1) voice concern over standard deviation-based indicators producing different results than those by the rank-size and primacy-based methods. Their analysis deals with both morphological and functional polycentricity, with the former focusing on the balance in the size distribution or absolute importance across centers, and the latter emphasizing the balance in the distribution of functional linkages or relative importance across centers (see also Burger & Meijers, 2012). As for morphological polycentricity, Bartosiewicz and Marcińczak (2020) observe that standard deviation-based indicators tend to categorize individual metropolitan areas in Poland as morphologically polycentric (with a large number of centers within individual metropolitan regions). In our view, the relatively high levels of morphological polycentricity produced by standard deviation-based indicators in the Polish case may be explained along the following lines: • Conceptually, the differences produced by different indicators may emanate from the fact that the rank-size and primacy-based methods commonly present a different take on an 'urban system' than methods based on standard deviation-type indicators. The former group focuses on the largest few centers, while the latter group includes all centers deserving that designation in light of the research question. By this token, the two sets of measures would only need to produce similar results if the top slice and the rest of the urban system follow similar distributions. However, as both groups of indicators essentially propose different treatments of the number of centers and the measurement of balance, it is understandable that they may not necessarily produce similar results. Put differently, indicators may entail different conceptualizations of 'polycentricity' and, therefore, may have strengths and weaknesses in revealing different aspects of urban systems; • Mathematically, if, as in the Polish case, a large number of small settlements are considered to qualify as 'centers' in an urban region, the standard deviation-based indicators may reflect the (im)balance among the many small settlements (as opposed to the (im)balance among larger centers). Furthermore, unlike rank-size based methods, standard deviation-based methods assign 'equal weights' to all centers in the calculations, as neither ranking nor logarithm transformation is applied. As mentioned above, standard deviation-based measures are composite measures as they combine the effects of the number of centers and 'balance' across centers. By contrast, the ranksize and primacy-based methods reported in Bartosiewicz and Marcińczak (2020) do not have such an issue as the number of centers used for analysis is fixed at two/three/four and one, respectively, regardless of the actual number of centers; • Empirically, geographical context matters. For example, in the analysis of functional urban areas in Europe and North America, studies often focus on the two to six largest or most important centers (e.g., Burger & Meijers, 2012). Similarly, in the Chinese case, most city-regions and prefecture-level cities have fewer than ten centers and more hierarchical patterns (e.g., . However, a visual examination of Fig. 3 in Bartosiewicz and Marcińczak (2020) suggests that there seem to be more than 30 nodes/centers in the metropolitan areas under study. As standard deviation-based indicators account for both the number of centers and their balance, the reported high polycentricity may well reflect the relative balance across a large number of small centers that are less legible in Fig. 3. Bartosiewicz and Marcińczak (2020, p. 10) acknowledge that "the city-regions in Poland reflect a somewhat different socioeconomic and spatial context than the regions in Northwest Europe or China", but it is worth noting that this may, to a degree, explain such results.
Furthermore, Bartosiewicz and Marcińczak (2020) raise concerns about standard deviation-based indicators identifying Polish cityregions as generally having low levels of functional polycentricity. However, further to the arguments we developed for morphological polycentricity, the following points can be made with regard to functional polycentricity.
First, having a large number of less well-connected centers in an analysis of functional polycentricity may affect the 'network density', a key variable in Green's (2007) indicator. In addition to N and balance factors, the density variable adjusts for 'effectively' connected centers within city-regions. Centers may be deemed not to be 'effectively' connected when the ratio between their dyadic connections and the theoretical maximum is close to zero. This adjustment is necessary as "a maximally connected network and an unconnected network would both be completely polycentric since, mathematically speaking, each of the nodes is equally connected to each of the other nodes (…) -so that unconnected collections of nodes are not defined as a functionally polycentric system" (Green, 2007(Green, , p. 2084. Put differently, standard deviation-based measures of functional polycentricity such as Green's (2007) further add the condition of network density onto the effects of N and the effects of balance (see also Sun and Lv (2020) for the use of three-factor composite indicators). Following this logic, even if there are mathematically balanced interactions between centers, when such balance is achieved with a low density, the final interpretation may still lack functional connections and thus a lack of functional polycentricity. Such difference can be understood when considering a second toy example of two urban regions, with one region consisting of three centers with intra-region connectivity of (4, 3, 2) and the other region of three centers with intra-region connectivity of (400, 300, 200). The two regions would receive the same polycentricity score based on rank-size and primacy calculations. However, standard deviation based methods would consider the latter to be more polycentric, as the former is comparatively far less connected and may not be associated with functional connections.
Second, all else being equal, having a large number of less wellconnected centers tends to overestimate the theoretical maximum L max , compress the network density towards zero, and thus lead to the relatively small functional polycentric measures observed in Bartosiewicz and Marcińczak (2020). This is because L max is estimated based on the number of centers and the theoretical maximum dyadic value. Having a larger number of less well-connected centers would, therefore, lead to an inflated theoretical maximum (L max ) that is less appropriate to serve as 'benchmark', and thus produce network density values closer to zero. In this case, alternative ways of estimating L max may be more appropriate and represent different assumptions regarding the 'upper limits' of the connectivity within individual urban regions or cities, or even more generally, their development level. 1 Against this backdrop, we believe a comparison of polycentricity measures needs to incorporate the number/type of centers and the measurement of balance in a similar fashion. For example, in areas with a large number of centers or areas where the top slice of the urban hierarchy has a different rank-size distribution than the rest, to compare rank-size, primacy, and standard deviation-based indicators, one may have to opt to restrict the analysis to a fixed number of significant centers. By the same token, for the case of functional polycentricity, different indicators such as Burger and Meijers (2012) and Green (2007) can be compared when density is (un)incorporated in similar fashions. To explore what this means in practice, in Tables 1 and 2, we use data on the population distribution across Local Administrative Units (LAUs) within selected Polish urban regions at the Nomenclature of Territorial Units for Statistic Level 3 (NUTS-3) and measure their morphological polycentricity. Specifically, we calculate for each city: (a) a rank-size regression as in Burger and Meijers (2012); (b) a rank-size regression without limitations on N; (c) a standard deviation-based measure as in Green (2007); (d) a standard deviation-based measure with N limited to the largest two, three, and four centers; (e) a standard deviation-based measure with alternative 'maximum possible standard deviation' (specifically, the denominator in Green's (2007) indicator is calculated based on a hypothetical urban system of the same size as the one under investigation in which one center has the entire population and the remaining centers none); and (f) urban primacy (Table 1). The following observations can be made: • Consistent with Bartosiewicz and Marcińczak (2020), when N is not accounted for, the association between indicators could be relatively weak. For example, (c) tends to point to relatively high levels of morphological polycentricity in cases with relatively high primacy and/or a 'steep slope' such as PL224, PL314, and PL343 (see below our discussions on possible thresholds for identifying (absolute) levels of polycentricity). As mentioned above, for cities with a relatively large number of equally sized (and relatively small) subdivisions/LAUs, (c) may be more reflective of the 'balance' between these subdivisions/LAUs. In Table 1, 45 out of 59 (76%) NUTS-3 units have at least 30 LAUs, where PL122 has 91 LAUs; • While the correlation between rank-size (a) and primacy (f) is − 0.834 (Table 2), which is consistent with Bartosiewicz and Marcińczak's (2020) finding that Burger et al.'s (2011) measure is essentially 1-(f), the correlation between (a) and (b) is only 0.231. In other words, indices from the same family (the slope of a rank-size regression line) can vastly vary due to the difference in the number of N; • When the number of LAUs (N) is accounted for, rank-size, standard deviation, and primacy-based indicators produce more consistent results. For example, when controlling for the largest two, three, and four LAUs within each NUTS-3, the Pearson correlation between (a) and (d) is 0.954. Similarly, when the theoretical maximum (i.e., the denominator) is adjusted to reflect the size of the urban region under 1 For example, the maximum dyadic value for individual urban regions/city or the average dyadic value in the most well-connected urban region/city may be used instead of the overall maximum dyadic value. The use of the former would assume that dyads are theoretically as strong as the strongest dyads within the same urban region/city under investigation, while the use of the latter would imply the connectivity of the urban region/city under investigation could theoretically reach the 'development level' or connectivity of the most well-connected urban region/city in the entire study area. For a more detailed discussion, see Green (2007).
investigation (see also other adjustments of Green's indicator in Chen et al. (2017)), the correlation between (a) and (e) is 0.754. The Pearson correlation between (d) and 1-(f) is at 0.758, and the correlation coefficient between (e) and 1-(f) is 0.974; • These results need to be read in a geographical context. The overall population distribution within Polish NUTS-3 may be rather dispersed. For example, the average primacy (f) is 0.202, meaning that, on average, the single largest LAU only occupies about 20% of the total population of those NUTS-3 units. This may be associated with processes that are conceptually beyond PUD, such as suburbanization (see, however, Spórna & Krzysztofik, 2020). While Bartosiewicz and Marcińczak (2020, p. 10) acknowledge this when they , whose data is based on the pre-2018 delineation of NUTS-3 units.
state that "the process of deconcentration (suburbanisation) in Poland is tightly linked to the degree of polycentric urban development -more substantial deconcentration implies more polycentric urban forms", in our view this may be a conceptual rather than a measurement problem.
The relevance of geographical context also shows from an analysis of some results for Chinese city-regions along the 'Yangtze Economic Belt' (Table 3). Here, we assess the polycentricity among prefecture-level cities in these regions based on their connectivity in the producer service networks, by drawing on the data presented in Zhang and Derudder (2019). In this example, urban systems within Chinese city-regions, in general, have fewer (and vastly larger) centers and are comparatively more hierarchical than in the Polish case. Table 3 suggests that: • Overall speaking, the performance of standard deviation based indicators seems to be more consistent with other indicators in the Chinese context. For example, the correlation between polycentricity measured by (a) and (c) is 0.847. The Pearson correlation between (c) and 1-(f) is strong at 0.902; • This association persists even when N is accounted for: the Pearson correlation between (a) and (d) is 0.965, while the correlation between (a) and (e) is 0.977. Still, the Pearson correlation between (d) and 1-(f) is at 0.971, and the correlation coefficient between (e) and 1-(f) is 0.973; • Furthermore, Wang (2020) has shown that indicators based on standard deviation and rank-size regressions are highly correlated and can serve as a robustness check for each other in analyzing Chinese prefecture-level cities where most cities would have fewer centers and relatively a more hierarchical urban system. Still, the correlation between standard deviation and rank-size indicators for the 183 prefecture-level cities with at least two population centers, as reported in Liu and Wang (2016), is 0.864.
A final comment concerns setting universal in-between critical values (e.g., 0.25, 0.50, and 0.75, as in Bartosiewicz and Marcińczak (2020)) for the categorization of (absolute) high and low levels of polycentricity. First, having a series of evenly spaced cut-off values may imply theoretically evenly distributed values for indicators ranging between 0 and 1. This may, however, not necessarily be the case. In this regard, the Gini coefficient may be a good example. It ranges between 0 and 1, but does not have 0.5 as the 'mid-point' and often has observed empirical values below 0.5 when measuring income (in)equality. Alternatively, as explained by Teng et al. (2011, p. 135), "according to the United Nation's definition, Gini index <0.2 represents perfect income equality, 0.2-0.3 relative equality, 0.3-0.4 adequate equality, 0.4-0.5 big income gap, and above 0.5 represents a severe income gap". Second, following this logic, the cut-off points for indicators may need to be determined heuristically or empirically from the dataset, such as the use of averages and medians in . Similarly, in Bartosiewicz and Marcińczak (2020), the cut-off points for rank-size based measures are determined empirically from the dataset instead of using any pre-defined critical values within (− ∞, 0). For example, if median values are used to categorize indicators reported in Table 1, (a) and (d) indicators would point to the same categories (i.e., higher versus lower levels of polycentricity) in all 59 Polish urban regions and (a) and (e) would arrive at the same categories in 45 out of 59 cases. Lastly, the theoretical implications of absolute cut-offs may need further consideration. As in the example of primacy-based indicators, while 0.50 has an explicit meaning, the largest unit has half of the morphological/ functional 'importance' of the urban system, whether and to what extent the urban system is polycentric requires further investigation (e.g., N and balance). Such interpretation may be even less straightforward for composite indicators (e.g., Green (2007); Sun and Lv (2020)).
The examples laid out in Tables 1-3 zoom in on the example of morphological polycentricity, but we are confident that, mutatis mutandis, these hold for functional polycentricity. Overall, then, we find a much better alignment between different polycentricity measures than reported in Bartosiewicz and Marcińczak (2020), which leads us to the following suggestions for future comparative analyses of PUD: • The conceptualization of 'polycentricity', the design of indicators (in particular, the treatment of N and its impact on (im)balance), and empirical/geographical contexts matter. For example, the use of four centers in Burger and Meijers (2012) and Meijers and Burger (2010) is based on a theoretically sound understanding of urban patterns in North-Western Europe and North America, respectively. Still, the density factor in Green's (2007) functional polycentricity entails   (2019).
assumptions regarding the theoretical maximum levels of connectivity that individual city-regions may achieve and may vary across different types of connections; • It is useful to provide multiple indicators to see whether they point to similar patterns and/or need adjustments (e.g., Li et al., 2019;Liu & Wang, 2016). It may also be useful to present results with different empirical choices regarding N and the (im)balance across centers, as shown in Zhang and Derudder (2019) and Zhang, Sun, Li, Dan, and Wang (2019). This is in line with Bartosiewicz and Marcińczak's (2020, p. 10) call that the "choice of indicators should be based on a careful comparison of potential indices". Put differently, as implied in various examples above, any single indicator may have its own strengths, weaknesses, and caveats; • For example, Green's indicators have been applied in many Asian contexts (e.g., Korea as in Kim, Lee, and Kim (2018), Indonesia as in Sadewo, Syabri, Antipova, Pradono, and Hudalah (2020), the Yangtze and Pearl River Deltas in China as in Zhao, Derudder, and Huang (2017)). In this regard, one caveat of Green's (2007) indicator, as implied in Bartosiewicz and Marcińczak's (2020), may be that it is more applicable in 'normal' hierarchical contexts (e.g., powerlaw patterns as in Angel (2012) or, in Green's (2007) terms, when urban regions/cities with a few large centers that 'stand out'). Therefore, for the use of indicators in different contexts/dimensions as well as in more extreme theoretical settings, researchers may wish to pay more attention to the selection of centers under different contexts as well as consider alternative adjustments (e.g., adjustments such as (d) and (e) shown above). This is especially the case when non-composite indicators (e.g., Meijers and Burger (2010)) and composite indicators (e.g., Green (2007); Sun and Lv (2020)) are compared side by side; • Caution is needed when setting theoretical/absolute in-between critical values of polycentricity indicators for categorizing high and low levels of polycentricity. In practice, those critical values can be determined heuristically/empirically (e.g., the use of averages to determine cut-offs for Burger and Meijers' (2012) indicator in Bartosiewicz and Marcińczak (2020) as well as for Green's (2007) indicator in ). While heuristic thresholds have been widely used to measure the (im)balance across individuals in socioeconomic inequality studies (see, for example, Cobham et al., 2013), such methods can be applied to measure that across urban centers by the same token; • It is imperative to thoroughly reflect on what constitutes a center in light of the research question at hand. Center definition should reflect the conceptual focus on PUD rather than related but possibly different processes such as suburbanization.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.