Socio-economic Segregation in a Population-Scale Social Network

We propose a social network-aware approach to study socio-economic segregation. The key question that we address is whether patterns of segregation are more pronounced in social networks than in the common spatial neighborhood-focused manifestations of segregation. We, therefore, conduct a population-scale social network analysis to study socio-economic segregation at a comprehensive and highly granular social network level. For this, we utilize social network data from Statistics Netherlands on 17.2 million registered residents of the Netherlands that are connected through around 1.3 billion ties distributed over five distinct tie types. We take income assortativity as a measure of socio-economic segregation, compare a social network and spatial neighborhood approach, and find that the social network structure exhibits two times as much segregation. As such, this work complements the spatial perspective on segregation in both literature and policymaking. While at a widely used unit of spatial aggregation (e.g., the geographical neighborhood), patterns of socio-economic segregation may appear relatively minimal, they may in fact persist in the underlying social network structure. Furthermore, we discover higher socio-economic segregation in larger cities, shedding a different light on the common view of cities as hubs for diverse socio-economic mixing. A population-scale social network perspective hence offers a way to uncover hitherto “hidden” segregation that extends beyond spatial neighborhoods and infiltrates multiple aspects of human life.


Introduction
Social segregation is one of the most persistent problems that stands in the way of upward socioeconomic mobility.It is typically viewed as the degree to which two or more groups live separately from one another (Massey et al. 1988), voluntarily, as a result of policies, or due to a reinforcing combination of both.As such, segregation seeps into every facet of life.The common approach to study segregation is to analyze the spatial distribution of groups of people with different characteristics, zooming in on neighborhoods.Indeed, neighborhoods are to a large extent representative of early schooling opportunities, provide access to a particular set of neighborhood relations, and create other opportunities for people to interact.They are also relevant because the probability of establishing social ties decreases with geographical distance (Liben-Nowell et al. 2005;Small et al. 2019;Viry 2012;Bidart et al. 2022).Hence, a neighborhood's social composition to a certain extent defines its residents' social and economic opportunities (Atkinson et al. 2001;Chetty and Hendren 2018;Friedrichs et al. 2003;Mayer et al. 1989).
However, the all-pervasive nature of segregation extends well beyond neighborhoods as the limited exposure of various population groups to one another are inherent to all facets of life including social circles, lifestyle practices, education trajectories, or career choices (Bojanowski et al. 2014;Henry et al. 2011;Hofstra et al. 2017).High levels of segregation therefore lead to the emergence of "socio-economic bubbles" where people almost exclusively interact with similar others (e.g. the filter bubble concept, see Pariser 2011).Hence, it is not obvious that administrative neighborhoods are the best level of analysis to measure, study and combat segregation.First of all, individuals' social networks are constructed by a wide variety of social contexts that may well extend beyond the spatially defined neighborhood, such as family, school classes, or colleagues.Second, some convincingly argue that neighborhoods in postindustrial societies have lost much of their previous importance for providing social cohesion and as the locus for social activities (Dahlin et al. 2008;Duyvendak 2011;Pinkster 2016).Thus, notwithstanding the important contributions made by the work on residential segregation, the spatial manifestation of social segregation in neighborhoods remains a smaller share of a wider issue (Leo et al. 2016;Dong et al. 2020;Bidart et al. 2022;Morales et al. 2019).
We develop a social network-aware approach to study socio-economic segregation and ask the question whether patterns of segregation in social networks are more pronounced than the common spatial manifestations of segregation.We intend to uncover potential "hidden segregation" that is not captured by spatial neighborhoods.In addition, we want to study segregation as a multi-layered phenomenon in the sense that it plays out across different sets of social contexts.To capture this, we conduct a population-scale social network analysis (Bokányi, Heemskerk, et al. 2022) to analyze socio-economic segregation at a comprehensive and highly granular level.Recent advances in data collection have made it possible to investigate large-scale social networks, for instance using digital traces of online platforms (Chetty, Jackson, et al. 2022a;Chetty, Jackson, et al. 2022b) as well as high-quality register data collected by national statistical bureaus (van der Laan 2021).We build on the latter as a promising opportunity to study social cohesion and segregation and investigate individual-level information on various social ties of an entire country's population: around 17.2 million registered residents of the Netherlands as of October 2018 linked through around 1.3B ties living in around 7.7 million households that are connected through around 914 million ties.With these data in hand, it is possible to represent the entire population as a large-scale social network in which nodes are individuals or households and the relationships between them based on the relations defined in the state-administered registers.
Population-scale network analysis has considerable benefits over existing approaches to social network inference (Peel et al. 2022) that rely on a wide variety of data collection designs ranging from studyspecific surveys and name or position generators to larger-scale online social network data and mobile communication networks.Surveys for instance can be highly tailored to specific analytical goals, but are prone to sampling-related issues and non-response or cognitive bias.Moreover, they are costly to implement for larger sample sizes (Brüggen, Wetzels, et al. 2011;Brüggen, Van den Brakel, et al. 2016;Bosnjak et al. 2005).Online social network data such as Meta/Facebook or Twitter do offer a cost-effective solution to obtain large amounts of social network data of self-reported social ties of individuals.However, these sources still suffer from issues of sample representativeness as well as absence of information on the different types of relationships between individuals and individual nodelevel characteristics (Bokányi, Heemskerk, et al. 2022;Lazer et al. 2021).In a similar way, while mobile phone communication records capture particular social ties between individuals and are also well suited for measuring the intensity and temporal frequency of interactions at a granular level, they nevertheless leave no room for distinguishing different social contexts (Onnela et al. 2007;Eagle et al. 2009).As such, it is hard to study the multi-layered structure of social segregation through such data.
Official register data as used in this work offer, in contrast, a clear and precise definition for each type of social tie such as a household member, a classmate, or a colleague.In addition, it also comes with a wide range of high-quality attribute data from those same registers.These register-based populationscale social network data thus allow us to study various different patterns of social segregation without a priori spatial assumptions.As opposed to more informal ties such as retweets, phone calls, or friendships, the social ties encoded in official registers are so-called formal ties, i.e. officially recognized or institutionalized ties and affiliations of residents such as their family connections, next-door neighbors, household members, school classmates or colleagues.It has been shown that formal ties capture the majority of people's strong connections (Wrzus et al. 2013;Buijs et al. 2022;van Eijk 2010).We consider our population-scale social network data to be a comprehensive mapping of the social opportunity structure available to individuals (Bokányi, Heemskerk, et al. 2022).However, we also acknowledge the limitations the chosen approach suffers from.Administrative registers that serve as an input for the construction of the population-scale social network do not offer a way of quantifying or distinguishing the strength of a tie and intensity of communication between individuals.For instance, we know that two students belong to the same class, nevertheless, we are not aware if they communicate at all, what the nature of their relationship is (unless they share a tie in any other social context covered by the data), or if it is a positive or a negative tie, etc.Furthermore, the chosen approach has a limited potential for tailoring the input data to a specific research question.We rely on multiple existing registers being collected for various administrative purposes and while it is not impossible to modify registers' design, such a task is a long-term policy initiative that would involve multiple stakeholders.
In the next section, we first summarize the literature on social segregation and compare the spatial neighborhood approaches with a broader social network approach, followed by an introduction of the Netherlands as our site of research.The subsequent section discusses the population-scale social network data as well as the methods used in the analysis.The empirical section presents the results of that analysis and shows on average doubled levels of social segregation in the social network approach compared to a spatial neighborhood approach.Further decomposition of the social network segregation patterns by layers (social contexts) as well as by various regions reveals even higher levels of segregation for particular types of social ties and subgroups of population.In the discussion and conclusion, we reflect on the consequences of this previously "hidden" social segregation for policy interventions and as well as on the opportunities of population-scale social network analysis as an efficient approach to map and analyze social segregation.

A social network perspective on socio-economic segregation
The post-war trend of economic growth in the second half of the 20th century has benefited many, but also led to a rather heterogeneous distribution of wealth among different socio-economic groups (Stiglitz 2015;Dabla-Norris et al. 2015).This is at odds with the central prediction of the neoclassical growth theory that envisions conditional convergence of disparities between countries, regions, neighborhoods, and households.Despite state and subnational efforts to tackle these issues both by implementing equality-enhancing policies and by increasing market forces, socio-economic inequalities remain selfreinforcing "traps" or "low-level equilibria" that are remarkably persistent over time (Robert Moffitt 2018).There are many explanations suggested for long-term persistence of social divisions and related inequalities such as the intergenerational transmission of wealth and ability; capital market imperfections imposing credit constraints for mobility; local spatial segregation patterns; and self-fulfilling beliefs (Piketty 2000).However, the root cause of segregation may lay deeper in the social fabric of our societies (Chang et al. 2020).We know that social networks do not emerge randomly.Far from it, social networks are typically segregated structures, where people connect to similar others with respect to a variety of characteristics.This is in part due to homophily: there is a higher probability for an individual to establish a social tie with an individual with similar attributes such as, for instance, age, gender, language, nationality, religion (Melamed et al. 2020;McPherson et al. 2001), or socioeconomic-status (Leo et al. 2016;Morales et al. 2019;Dong et al. 2020;Bokányi, Juhász, et al. 2021).When the well-endowed cluster together, and those in a more disadvantaged position do so as well, these social cleavages make existing inequalities persistent.They may even amplify them (van Ham et al. 2018) because social ties provide a form of capital as an infrastructure towards certain resources (Bourdieu 1986).Social capital literature firmly establishes that social relations provide much needed access to a wide range of resources that are important for reducing inequalities and upward mobility.The resources that may flow through social networks include social and financial opportunities (Uzzi 1999), valuable information relevant for job search (Demchenko 2011;Granovetter 1973;Montgomery 1992;Rajkumar et al. 2022) and educational opportunities (Frank et al. 2018).Using diverse resources embedded in one's social network results in better socio-economic outcomes (Lin 1982) and access to such resources is in turn determined by "the strength of [social] position" (Campbell et al. 1986;Lin 1999).
Following a long tradition of social capital research, recent studies that have access to large-scale social network data empirically demonstrated how social networks among others explain socio-economic inequality (Jackson 2008;Chetty, Jackson, et al. 2022a;Chetty, Jackson, et al. 2022b).Jackson finds that "immobility results when people end up trapped by the social circumstances into which they are born: the networks in which they are embedded fail to provide them with the information and opportunities that they need to succeed" (Jackson 2019, p. 118).And other work shows how "stronger" personal social networks characterized by larger quantities of "high-quality" social contacts are associated with socio-economic benefits (Pinquart et al. 2000;Woolcock 2001;Chetty, Jackson, et al. 2022a;Chetty, Jackson, et al. 2022b).The social fabric people are embedded in is therefore best seen as a social opportunity structure (Bokányi, Heemskerk, et al. 2022).Roberts (Roberts 1977;Roberts 2009) theorized how a complex interplay of both external and internal sociological factors (such as family background, school choices, and labor market conditions) limit the choices available to an individual.He showed for instance that macro-level labor market imbalances, particularly youth unemployment, stem not from exclusively individual "wrong choices" but are driven and reinforced by the existing opportunity structures (Roberts 2009).In what follows, we build upon this conceptualisation of the social fabric as an opportunity structure.The extent to which this opportunity structure is segregated into different socio-economic foci or bubbles determines how pervasive inequalities are in a society.
The relation between social segregation and inequalities has long been recognised by policy makers and many efforts have therefore been made to combat segregation.Most attention has been given to the spatial manifestation of social segregation.On the one hand, geographic neighborhoods are indeed directly related to residents' social and economic opportunities, as a fair share of social life plays out inside neighborhoods (Lenahan et al. 2022;Kuyvenhoven et al. 2021).They provide a site of access to a particular set of neighbors and in general, create opportunities for people to interact in their community centers, sport clubs and other civic society organizations (Friedrichs et al. 2003;Sharp et al. 2018).On the other hand, spatial planning also requires strategic reasoning about desired neighborhood composition, because that is the "level of analysis" of urban development.Together, this underpins the dominance of a spatial perspective on segregation in both literature and policy making.For instance, smart urban planning approaches see cities as the "champions of diversity" so that they provide larger sets of social choices which results in lower levels of segregation (Jacobs 1961;Milgram 1970;Glaeser 2011;Bettencourt 2013).The policy focus also triggered a large body of literature on neighborhood effects and "moving to opportunity" kind of social experiments.In these experiments, families were moved to other, more well-off, neighborhoods, and this change in their residential environment has been reported to improve intergenerational socio-economic-mobility, long-term health, education, and employment outcomes (Chetty, Hendren, and Katz 2016;Chetty and Hendren 2018;Ludwig et al. 2013;Orr et al. 2006).
There are, however, good reasons to assume that the importance of neighborhoods as key social foci is diminishing.First, increased geographic mobility further adds to the spatial dispersion of social connections, discounting the importance of neighborhoods (Liben-Nowell et al. 2005;Viry 2012;Bokányi, Juhász, et al. 2021).Second, there is a diminishing role for the neighborhood as a site for social tie (re)production.Already, co-worker ties are much more frequent than those to neighbors (Dahlin et al. 2008).Third, such spatial dispersion is exacerbated by the rise of online social networks, which further reduce the importance of spatial residential patterns for social cohesion and segregation (Scellato et al. 2021).In sum, while the spatial focus on neighborhoods in both literature and policy making is understandable, it may misread and underestimate actual patterns of social segregation.We therefore propose a population-scale network analysis study to map and analyze the patterns of segregation for an entire country.The unit of analysis will be the household because these integrate their members into joint social and economic units significantly sharing their life trajectories, activities, and responsibilities.Households are therefore widely regarded as the locus for socio-economic status as well as the main target for welfare policies (Katz et al. 2001;Kling et al. 2007).
We choose to investigate the social opportunity structure of the relatively egalitarian country of the Netherlands.Despite ranking high in terms of social cohesion as well as being in the top-tier of highincome economies, the Netherlands has been reported to have one of the highest wealth inequalities in the world as measured by the Gini coefficient (Credit Suisse Group Research Institute 2019).Wealth distribution in the Netherlands is heavily skewed as it is largely influenced by housing and pension practices that lead to more than half of the population having none to negative net wealth (Salverda et al. 2013).On the other hand, income distribution is more evenly spread.As ranked by the income inequality expressed by the Gini coefficient, the Netherlands has consistently been reported in the top fifteen best performing economies (Salverda et al. 2013).The combination of relatively high wealth concentration and moderate income inequality makes the Netherlands a prime case study for social network segregation patterns that persist despite strong welfare state efforts.
Statistics Netherlands.Further details about ensuring data security and privacy are described in (van der Laan 2021).Network construction.The schematic representation of the input population-scale persons' network is presented in Figure 1.Based on this network of people, we first perform a household-level aggregation of the network such that the result is a multilayer network in which nodes are the households and connections between them are family, neighborhood, school, and work ties that connect any of the members of households.The network is unweighted.Whenever households share ties in different social contexts, a single link per layer is preserved.For instance, if parents in two households work together, and their kids are in the same school class, respective households share two links: one work link as well as one school tie.Except for single-person households, the majority of the households are either nuclear or extended family members living at the same address.Households not covered by kinship or institutional relations are classified as "other households", consisting of individuals that live on the same address and act as a joint economic unit as defined by Statistics Netherlands (Witvliet 2012).For the purposes of this study we exclude institutional households such as care homes, prisons, orphanages, etc. from the analysis.Table 1 summarizes properties of the transformed network providing information on the number of nodes and edges along with the average, minimum, and maximum degrees for each configuration.Node attributes.Based on the individual-level node attributes of household members, we assign joint characteristics of a household with respect to income, educational level, ethnic and migration background.Joint household-level attributes are derived from the profile of "founding" adults: either one adult in the household in case of single-person or single-parent households or based on the "founding" couple in other cases.If a founding adult in a household is single, attributes with respect to the adult's income, education, ethnic and migrant groups are assigned to a corresponding household.For households with a founding couple, we combine individual characteristics of founding adults from a couple into a household-level attribute.In such cases continuous variables such as income correspond to an equivalised combined household income, i.e. a value that takes into account the size and composition of a household.For nominal data that include ethnic group and migrant generation we combine the categories describing individuals as provided by the Statistics Netherlands.A list of categories that the ethnic group variable takes is shown in Table 2.For the migrant generation, we distinguish between native Dutch, first and second generation migrants.Given that two individuals belong to the same ethnic group and/or migrant generation, household-level characteristic takes the identical value.Whenever two founding adults come from different ethnic or migrant generation backgrounds, we combine their individual-level characteristics into a mixed household's attribute.For instance, a native Dutch living with a Surinamese would be described as a mixed Dutch-Surinamese household.For ordinal data describing people's highest achieved education level we follow the same aggregation approach.On the individual level education background takes a natural ordering from primary education as detailed in Table 2.When aggregated on a household level, education attribute takes a corresponding value of the individuals if they obtained the same education level or presents a mixed category, for instance, MBO-WO (vocational and higher education).Information about node attributes is summarized in  Network layers.We distinguish four network layers.First is the family layer which is based on childparent information and allows us to derive links including an individual's parents, children, siblings, grandparents/grandchildren as well as aunts/uncles and cousins.Second is the school layer which is based on people who are admitted to a formal institute for education at the time of the data collection (October 2018).The layer distinguishes between primary, secondary/specialized secondary, vocational and higher education levels.For all the school levels, people are grouped in classes if they attend the same educational institution, at the same location (applicable to elementary, secondary/secondary special, and higher education), with the equal overall duration of schooling and identical type of education (that can be distinguished among secondary, vocational, and higher educational institutions).Third is the work layer that provides a set of social ties between colleagues who work for the same employer.In the case of larger workplaces, only the 100 geographically closest colleagues (in terms of their place of residence) are sampled.Finally, the neighbor layer includes at most 10 geographically closest neighboring households, referred to as "next-door neighbors".We include this as part of the social opportunity structure derived from the context of the built environment.Except for family, all layers in our study have a spatial angle -neighbor is the smallest scale, but lower levels of education, especially primary schooling, or workplace choice are also to an extent influenced by spatial constraints.However, these layers are not restricted by an a priori spatial unit of aggregation.This is the case in a strict neighborhood approach dominant in the literature that we compare with.

Segregation measurement.
In what follows, we outline how we compare the social network perspective on segregation with a neighborhood perspective.The concept of a neighborhood here denotes people living in the same geographic neighborhood as defined by the smallest administrative neighborhood borders in the Netherlands (the so-called "buurt" in Dutch), typically consisting of a few thousand households.The social network perspective is tailored to the considered population-scale multilayer network consisting of family, classmates, colleagues, and next-door neighbors explained above.To measure segregation from the spatial and network perspective, we take a three-step approach as detailed below.
Grouping households.We first categorize households into respective groups as defined by their attributes: education level, ethnic group, migrant generation, and household income.For nominal data (education, ethnic and migration background) the size of each group corresponds to the prevalence of each attribute in the population of households.For the continuous variable in the analysis (income) we perform a normalization.We linearly split the full income range of Dutch residents into equally large deciles, or brackets, (D=1,2,. . .,10) such that the 10% lowest-income households are grouped in the 1 st decile, and 10% of the richest ones in the 10 th one.
Mixing matrix construction.To operationalize the concept of the social opportunity structure, we introduce a mixing matrix, which is a matrix X that represents the connectivity between different subgroups of the population as defined by their attributes.To construct such matrices, we count the occurrence of ties between households that belong to a certain subgroup as defined by one of the attributes in consideration and those in different subgroups of population including their own.Based on these occurrences, we construct a mixing matrix in which rows and columns represent all possible values that an attribute takes.Each cell at the intersection of two subgroups of the population (for instance, two income deciles) displays the share of contacts a household of a certain income bracket (as indicated by the value on the vertical axis) shares with the households in the income decile as indicated by the value on the horizontal axis.The elements of the matrix are normalized by row.The diagonal elements represent the share of contacts each income decile has within its own income bracket.The matrices are therefore a visual representation of aggregated mixing patterns in the population that also serve as an input for further quantification of segregation levels.
Assortativity coefficient.Finally, we quantify the level of segregation as assortative mixing along the chosen dimensions (income, highest achieved education level, ethnic group, and migration background).
We do so through the common assortativity coefficient ρ (Newman 2002;Newman 2003) that captures the Pearson correlation between attributes of the connected nodes in a network.Based on previously obtained mixing matrices, the assortativity coefficient is defined as follows in Equation 1a for scalar or numeric values of income brackets, and in Equation 1b for discrete characteristics: where i and j denote income deciles (D=1,2,. . .,10), X represents previously defined mixing matrix normalized such that all elements add up to 1 and each element in the matrix X represents a share of links between income decile i and income decile j ; denominator presents the multiplication of the standard deviations of row-wise and column-wise sums; and a i , b i denote the row-wise and column-wise sum of normalized shares, respectively.
The assortativity coefficient ρ essentially captures how diagonal a matrix is: higher observed assortativity would indicate a higher share of links within one's own or similar subgroup of the population.The coefficient ranges from −1 to 1, where higher positive values indicate stronger preference for mixing with someone of a similar trait.An assortativity coefficient of 0 signals random mixing, and negative values point to preference for matching with someone of a different type.While there are two assortativity coefficients defined for scalar and discrete values, for readability purposes we refer to both as ρ.
To contrast spatial and social network perspectives on segregation, we apply the measurement of the assortativity coefficient to both contexts.For social network angle, we measure the assortativity coefficient of the social opportunity structure, i.e. direct network neighbors in a multilayer population-scale social network, referred to as "link assortativity".For spatial analysis, we calculate the assortativity coefficient among neighbors as defined by the borders of the administrative neighborhoods ("buurt").
To further shed light on problematic cases of higher concentration of contacts within one's own subgroup, we introduce an additional metric that applies to scalar variables only (income deciles).We demonstrate within-decile concentration of contacts in their own bracket as well as two most closely related income groups.For instance, for households in decile 2 (second poorest income group), we represent the share of their contacts in the same, 2 nd income decile, and the share of their contacts that fall into the neighboring income decile (1 st and 3 rd ).The expected value assuming uniform distribution among deciles would be that only 10% and 30% of one's contacts would fall into their own and combined with two other most similar income groups, respectively.In the next section, we visually present the assortativity analysis in mixing matrices that show to what extent particular subgroups (based on their attributes) have an opportunity to interact with people who belong to the same or different subgroups.

Results
In this section, we present the findings of our analysis, starting off by applying a segregation measurement strategy defined above to the selected node attributes that characterize households in the social network.We then focus on the most pronounced dimension of socio-economic segregation in social networks and contrast it with that of spatial neighborhoods.As a concluding step, we present assortativity analysis for the constituting layers of the social network.
Dimensions of segregation in the Netherlands.First, we investigate segregation in socio-economic status or migrant background, arguably the two most studied facets of segregation.We capture segregation along the four dimensions: household income, highest achieved level of education, ethnic group, and migrant generation.
Figure 2a presents assortativity analysis of the multilayer population-scale social network that suggests that the captured social opportunity structure is segregated with respect to all four attributes, in the range of assortativity coefficient from around 0.05 to around 0.25.As an illustration, comparable previous work that uses assortativity to gauge socio-economic segregation finds values of no higher than around 0.5 even in the most extreme cases (Leo et al. 2016;Morales et al. 2019;Bokányi, Juhász, et al. 2021;Dong et al. 2020).We find that assortativity levels are most pronounced for income (link assortativity ρ = 0.22).On the other hand, the highest achieved level of education is the least impactful trait in terms of segregating households, with an assortativity coefficient of only 0.07.This speaks to the importance of education as a means to break free from income segregation.Migrant background is again a clear segregator, but less so than income with an assortativity coefficient of 0.15 for migrant generation and 0.13 for ethnic group.
The observed averages hide the considerable variation across different regions in the country.We therefore perform this assortativity analysis on the municipality-level social networks for each attribute.Figure 2b illustrates a great heterogeneity across the country with the distribution of assortativity values across all the municipalities in the Netherlands.The distribution of income assortativity by municipalities in the Netherlands in Figure 2b (the blue bars) shows a wide range of values from 0.08 to 0.28.But also here income assortativity is consistently higher for the majority of the municipalities.Thus, despite a well developed welfare state with a strong focus on income redistribution policies, income remains the most significant cleavage in Dutch society.In what follows we therefore focus on income as the main dimension across which segregation manifests.Municipalities with the highest levels of income assortativity are located mostly in the central-western Netherlands, the so-called "Randstad" area containing the largest Dutch cities including Amsterdam, The Hague, Rotterdam, and Utrecht, and their suburbs.Similarly, we see that the border regions of Zeeland (bottom left) and of the South of Limburg (bottom right) that contain some of the larger municipalities (Maastricht, Terneuzen) also exhibit relatively high levels of income assortativity.In fact, we find that there is a log-linear relationship between income assortativity in a municipality and the size of its population (Figure 3b).This means that contrary to the smart urban planning approaches perception of cities as "champions of diversity" we find that larger municipalities exhibit higher levels of segregation.This is not due to differences in degree distributions across various municipalities as they are consistent, no matter the population size (see Supplementary Information S1).Comparing the social network and the spatial approach.We now turn to the main issue at hand: how segregation in the social network compares to segregation in spatial neighborhoods.Figure 4 shows the social opportunity structures for the population-scale social network and the administrative neighborhood approaches, as introduced and discussed in Section 3. The comparison of the two reveals striking differences.First and foremost, segregation in the social network is more than twice as much as in administrative (spatial) neighborhoods (income assortativity resp.0.217 and 0.103).Second, the distributions of connections across income ranges are markedly different.The social network perspective (Figure 4a) displays higher concentration of contacts across the diagonal of the matrix and neighboring cells.This indicates a stronger tendency of households to establish ties with other households that are (relatively) similar in terms of income.On the contrary, the administrative neighborhood matrix (Figure 4b) shows a rather uniform distribution of contacts across the entire income range.The only exception is the relatively high concentration of contacts among the poorest as well as amongst the richest deciles.Yet, even this self-orientation at the extremes is still more pronounced in the social network matrix (by 3 percentage points for the bottom-left cell for the poorest decile and 6.5 percentage points for the top-right cell for the richest one).Furthermore, administrative neighborhoods demonstrate slightly higher exposure between the extremes of income distribution.The top-income decile has a relatively higher share of contacts with the poorest-income decile which is partially driven by policy efforts aimed at evenly redistributing social housing units across the city, even in the affluent neighborhoods.
Population-scale social network patterns of segregation are clearly more pronounced compared to spatial patterns of segregation.While households may well have a rather diverse potential exposure to households of various socio-economic backgrounds in spatial neighborhoods, their actual social network opportunity structure shows consistently higher levels of segregation.In fact, people are more than twice as likely to be socially confined in their own income group in the network as compared to spatial neighborhoods.Furthermore, if we consider segregation through the spatial neighborhood approach, the finding suggests that segregation occurs predominantly at the extremes of the income distribution -most evidently for the poorest and the richest income deciles.Nonetheless, in addition to the high levels of income segregation for the poorest and the richest income groups, social networks reveal moderate levels of segregation, but across the whole income range.
The multi-layered nature of social segregation.We have established that social segregation is more pronounced if we study it through a lens of social opportunity structure.This leads to the question of how the constituent layers or social contexts of that social opportunity structure reinforce social segregation.We, therefore, calculate the contribution of each layer to the overall income assortativity pattern observed across the whole network.We distinguish between the layers discussed in Section 3, namely: family, school, work, and next-door neighbors, and find rather dissimilar segregation patterns, as Figure 5 illustrates.
Family is the least socio-economically segregated context (ρ=0.14),followed by school (ρ=0.16),work (ρ=0.20), and finally next-door neighbors, the most segregated network layer (ρ=0.34).This ranking is consistent across the whole country, irrespective of the population size of the municipality (See Supplementary Information S4).When combined, these layers constitute the segregated social opportunity structure as presented in Figure 5 (ρ=0.22).When we remove one of the layers from the entire social opportunity structure, the segregation score remains in the range from 0.19 to 0.24, which indicates that there is not one layer that predominantly drives segregation (Supplementary Information S5a).
When we measure segregation values by adding layers on top of each other in the order mentioned above, we notice that the work layer makes the largest contribution to segregation (Supplementary Information S5b).This is not unexpected given that the work layer is also the largest one in terms of average degree.
Figure 5 presents a detailed assortativity analysis for each of the four layers of the network.For each matrix, the right hand side bars show the exposure to households in one's own income decile (purple bar) for each decile as well as additional exposure to households in the two other most closely related income groups (pink bars).The yellow vertical dashed lines give a 10% and 30% expected threshold under uniform distribution of ties over the entire population.This gives revealing additional insights into the pattern of segregation for each layer.Family connections (Figure 5a, left), for instance, are the least segregated in overall terms as indicated by the assortativity coefficient of 0.14.Still, they have an apparent diagonal that indicates that across the whole income range, the highest number of family ties is with households of the exact same socio-economic standing.This is well illustrated by the right hand side of Figure 5a, which shows that across the whole income range exposure to one's own income decile is higher than the 10% baseline.When we also take into account the most similar income groups represented by pink bars (for instance, for the poorest income decile 0, the two most similar income deciles beyond its own are decile 1 and 2; for decile 3, these are 2 and 4), it is clear that the richer the household is, the more likely it is to have family members of either exact same or similar income background.The top income decile even has half of its social exposure through family connections to three most similar income groups.The school layer exhibits a slightly higher level of segregation but a highly dissimilar pattern compared to the family layer (Figure 5b -left).In the school context, all households have a relatively high exposure (on average, 15% of all contacts) to the bottom 10% poorest households.This exposure is the highest for the lowest income decile.About 30% of households connected to the lowest income households are equally low-income households.In a similar way, the top 10% richest households in the country have the highest concentration of school connections within their own income decile.Moving to the most similar income deciles, segregation becomes even more pronounced, with the top and bottom deciles having around 50% and around 45% of their social opportunities within their own income range.The granularity of our data allows us to further specify segregation patterns by school type.This relatively high level of segregation at the extremes of the income distribution (poor to poor as well as rich to rich) is reproduced in primary and secondary schools, whereas vocational training institutions create high exposure primarily among the poorest students.Unlike the other institutions, higher education exhibits high exposure to the bottom 10% poorest households across the entire income range (corresponding matrices are presented in Supplementary Information S6).This is due to the fact that in many cases children in a household move out and become independent single-person households when they join a higher education institution.And these single-person households to a large extent are without income and fall into the poorest income decile in the population.
While the workplace social opportunity structure (Figure 5c) does not exhibit an apparent diagonal of the matrix as in, for instance, the family layer, there is a general tendency of having work colleagues if not in the exact same then in a relatively similar income bracket.Moreover, this pattern is stronger for higher income range households.The upper half of the income distribution has a disproportionately high exposure to households of similar socio-economic standing, and this tendency becomes more pronounced with an increase in the household income.
The most segregated network layer is that of next-door neighbors (Figure 5d), and this is mostly due to the bottom 30% of the income range as well as the top 10% richest households with around 20-25% of next-door neighbors being in the exact same income decile as the observed households.For the middle as well as upper-middle income classes, the preference for mixing within households' income decile or similar income groups is present, yet less pronounced.This may well reflect the opportunities for the higher income deciles to choose where they want to live, as well as the inability of the lowest incomes to do so.
Figure 6 visually summarizes the results of our analysis and shows the level of segregation as measured by assortativity for each layer, for the network as a whole, and for segregation measured through administrative neighborhoods.The population-scale social network approach introduced here exhibits a segregation twice as high as could be captured across the boundaries of administrative (spatial) neighborhoods.This is apparent in each constituting network layer -family, school classmates, work colleagues and next-door neighbors.This leads to the conclusion that at a large enough scale of spatial aggregation, patterns of socio-economic segregation are minimized, while in fact, they persist in the underlying structure of social networks.

Discussion and conclusion
We asked the question whether patterns of segregation are more pronounced in social networks than the common spatial manifestations of segregation.Our research showed that levels of segregation in the population-scale social network are indeed more than twice as high as in administrative neighborhoods.
Neighborhood-level indicators of segregation hide underlying persistent patterns of segregation and inequalities.This means that both scholars and policy makers may systematically underestimate the levels of segregation in society.Of course, spatial and social aspects remain deeply intertwined.Our contribution is to point out that neighborhoods may capture some but certainly not all relevant parts of social ties that segregate our societies.As such, we position ourselves in the literature that calls scholars and policy makers to go beyond spatial patterns of segregation.We suggest a practical and low-cost approach to include social networks, while others have, for instance, convincingly argued to include temporal segregation (i.e.how segregation varies throughout a day, a week, and different seasons (Silm et al. 2014)), segregation in amenities (Moro et al. 2021), or social interactions (Calvano et al. 2022), as well as segregation measurement based multi-hop distances in the network (van der Laan et al. 2022).
Going beyond spatial patterns is necessary to understand the persistent inequalities in our societies.
Our approach also allows us to further investigate segregation as a multi-layered phenomenon that plays out across different sets of social contexts.We found that social contexts are rather dissimilar in terms of socio-economic segregation.The most diverse exposure is presented by family and school layers.
Family connections are the least segregated social context, yet we still observe a higher concentration of contacts within one's own income group across the whole income range.And this pattern is more pronounced for affluent households.The school layer, marginally more segregated as compared to family, reveals socio-economic segregation at the extremes of the income distribution.This is to a large extent a result of segregation patterns in lower education levels, primary and secondary schools.Previous research has provided evidence that for earlier stages of education trajectories, especially elementary school, due to relatively low commuting distances the school segregation pattern is to a large extent a reflection of residential segregation (Boterman 2019).On the other hand, at later stages of education students have a higher likelihood of commuting longer distances to a school of choice or even moving out of a parents' household to pursue further education.This is reflected by a high exposure to poor households in the context of higher education: children that moved out of the parents' house for the purposes of studying become independent, single-person households with low to no income, thus we observe the highest exposure to this kind of low-income students across the whole income range.
On the other hand, work colleagues as well as next-door neighbors are the social contexts that limit one's exposure to diverse socio-economic groups most.Colleague ties represent the most numerous (in terms of average degree), but also the second most segregated social context.Social mixing patterns at workplaces demonstrate limited exposure to various socio-economic groups and this is the most prominent for the richer half of the income distribution.Finally, the most segregated network layer among all is next-door neighbors, with the extremes of the income distribution -the poorest and the richest households -being the most segregated.This speaks to the low effectiveness of city planning practices aimed at evenly redistributing social housing and creating more diverse neighborhoods.Even though the composition of neighborhood as defined by administrative boundaries may seem well-mixed, on a more granular level, people living next door are very likely to belong to one's exact same socioeconomic strata.
The aim of our study was to compare two different approaches to capturing socio-economic segregation and not to establish whether segregation in the Netherlands is high or low.Our findings can of course be placed in the context of other work.The overall level of income assortativity of 0.22 we found ties in well with previous studies wherein empirical social networks exhibit similar assortativity levels.A recent study on patterns of urban mobility and social network structure based on Twitter data from the fifty largest municipalities in the USA report distributions of income assortativity in the range from 0.05 to 0.5, with the majority of metropolitan areas falling into a narrower 0.1-0.25 span (Bokányi, Juhász, et al. 2021).Other work on empirical interaction networks (inferred from purchase and Twitter data) at a country level yield up to around 0.5 income assortativity (Dong et al. 2020).The levels of segregation we found seem very realistic and a good foundation for further comparative work, both across countries and over time.Of course such comparative studies need to carefully consider the sensitivity of the assortativity measure to the network structure as well as the underlying inequality in income distribution.
The network that we investigated here is a social opportunity structure.We can therefore assume that social networks where tie formation or activation is based on user choice are even more segregated.
The interaction between the social opportunitiy network and activated ties is therefore a fruitful and important avenue for additional research.Of particular interest here is enriching population-scale social network data from registers with population wide surveys that include social network generating questions.
Our findings also speak to the perception of cities as "champions of diversity".We find that larger municipalities tend to exhibit higher levels of socio-economic segregation.In larger cities, despite a diverse pool of social contact opportunities, households gravitate towards their own income range in contexts such as family, school, workplace, and next-door neighbors.An intuitive reason for this finding is that in smaller communities there is a relatively narrower pool of choices in terms of affiliations and individuals are hence limited in choosing the kind of social ties they desire (low choice homophily).This moderates segregation in smaller communities.On the contrary, larger cities, despite a more diverse demographic distribution, have a greater potential for choice homophily to materialize.A similar pattern of results was obtained in a recent study based on social interaction network inferred from mobile communication data (Nilforoshan et al. 2022).They have demonstrated that in terms of potential for social interactions larger metropolitan areas are more segregated as a result of a more diverse socioeconomic differentiation of amenities.Even though our study relies upon a vastly different approach to social network inference, these results complement each other and both challenge a dominant view of cities as hubs for diverse socio-economic mixing.
For the case of the Netherlands we established that segregation is most pronounced along the dimension of household income.Nevertheless, in the public and policy discourse on segregation it often fades against the background of "ethnic" segregation, i.e. segregation of native vs multiple migrant categories of population (Boterman et al. 2021).Interestingly, in the Dutch context, income-based interventions are typically used as a proxy for tackling segregation associated with immigration.This is because ethnic segregation raises various ethical and political concerns, and also because neighborhood income is associated with its ethnic composition (Gent et al. 2016).However, we found that this correlation between ethnic background and income does not hold at the household level (as reported in Supplementary Information S2).This leads to the somewhat surprising situation that while the policy interventions to reduce "ethnic"segregation are ill-informed and therefore misdirected, these same interventions do in fact by accident impact the more important dimension of socio-economic segregation.
Our approach may inspire new avenues for effective policies to combat segregation and related inequalities.Such avenues should look beyond the assortativity values that we report for the network and its constituting layers and at least take into account the number of connections in each layer as well.For instance, effective policies that decrease workplace segregation may bring more benefits, since it affects more connections than decreasing for instance next-door neighbor segregation, even if assortativity values for the latter are higher.
The approach used by us also comes with a number of limitations.First of all, although the data coverage and quality are unprecedented, they are not flawless.The family layer for instance is impacted by missing family registers before 1995 and a large share of missing family ties for first generation migrants.The work layer currently suffers from a set boundary on firm size that applies to large companies.And the school layer cannot distinguish between school classes in the same year.These are important caveats to take into account when interpreting the results, but also serve as important input for improving such population-scale social network datasets.Second, registers only capture "formal" ties and miss out more "informal" ties such as friendship.However, previous work has firmly established that friendships are mostly realized from these contextual relationships to institutions.Nevertheless, we expect that informal ties such as friendships exhibit more choice homophily and add on to the levels of segregation.Third, the population-scale social network is strictly defined by the borders of the country, thus it omits ties that span across countries.While we checked for consistency of degree distributions between different parts of the country, this is highly relevant especially for border regions, where people could commute to a neighboring country for work or might have a family there, or for those cases when Dutch residents migrated elsewhere.
Our results suggest that social networks play a fundamental role in assessing socio-economic segregation and uncovering its multi-layered nature.Focusing on population-scale social networks overall as well as its constituting layers reveals remarkably consistent higher levels of segregation as compared to what could be captured with existing approaches.This may be considered a promising validation of incorporating social-network aware measures of segregation into existing social cohesion evaluation frameworks.Furthermore, such extension is becoming more achievable in the light of increasing availability of population registers worldwide.

Figure 1 :
Figure 1: Schematic representation of the multilayer population-scale social network: nodes are registered residents of the Netherlands, links are ties in one or more social contexts: family, next-door neighbors, household members, school classmates, and colleagues.

Figure 2 :
Figure 2: Assortativity in a population-scale network of households.(a) Populationscale link assortativity values for various attributes.(b) Distribution of link assortativity across municipalities in the Netherlands: education level assortativity in red, ethnic group in orange, migrant generation in green, and income assortativity distribution is presented in blue.

Figure
Figure3ashows how income assortativity values are geographically distributed across the country.Municipalities with the highest levels of income assortativity are located mostly in the central-western Netherlands, the so-called "Randstad" area containing the largest Dutch cities including Amsterdam, The Hague, Rotterdam, and Utrecht, and their suburbs.Similarly, we see that the border regions of Zeeland (bottom left) and of the South of Limburg (bottom right) that contain some of the larger municipalities (Maastricht, Terneuzen) also exhibit relatively high levels of income assortativity.In fact,

Figure 3 :
Figure 3: Geographical decomposition of income assortativity.(a) Income assortativity per municipality.(b) Relationship between income assortativity and size of the municipality (natural logarithmic scale)

Figure 4 :
Figure 4: Mixing matrices for population-scale social network of households (a) and administrative neighborhoods (b).Colors denote the share of links the two income deciles share.

Figure 5 :
Figure 5: Assortativity for four layers: (a) family, (b) school, (c) work, (d) next-door neighbors.The left hand side of each panel contains the mixing matrices, similar to Figure 4.The right hand side of each panel presents the within-decile concentration of contacts within similar income groups: dark purple bars represent the concentration of contacts within the exact same income decile, pink bars show concentration of contacts within the two other most similar income deciles.Yellow dashed vertical lines at 0.1 and 0.3.

Figure S2 :
Figure S2: Results of the significance tests for correlation between household attributes: p-values for ANOVA to test the correlation between a numeric variable (income decile) and the remaining categorical variables (migrant generation, ethnic group, education level) as well as p-values of Chi-square test to examine the strength of the relationship between categorical variables.The confidence interval is 99%.

Figure S3 :
Figure S3: Household degree distributions.(a) Overall household degree distribution.(b) Household degree distribution for individual layers of the network: family, next-door neighbors, school, and work.(c) Overall household degree distribution for two groups of municipalities: border regions (green dots) and the rest of the country (orange dots).(d) Distributions of the household degree in the neighbors layer for the largest (orange dots) and the smallest municipalities in the country (green dots).

Figure S4 :
Figure S4: Scatterplot of income assortativity per layer in relation to the population size of a municipality.

Figure S5 :
Figure S5: Contribution of each layer to the overall level of income assortativity: (a) income assortativity levels for network composed by alternating exclusion of individual layers (w/o (without) denotes the layer which was excluded from the network), (b) income assortativity levels for network composed by consecutive inclusion of individual layers.

Table 1 :
Nodes Edges <k> k min k max Network properties for various network layer combinations.Columns show the number of non-isolated nodes, edges, average degree <k>, minimum degree kmin and maximum degree kmax.

Table 2 ,
and their distributions are presented in Supplementary Information S1.Correlations between household-level attributes are examined in Supplementary Information S2.