1 Introduction

Cultural distance is frequently cited as a potential source of problems: it complicates the management of multinational enterprises, reduces commercial exchanges, arouses suspicion and sometimes hostility, and slows down the integration of foreigners in the host country, etc. There are also potential benefits to it: for instance, it attracts curious tourists, helps to address complex problems from different angles, and contrasts the tendency of the world towards standardisation, under the pressure of globalisation. But whilst these benefits sometimes occur, they seem to materialize only rarely.

This issue is of great cultural and social significance, but research into it is still yet to progress because the abstract concepts of culture and cultural distance are difficult to define, impossible to measure directly (they are latent constructs), and hard to measure even indirectly, because the corresponding manifest variables are typically questionable and partial.

In the first part of this paper, we briefly discuss the notion of cultural distance, highlighting two separate issues. The first regards the definition of culture and its empirical indicators. Despite the relevance of the topic, we do not devote much space to it because our data do not permit us any choice: we must stick to the few variables that we have (Sect. 3). The second issue relates to the type of distance to be measured, distinguishing between what we will call here the collective and the individual approach. The former, which is a far more widely accepted concept, relies on the idea of a ‘representative’ (e.g., average, or modal) value, such as national culture. Groups have their ‘characteristic’ traits, and distances are built. The latter approach, the one that we will adopt here, relates instead to the statistical notion of variability, and can be broadly described as an attempt to estimate how likely it is for individuals of group A to find someone ‘like them’ in group B.

In Sect. 3 we present our data. By merging two similar but independent Italian surveys of 2011 and 2013, we try to assess the cultural distance between Italians and foreigners in Italy. Unfortunately, overlap between the two surveys is rather limited; this forced us to use just a few survey questions, focused on what may be called ‘culturally-driven use of time (with a focus on selected items)’. For brevity’s sake, we will sometimes refer to this as (revealed, or actual manifestations of) culture. While we admit that this may be a misnomer, and that in all cases the variables that we have may at best reveal only part of the general picture (cultural distance between groups), we believe that this may nonetheless prove of interest for three main reasons.

The first is that we do not know fully what people’s internal beliefs are, but we may observe external social behaviour, which serves as a reflection of culture (and other variables, to be sure). This is precisely what our variables measure, albeit only partially and imperfectly.

The second reason is that the issue (potential problems caused by the presence of ‘aliens’ in the country) is deeply felt in Italy and is a cause of considerable political controversy. Despite this, to the best of our knowledge, no scientific evidence has ever been produced on the subject. Is the cultural distance between natives and immigrants large or small? Is it increasing or decreasing over time? Is it the same for all groups of foreigners?

The third reason lies in the DBS (Distance Between Strata) method, introduced just a few years ago (De Santis, Maltagliati, and Salvini, 2016). This presentation improves on the preceding ones (see also Mucciardi and De Santis, 2017; De Santis and Mucciardi, 2017) in a few respects

  1. (1)

    we illustrate its overall philosophy more clearly,

  2. (2)

    we show that treating ordinal scales as interval scales does not worsen, and arguably improves, the performance of the method (Sect. 4), and, perhaps most importantly,

  3. (3)

    we argue that, within limits, two of the critical choices of this method (the clustering criterion and the number of clusters) are less problematic than it may seem: in a large majority of the applications that we subjected to our sensitivity analysis (Sect. 7), results do not essentially change.

We produced a number of different results (Sects. 5 and 6). The first, and possibly the most significant, is that heterogeneity characterises all groups, including Italians (by macro-region of residence), who show a clear North–South gradient in their culturally-related use of time. The presence of heterogeneity suggests, on the one hand, that stereotypes have scant scientific basis and, on the other, that measuring cultural distances on ‘average’ (or ‘representative’) profiles is questionable.

Among people of foreign origin living in Italy, we can distinguish those who are no longer foreigners, because they acquired Italian nationality at some point, and those who remain foreigners by country (or world region) of origin. This is, we believe, of the greatest interest, both in itself (which foreigners are closer to Italians? Can we discern a pattern in the cultural distance between groups of foreigners?) and as a check of the method: we cannot prove that the method works, but we can show that it leads to very reasonable results.

Indeed, assessing the validity of a (relatively) new measure is not an easy task. One way to do it is to see how closely its results conform to assumptions, or at least very strong expectations. We designed two checks based on our assumptions: the method needs to pass these tests to prove worthy of further consideration. These are:

Check (A) Italians (by macro-region of residence, of which we have five) should be culturally close to each other and more or less in geographical order, from North to South;

Check (B) people of foreign origin but now with an Italian citizenship should be culturally closer to Italians (with Italian origin – simply Italians from now on) than foreigners are;

As our method passes both checks (Section 5) we deem it reliable, and we use it to test the following hypotheses:

Hy. (#1) foreigners should form relatively close cultural clusters when they come from the same world region (e.g., Latin America or Africa), which implies that

Hy. (#1.a) foreigners coming from EU countries should be culturally closer to Italians than foreigners coming from other parts of the world;

Hy. (#2) the cultural distance should be smaller for people with some common background, be it in geographical or socio-economic development terms;

Hy. (#3) based on a recent survey (Istat, 2020, p. 22), the Chinese, and less so also the Filipinos and the Peruvians living in Italy, should be the national groups most culturally distant from Italians, given their linguistic difficulties (in the first two cases) and their tendency to form homogeneous but also relatively close communities (all of them).

Surprisingly, though based on clustering, the proposed DBS method does not require that clusters be understood, interpreted, or labelled. However, it is instructive to do so and to understand why certain national groups are close to or far from others; this is what Sect. 6 does.

Section 7 reports the main results of our sensitivity analysis. We attempted several alternative paths (e.g., different clustering procedures and different numbers of clusters) and we compared the outcomes. As we show, in the large majority of cases the results are highly consistent, which reinforces our claim that they are robust and not model-induced.

Substantive and methodological conclusions are drawn in the eighth and final Section.

2 The Notion and Use of Cultural Distance

Culture may be defined in several ways, whether it be according to norms, values, or beliefs. In all cases, two broad approaches can be identified, which we will label ‘collective’ and ‘individual’, respectively. The former is more popular and intuitive. In a famous interview, for instance, the social psychologist Hofstede defines culture as the ‘collective programming of the mind that distinguishes one group or category of people from another’ (http://www.geerthofstede.nl/). Shortly after, in the same interview, Hofstede simplifies his own definition to ‘the unwritten rules of the social game’. This definition implies group homogeneity, which leads to the definition of cultural difference as ‘a difference in human values that are rooted in national culture, which affect individuals’ attitude and behaviour’ (Poh Chuin, 2019).

These two sources are not chosen at random: they derive from the business and administration world, where the main objective is to translate the overarching idea of culture into something measurable and useful, such as for managing multinational enterprises (Hofstede, 1980) and understanding what guides international commerce (Poh Chuin, 2019), tourism (Ng et al., 2007; Petit & Seetaram, 2019), cultural consumption (Schwartz, 2013), including the success of TV programmes (Berg, 2017; Ksiazek & Webster, 2008), the tendency of customers to complain (Luria et al., 2016), the entrepreneurial success of women (Naidu & Chand, 2017), and human development (Gamlath, 2017). These ideas are so deeply rooted in this context that Guiso et al. (2006) assert that ‘the ultimate validity of the notion of culture resides in its ability to enhance our understanding of economic behaviour’.

However, cases when culture is evoked are also frequent outside the economic sphere. For instance:

  • terrorism in OECD countries tends to increase with immigration, but not among immigrants from culturally close countries (Böhmelt & Bove, 2020),

  • the necessity and difficulty of integrating (sometimes just assimilating) foreigners, whose cultural difference is perceived as a menace, has inspired and continues to inspire the immigration policy of several countries, such as Switzerland (Piñeiro & Haller, 2012),

  • potential migrants are culturally different from the rest of the population in their origin countries, they select their destination country on the basis of cultural traits (Docquier et al., 2020), and they contribute to the cultural modification not only of the destination but also of the origin country thanks to the new ideas that they come into contact with and bring back home (Fargues, 2011),

  • democratic stability or breakdown may depend on the depth of internal cultural divisions (McDoom & Gisselquist, 2020),

  • national cultures that attribute greater importance to economic success tend to breed feelings of inferiority in the poor; this, while boosting economic activity, may prove socially disruptive (Steckermeier & Delhey, 2019).

In all cases, the attempt to define and measure a ‘typical’ cultural profile (for instance, of a country) and to use this profile to ‘explain’ a dependent variable is rarely successful (Beugelsdijk et al., 2015; Shenkar, 2001), and, when it is, suspicions arise regarding the choice of the manifest variables used to define the latent (non-observable) dimension of culture. If these manifest variables are chosen with the dependent variable already in mind, as seems inevitable, results may be biased. While several indicators may have a weak theoretical basis (Ortega-Villa & Ley-Garcia, 2018), and some, including our own, are clearly driven more by data availability than by a-priori reasoning (e.g., Vieira et al., 2020), it is also reasonable to suggest that finding suitable empirical data is a particularly serious obstacle in this field.

Part of the problem may lie in the often implicit assumption that ‘there are larger cultural differences between countries than within countries’ (Ng et al., 2007). However, this assumption may be questioned, which leads to the second line of research on culture, the ‘individual’ approach, which is the style of approach that we follow here. We follow Beugelsdijk et al.’s (2015) suggestion to build variance-based measures of cultural distance, based on an idea that can be traced back to Byrne and Nelson’s (1965) observation that individuals are attracted by those who resemble them.

The point can be better illustrated with an example. Imagine three countries ‒ A, B (reference), and C ‒ and imagine that, on some reliable scale of ‘national culture’, these three countries score, respectively, 90, 100, and 120. Imagine further that a multinational enterprise, based in B, wants to open a branch abroad, in country A or C, based on cultural distance. At first sight, country A is a better choice, because the average cultural distance is smaller, 10 instead of 20. However, imagine that the citizens of country A (potential employees and customers of our enterprise) are extremely homogeneous, so that each of them scores approximately 90, whereas those of country C are highly heterogeneous, scoring, say, between 80 and 160. In this case, there will be a reasonable share of them scoring approximately 100, thus proving culturally close to our enterprise, coming from B. The example could be made more realistic, by allowing for some variability within our (B) enterprise, because not all of its managers and staff will score exactly 100. However, the point is that, in the search for somebody who resembles us, it may be unwise to rely only on group (country) means; it is better to consider also (or even primarily) internal variability. Beugelsdijk et al. (2015) do so using variance. However, when individual data are available, it seems preferable to measure cultural distance directly at the individual level, as we do here (Sect. 4).

To summarise, two separate issues are worth considering. The first refers to the specific cultural aspects to include in the analysis. Scholars who are unable to design their own survey must work with the empirical data they have at hand, frequently less than optimal. Our case is no exception, as discussed in Sect. 3.

The second issue is whether it is preferable to use the average-based (‘collective’) approach or the variability-based (‘individual’) approach. With the former, researchers look for the typical, or representative, individual of a group. With the latter, which is our preference, researchers look at the composition (or ‘mix’) of individuals within groups, and it is these internal distributions (not their averages, or any other representative value) that make two groups close, or far away, from each other.

3 Data: Foreigners, Former Foreigners, and Italians in Italy (2011–13)

Our data come from two Istat surveys (Istat is the National Institute of Statistics in Italy). The first is the multipurpose survey on the Condition and Social Integration of Foreign Citizens in Italy – CSIFCI (‘Condizione e Integrazione Sociale dei Cittadini Stranieri’, 2011–12). At that time, foreigners in Italy accounted for approximately 4 million (6.8%) of all residents, and comparatively little was known about them. They were in principle included in all surveys, but they always ended up by being too few to be analysed separately, especially by origin.Footnote 1 Adapting to their case the rationale (and the questionnaire) of the multipurpose surveys that Istat routinely implemented in those years, 25,326 of them were interviewed on this occasion.

To compare foreigners in Italy with Italians, we used the ‘standard’ 2013 multipurpose survey (ADL-Aspects of Daily Life, or ‘Aspetti della vita quotidiana’) with 20,275 respondents.

Unfortunately, to merge the two surveys we were forced to use only the questions that were (virtually) identical in the two cases, which turned out to be relatively few (Table 1). Based on the 11 manifest variables that we could retain for the analysis, our (implicit) definition of ‘culture’ focuses on the way respondents spend part of their (free) time, alone or with others, and on related activities: surfing the internet, or reading books, newspapers, and magazines; attending cultural events (cinema, theatre, music); talking about politics. Admittedly, it is not only their cultural background that matters here; the outcome depends also on several other variables that we cannot keep under control, such as their resources, and the ‘cultural supply’ of their environment.

Table 1 Questions used for clustering taken from two Istat surveys, codes, and distribution of respondents (Italy, 2011–13)

All of these factors are also linked to the respondents’ age, which is known. As our method is not well suited to include covariates (see Sect. 4), we needed to limit the heterogeneity of the group retaining only respondents in their adult (18–64) years. As we wanted to deal with sufficiently large national groups, we further limited our sample to nations (or groups of supposedly homogenous nationalities) with at least 100 members, and we discarded the rest.Footnote 2

These restrictions explain why, of the 20,275 respondents to the ADL-2013 Survey (Aspects of Daily Living), only 11,481 appear in Table 1: these we will call ‘Italians’ in the rest of this paper. Conversely, of the original 25,326 respondents of the 2011–12 ‘Foreigners’ Survey, only 15,007 survive in our analyses. Among these, there are 448 persons of foreign origin but with (later acquired) Italian nationality, and 14,559 foreigners. Globally, we have 26,488 observations (Table 2).

Table 2 Respondents in our sample, by gender and national group, Italy 2011–13

4 The DBS Method (Distance Between Strata)

As the DBS method that we will apply here has been extensively illustrated elsewhere (De Santis, Maltagliati, and Salvini, 2016; De Santis and Mucciardi, 2017; Mucciardi and De Santis, 2017), what we are offering in these pages is an alternative, graphical illustration.

Imagine that we observe 13 units (e.g., individuals) belonging to three different groups (from now on, ‘strata’): Triangles, Squares, and Circles. These strata can be all types of collective units: nations (e.g., Thailand, South Korea, and China–so that initials match), ethnic groups, soccer teams, etc. Imagine further that we classify these individuals on two manifest variables and that, given the outcome (Fig. 1), we form three clusters. Notice that our methodological contribution is not in the clustering method, but in what comes next: assuming that these clusters are meaningful, we use them to characterise the strata to which the clustered units belong. We do this by looking at how the strata-specific units distribute (proportionally) among clusters.

Fig. 1
figure 1

Source: illustrative example. Strata could be, for instance, nations (e.g., T = Thailand; S = South Korea; C = China)

Imaginary individuals belonging to three strata (Triangles, Squares, and Circles), classified on the basis of two manifest variables and grouped in three clusters

The easiest way to conceptualise this is to imagine the strata as ‘planets’ in an N-dimensional space, where N + 1 is the number of clusters. The coordinates of these ‘planets’ (strata) are the observed proportions. For instance, ‘Circle’ has a third of its members in each cluster, while ‘Triangle’ has 50% of its members in cluster A, 25% in cluster B, and another 25% in cluster C. This leads to Fig. 2.

Fig. 2
figure 2

Source: see Fig. 1

Imaginary strata (Triangle, Square, and Circle) classified on the basis of the proportional distribution of their members among the previously-formed clusters

In this case, a bi-dimensional space suffices to represent our strata despite the apparent tri-dimensionality of their coordinates (N + 1 = 3), because we work with proportions, the sum of which is one, which reduces the degrees of freedom (to two, in this example). When clusters are more than three, an exact bi-dimensional representation of the strata (planets in a hyper-space with N + 1 dimensions) becomes impossible. However, acceptable approximations are generally offered by ad-hoc dimension-reducing statistical techniques, such as factor analysis or, as in our case, MDS – multidimensional scaling. This bi-dimensional plot, incidentally, is not a necessary ingredient of the DBS method, but it helps to understand its results.

The final step consists of calculating how far these ‘planets’ (our strata) are from each other (in Fig. 2), which can be done in the simplest possible way, i.e. calculating Euclidean distances, the extension of Pythagoras’s formula to an N-dimensional space.

Note how different this approach is from using strata’s average values. In Fig. 1, for instance, one could also easily calculate the average of each stratum on both dimensions (manifest variables) and use these three barycentres as strata-representative values. In so doing, however, internal variability gets lost: one would obtain the same results if the units that belong to the same stratum had different values with the same average. As mentioned, Beugelsdijk et al. (2015) try to correct for this loss of information by allowing for internal variance, thus creating a sort of ‘acceptance zone’ around these barycentres: units that lie in that zone are reasonably similar to the nationally representative values. We submit that the proposed DBS (Distance Between Strata) method, while moving along the same lines, performs better, because it preserves individual values, and lets the strata space (of Fig. 2) reflect the exact distribution of respondents among clusters.

4.1 Cautions

Caution is needed in interpreting results. First, the maximum possible distance between strata is √2, regardless of the number of clusters (N + 1), and the maximum empirical distance that has ever been found until now is even smaller, approximately 1.2 (Mucciardi and De Santis, 2017, Table 3). Even more importantly, results are relative, not only to the manifest variables, but also to the terms of comparison. In Figs. 1 and 2, for instance, if we added a new stratum (Diamonds), and if its members were culturally distant from all those previously observed, either the number of clusters would change, or, with the same number of clusters, a different distribution of respondents among clusters would be observed. In all cases, the coordinates of the previous ‘planets’ (or strata: Triangles, Squares, and Circles) would change and so would the distances between them. In this example, remembering that the maximum possible distance is √2, these distances would shrink, to ‘accommodate’ the heterogeneous newcomer in the new strata space. In other words, Triangles, Squares, and Circles would now appear closer to one another, and all of them far from Diamonds. This happens because the method builds a strata space (of the type shown in Fig. 2) that adapts automatically to what is analysed. For instance, Poland may appear an outlier in a map focused on the distance between France and the UK, but very close to France and the UK in a map that also includes China, and all four of them very close to one another if the chart also includes the Moon.

Table 3 Respondents by national group (stratum) and five clusters (Cl, A to E), proportions and total. Cultural distance of each stratum from Italy (2011–13)

A few more observations are in order. The first is that manifest variables can be considered all together (as we will do here) or subdivided into assumedly homogenous categories, ideally referring to the same cultural sphere, e.g., ethics, family, and religion. In the latter case, cultural distances will be evaluated by ‘area’, or ‘sphere’.

Secondly, the members of the strata must be sufficiently numerous for their distribution among clusters to be robust to random variations. Unfortunately, as the number of clusters is a priori unknown (see below), it is not easy to determine the minimum number of members of each stratum that is required for the method to work properly. In this paper, we set this minimum to 100 and we worked (primarily) with five clusters; this ensures we remain in the safety zone and, besides, our sensitivity tests (when we progressively increased the number of clusters up to 70–Sect. 7) confirm that our results are extremely robust.

The third observation is that, beyond being members of a stratum, respondents have other characteristics (e.g., sex, age, and education) that will affect their answers (manifest variables), interfering with the connection of interest (stratum-culture). In ‘normal’ models, one would introduce these characteristics as covariates, to keep them ‘under control’. However, this case is different, because individuals characterise their strata only indirectly, by forming clusters, and the inclusion of individual covariates becomes impossible at this stage, while the introduction of a summary measure (e.g., the mean age of the members of the cluster) would contradict the individual nature of the approach (and lead to insignificant results – not shown here).

Combining covariates (e.g., nationality and sex) and creating more homogeneous strata is a possibility, but their number increases multiplicatively, and soon becomes incompatible with the requirement mentioned above, that a reasonable balance be maintained between the number of observations per (homogeneous) stratum and the number of clusters. Another possibility is to selects units (respondents, in our case) that are similar in structural terms, as we did here (respondents aged 18–64 years).

Fourthly, in this paper, we treated ordinal, Likert-type scales as interval scales, using the values indicated in Table 1. We did this for several reasons. Firstly, we did it because, as discussed by De Santis, Maltagliati, and Salvini (2016), to treat these answers as nominal requires long and painstaking transformations (Jaccard’s index of similarity), which exceed the computing capacity of most calculators and require ad-hoc solutions. Secondly, we did it because this choice is likely to be close to the average respondent’s perceptions, i.e. non-distortive, and is in all cases frequently adopted in cases like this (e.g., Carifio & Perla, 2007; Spitzer, Greulich, & Hammer, 2018; Wu & Leung, 2017). Finally, and perhaps most importantly, because none of these variables is analysed directly. In combination with others, they are used to form clusters and (almost) any transformation that preserves the ranking of the answers leads to similar results (Harwell & Gatti, 2001; Hennig et al., 2015; Walesiak & Dudek, 2010).

The fifth qualification is that clustering involves other arbitrary choices, among which are the questions of which clustering method to adopt and how many clusters to form. Certain methods, among them EM (De Santis and Mucciardi, 2017 and Mucciardi and De Santis, 2017), have the advantage of an incorporated stopping rule (which depends itself on an arbitrary parameter predefined by the researcher), but they are not available in all statistical packages. Here, after considering several options, we eventually opted for the Ward method (minimal internal variance within clusters), because it tends to create clusters of comparable dimensions, a non-trivial advantage for the DBS method, which looks precisely at how the members of a certain stratum distribute among clusters. As the ultimate purpose of the procedure is to construct a matrix of distances between strata (in this case, between national groups, of Italians and foreigners living in Italy) we checked that different clustering methods (and different numbers of clusters within each of them) resulted in comparable distances (Sect. 7).

5 On the Heterogeneity of Foreigners (and Italians) in Italy

Several results emerge from our analysis. For the sake of brevity, in this section we will present only those that we obtained from what we believe is the best clusterisation method: Ward, with five clusters. Robustness checks are discussed in Sect. 7.

The single most important result is displayed in Table 3, where we present the distribution of our observations among the five clusters that we formed (A–E), and where strata (national groups) appear in alphabetical order of their label. While Table 3 contains all the information, Fig. 3 is arguably easier to interpret: using MDS (multidimensional scaling), we projected the distances of Table 3 on a bi-dimensional plan, preserving most of the initial information (the correlation between the original and the transformed distances is 0.992).

Fig. 3
figure 3

Bi-dimensional representation (with MDS – multidimensional scaling) of the 861 distances between the national groups listed in Table 3. Italy 2011–13. Note: with 42 national groups, there are (42·41/2 =) 861 distances. Sources and labels: see Table 2

Let us first check whether the method passes the two tests that we had set from the beginning:

  1. A.

    Are Italians culturally closer to each other than to all other nationalities? Are the five macro-regions of residence more or less in geographical order, from North to South?

The answer to both questions is yes. Italian macro-regions form a separated and relatively homogeneous subgroup (top left of Figure 3). Within that, the regions of the North-Centre form a sub-cluster, with strong internal homogeneity, clearly separated from the other, composed of South and Islands (Table 4). However, despite this marked cultural divide, regardless of the region of residence, Italians are closer to other Italians than they are to any other national group.

  1. B.

    Are people of foreign origin, but now with an Italian citizenship, culturally closer to Italians (with Italian origin) than foreigners are?

Table 4 Average cultural distances within and between Italian macro-regions (2011–13)

Yes. The distance between Italians and these ‘late’ Italians is 0.152 (Table 3). This number means nothing in itself. However, compared to the others of Table 3, it tells us that this distance is larger than, although comparable to, that between Italians (Table 4), but smaller than that between Italians and any other foreign group.

We can move on to determining whether our expectations are satisfied.

  1. 1.

    Do foreigners form relatively close cultural clusters when they come from the same world region (e.g., Latin America or Africa)? Are immigrants from EU countries culturally closer to Italians than immigrants coming from elsewhere?

The answer is yes in both cases. Let us start with the second question. The cultural distances between Italy and the (few) available European nationalities range between .226 (Germany) and .299 (other northern European countries), but the distance between Italy and other (non-European) nationalities is larger, .309 (Table 5). Closer inspection of Figure 3 and Table 5 reveals that the Centre-North of Italy is closer to the rest of Europe than Southern Italy, which is instead closer to immigrants from other origins (i.e., from less developed countries). This, too, is consistent with the available information on the degree of proximity of the central and northern part of Italy with the rest of Western Europe, not only in geographical but also in economic and commercial terms.

Table 5 Cultural distance between Italy, North-Western and Southern Italy, and selected world regions (2011–13)

As for the other countries, a graphical answer to the question of their cultural distance can be found in Figure 3 where we encircled nationalities with a common origin. Of course, there are exceptions and overlaps: for instance, Egypt is far from other Northern African countries, while Eastern Europe (in particular Romania and Bulgaria) appears to be very close to Africa. Overall, however, expectations are satisfied: geographically homogeneous countries tend to be characterised by smaller cultural distances (see also Table 6).

  1. 2.

    Is the cultural distance that we obtain positively correlated with the geographical and ‘development’ distance (the latter measured with, for instance, the HDI – Human Development Index)?

Table 6 Regression of cultural distance on geographical and development distance, selected countries (and nationalities of foreigners in Italy 2011–13)

We expected a similar background (in terms of geographic proximity, or socio-economic development of the country of origin, or both) to translate into a similar culturally-driven use of time (as measured by our empirical indicators). Table 6 shows that this expectation is only partly satisfied. For the 29 single countries listed in Table 2, we measured the corresponding distances in terms of ‘culture’ (our dependent variable), geography (distance between capital cities, in kilometres), and development (absolute value of the difference of the HDIs for each pair of countries). We did this twice: both for the entire set of countries, with all the possible (29·28/2=406) distances, and for Italy only (28 distances between Italy and the other countries on the list). In both cases, the results are scarcely conclusive: the independent variables have the expected sign, but they are generally not significant (exception: ‘development distance’ for the whole set of countries), and the goodness of fit is very low.

  1. 3.

    Are the Chinese, and less so also the Filipinos and the Peruvians, particularly far from the Italians, as another Istat survey suggests (Istat, 2020)?

Note, first, that the 2014–15 Istat survey (published in 2020) that we are using here as a term of comparison is focused on a different target of respondents: the second generation of foreigners (born in Italy by foreign parents, or who immigrated at young ages), attending junior and senior secondary school (11–18 years). The connection between this group of selected youngsters and the group that we analyse in the rest of the paper (foreigners aged 18–64 years) is rather loose, and relies on two assumptions: that certain cultural traits are preserved and passed on to the next generation and that they are reflected in the 11 empirical variables that we use. With these cautions in mind, our results seem to be in line with expectations (Figure 4): the greater the cultural distance of the group, the smaller the proportion of young respondents from that group who declare that ‘they feel Italian’.Footnote 3

Fig. 4
figure 4

Source: As for the cultural distance, Istat (2016a,b); as for the proportion of students (of secondary schools, aged 11–18 years) declaring they feel Italian (out of a representative sample of over 42 thousand respondents), Istat (2020, p. 23)

Cultural distance from Italians (2011–12) and proportion of students (2nd generation immigrants) declaring that they feel Italian (2014–15)Note: the nationalities displayed in this figure are those reported in Istat (2020), Table 2.1, p. 23

6 Interpreting Clusters

The DBS method does not require scholars to interpret and label the resulting clusters; indeed, we reached our conclusions on the relative cultural distance of the various groups of foreigners from one another, and from Italians, without commenting on our clusters.

However, once a sufficiently reliable result has been reached, it may be worthwhile to stop and consider what the members of a given cluster have in common and why certain national groups differ from, or are instead close to, others.

Let us start with Fig. 5, where we compare four groups, two of which are very similar (Italians, and on top of that, both from the southern part of the country), while the others are immigrants, from China and, separately, from predominantly Muslims countries.Footnote 4 The two Italian subgroups are very similar: not because they are internally homogeneous, but because the five typologies that we identify (i.e. people belonging to clusters A–E) have very similar proportions in the two areas. Conversely, the Chinese appear to be culturally distant from this standard because the distribution of their members among the five clusters is markedly different. For instance, more than 30% of (southern) Italians are in cluster A, where only less than 5% of the Chinese can be found. Approximately 12% of the Italian respondents are in cluster D, which hosts only 1% of the Chinese. Conversely, 45% of the Chinese are in cluster E, which includes only about 17% of the Italians.

Fig. 5
figure 5

Unit distribution among five clusters of four selected strata: Italy-South and Italy-Islands, China, and immigrants from Muslims countries (‘Muslims’). Ward method. Note: CL = Cluster. ‘Muslims’ are immigrants from Algeria, Egypt, Kosovo, Middle East, Morocco, Pakistan, Senegal, and Tunisia

Immigrants from Muslim countries lie somewhere in between: if we considered only the members of this group belonging to cluster A (12%), we should conclude that they are closer to the Chinese. Instead, if we focused only on the proportion belonging to cluster E (21%), we should come to the opposite conclusion. In fact, their correct allocation is better assessed by looking at the entire distribution, and it transpires that immigrants from Muslim countries are approximately halfway between southern Italians and the Chinese.

What characterises the members of these clusters is illustrated in Fig. 6 and Table 7.

Fig. 6
figure 6

Illustrative characterisation of the main characteristics of the five clusters (A to E). Ward method. Note: the darker the circle, the more that activity is practised by the members of that cluster

Table 7 Clustering variables, distribution of units by five (Ward) clusters, and relative proportions

Cluster A, for instance, is composed of respondents who frequently use personal computers, access the internet, go out (theatre, cinema, music, and sport events), read books and newspapers, and like to talk about politics. More than 30% of Italians are like this. Cluster C is just the opposite: the members of this group do very little of any of these activities–and this describes more than 30% of immigrants from Muslim countries. Cluster E groups respondents who, while active with computers and on the internet (even more than those of cluster A, actually), limit their outings to dancing, and rarely, if ever, talk about politics. 45% of the Chinese are more or less like this. The other clusters can be characterised in a similar way, by looking both at Fig. 6 and Table 7.

7 Sensitivity Analysis

This section is devoted to a sensitivity analysis, which shows that several plausible alternative choices would not have meaningfully affected our results.

The first question that we will explore is on the best number of clusters. For each clustering method (e.g., Ward), we know the proportion of the total variance explained with N-1 clusters, and its improvement in passing from N-1 to N clusters. Figure 7 shows that this improvement declines very rapidly: it is slightly higher than 3% in the passage from four to five clusters (when about 81% of the total variance is explained), but it drops to less than 1% after that. This justifies our choice of stopping at five clusters.

Fig. 7
figure 7

Source: Istat (2016a, 2016b)

Share of variance explained, and increases in the explained share of variance, by number of clusters (Ward method) How to read the figure. With the Ward method, our preferred choice, if one uses three clusters, approximately 74% of the original variance is explained (thick line, left scale)). As this share is approximately 67% with two clusters, the improvement (or marginal progress) is of approximately 7% (dotted line, right scale).

Besides, the results do not change in any relevant way as the number of clusters varies. With 42 national groups (five are Italians, one is Italians with foreign origin, and 36 are foreigners) we have (42·41/2 =861) cultural distances. These distances change almost linearly with the number of clusters and are therefore very strictly correlated to those that one finds working with just five clusters (Fig. 8, left).

Fig. 8
figure 8

Source: Istat (2016a, 2016b)

Correlation of results (distances between nationalities) between the case with five clusters and the case with N clusters. Ward (left) and other methods (right). How to read the figure. With the Ward method (left panel), if one uses 6 clusters instead of 5 (standard of reference), the resulting 861 cultural distances are almost the same (or, more precisely, are an almost perfect linear transformation of the baseline case – i.e., produce the same substantial result): the correlation is 0.996. As the number of clusters increases, the correlation of the 861 cultural distances with the reference case (5 clusters) declines, but only very slightly: with 50 clusters, for instance, it is still .882. The same happens in the right panel, where we show the results of the same test for different clustering criteria (Average linkage, Centroid, …). The only exception is the average linkage, where a discontinuity emerges in the passage from 35 to 40 clusters.

This consistency (virtually the same results as the number of clusters vary) is not limited to the Ward method; it also emerges with almost all the other clustering methods that we tried: Centroid, Complete linkage, EML, Median linkage, and Single linkage (Fig. 8, right). In short, the number of clusters does not affect results in any significant way.

This permits us to compare methods on a predefined number of clusters (five, except for the Average linkage method, for which we selected both five‒representative of what happens up to 35 clusters ‒ and 50 ‒ representative of what happens for 40 or more clusters; Fig. 9).

Fig. 9
figure 9

Source: Istat (2016a, 2016b)

Correlation of results (861 between nationalities) between the standard (Ward, 5 clusters) and other clustering methods (with 5 clusters, except for the Average method, also used with 50 clusters)

The correlation with our reference (Ward) method is sufficiently high overall (above 60%), except for the Single linkage method.Footnote 5 In three cases, the correlation with Ward is extremely high (above 90%): Complete linkage, EML, and Average linkage (this one, however, only with a large number of clusters).

In short, the conclusion of this section is that, as it always happens with clustering, several alternatives are possible, not all leading to the same results. However, Ward’s clustering method seems preferable in this type of application, because it tends to generate clusters of comparable dimensions, i.e., with approximately the same number of units. As the DBS (Distance Between Strata) method is based on the distribution of units (from different strata) among clusters, it is better to avoid an excessive concentration of units in the same cluster, if this is at all possible. In this case in particular, it transpires that similar results emerge with other robust clustering methods, which reinforces our claim that Ward’s clustering criterion should be the preferred choice when using the DBS method.

8 Conclusions

A few notes of caution are in order. The first is that emigrants are a selected subgroup: their cultural orientation may not be representative of that of their home country (see Docquier et al., 2020) or, depending on the length of their stay abroad, it may no longer be. If anything, however, if emigrants were positively selected towards emigration and towards a given destination at the start, or if they have somewhat adapted to the cultural habits of the host country, or both, they should be closer to natives than their fellow citizens are (that is, those who have remained in their country of origin). In short, our results are likely an underestimate of the cultural distances that would emerge comparing residents.

The main note of caution, however, comes from the ‘cultural’ variables that we used for our analysis. We could not choose them because we were data-constrained; we considered only the questions that were asked, with the same wording, in the two Istat surveys that we merged (Istat, 2016a, 2016b), one of which focused on immigrants in Italy. Our empirical variables are therefore few, only 11, and they refer primarily to what respondents do in their (free) time. This use, while surely culturally driven, is also influenced by several other variables that we could not keep under control, such as personal resources, availability of free time, and the endowment of the areas where respondents live. In other words, we are also measuring the socio-economic standing of our respondents, together with their cultural orientation and their constraints, and we cannot determine how relevant each of these factors is.

However, first, this is what always happens, in various degrees, with empirical indicators. Secondly, it was impossible to do any better at this stage: future surveys will hopefully cover an increasing number of appropriate cultural dimensions. Thirdly, this weakness can also be used to defend our approach: even with the few and perfectible manifest variables that we had at our disposal, the method produced an output that passed all our preliminary checks, proved robust to alternative specifications (e.g., clustering method and number of clusters), and ‘makes sense’. The cultural distances that we find are in fact consistent with expectations, and with what alternative sources suggest (e.g., a later Istat survey, conducted in 2014–15; Istat, 2020).

Among the expectations that were fulfilled, there is the distinction between the various parts of Italy, which also emerges in our data, with the well-known geographical gradient, North to South. Further, we found that people with Italian nationality but foreign origin are relatively close to Italians tout court, definitely closer than all foreigners living in Italy, including other Europeans, such as Germans and French. Their distance from the Italian ‘standard’ (assuming its existence) is 0.155 (on a 0‒√2 scale), comparable with the distance that separates ‘the two Italies’ (North to South, equalling 0.137), which can be conveniently used as a standard of reference in this peculiar context, where the metric is conventional and otherwise impossible to appreciate.

Unfortunately, we do not have panel data (not even time series) for any of the measures presented here and we cannot be sure about the correct interpretation. We offer two, not necessarily alternative, explanations: convergence and selection. The former implies that people at different stages of their assimilation into Italian society (first foreigners, then Italians with foreign origin) are also characterised by varying degrees of proximity to the Italian ‘culture’. In this interpretation, cultural orientations (influencing the use of time, which is what we measure here) change over time. The latter, instead (i.e., selection), is a mechanism that induces people with greater affinity to the Italian culture to come, stay, and eventually acquire the Italian nationality. In this interpretation, cultural orientations do not need to change; they simply differ between individuals. The truth likely lies somewhere in between, but, as mentioned, the lack of longitudinal data prevents us from exploring the matter in greater depth.

As for foreigners, their average cultural distance from Italians is approximately 0.300, but they are not homogeneous ‒ far from it, in fact. We began with an expectation: that the closer they were ‘at the start’ (better: the closer was their country of origin, in terms of geographical distance and in terms of development, measured through the Human Development Index), the closer they would turn out to be in cultural terms. This expectation found only moderate empirical support, and surely needs further investigation, with better data and alternative methods.

The nationalities that a subsequent Istat survey indicates as the most difficult to integrate in Italian society (e.g., the Chinese and the Filipinos) appear to be the farthest (or at least among the farthest) in our data, despite the use of different target populations, different indicators, and a very different methodology. Based on this, and on the other pieces of evidence presented above, we submit that our ranking (in Table 3) may be used as a possible starting point to obtain at least an indication of the cultural distance of the various national subgroups of immigrants in Italy.

We suggest three main lines of future research. The first is to check whether these results are robust to alternative approaches. For instance, one could go into much greater depth by separating the two databases that we merged here, ADL for Italians and CSIFCI for foreigners. The comparison between Italians and immigrants becomes impossible, but one could verify, on a much wider and more appropriate set of questions, if the results for the two subgroups (separately: Italians by macro region of origin; foreigners by nationality) remain at least roughly the same.

The second step is to apply this methodology to more recent databases, as soon as they become available, and to determine how cultural distances evolve over time, both in general, and for specific national subgroups. Is there convergence towards the Italian ‘standard’ (which, incidentally, may not remain constant)? If yes, how quick is it? Can this convergence be related to covariates such as length of stay, marital status, and labour market participation?

This leads us to the third step: how to use our results. Even without going as far as Guiso et al. (2006), who view culture exclusively as a way of predicting economic behaviour, it seems reasonable to wonder how these findings can help us understand Italian society, and the socio-economic (and demographic) behaviours of her increasingly diverse actors. An additional difficulty is that cultural proximity can be both a cause (for instance of greater or lesser economic success) and an effect (for instance, of intermarriage). Disentangling the causal chain appears to be particularly problematic, in this case.

Much remains to be done, but no progress is possible until a reliable measure of cultural distances becomes available. The one that we have proposed in these pages hopefully will pave the way for advancements in this field.