1 Introduction

Identity is the way that individuals categorize themselves and others. When people are mobilized based on their identity group, the politicalized identity could have a significant social-political impact and even be a source of violent conflicts, such as in Northern Ireland in the 1970s and Rwanda in the 1990s (Bhavnani & Backer, 2000). While identity often appears to be pre-determined and sticky, contemporary scholars tend to agree that identity is not fixed or inherent but rather constructed by social, political, and institutional contexts (Abdelal et al., 2006; Anderson, 2006). Therefore, identity is malleable and fluid. Moreover, rather than choosing one single identity, people can have multiple identities simultaneously (Medrano & Gutiérrez, 2001; Waters, 1990). However, the measurement of identities, particularly for mixed identities, can be challenging (Royuela, 2020; Voicu & Ramia, 2021).

Hong Kong is an ideal place to study identity politics, particularly for mixed identities. Hong Kong was once a refugee society as many mainland Chinese fled to the city in the late 1940s. With Hong Kong’s economic take-off and the political turmoil in mainland China, new generations born in Hong Kong after the Second World War began to develop a strong Hong Kong identity (Lau & Kuan, 1988; Lui et al., 2018, 4). Since the handover of sovereignty from Britain to China in 1997, scholars have made great efforts to examine the identities of Hong Kong. In the initial years after the handover, studies suggest a rise in Chinese identity and mixed identities (Fung, 2004; Lee & Chan, 2022). Based on survey data from 1997 to 2013, Steinhardt et al. (2018) argue that the Chinese and Hong Kong identities are compatible and can co-exist in a non-zero-sum game. The correlation between Chinese and Hong Kong identities was robustly positive until 2008; however, the positive correlation no longer existed after 2008 (Steinhardt et al., 2018, 268). Following a period of bolstering Chinese national identity until 2008, the sense of Chinese identity among Hong Kong citizens started to decline.

Since the mid-2000s, Hong Kong has emerged as a hotbed of identity disputes within China (Lee, 2020, 371). The rise of localist identities and escalating antagonistic sentiments against mainland China have been seen in both social protests and legislative elections (Cheng, 2016; Ku, 2012; Ma, 2015; Wong & Chu, 2017). Since the 2010s, the contentions rising from the proposed national education, the discontent over the inflow of mainland immigrants and visitors amid fast economic integration (Yang et al., 2022), and the disputes surrounding the allocation of public goods have caused the rise of localism (Chan, 2014). Since the Umbrella Movement in 2014, localism has risen advocating for greater autonomy and even independence (Veg, 2017; Kwong, 2016; Tang & Chung, 2022; Wong & Wan, 2018; Yang and Wu, 2023; Ip, 2019). Studies suggest that the youth’s local identity is compatible with global citizenship identity (Chan & Tang, 2019), but their local identity has become increasingly incompatible with Chinese identities since the 2010s (Lee & Chan, 2022). For some citizens with localist political orientations, the Chinese identity and Hong Kong identity are incompatible and seen as a zero-sum game (Kwong, 2016; Ng, 2021). The segregation between these identities is further posited to stem from the Hong Kong identity being civic, emphasizing democratic values, while the Chinese identity is more ethnically centered (Veg, 2017; Chow et al., 2020).

The existing literature has offered valuable insights into the identity shift in Hong Kong since the handover. However, much of the current literature only adopts a unidimensional single-item indicator for national-subnational identities. Scholars have insightfully pointed out that the single-item indicator may not accurately estimate the compatibility of identities (Steinhardt et al., 2018) and the prevalence of various combinations of identities in a population (Lee, 2020, 377). Some recent studies adopt multiple indicators for identities (Steinhardt et al., 2018; Lee & Chan, 2022; Chan & Tang, 2019), but the correlation analysis alone cannot fully reveal the prevalence of different combinations of mixed identities. Though the analysis of statistical correlations can illustrate whether the identities are compatible in the population or in certain groups (e.g., youth), the method of statistical correlation analysis cannot fully demonstrate the different combinations of identities in the population.

This study contributes to the literature by examining the prevalence of different combinations of identities through the K-means algorithm on multi-item indicators, with the bi-annual survey data from the Hong Kong Public Opinion Research Institute (HKPORI, formerly the Hong Kong University Public Opinion Program) from 2016 to 2022. Moreover, this study conducts regression analyses to examine factors determining a citizen’s membership to a cluster identified by K-means clustering. The key research questions that guided this study are: What are the patterns of self-perceived identities from 2016 to 2022? What are the attributes associated with a cluster identified by the K-means algorithm?

This study hence aims to generate new insights into the identity shift in Hong Kong. In addition to its contribution to Hong Kong studies, as mixed identities are prevalent globally, this study contributes to the literature on measuring identity multiplicity, arguing that multidimensional measurement should be preferred when the research aims to examine the prevalence of different combinations of identities.

2 Identity Fluidity, Multiplicity and Its Measurement

Identity can be defined as the social category in which individuals are placed based upon one or more attributes (Egan, 2020, 700). Indeed, many important attributes that define people’s identity can be either impossible to change (e.g., place of birth, ancestry), or hard to change (e.g., gender). Some important attributes can change, but they only change slowly in a constrained way, such as social–economic status (Chandra, 2006). Moreover, family, community and organizations with greater staying powers than individuals can shape individuals’ identities, which make identity span generations and create a sense of being always there.

However, identity is fluid for two reasons (Egan, 2020, 701). First, though objective group membership tends to be straightforward and fixed in many cases, subjective group identification can shift over time (Huddy, 2015). The multiplicity of identities offers a crucial condition that allows the shift of subjective group identification. For instance, individuals may develop a stronger identification with one identity category but lower their identification with another. Second, the boundaries of identity categories can be fuzzy, which leads individuals to identify with and report different identity categories at different time points. Some individuals may have important attributes sitting at those fuzzy boundaries, making their identities particularly malleable (Egan, 2020, 701).

In addition to the fluidity of identity, scholars widely agree that individuals can hold multiple identities simultaneously. Identity multiplicity is particularly salient among immigrants and ethnic minorities, who tend to identify with national, ethnic, and religious identities with different degrees of compatibility or conflict (Fleischmann & Phalet, 2016; Wiley et al., 2019). Moreover, identity multiplicity can be politically salient in regions with a history of intergroup conflicts, such as Northern Ireland and Catatonia (Crisp et al., 2001; Moreno et al., 1998). For example, although identities in Northern Ireland are conventionally divided across religious lines (Protestants and Catholics), the social categorizations are also shaped by other factors such as gender (Crisp et al., 2001).

Given the fluidity and multiplicity of identities, it can be particularly challenging to measure mixed identities. The literature has suggested several strategies to measure multiple identities accurately. First, the literature has proposed adding both measures of self-categorizations as a member of multiple groups and a member of a blended group (Wiley et al., 2019). For instance, the respondents will be asked about their identification with religious and national identities, and they can be high on both identifications (Fleischmann & Verkuyten, 2016). The respondents will also be asked whether they identify with a blended group (e.g., British Muslims). Second, researchers recommend considering multiple perspectives of identities (Wiley et al., 2019). Rather than simply studying whether one categorizes oneself into a group, other perspectives, such as how individuals positively feel about the group, can be added. Third, longitudinal research has been recommended for tracing development and change (Fleischmann et al., 2019).

While the literature has offered general advice on measuring mixed identities, the selection of measurement shall depend on the research objective, particularly considering the constraints of time and resources. This study argues that when the main research objective is to examine the prevalence of different combinations of identities, then multidimensional measurement are preferable to unidimensional measurement for three reasons. First of all, the unidimensional measurement assumes identities in a single continuum. However, the individuals’ identities that researchers try to measure are not necessarily each other’s opposite (Arends-Tóth & Vijver, 2007). A unidimensional measurement will therefore miss several important combinations of identities (e.g., dual identity with high identification on both items; individuals with low identification on both items, etc.). Second, when adopting unidimensional categorical variable, as the boundaries of many identity categories are fuzzy (Egan, 2020, 701), it can create inherent difficulties for individuals to report mixed identities through unidimensional categorical variables, particularly for those with important attributes sitting at those fuzzy boundaries (e.g., individuals with parents from different races). Furthermore, for individuals with multiple identities, their multiple identity categories may not be fully convergent or overlapping to form a single ingroup identity category as indicated in a unidimensional categorical measurement (Roccas & Brewer, 2002).

Based on the practices described in the literature on studying multiple identities (Fleischmann & Verkuyten, 2016; Ng & Verkuyten, 2013), this study adopts K-means clustering analysis to examine multiple identity indicators using longitudinal data in Hong Kong and compares these results to a conventional unidimensional categorical indicator commonly used in Hong Kong surveys. The findings confirm that when the research targets measuring the prevalence of different combinations of identities, multidimensional measurement shall be preferred.

3 Measuring Citizens’ National-subnational Identities in Hong Kong

Measuring identities is crucial to examine identity shifts and mixed identities accurately. In studying national-subnational identities in Hong Kong, scholars mainly focus on examining citizens’ identification with Chinese and Hong Kong identities. The Chinese identity is usually taken as a “superordinate identity” (national identity) as it covers diverse sub-groups, no matter of ethnic or socioeconomic terms (Jaśko & Kossowska, 2013; Lee, 2020, 372). The Hong Kong identity is often treated as a subordinate identity (Steinhardt et al., 2018, 263), although some scholars disagree, as discussed below (Fong, 2020; Ho, 2022).

The existing literature often uses a unidimensional single-item indicator to examine national-subnational identities in Hong Kong. However, in the study of nested identities, it is difficult to apply a unidimensional single-item indicator to examine the compatibility of identities and identity correlations (Levy, 2014; Gries et al., 2012). The most commonly used conventional single-item measurement invites respondents to pick an answer from categories: Hong Konger, China’s Hong Konger, Hong Kong’s Chinese, and Chinese. However, as Francis Lee and Joseph Chan point out, the meanings of “China’s Hong Konger” and “Hong Kong’s Chinese” are not immediately clear (Lee & Chan, 2005, 20), and Hong Kong and Chinese identities are treated as dichotomous assuming that identifying with Hong Kong will weaken Chinese identity (Lee & Chan, 2005). Such a unidimensional, single-item indicator may prime the contrast between Chinese identity and Hong Kong identity but cannot reveal the compatibility of identities (Lee & Chan, 2022). Moreover, this unidimensional single-item indicator is not able to capture the many combinations of mixed identities, such as the combination of equally strong Chinese and Hong Kong identities.

This study agrees that it is more appropriate to use multidimensional measurement for mixed identities rather than a unidimensional single-item indicator, as people may adopt multiple identities simultaneously (Steinhardt et al., 2018; Lee & Chan, 2022). The multidimensional measurement can better estimate and compare the prevalence of different combinations of identities in the population compared to a unidimensional measurement.

This study adopts four indicators to measure nested identities. The four indicators are a scale of the strength of identity as a Hong Konger from 0 to 10, a scale of the strength of identity as a Chinese citizen from 0 to 10, a scale of the strength of identity as a citizen of PRC from 0 to 10, and a scale of the strength of identity as a member of the Chinese race from 0 to 10.Footnote 1 This study holds the strength of the identity as a citizen of the PRC as an indicator of a citizen’s level of identification with China as a political community and the strength of identity as a member of the Chinese race as an indicator of a citizen’s level of identification with China as a cultural and ethnic community. Previous studies have suggested that Hong Kong citizens traditionally have a stronger affection for “cultural and economic China” while being more distanced from the Chinese government (Fung, 2004). Adding the strength of the identity as a citizen of the PRC and a member of the Chinese race in the analysis can help to reveal citizens’ Chinese identities from multiple dimensions.

It is worth noting that some scholars have suggested that the study of identities in Hong Kong should go beyond the perspective of “national identity versus local identity” and propose a new perspective of a “stateless nation” (Fong, 2020; Ho, 2022), which indeed also requires the measurement of identities not taking Chinese identity and Hong Kong identity as unidimensional in a single-item indicator; multi-item indicators are, therefore, preferred (Wong et al., 2021b, 66).

4 Data and Methods

This study uses biannual ethnic identity surveys from December 2016 to June 2022 offered by the HKPORI, publicly available on its website.Footnote 2 The sample size for each biannual survey is around 1000 (ranging from 1000 to 1034 in the selected period). The biannual survey collects data from a random sample of Hong Kong citizens through telephone surveys. The rationale to start the analysis from December 2016 is because of data availability, as the biannual datasets before that do not incorporate some important variables, such as political inclinations, while the datasets since December 2016 have these variables.

The bi-annual surveys done by the HKPORI have both the four-item indicators and the conventional unidimensional single-item indicator of “Hong Konger, China’s Hong Konger, Hong Kong’s Chinese, Chinese.” Therefore, it offers a rich dataset to examine citizens’ identities through the four-item indicators and to compare the results with the single-item indicator. In contrast, other major publicly available surveys, such as the World Value Survey and Asian Barometer, only have the single-item indicator. Hence, this study uses the survey by HKPORI for analysis and comparison.

This study employs the K-means clustering method, a type of unsupervised machine learning, to discern patterns in national-subnational identities. K-means clustering produces centroid coordinates, signifying the within-group means on the feature vectors within the cluster. The procedure commences by allocating each observation to the nearest cluster, followed by calculating the centroid for each cluster. This procedure is iterated until there is no further change in the assignment of clusters.

Regarding setting the number of clusters, this study starts with doing multiple imputations by chained equations for each biannual dataset incorporated in this study. Following the practice for multiple imputations in clustering analysis proposed by Basagaña et al. (2013), I then use K-means clustering to identify clusters for all imputed samples (ten imputed samples for each biannual dataset). In theory, the four-item indicators with a 0–10 scale can yield 14,641 combinations of identities. In all imputed samples, the Calinski-Harabasz Index generates the highest value with two clusters, with a decreasing value when further increasing the number of clusters, which suggests two clusters offer the most distinct clustering. In these two clusters, one has a high level of identification in all four indicators, namely the strength of identity as a Hong Kong citizen, the strength of identity as a Chinese citizen, the strength of identity as a citizen of the PRC, and the strength of identity as a member of the Chinese race. The other cluster has a high identification as a Hong Kong citizen but a medium–low level of identification in the other three indicators. Given the rising localism in Hong Kong since the Umbrella Movement in 2014, which suggests an exclusive Hong Kong identity, this study chooses to set the group number as three to capture the rise of localism. After setting the group number to three based on the domain knowledge (Ahlquist & Breunig, 2012; Wagstaff et al., 2001), a cluster of predominant Hong Kong identities emerges in all imputed samples. This study hence examines the imputed samples by setting the group number as three. In the data analysis, this study used the first imputed samples for the 2016 December survey, the 2019 December survey and the 2021 December survey to illustrate the centroids of clusters and to perform the regression analyses.

After setting the cluster number as three, in all imputed samples, Cluster 1 refers to predominant Hong Kong identity, indicating citizens who identify predominantly with Hong Kong with low Chinese identity. Cluster 2 is named “moderate hybrid identity” that represents citizens with medium–high Hong Kong and medium Chinese identities (the level of identification with each dimension is moderate). Following the practice of defining “dual identity” as being high on both dimensions in the literature (Fleischmann & Verkuyten, 2016, 152), Cluster 3 is labeled as “dual identity” that includes citizens with simultaneously high Hong Kong and Chinese identities. The clusters identified in the K-means clustering analysis are, therefore, highly different from the picture presented through the conventional single-item indicator (Hong Konger, China’s Hong Konger, Hong Kong’s Chinese, Chinese).

In contrast to the predominant Hong Kong identity as a prominent pattern in the datasets, the clustering analysis does not identify a predominant Chinese identity even when setting the group number as nine. Ascribing three levels to each identity (high, medium, and low for both Hong Kong and Chinese identities) results in nine potential combinations, one of which is the predominant Chinese identity (high Chinese identity with low Hong Kong identity). However, K-means clustering does not identify a cluster of predominant Chinese identity even when setting the group number as nine. This illustrates that in Hong Kong, citizens with a high Chinese identity generally also have a strong or medium Hong Kong identity. Therefore, those citizens with a high Chinese identity tend to take the Chinese identity as a superordinate (national) identity that does not conflict with their Hong Kong identity.

5 Comparing the Multidimensional Measurement with Unidimensional Measurement

Before looking into the patterns of citizens' identity from 2016 to 2022, this study compares the conventional unidimensional single-item indicator with the multiple identity indicators. First, the categories of “Hong Konger” or “Chinese” in the single-item indicator, which are usually interpreted as citizens with an exclusive identity, can actually be a mixed identity for many observations according to the K-means clustering analysis of multidimensional measurement. For example, the cross-tabulation for the December 2016 dataset (Tables 1 and 2): interviewees who identify themselves as “Chinese” in the single-item measurement mostly belong to the clusters of moderate hybrid identity and dual identity, in which they have a strong Hong Kong identity rather than exclusively identifying as Chinese. Specifically, 18% of the interviewees who identify themselves as “Chinese” in the single-item indicator belong to Cluster 2 of moderate hybrid identity with a medium–high level of Hong Kong identity (7.6 out of 10). 78.1% of them are in the cluster of dual identity, with a high level of Hong Kong identity (8.6 out of 10). Similarly, among the citizens identifying as “Hong Kongers,” 45.45% are in the cluster of moderate hybrid identity, with a medium-level Chinese identity rather than an exclusive Hong Kong identity (6.2 out of 10). Moreover, 20.25% are in the cluster of dual identity, with a strong Chinese identity rather than an exclusive Hong Kong identity.

Table 1 Coordinates of centroids of three clusters for December 2016 survey
Table 2 Cross-tabulation of cluster label variable with the single-item indicator for December 2016 survey

This study suggests that researchers shall not assume the categories of “Hong Konger” or “Chinese” in the unidimensional measurement means those citizens identify exclusively with one or the other. This is particularly true regarding Chinese identity, as citizens with a high level of Chinese identity also tend to have a strong or medium level of Hong Kong identity. However, this study does acknowledge that citizens who pick “Hong Konger” in the single-item indicator are more likely to have a predominant Hong Kong identity. In Table 2, 34.3% of citizens who pick “Hong Konger” in the single-item indicator belong to the cluster of predominant Hong Kong identity.

The K-means clustering analysis suggests that the proportion of citizens with mixed identities in the population has been underestimated by previous studies that adopt the unidimensional single-item indicator assuming that Chinese identity and Hong Kong identity are each other’s opposite. Regarding the December 2016 dataset, the clustering analysis suggests that 83.7% of respondents have mixed identities (clusters of moderate hybrid identity and dual identity), much higher than the 47.1% suggested by the conventional single-item measurement. Regarding the December 2019 dataset (Table 3), the clustering analysis suggests that 73.3% of respondents have mixed identities (clusters of moderate hybrid identity and dual identity), much higher than the 29.9% suggested by the conventional single-item indicator. In the December 2021 dataset (Table 4), 76.3% of respondents have mixed identities (clusters of moderate hybrid identity and dual identity), higher than the 42.8% suggested by the conventional single-item indicator.

Table 3 Coordinates of centroids of three clusters for December 2019 survey
Table 4 Coordinates of centroids of three clusters for December 2021 survey

This comparison illustrates that the unidimensional measurement underestimates the proportion of citizens with mixed identities in the population, as it treats Chinese identity and Hong Kong identity as unidimensional in a single feature vector. In contrast, the multidimensional measurement takes the identities of Hong Konger, Chinese citizen, Hong Kong Chinese, and China’s Hong Konger as four feature vectors, which enables a more accurate measurement of the major combinations of identities and their proportion in the population.

6 Decoding the Citizens’ Identities from 2016 to 2022

This study adopts the K-means clustering of four indicators to examine citizens’ identities according to biannual datasets from December 2016 to June 2022. This study starts by examining the patterns of identities through three biannual datasets (December 2016, 2019, and 2021). Then, we analyze the trend of shifting identities from 2016 to 2022 among the general population and the youth.

This study first finds that in these three biannual datasets (December 2016, 2019, and 2021), dual identity (high Hong Kong identity combined with high Chinese identities) is always one out of the three clusters (45.5% in the December 2016 dataset; 34.3% in the December 2019 dataset; 46.6% in the December 2021 dataset), which is not well captured in previous studies adopting the conventional single-item indicator. This cluster has a high level of identification as a Chinese citizen (9.3 in the December 2016 dataset, 9.5 in the December 2019 dataset, 9.3 in the December 2021 dataset), also a high level of identification as a citizen of the PRC and a member of the Chinese race. The strength of Hong Kong identity in the cluster of dual identity is slightly lower than the other three indicators but is still high (8.6 in December 2016 dataset, 8.3 in December 2019 dataset, 8.6 in December 2021 dataset). Thus, this group simultaneously embodies high Hong Kong and Chinese identities.

Another pattern of mixed identities among the population is the moderate hybrid identity (a medium–high level of Hong Kong identity combined with a medium level of Chinese identities, 38.2% in the December 2016 dataset; 39% in the December 2019 dataset; 29.8% in the December 2021 dataset), which is also not well captured in previous studies using a single-item indicator. This cluster has a medium–high to high level of Hong Kong identity, a medium level of identity as a Chinese citizen and as a member of the Chinese race, with a relatively low level of identification as a citizen of the PRC. Among the three indicators of Chinese identities, the strength of the identity with Chinese race is mostly the highest (6.9 in the December 2016 dataset; 6.6 in the December 2019 dataset; 5.8 in the December 2021 dataset), while the strength of the identity as a citizen of the PRC is the lowest (5.2 in the December 2016 dataset; 3.7 in the December 2019 dataset; 4.6 in the December 2021 dataset). This fits with the existing literature that Hong Kong citizens tend to have a stronger ethnic Chinese identity and affection for China culturally yet are distanced from the Chinese government (Fung, 2004). When comparing this cluster among these three datasets, we find the strength of identification as a Hong Konger reached its highest in December 2019, while the strength of identification as a Chinese citizen and a PRC citizen both reached their lowest in December 2019 among the three datasets, which could reflect the impact of the 2019 anti-ELAB Movement on citizens’ identities.

A predominant Hong Kong identity (high Hong Kong identity with low Chinese identities) is identified as being one pattern out of the three in all three datasets, which fits the study on the rise of localism in Hong Kong (Kwong, 2016). The proportion of citizens who belong to this cluster increases from 16.3% in the December 2016 dataset to 26.7% in the December 2019 dataset. It then declines a bit and reaches 23.7% in the December 2021 dataset. For this cluster of predominant Hong Kong identity, it had a high level of identification as a Hong Konger but a low level of identification as a Chinese citizen, PRC citizen, and a member of the Chinese race (among the three indicators of Chinese identities, the identification with the Chinese race was the highest). When comparing the cluster of predominant Hong Kong identity in all three datasets, we find that the strength of Chinese identities declined in all three indicators from 2016 to 2019. More specifically, in the December 2016 survey, this group still has a medium level of identification with the Chinese race (5.0 out of 10). However, in the December 2019 survey, citizens in this group have a further lower level of identification as a member of the Chinese race (2.1 out of 10), and it remains at a low level in the December 2021 survey (2.2 out of 10). This suggests that citizens with localist orientations not only reject being a member of the PRC but also start to reject cultural China and no longer recognize themselves as a member of the Chinese race. In contrast, the strength of their identity as Hong Kongers remains high. A strong Hong Kong identity with a declining Chinese identity indicates that citizens in this group have developed an even more exclusive Hong Kong identity in recent years.

After examining the clusters in the three datasets, this study further looks into the trend of changing identities from 2016 to 2022 in the general population (Fig. 1). Citizens with mixed identities (moderate hybrid identity and dual identity) are in the range between 68% and 87% during this period. The proportion of mixed identities declined between 2017 and 2020 and has increased a bit since 2021. More specifically, citizens within dual identity, that is, the number of citizens with simultaneously high Hong Kong and Chinese identities, first declined in 2018 and reached its lowest level in June 2019 (33.8%); it then bounced back and reached 46.6% in the June 2022 survey. Moderate hybrid identity (medium–high Hong Kong identity and medium Chinese identities) was as high as 42.6% in June 2019 but has declined since 2019, reaching 29.4% in June 2022. In contrast, citizens within the cluster of predominant Hong Kong identity started to increase in 2018 and peaked in the December 2020 survey at 32% before declining to 21.8% in the June 2022 survey. The identity shift during 2019–2020 can probably be attributed to the Anti-ELAB Movement in 2019.

This study further examines the identities among young citizens (age 18–29) in Fig. 2. The proportion of young people with a predominant Hong Kong identity largely increased from 2016 to 2022, while the proportion of youth with mixed identities declined a lot during this period. When comparing Figs. 1 and 2, the proportion of young citizens with the predominant Hong Kong identity is much higher compared to that of the general population. Young citizens holding the predominant Hong Kong identity started to increase in December 2018 and reached their peak in December 2020, with 60.4% of young respondents belonging to the exclusive Hong Kong identity. It then declined a bit, but still, more than half of young citizens had a predominant Hong Kong identity in the June 2022 survey (51.8%). In contrast, the proportion of young citizens with dual identity is much lower than in the general population. The proportion of young citizens with simultaneously high Hong Kong and Chinese identities was low (9.2%) in June 2019 and lowest (8.0%) in December 2020. It has increased since 2021, reaching 17.5% in June 2022. The cluster of moderate hybrid identity was as high as 55.8% in June 2017 but then declined to 30.7% in June 2022. The cluster of predominant Hong Kong identity is slowly usurping a portion of moderate hybrid identity over the years (Figs. 1 and 2).

Fig. 1
figure 1

Citizens' identities from 2016 to 2022

Fig. 2
figure 2

Young citizens' identities from 2016 to 2022 (aged 18–29)

7 Examine Attributes Associated with of Clusters Identified by the K-means Clustering Analysis

After using K-means clustering to identify the patterns of citizens’ identities, this study further examines the characteristics associated with the clusters identified by the K-means clustering analysis. Though the available variables in the public datasets of the HKPORI are limited, this study incorporates the following variables based on the existing literature.

First, the literature generally suggests that political values influence the identities of Hong Kong citizens, particularly citizens’ anti-authoritarianism and pro-authoritarianism attitudes (Veg, 2017; Chow et al., 2020). More specifically, political values of pro-democracy and anti-authoritarianism are found to be correlated with a stronger Hong Kong identity. In contrast, pro-authoritarianism values are associated with identification with China. Therefore, this study uses citizens’ self-reported political orientations to examine their political values. Based on the literature, we develop the following two hypotheses:

Hypothesis 1a: citizens who have pro-democracy political orientations are more likely to have the predominant Hong Kong identity.

Hypothesis 1b: citizens who don’t have pro-democracy political orientations are more likely to have dual identity.

Second, homeownership influences the political attitudes of Hong Kong citizens (Wong & Wan, 2018). Homeowners tend to support pro-establishment political parties that favor the status quo. In contrast, non-homeowners tend to support opposing parties. Hence, our study also incorporates homeownership status into the analysis. This study proposes the following hypothesis based on the literature:

Hypothesis 2: Homeowners are more likely to have a dual identity.

Third, the literature has suggested that citizens born in Hong Kong tend to have a stronger Hong Kong identity (Steinhardt et al., 2018) and have stronger pro-democracy political attitudes than immigrants (Wong et al., 2018). Therefore, we incorporate the variable of birthplace in our analysis and propose the following hypotheses:

Hypothesis 3a: citizens who are born in Hong Kong are more likely to have a predominant Hong Kong identity.

Hypothesis 3b: citizens who are not born in Hong Kong are more likely to have a dual identity.

Forth, scholars have found that younger citizens tend to have a stronger Hong Kong identity and post-materialist orientations (Steinhardt et al., 2018; Tang & Cheng, 2021; Wong et al., 2021a). Thus, we also add age into our analysis and propose the H4:

Hypothesis 4: younger citizens are more likely to have a predominant Hong Kong identity.

In addition to the abovementioned variables, we also include the variables of education level and self-perceived class, as the literature suggests these variables may influence citizens’ political attitudes, which may also influence their identities (Lee et al., 2017; Wong & Wan, 2018). Moreover, we also incorporate gender as it is a commonly used demographic control variable.

7.1 Data and Measures

The dependent variable is the cluster of identities among Hong Kong citizens. This study uses the K-means algorithm to divide each dataset into three clusters. As the dependent variable is nominal, this study utilizes multi-nominal regressions for the analysis.

Regarding the independent variables, first, for the self-reported political orientations, we use a dummy variable on whether the respondents supported the pan-democratic camp (citizens who support the pan-democratic camp are counted as 1, while other political orientations are counted as 0). Second, for the level of education, we include a three-level ordinal variable, including 1 as primary or below, 2 as secondary, and 3 as tertiary or above. Third, we include a dummy variable for homeownership (homeowners = 1, non-homeowners = 0). Forth, we add the variable of self-perceived class (lower class or grassroots = 1, upper class = 5).

For the demographic variables, citizens’ birthplace is a dummy variable (born in Hong Kong = 1, born elsewhere = 0). We also put gender as a dummy variable (male = 1, female = 0). Finally, we include the citizens’ age group in the analyses (18–29 years old = 1, 70 years and above = 6).

7.2 Empirical Results for the December 2016 Dataset

This study uses multi-nominal regression analyses for all three datasets (December 2016, 2019, and 2021) (Table 5). We use the cluster of dual identity as the reference category in the regression analyses.

Table 5 Multi-nominal regression on factors influencing citizens’ identity categories

First, regarding the cluster of predominant Hong Kong identity (Cluster 1), we find that citizens who support the pan-democratic camp are more likely to be in this cluster rather than the reference category of citizens with simultaneously high Chinese and Hong Kong identities (p < 0.001). Second, citizens born in Hong Kong are more likely to be in this cluster than the reference category (p < 0.001). Third, younger citizens are more likely to be in this cluster (p < 0.001). Finally, citizens who perceive themselves as belonging to the upper class are less likely to be in this group (p < 0.05).

Regarding the cluster of moderate hybrid identity, we find that citizens who support the pro-democracy camp are more likely to be in this group, compared to the reference category of dual identity (p < 0.001). Citizens born in Hong Kong are also more likely to be in this cluster than in the reference category (p < 0.01), as are younger citizens (p < 0.001).

7.3 Empirical Results for the December 2019 Dataset

We keep the cluster of dual identity as the reference category in the multi-nominal regression analyses for the December 2019 dataset. Regarding the cluster of predominant Hong Kong identity, we find that citizens who have pan-democratic political orientations are more likely to join this group compared to the reference category of dual identity (p < 0.001). Moreover, citizens who are born in Hong Kong are more likely to be in the group of predominant Hong Kong identity rather than the reference category (p < 0.001). The study finds citizens with a higher level of education are more likely to have a predominant Hong Kong identity (p < 0.05). The analysis further suggests that homeowners are less likely to have a predominant Hong Kong identity than the reference category of dual identity (p < 0.05). We further find that citizens who perceive themselves as a lower class are more likely to be in this group (p < 0.05). Younger citizens are also more likely to have a predominant Hong Kong identity (p < 0.001).

In terms of the cluster of moderate hybrid identity, the analysis suggests that citizens who support the pan-democratic camp are more likely to be in this cluster compared to the reference group of dual identity (p < 0.001). Citizens born in Hong Kong are also more likely to be in this group (p < 0.01). Moreover, citizens who perceive themselves in the higher classes are less likely to belong to this group (p < 0.01). Younger citizens are more likely to be in this cluster than the reference category (p < 0.01).

The analysis of the 2019 dataset suggests that homeowners are more likely to have dual identity than predominant Hong Kong identity. The literature suggests that homeowners tend to take more conservative political attitudes toward supporting status-quo and pro-establishment parties (Wong & Wan, 2018). This study adds to the literature by examining homeowners’ identities.

7.4 Empirical Results for the 2021 Dataset

The empirical findings for the December 2021 survey are largely consistent. Again, we use dual identity as the reference category in the multi-nominal regression analyses. Regarding the cluster of predominant Hong Kong identity, we find that citizens with pan-democratic political orientations who are born in Hong Kong and younger are more likely to be in this cluster than the reference category of dual identity (p < 0.001). Those in the cluster of moderate hybrid identity, again, tend to be pro-democratic, born in Hong Kong, and younger than those with dual identity (p < 0.001).

In summary, in all three datasets, we find that citizens who support the pan-democratic camp, are younger, and are born in Hong Kong are less likely to have dual identity, which fits the hypotheses developed from the literature. This finding aligns with findings in the literature that birthplace, political values, and age significantly impact citizens’ identities (Steinhardt et al., 2018; Veg, 2017; Chow et al., 2020; Lee & Chan, 2022).

8 Conclusion

Measuring mixed identity is a challenging task. The existing literature on identities in Hong Kong often adopts a unidimensional single-item indicator for national-subnational identities, which cannot accurately estimate the compatibility of identities (Steinhardt et al., 2018) and the prevalence of different combinations of identities in a population (Lee, 2020, 377). Some studies adopt multiple indicators for national-subnational identities (Steinhardt et al., 2018; Lee & Chan, 2022; Chan & Tang, 2019). The statistical correlation analyses in those studies can estimate the compatibility of identities but cannot reveal the prevalence of various combinations of identities in a population. This study moves a step further by examining the prevalence of different combinations of national-subnational identities in Hong Kong from 2016 to 2022 through K-means clustering analyses of multidimensional measurement, and compared the result with the conventional unidimensional measurement that are commonly used in surveys in Hong Kong. The clustering analysis identified three clusters, namely dual identity, moderate hybrid identity, and predominant Hong Kong identity, which are different from the identity categories presented in the conventional unidimensional measurement (i.e., Hong Konger, China’s Hong Konger, Hong Kong’s Chinese, and Chinese). The finding, therefore, generates a different picture of the prevalence of major combinations of identities in Hong Kong.

The findings suggest that previous studies adopting a conventional single-item measurement underestimate the proportion of mixed identities. Two of the three clusters identified by the K-means algorithm were mixed identities, including the dual identity and moderate hybrid identity. According to the K-means clustering analysis, between December 2016 and June 2022, the percentages of citizens with mixed identities ranged from 68% to 87%, much higher than what the conventional single-item measurement suggests.

The present study illustrates the rise of the predominant Hong Kong identity between 2016 and 2022, which fits with the rise of localism widely discussed in the literature (Chong, 2022; Kwong, 2016; Veg, 2017). The proportion of citizens in the general population who belongs to this cluster reached its highest point in December 2020 (32%). Meanwhile, the proportion of young people (aged 18–29) with exclusive Hong Kong identity was much higher, reaching its peak in December 2020 at 60.4%, and has remained higher than 50% since 2020. Moreover, this study illustrates that, for citizens with a predominant Hong Kong identity, their identification with the Chinese race declined from 2016 to 2022. This suggests that citizens with localist orientations are distant from China politically and increasingly distanced from China culturally.

As identity multiplicity is a global phenomenon, this study contributes to the literature on measuring mixed identity by proposing that when the research aims at examining the prevalence of different combinations of identities, then multidimensional measurement should be preferred to unidimensional measurement. However, depending on the objectives of the research, unidimensional measurement can have unique advantages in examining social identifications and psychological attachment in certain contexts (Ng & Verkuyten, 2013, 855). For instance, explicitly defining oneself as Hong Kong Chinese can have its distinctive psychological meaning and attachments, compared to separately indicating how strongly one identifies with Chinese and Hong Kong identities.

The present study further examines the attributes associated with the identity clusters. We find that citizens with dual identity are more likely to be older, non-supporters of the pan-democratic camp, and not born in Hong Kong. Homeowners and citizens who perceive themselves as belonging to the upper class are also more likely to have dual identity. In contrast, citizens who are younger, supporters of the pan-democratic camp, and born in Hong Kong, are more likely to have the moderate hybrid identity or predominant Hong Kong identity. The findings fit the hypotheses generated from previous literature.

This study examines the identity shift in Hong Kong from 2016 and 2022. We acknowledge that citizens’ understanding of the meaning of Chinese identities and Hong Kong identity may have changed during this period. However, according to the regression analysis, the predictors of citizens’ identities remain consistent among the three datasets analyzed in this study, which suggests that citizens’ self-rated strength of Hong Kong and Chinese identities remain comparable over the years. In the future, qualitative, interview-based studies are warranted to examine how the meanings of these identities change over time. Moreover, given the limited information contained in the HKPORI datasets, the variables added to the models are relatively limited. New surveys that include more variables and examine their impact on citizens’ identities would be highly informative.