Assessment of community vulnerability during the COVID-19 pandemic: Hong Kong as a case study

The COVID-19 pandemic continues to threaten global public health. Reliable assessment of community vulnerability is therefore essential to fighting and mitigating the pandemic. This study presents a framework that considers the roles of internal and external factors, including the components of social vulnerability, exposure, and sensitivity, to comprehensively and accurately assess community vulnerability to the pandemic. With respect to internal factors, we summarized the inherent social characteristics of people groups using census data and explored the roles of both overall and four major thematic social vulnerabilities in shaping community infection by COVID-19. We then designed two external factors to characterize exposure and sensitivity and implemented an aggregation by multiplying them with the internal social vulnerability to achieve a comprehensive vulnerability assessment. The role of the estimated vulnerability in shaping community infection was evaluated by statistical and spatial analysis as well as by risk factor classification using defined rules. This case study of Hong Kong demonstrated the value of our framework in vulnerability assessment and revealed the role of vulnerability in shaping community infection by COVID-19.


Introduction
As a novel contagious disease, COVID-19 has rapidly spread acute respiratory syndrome in human beings (Zhou et al., 2020b). Its high transmissibility still affects most countries and has paralyzed the world, causing the World Health Organization to declare it a pandemic in March 2020 (World Health Organisation, 2020a). Various challenges have inevitably emerged, such as intense pressure on global health systems (Lal et al., 2021), social isolation and psychological trauma (Calbi et al., 2021;Sheffler, Joiner and Sachs-Ericsson, 2021), and unequal effects on vulnerable groups (Herrmann, Nielsen and Aguilar-Raab, 2021;Malard et al., 2020;Shen et al., 2021). Governments have had to maintain a high degree of vigilance and transform relatively fixed modes of operation into adaptive ones (Allain-Dupré et al., 2020). Various adaptive countermeasures have been implemented, including social distancing policies, school closures, factory closures, and even lockdowns (Hale et al., 2021). This has created a need for a reference around which to formulate adaptive countermeasures (Huang et al., 2021). Effective assessment of community vulnerability to COVID-19 is therefore of great importance in designing adaptive responses that support the WHO public health axiom of "detect, protect and treat" (Smith and Judd, 2020;World Health Organisation, 2020b).
Vulnerability is viewed as the harm proneness of people and assets if exposed to hazard events (Turner et al., 2003). Components of vulnerability can vary in different hazard contexts. With respect to COVID-19, it is thought to consist of three factors: exposure, sensitivity, and social vulnerability (Costa and Kropp, 2012). Exposure indicates the degree to which people and assets are exposed to harm (Frazier, Thompson and Dezzani, 2014;Pluchino et al., 2021), sensitivity is the extent to which system features respond to hazards (Zacharias and Gregr, 2005), and social vulnerability pertains to the capability of groups with different social, economic, demographic, and geographic characteristics to withstand risk (Cutter et al., 2003;Cutter and Finch, 2008;Kim and Bostwick, 2020). We consider the effective assessment of vulnerability as essential to assisting COVID-19 risk management. Related studies have fallen into three categories.
First, researchers have analyzed the correlation between internal factors (i.e., social vulnerability) and COVID-19 risk. They have typically concluded that groups and communities with more vulnerable social demographic characteristics suffer from higher infections and deaths. For example, impoverished people may lack medical insurance and be unable to afford medical expenses, leading to higher mortality rates (Baena-Diez et al., 2020;Bong et al., 2020). Minority groups with low social status may cluster in areas with poor public health conditions which increases their infection risk (Cordes and Castro, 2020;Franch-Pardo et al., 2020). People with poor immunity or comorbidities also have a higher mortality risk (Grasselli et al., 2020;Shi et al., 2020).
Second, research has aimed to account for external factors to provide a comprehensive assessment of vulnerability, which usually emphasizes exposure. For example, this has been accomplished by combining factors related to COVID-19 infection or mortality (such as public exposure and health system capacity) with those related to social vulnerability (Acharya and Porwal, 2020;Kiaghadi, Rifai and Liaw, 2020;Sarkar and Chouhan, 2021). The combination is typically performed through a summation aggregation operation, such as the equal weight summation or PCA (Principal Component Analysis) (Flanagan et al., 2011;Kim and Bostwick, 2020;Sarkar and Chouhan, 2021). Few studies have used multiplication for aggregation (Pluchino et al., 2021) to consider the interaction effects of internal and external factors.
The third category of research has emphasized extracting the characteristics of historical cases to improve risk management, which elucidates the role of sensitivity in vulnerability. For example, by analyzing the spatial trajectories and distribution of confirmed cases, researchers have detected places with higher infection risk, such as grocery stores, gyms, restaurants, and offices. (Jiang et al., 2021;Sun et al., 2020;Zhou et al., 2020a). This category of research has expanded the view of external factors. Although the indicators used improve risk warning and management by using empirical information derived from the current epidemic, perspectives remain incomplete due to a lack of internal factors.
In summary, the first category of research has primarily used inherent social vulnerability to reveal the unequal capabilities of people or communities to cope with risk. However, it is essentially an inherent factor that weakens its applicability in specific hazards, such as the COVID-19. The second category of research has addressed the importance of external exposure factors using the aggregation of both social and exposure factors. The aggregation is usually in the form of summation that weakens the interaction effects between the internal and external factors. Few studies are aware of this and have used multiplication. This type of research has greatly contributed to vulnerability assessment, while it ignores the historical analysis needed for realistic risk management of COVID-19, as in the sensitivity-focused research in the third category. Based on these previous studies, we aimed to develop a framework that integrates social vulnerability, exposure, and sensitivity to comprehensively assess community vulnerability to COVID-19 for effective risk management.
The framework consists of three parts. The first part includes selecting and calculating indicators related to social vulnerability, exposure, and sensitivity. These indicators will then be used to comprehensively assess vulnerability in the second part. Finally, we will evaluate the effectiveness of the framework and explore the roles of vulnerability in shaping community infection risk of COVID-19. By applying this framework to the case of Hong Kong, our study contributes in threefold aspects: (1) theoretically, it demonstrates the value of all considerations of internal social vulnerability, external exposure, and realistic sensitivity components for a robust and comprehensive assessment of vulnerability, (2) methodologically, it proposes a feasible framework and reveals the aggregation mechanism of the aforementioned three components of vulnerability assessment, and (3) in terms of social meaning, it can identify vulnerable areas and explore the role of vulnerability in community infection by COVID-19 to draw risk management suggestions.

Study area and data description
Hong Kong is a city and special administrative region of China with over 7.5 million residents living in a territory of 1,104 km 2 . The territory contains 3 regions subdivided into 18 districts for administrative management. It also contains 9 new towns developed to cope with population growth, as shown in Fig. 1(1). We chose Hong Kong as a research case for three reasons. First, Hong Kong has serious wealth disparity, with a Gini coefficient exceeds 0.539, far exceeding the 0.4 inequality line (Central Intelligence Agency, 2016). This suggests that Hong Kong has disadvantaged groups or areas necessary to study social vulnerability. Second, Hong Kong is the fourth most densely populated areas in the world with packed housing and mobility conditions, which greatly increases the likelihood of risk exposure and facilitates the study of the impact of external factors in the context of COVID-19. In addition, effective but costly countermeasures such as complete lockdowns have created a dilemma for Hong Kong due to its position as financial center and commercial port. Despite experiencing four waves of COVID-19 over the past two years, the city has never completely closed. These factors illustrate the importance of making an accessible and reliable assessment of community vulnerability to assist COVID-19 risk management in Hong Kong.
This study used the following four datasets.
To investigate the spatial variance in assessed community vulnerability, data on the 291 Tertiary Planning Units (TPUs) set by the Planning Department are used as basic scale for the community ( Fig. 1(2)). TPUs are distributed over the whole territory of Hong Kong and commonly utilized to support fine-grained urban studies. These data are publicly available from the website DATA.GOV.HK (Boundaries of Tertiary Planning Units & Street Blocks / Village Clusters |DATA.GOV. HK).
To fight the COVID-19 pandemic, the Department of Health created a platform (COVID-19 Thematic Website -Together, We Fight the Virus -Home (coronavirus.gov.hk)) to give daily updates on infections. We have compiled 11,971 released confirmed cases from January 23, 2020, to July 22, 2021 (Table 1). After eliminating data with missing locations, our final dataset contained 10,089 cases, which is the total number of daily cases during the data collection period, and each case has a valid location information.
Calculating social vulnerability requires sociodemographic characteristics to represent internal factors. We thus collected social, economic, demographic, and housing data from the latest census in 2016 provided by the Census and Statistics Department (2016 Population By-census (bycensus2016.gov.hk)).
(4) Public transportation data of Hong Kong. Hong Kong has a highly developed public transportation system, which serves as a popular means of daily travel. However, in the context of COVID-19, public transport in areas with a high population density may put people at high risk of exposure. Our study therefore considered public transport data including MTR, Franchised Buses, and Minibuses.
Hong Kong's extensive railway network, the MTR, forms the backbone of the public transport system. The highly developed public bus system complements the railway network and runs in places that the railway cannot reach. Franchised Buses, the main component of the bus system, have comparable passenger capacity to the railway. Minibuses offer a feeder service and serve areas with relatively low passenger demand or where the use of high-capacity transport modes is not suitable. Minibuses contain the Green Minibuses (GMBs) and Red Minibuses (RMBs). The latter were excluded from the study as they do not operate fixed service routes.
We downloaded the latest geodata of Franchised Bus stations and GMB stations from the Hong Kong Geodata Store website (Hong Kong GeoData Store), and all MTR stations were geocoded from Google Maps. Table 2 shows the public transportation information and statistics of the number of vehicles and capacity per day, published in Public Transport Strategy Study 2017 (Transport and Housing Bureau, 2017).

Method
To assess community vulnerability to COVID-19 for risk management, we developed a framework that included four main modules: indicator selection and calculation, social vulnerability estimation, vulnerability assessment, and influence evaluation of vulnerability on infection (Fig. 2).
(1) Indicator selection and calculation. Our study aimed to construct a comprehensive assessment of community vulnerability to COVID-19 with components of internal social vulnerability, external exposure, and realistic sensitivity.
Based on available data and previous studies (CDC/ATSDR, 2018; Song et al., 2020;Tiwari et al., 2021), we first selected 11 sociodemographic indicators from the census data for use in the social vulnerability assessment. These indicators were further divided into four themes for theme-related vulnerability assessment, including socioeconomic status, household composition, minority status and language, and housing conditions (CDC/ATSDR, 2018). Details and calculations of these indicators are shown in Table 3.
Given that COVID-19 is an infectious respiratory disease that spreads easily in high exposure areas and Hong Kong has packed housing and mobility conditions, we considered that not only the high population size as traditional studies adopted (Pluchino et al., 2021), but also the crowding experience in public transport may put people at high risk. Public transport load was calculated as an indicator by dividing population size by the number of public transport stations, as shown in Equation (1). The percentile rank of the load was then calculated to eliminate the influence of magnitude differences. Considering the differences in capacity for the transport types described in Table 2, weights were further assigned by their per day capacity, as shown in Equation (2).
where TL it represents the population load of type t public transport in TPU i, P i signifies the population in TPU i, and T it is the number of stations of type t public transport in TPU i.
where TL Pr it represents the weighted percentile rank of the load of type t public transport in TPU i, r TL it signifies the rank of the load of type t public transport in TPU i, N t is the total number of TPUs containing type t public transport stations, and C t signifies the per day capacity of type t public transport.
To realistically assess community sensitivity to COVID-19, we considered the age distribution of confirmed cases and assumed that the   more sensitive the area to COVID-19, the more consistent the age distribution of confirmed cases would be with that of the community population. The basis for this assumption is that in the early stages of the spread, there were more infections between people of similar age, given the high likelihood of interactions. Then, as the pandemic spreads and evolves, this age bias broke down, and the distribution approached the age distribution of the community population. This assumption is explored more thoroughly in the discussion section. Here, we focus on the measurement of the sensitivity. The main idea is to measure the variation in confirmed cases by age based on the age grouping criteria in the census data. We first calculated the ratio of confirmed cases in each age group to the total number of people in that age group. Then, the uniformity of the ratios of each TPU was measured to reflect the realistic sensitivity to COVID-19. In highsensitivity areas, there was generally high uniformity that indicated broader exposure among age groups, while the opposite was true of lowsensitivity areas. These calculations were accomplished using Equations (3) and (4).
where m i represents the mean of the ratios in TPU i, P ia signifies the population size of age group a in TPU i, c ia indicates the number of confirmed cases of age group a in TPU i, and n is the number of age groups, which was 5 in our data.
where std i represents the standard deviation of the ratios in TPU i and the other variables are as previously stated.
where AU i represents the uniformity of confirmed cases by age distribution in TPU i and the other variables are as previously stated.
(2) Social vulnerability assessment. Social vulnerability is commonly assessed by constructing an index. Considering the unequal dimensions between indicators, such as the dimensional difference between, e.g., "income is 10000" and "house size is 50", researchers used a ranking-based aggregation. The ranking operation can eliminate the problem of dimensionality and focuses on the differences among various research units, while a problem arises in those aggregations usually regard indicators as having equal weights (CDC/ATSDR, 2018; Kiaghadi et al., 2020). We recommended assigning weights to indicators before aggregation to address the varied importance of indicators using principal component analysis (PCA). Constructing the social vulnerability index then involved three steps: percentile rank calculation, weight calculation by PCA, and social vulnerability value calculation.
1) Percentile rank calculation. Percentile rank was calculated for all of the selected indicators described in Table 3 according to Equation (6).
where Pr ij is the percentile rank of indicator j in TPU i, r ij is the rank of indicator j in TPU i, and N j is the total number of TPUs with indicator j. For indicators with a negative influence, the rank value was reversed.
2) Defining weight by PCA. We performed a PCA on selected indicators in Table 3 to define their

Rent
Monthly rent of households weights. PCA is a common linear dimensionality reduction method that usually projects original data into lower dimensional space (Maćkiewicz and Ratajczak, 1993). In essence, the projection finds a linear combination of original variables to acquire dimensional data. Coefficients of the linear combination reflect the correlation and contributions of which the original data projects to dimensional data. The overall importance of each indicator in this projection process is measured by the overall ratings of the coefficient of each indicator, which we regarded as the corresponding weight. Let X = [x ij ] p×s represent the centralized original data, where X is normalized by mean value, s is the total number of indicators, and p is the number of TPUs. Matrix PC = [pc ik ] p×m is the m components derived by PCA. L = [l jk ] s×m signifies the principal axes, which represent the loadings of the original variables contributing to the corresponding principal components and parallel to the eigenvectors. E = [e k ] 1×m and Ep = [ep k ] m×1 are respectively the value and percentage of variance explained by each component in the PCA. From these, we calculated the weights of the indicators using Equations (7), (8) and (9).
where coef ′ is a matrix representing the score coefficient of each original variable to the PCA components, s is the total number of indicators, m signifies the number of components, l jk is the loading of the original indicator j to the projected component k, e k is the explained variance value by component k.
where coef is the comprehensive score coefficient that represents the overall contribution of each variable to the PCA components, ep k is the percentage of explained variance by component k and the other variables are as previously stated.
where weight j is the normalized weight of indicator j to the result of the PCA and the other variables are as previously stated.
The Kaiser-Meyer-Olkin (KMO) (Cureton and D'Agostino, 1993) measure and Bartlett's Test of Sphericity (Bartlett, 1951) were performed on the indicators before the PCA to demonstrate the strong correlations between the variables. The KMO tests the strength of the partial correlation between variables, which determines how suited data is for factor analysis. The value of KMO close to 1.0 indicates a strong suitability for factor analysis, and that less than 0.5 is unacceptable. Bartlett's Test of Sphericity tests the hypothesis that selected variables are irrelevant and not suitable for factor analysis. A significance value of less than 0.05 disproves this hypothesis and supports the inclusion of the tested variables. Typically, a KMO value greater than 0.7 and a significance value of Bartlett's Test of Sphericity less than 0.05 are regarded as standards for PCA.
3) Social vulnerability value calculation. After calculating the percentile rank and PCA weight, we aggregated the indicators of different social themes to obtain the thematic social vulnerability of each TPU i according to Equation (10). We then aggregated the thematic social vulnerability values to obtain the overall social vulnerability value, as in Equation (11).
where z represents the four themes in Table 3, SV iz signifies the social vulnerability value of TPU i estimated based on the indicators of theme z, and weight zj is the weight of indicator j of theme z derived by the PCA.
where SV i is the overall social vulnerability value of TPU i and the other variables are as previously stated.
The comprehensive calculation of vulnerability should aggregate all the above indicators. As discussed earlier, summation and multiplication are two main methods of aggregation (Flanagan et al., 2011;Kim and Bostwick, 2020;Pluchino et al., 2021;Sarkar and Chouhan, 2021). In Equation (12), without loss of generality, the external exposure and realistic sensitivity components were combined with social vulnerability by a multiplication operation.
where BV i is the vulnerability value of TPU i, TL Pr it represents the weighted percentile rank of the load of type t public transport in TPU i as shown in Equation (2), AU i represents the uniformity of confirmed cases by age distribution in TPU i as shown in Equation (5), and SV i is the overall social vulnerability value of TPU i as shown in Equation (11).
(4) Influence evaluation of vulnerability on infection.
To evaluate the role of vulnerability in shaping community infection, our framework analyzed both the correlations between vulnerabilities and the current COVID-19 situation using Spearman test (Spearman, 1961), as well as their spatial distribution patterns to provide insight into risk management. The number of confirmed cases was used as the indicator of COVID-19 situation. The correlations aimed to infer which types of factors had a greater impact on risk, while the spatial analysis explored the spatial variances in the different vulnerability factors. We then defined the rules identified by the indicators to classify vulnerable areas. Let R i = ( r 1 , r 2 , ⋯, r f ) as the classified risk type of TPU i, where r f is the symbol of TPU i on rule f. For example, if the uniformity of age distribution is an identified rule represented by f, then r f = 1 means that the sensitivity factor is vulnerable in TPU i, and otherwise r f = 0. By this approach and spatial overlay, all vulnerable TPUs returned a set of rule sequences to support the classification of risk types that were relevant to risk management.

Results
Based on the modules in our framework, we analyzed our results in four parts. The first two parts focused on correlation analysis. Specifically, the first part analyzed the correlation between the COVID-19 situation and social vulnerability, while the second part analyzed its correlation with vulnerability with consideration of external factors. In the third part, we analyzed the spatial distributions of the results related to the first two parts. The fourth part focused on the classification and investigation of areas of high infection risk to formulate suggestions for risk management.
(1) Thematic and overall social vulnerability. By applying the previously described method to selected indicators, we assessed the social vulnerability in Hong Kong. The PCA-related statistics are shown in Table 4. The P-value of Bartlett's Test of Sphericity was less than 0.05, and the KMO value was 0.85 (greater than 0.7), demonstrating that the PCA was applicable to our selected indicators. To keep enough variability, we selected the first two principal components which has an explained variance greater than 1 (Girden, 1996). The selected components have cumulatively explained 72.5 % of the variance ratio which is acceptable. The indicator weights obtained by the PCA are shown in Table 5. We calculated the thematic and overall value of social vulnerability in Hong Kong by applying this set of weights to the percentile rank values of the indicators using Equations (10)-(11). Table 6 shows their correlations with COVID-19 situation.
Both the overall and thematic values of social vulnerability had moderate correlations with the COVID-19 situation (Table 6), while different types of social vulnerability had different prominence in the correlations, even if they were not far apart. Areas with vulnerable socioeconomic status and vulnerable minority status and language may not have a prominently higher correlation to the number of confirmed cases but had a higher risk of mortality once diagnosed. Conversely, areas with vulnerable household composition and vulnerable housing conditions showed the opposite pattern. Compared to be infected, vulnerable minority or socioeconomic status is more disadvantage in treating the pandemic due to incomplete medical protection and limited savings. Household composition and housing condition are factors related to external environment to a certain extent, making them more relevant to the differences in the number of confirmed cases.
The above findings preliminary showed that factors linked to the external environment were more related to infection, although it is not prominent enough currently. We will explore this phenomenon in the following section.
(2) Vulnerability with consideration of external factors. The vulnerability measure was combined external exposure and realistic sensitivity to COVID-19 using Equation (12) to achieve a comprehensive assessment. The correlations of assessed vulnerabilities with COVID-19 situation are shown in Fig. 3. Given that the external factors more greatly influenced the number of confirmed cases, but not fatalities, Fig. 3 only shows the correlation values for the number of confirmed cases.
These results show that external factors certainly affected COVID-19 infection. By comparing elements vertically, the correlation values incrementally increased when considering the external factors. The correlation value between overall vulnerability and the number of confirmed cases increased from 0.48 to 0.68 or 0.74, and then to 0.79, revealing that the consideration of external factors is effective in comprehensively revealing COVID-19 infection risk.
External factors also had a stronger relationship to confirmed cases than internal factors. Comparing different elements horizontally showed that the correlation values related to overall vulnerability were lower than those related to housing condition and exposure, the latter of which was the highest. The value located at (BV (ep), exposure) was 0.72, while that at (BV (ep), overall) was 0.68. This result suggests that only considering external exposure yielded a more effective risk assessment than when internal factors were included. The value at (BV (ep&rs), exposure), representing the correlation between the two external factors (i.e., exposure and sensitivity) combined result and the number of confirmed cases, was 0.85, exceeding that of (BV (ep&rs), overall), which considered internal factors. This result also indicates the greater power of external factors compared to internal factors.
In addition, the results demonstrated the principle of multiplication among the components of social vulnerability, external exposure, and realistic sensitivity in vulnerability. Fig. 3 shows the result of the summation in the "overall+" column, which was inferior to that of the "overall" column assessed by multiplication, thus verifying the effectiveness of the principle of multiplication.
(3) Spatial distribution analysis of vulnerabilities. The spatial distribution elucidated the features of vulnerable areas and showed which combinations of components determined the COVID-19 situation in these areas (Figs. 4-7), revealing several findings.
First, most results effectively identified areas with a high spatial agglomeration of confirmed cases. However, the factors leading to this high agglomeration differed. For example, the high-level vulnerable areas in the south of Kowloon region occurred in districts with both a high population size and poor housing conditions, such as the Wong Tai Sin and Kwun Tong districts, suggesting that these factors were the two main causes of infection in these areas. As one of the new town centers, Sha Tin district is well-developed, but high mobility and crowded housing conditions elevated the risk of infection. The wealthy Eastern district in the northeast of Hong Kong Island also contained high-risk and sub-high-risk areas. Its large wealth gap and high population density placed vulnerable groups in this district at high risk, which also increased the risk of infection in the surrounding wealthy areas.
Second, the results with external indicators considered produced more credible and robust assessment compared to the thematic social vulnerability, which were highly varied. For example, areas in the suburbs could have high-vulnerability socioeconomic status but medium-level housing conditions or even low-vulnerability household compositions. Downtown areas were less likely to have vulnerable social status, but high human mobility greatly increased risk, which was ignored by social vulnerability. The assessment of vulnerability with consideration of external indicators had substantial advantages in accurately identifying the risk levels of these areas. For example, some suburbs in the northern New Territories had sub-high or medium-level social vulnerability, but a low infection risk, which was observed best in the comprehensive vulnerability assessment. Therefore, by considering both external exposure and realistic sensitivity, we produced credible and robust results.
Third, the two external factors contained in the vulnerability assessment showed strengths in different ways. External exposure highlighted the role of residential clusters in the spread of infection. As shown in Fig. 5(4), a high exposure risk tended to occur in areas with high residential density, such as eastern Sha Tin district, which contains residential areas like City One. The realistic sensitivity to COVID-19 helped identify which areas were more likely to form high-risk  *, **, *** signifies that the p-value of Spearman's test is less than 0.05, 0.01, and 0.001, respectively.

Fig. 3.
Correlations between the results of the vulnerabilities from different calculation schemes and the numbers of confirmed cases. "SV" represents social vulnerability, "BV (ep)" represents vulnerability with exposure considered. "BV (rs)" represents vulnerability with sensitivity considered, "BV (ep&rs)" represents vulnerability with both exposure and sensitivity considered. "overall" signifies the aggregated vulnerability value of the previously described method. "overall+" signifies the vulnerability results aggregated by summation. The "exposure" refers to indicators of external exposure. Fig. 4. Spatial distribution of social vulnerabilities. CPc signifies the correlation value and significance to confirmed cases, and CPf signifies to the fatalities.
transmission chains. As shown in Fig. 6(4), sensitive areas consistently occurred in the centers of Hong Kong's new towns (i.e., the planned satellite towns shown in Fig. 1(1)). Regionally developed infrastructure and a greater distance from the central business district facilitated strong internal connections, making high-risk internal transmission chains more likely. Finally, housing conditions had a great influence on the infection risk. As shown in Figs. 4-7, most areas with crowded or poor housing conditions had more infections, including the commercial Sha Tin district, the affluent Eastern district, and the crowded Wong Tai Sin and Kwun Tong districts. Given the high infectiousness of COVID-19, crowded housing produced more family transmission, and lower sanitary conditions in poor living facilities facilitated further infections. This result suggests that risk management should focus on people with poor housing conditions. The overall results aggregated by summation revealed areas with high levels of infections, but were not accurate in areas with other infection levels. This once again demonstrated that infection risk was determined by the multiplication of internal and external factors.
(4) Results of the classification and investigation of high infection areas.
Based on the previous correlation and spatial analysis, we defined four indicators as classification rules. The social status indicator was the sum of socioeconomic status, and minority status and language, which trended to reveal mortality risk. We then summed household composition and housing condition as the housing indicator, as they were internal factors with links to external infection. External exposure and realistic sensitivity were selected as the other two indicators. Areas including any indicator ranking in the first quantile were designated as high-risk areas and investigated. By performing spatial overlay and rule classification, results in Fig. 8 provide references for risk management.
Targeted policies can be designed for areas with risk dominated by specific factors. When risk mainly arises from sensitivity, policies should focus on restricting or adjusting activities, such as reducing the offline operation of commercial facilities and encouraging online purchasing. For risk dominated by exposure, which was prevalent in the residential clusters, public facilities such as parks and swimming pools should be closed. When risk mainly arises from housing factors, facilities prone to hygienic infection such as elevators and garbage drop points should be regularly disinfected, and supplies can be provided by the government.
For areas with risk dominated by several factors, policies must be more adaptive and effective. Fig. 8 shows that these areas were prone to be clusters, such that the multiplicative interactions of factors may spill into surrounding areas. In these cases, community lockdowns can be implemented and supplemented by targeted measures for specific factors.
Besides, housing-related infection covered more areas than those linked to other factors in Hong Kong, more and enhanced policies for this factor should be developed. Conversely, areas with vulnerable social status but less external risk had considerably lower infection risk. In summary, the results demonstrate the feasibility of our method and indicate that the components of social vulnerability, external exposure, and realistic sensitivity shape COVID-19 infection risk by areas. Importantly, analyzing the vulnerability factors of areas provides valuable insights for risk management.

Discussion
This study constructed a framework for the comprehensive assessment of vulnerability to COVID-19 by considering the components of social vulnerability, exposure, and sensitivity. The sensitivity analysis assumed that the more sensitive the area to COVID-19, the stronger the transmission chain and also the more consistent the age distribution of confirmed cases with the age distribution of the community population. To verify this assumption, we analyzed the age distribution of confirmed cases in different pandemic stages according to the infection relationships provided in our data with non-null "related case" fields in Table 1. Measuring by pandemic stage, rather than by region, allowed us to avoid the problem of insufficient data with respect to infection relationships.
We first filtered related data with non-null "related case" fields and divided them into corresponding stages of COVID-19 according to the date range in Table 7. A time interval was left between the early and outbreak stages as a buffer, and other stages are excluded.
Then, we calculated the coefficient of variation of age among related case groups according to the age distribution of confirmed cases using Equation (13).
where m r signifies the mean value of age within related group r and std r is the standard deviation of group r. coef r represents the coefficient of variation of age among confirmed cases. Fig. 9(2)-(4) show the age distribution of confirmed cases in different COVID-19 transmission stages. The distribution of the mean age of all confirmed cases and the population age distribution derived from the census were largely consistent, with the exception of those around the age of 40, indicating that COVID-19 in Hong Kong has widely spread among all age groups ( Fig. 9(2)). The unusually large number of confirmed cases around the age of 40 may have been due to the importance of this group in infecting other age groups. People of this age typically commute more and have greater responsibilities regarding the care of children and the elderly, making them more likely to generate cross-age infections.
To further explore the age features of confirmed cases, we compared the results of the early and outbreak stages. When focusing on the distribution of the coefficient of variation of age, the early values (maximum around 0.8, most are less than 0.1) were generally smaller than the outbreak values (maximum around 1.2, most are around 0.2). This result indicates that the age variance between the early infected cases was smaller than that of outbreak cases, which supported our assumption. With respect to the distribution of mean age, that of outbreak cases was most consistent with the population age distribution of the census, indicating that the age distribution of confirmed cases in the outbreak stage was uniform. Therefore, the analysis of the age distribution of confirmed cases in different stages of COVID-19 confirmed our assumption that there were more mutual infections between people of similar age in the early stages, but the age distribution of confirmed cases approached that of the community population as infections become wide-spread.

Conclusion
Reliable measurement of COVID-19 vulnerability is essential to risk management. Our study contributes to this emerging body of research by presenting a framework for assessing community vulnerability to offer a comprehensive explanation of observed differences in COVID-19 severity.
The framework consists of three parts. The first part focused on the selection and preprocessing of indicators for vulnerability assessment, including components of social vulnerability, exposure, and sensitivity. Social vulnerability examined the inherently social characteristics of people groups in fighting the pandemic. Exposure focused on external exposure risk based on the magnitudes of population and public transportation. Sensitivity showed the realistic transmission risk indicated by the uniformity of age distribution of confirmed cases based on our discussed hypothesis, which proposed that higher sensitivity areas had stronger transmission chains and a more uniform age distribution of confirmed cases. In the second part of the framework, we implemented social vulnerability assessments with consideration of the roles of thematic and overall social vulnerability in shaping the COVID-19 situation. Considering the impact of external factors, we then comprehensively assessed vulnerability by designing factors that represent the external exposure risk and the realistic sensitivity to COVID-19 before aggregating them with social vulnerability. The third part evaluated the roles of vulnerability in shaping infection risk. Using statistical and spatial analysis, we first identified indicators that were relevant to infection risk, and then classified high infection areas by the composition of vulnerability factors.
This case study on Hong Kong illustrates the feasibility and value of our framework in providing comprehensive vulnerability assessment for risk management. The results indicate that social vulnerability, external exposure, and realistic sensitivity shape community infection. Most factors effectively identified areas with high spatial agglomeration of confirmed cases, and the results showed that external factors were more influential than internal factors. In addition, the two external factors showed strengths in different ways. Exposure played a role in revealing the risks related to residential density, while realistic sensitivity was important to recognizing which areas were more likely to form high-risk transmission chains. Given that factors differed in their influences on infection, we classified the high infection areas and investigated their composition of related vulnerability factors to provide insights for risk Fig. 7. Spatial distribution of vulnerabilities with consideration of both external exposure and sensitivity. CP means the correlation value and significance to confirmed cases. + means the result is aggregated by summation.

management.
The main findings point to several suggestions for risk prevention and management. First, considering the high sensitivity in the new towns, the government could formulate epidemic prevention strategies to adjust internal interaction activities, such as reducing the offline operation of shops and encouraging online purchasing. Second, in the early stages of the pandemic, people of the same age group as that with the most confirmed cases should be more vigilant. Also, tracking the trajectories of confirmed cases in the group of people aged around 40 could help mitigate transmission across age groups. Third, focus should be placed on areas with vulnerable external factors. For example, for areas in which the risk mainly arises from exposure, such as residential clusters, public facilities could be closed. Fourth, due to the crowded housing in Hong Kong, housing-related infection was particularly influential, suggesting that relevant policies should be developed for mitigation. For example, facilities prone to hygienic infection such as elevators and garbage drop points should be disinfected. Fifth, for areas with risk dominated by several factors that could infect surrounding areas, lockdowns could be implemented and supplemented by targeted measures for specific factors. Lastly, areas with vulnerable social status but less external risk have considerably lower overall risk.
Although, our study demonstrates the feasibility of assessing comprehensive vulnerability for risk management, it can be improved in some ways. First, a wider range of data would have facilitated more indepth findings. For example, we did not explore more factors related to mortality risk due to the lack of medical facility data and comorbidity data. Second, the main data used were static census data, preventing the consideration of spatial interactions between areas. Third, our sensitivity factor only contained the uniformity of age distribution, and social vulnerability did not consider human and policy responses. Fourth, dynamic human mobility could be taken into consideration for a more effective exposure factor assessment if the data are available. All of these issues should be addressed in future research.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
We have shared the official links to our data in the article.