Not Everyone is Engaged: An Innovative Approach to Measure Engagement Levels on the Labor Market

Individuals' level of engagement on the labor market is hypothesized to be of critical importance for labor market outcomes. Based on a recent labor market survey in the Kingdom of Saudi Arabia (KSA), this paper develops an innovative index to measure individuals' labor market engagement across three dimensions: preferences, intensity, and barriers.The index can also to be used to cluster individuals with similar engagement levels to target labor market interventions more effectively and may be used to roughly measure the effectiveness of labor market interventions across time, as it can calculate the score for out-of-sample individuals. The index is computed individually and in aggregate for six labor market groups: employed, unemployed, and out-of-the-labor force, each separately for men and women.The methodology includes: (i) identifying labor subgroups and engagement dimensions, (ii) identifying relevant variables within each group and dimension, (iii) constructing an index for each group and dimension that captures the relative status of an individual against his/her reference group, and (iv) synthesizing the different engagement dimensions into a single indicator. Findings confirm the strong heterogeneity of labor market engagement in the KSA and the usefulness of differentiating interventions for job searchers depending on which cluster they belong to.


Introduction
Labor market engagement is of critical importance to labor market outcomes, but not everyone who participates in the labor force or who finds employment is engaged. Traditional measurements of employment, unemployment, and being out of the labor force (OLF) may not fully measure engagement levels. For example, a priori, it may be assumed that an employed person is more engaged than an unemployed person. However, the employed person may exert minimum effort at work while the unemployed person actively searches for a job. Traditional measures are sufficient to explain labor market outcomes and indicate the effectiveness of interventions only if the level of engagement between individuals is assumed to be homogeneous. In reality, that is not always the case.
To this end, and with the aim of filling this gap in labor market metrics, the paper develops an engagement index called the Relative Engagement Labor Index (RELI) based on three dimensions: the extent of individuals' preferences to be engaged; the intensity of the effort they are undertaking to be engaged, and the constraints they face to be engaged. A principal component methodology is adopted to construct the index using labor market survey data from the Kingdom of Saudi Arabia (KSA). Cluster analysis is then undertaken to profile subpopulations according to engagement levels, with the goal of targeting interventions to the emerging clusters.
The KSA was chosen for developing the index as its labor market has unique characteristics. The KSA labor market is segmented by at least three dimensions: (i) between KSA nationals and foreign workers; (ii) between the public sector, where most KSA nationals work, and the private sector, which is dominated by foreign workers; and (iii) between men and women. Further, both the open demand-driven admission scheme and the way oil wealth is redistributed in the KSA create major distortions in the labor market (Bodor and Holzmann 2016). These distortions may have an impact on the engagement levels of KSA nationals. Being able to quantify the heterogeneity of labor market engagement in the KSA is central for the design and implementation of better policy interventions. For the sake of clarity, the current paper focuses only on KSA national workers; however, a natural extension for this paper is the study of foreign workers.
The purpose of RELI is therefore fourfold: • To establish the scope, depth, and heterogeneity of labor market engagement for national labor market groups by socioeconomic characteristics. This should inform policy makers on the size of the problem. • To use these disaggregated results to design and direct policy interventions toward groups with low engagement levels. A successful profiling of engagement-distant groups is expected to emerge as an operational and effective approach. • To detect relevant differences in aggregate results across all labor market groups by socioeconomic characteristics, which may offer guidance about policy gaps and intervention opportunities. • To suggest a set of questions that can be used to evaluate interventions applied between different measurement periods.
Few attempts have been made to move well beyond traditional labor market categories and to exploit household data and ad hoc surveys for some measure of engagement. The International Labor Organization (ILO) and the World Bank use ADePT software to translate household survey data into ready-to-use analytical labor market tables (Pietschmann et al. 2016). In the U.S. a Labor Market Engagement Index aggregates levels of employment, labor force participation, and education levels to measure geographic differences in engagement across countries (data.world). Last but not least, measures of labor intensity (occupation, days, and hours worked) are also used to explain differences in BMI (body mass index) and to explore their causal link.
To the best of this paper's authors' knowledge, RELI is the first attempt to use a labor market survey to construct an engagement index within the traditional categories of employed, unemployed, and OLF. In contrast, the methodology of index construction applied in this paper -the Principal Component Analysis (PCA) -is much more widespread and well developed. Many examples exist in the literature on PCA applications in economics; a few of them are cited in this paper. Cordova (2008), for example, uses PCA to construct a relative wealth index based on household assets for 21 Latin American and Caribbean countries. Fuchs et al. (2018) use PCA to reduce the dimensionality of demographic variables when forecasting labor participation. Filmer and Pritchett (2001) use PCA to construct a linear wealth index from asset ownership indicators for Indonesia, Pakistan, and Nepal and correlate the index with school enrollment. Huh and Park (2018) use PCA to develop a composite index to measure the degree of regional integration in Asia. Drafor (2017) uses PCA to reduce the number of variables in the Ghana Living Standards Survey to construct a spatial index, analyzing the spatial disparity between rural and urban areas.
The paper is organized as follows. Section 2 presents the data and methodology used to construct the index. Section 3 presents findings based on application of the index to the KSA. Section 4 translates the findings into potential policy applications. Section 5 summarizes key lessons learned while constructing the index. Finally, section 6 concludes. A comprehensive annex details some of the results.

Index Data
Data from a labor market survey of KSA nationals were used to construct the index. The survey was conducted between November 2015 and January 2016. A total of 4,939 KSA nationals were sampled via a tightly controlled quota sample whereby interviewers had to recruit respondents to meet a set of criteria on key respondent characteristics. These quotas were derived for economic activity status (OLF, unemployed, and employed), age group, and gender, all by province.
Questions in the survey included information on the characteristics and employment status of Saudis, attitudes to work and barriers for women's employment and participation in the labor market, job characteristics for those employed, economic history for those interviewed dating back to the last 10 years, job search intentions and efforts for both employed and unemployed Saudis, income information, and views on certain interventions being implemented in the KSA.

Index Methodology: Framework and Strategy
This section outlines the methodologies used to construct the index and to identify clusters of individuals who exhibit similar levels of engagement.

Construction of the Relative Engagement Labor Index (RELI)
The methodology for constructing the index encompasses the following four steps: (i) identifying labor subgroups and engagement dimensions, (ii) identifying relevant variables within each group and dimension, (iii) constructing an index for each group and dimension that captures the relative status of an individual against his/her reference group, and (iv) synthesizing the different engagement dimensions into a single indicator. 4 Step 1: Identifying relevant groups and dimensions Groups: Labor markets are embedded within social structures and certain groups of individuals are different from one another and should not be pooled together when constructing and analyzing the index. One way to identify these groups is by considering variables that have strong social implications regarding the way individuals behave, such as gender and work status. Women and men face structurally and historically different social realities, and if pooled together, may bias identification of the vulnerability of the population in each group. Similarly, while work status does not trivially identify engagement, being employed, unemployed, or OLF may also embed social realities and affect individuals' behaviors. RELI is therefore calculated for six groups: (i) OLF men; (ii) OLF women; (iii) unemployed men; (iv) unemployed women; (v) employed men; and (vi) employed women.
Dimensions: This paper defines engagement as a combination of three different dimensions that jointly determine the level of interaction of the individual with the labor market and allow measurable comparisons between individuals. These dimensions are: • Barriers: Social contexts can, in some instances, preclude individuals from offering their work (social barriers). Also, firms may not be interested in the skills and capacities that individuals are willing to offer (technical barriers). This dimension measures the social and technical barriers. • Preferences: Contrary to the previous dimension, preferences explore the willingness of individuals to offer their work. This dimension measures attitudes toward work and how much importance individuals attach to having a job. • Intensity: Intensity measures how committed people are to their job (or to their job search).
Although preferences and intensity both highlight individuals' willingness to work, preferences focus on the breadth of individuals' willingness to work (i.e., attitudes toward characteristics of a job), while intensity focuses on the depth (how much of each activity a person is willing to do).
Step 2: Identifying variables per group and dimension The three dimensions (barriers, preferences and intensity) are inherent and unobservable, and thus, it is difficult to measure them in a single question. Hence, the paper presents a methodology based on principal component analysis to extract latent variables based on the survey questions. For each dimension, several survey questions were chosen based on their relation to the latent variable. These questions are described in the annex.
Step 3 The second stage uses PCA to extract the common information of these variables (Johnson and Wichern 2007).
Applications of PCA traditionally use Pearson correlations to identify the common information between variables. Examples of their applications can be found in Filmer and Pritchett (2001) and in Vyas and Kumaranayake (2006 Howe, Hardreaves, and Huttly (2008) offered a solution calculated with polychoric correlations. Whereas this methodology is mentioned, it was not considered for two reasons: (i) PCA scores obtained from polychoric correlations cannot be reconstructed for individuals who were not in the baseline. Hence, it would be very difficult to design policy evaluations using it; and (ii) the strong similarity of Spearman and Pearson results suggests that the relevant information is captured using these two methods, which are easy to extend for policy evaluation scenarios. 3 The current results were calculated using the statistical software R. It was chosen due to its flexible format, which allowed PCA analysis with correlation matrices different from those using Pearson correlations.

6
Based on the previous results, all the variables are positively correlated. This is consistent with step 2 and reveals that some common information is being manifested in these answers -in this case, a preference index.
Once the correlation matrices are obtained, the two first principal components (the first and second eigenvectors of these matrices) are calculated. Whereas the first component is the index presented in Table  2.3, Figure 2.1 shows both the first and second components, as the latter may contain additional important information. For example, for OLF women, the two variables "reasons_not_working" and "work_attitudes" have the highest weights in the first component. The other two variables give complementary information, but because they also hold additional information, their weights are lower in the first component.  The previous procedure is replicated for each group and each dimension.
Step 4: Synthesizing information The last step aggregates the dimensions into one unique measure of engagement using two methods. The first method, called RELI with fixed weights (RELIF), claims that each of the components is equally relevant. After scaling them, each one is multiplied by a third and then summed together to create the RELIF: With this index, individuals who are above the average in each dimension will have higher levels of engagement than individuals below the average.
The second method, called RELI with endogenous weights (RELIE), replicates step 3 over the first and second components of each dimension. Whereas the relevant component is the first one, as it was previously observed, the second component provides additional information; thus including it reduces the noise of RELI. The benefit of this procedure is that weights are calculated based on the data. The problem is that the dimensions capture different and independent features of engagement. Thus, as observed in the biplots of Figure 2.2 and in Table 2.4, the index does not accurately capture the relevant effects. For this reason, even though the weights are consistent with the data, RELIF is considered more appropriate and is thus used in the analysis. 4 Figure 2.2: RELI biplot of OLF women

Identification of Clusters
This section outlines the methodology for identifying clusters with shared characteristics in the three dimensions (preferences, intensity, and barriers). A hierarchical cluster is performed using Ward's method over the dimensions (Rokach and Maimon 2005). 6 The method is carried out at the dimension level rather than in aggregate as it captures more information about the particularities of each cluster. The case of OLF women is presented in Figure 2.3.

Figure 2.3: Hierarchical clustering for OLF women
Although there is no consensus about the best way to determine the number of clusters, too many clusters may not be useful to policy makers because policy may end up being case specific. On the other hand, having no clusters creates issues because policies will equally treat individuals with heterogeneous conditions. Due to these considerations, and based on the previous dendrogram, this paper generates four clearly defined clusters. Following the cluster identification, it is possible to measure the average value of each dimension in each cluster as presented in Figure 2.4. Cluster 4 is characterized with high values of each dimension, cluster 3 has average values, cluster 2 has negative values, and cluster 1 has very negative values, except for the intensity dimension, which is considerably high. This information allows for characterization of each cluster with its associated individual traits. Section 3 provides a more detailed explanation of these findings. 9

Aggregation of Information Across All Six Groups
This section describes how to analyze each dimension independent of the six groups (unemployed, employed, OLF, by women and men). A priori, this exercise seems unachievable because each dimension in each of the six groups was built with different variables and weights. Nevertheless, since step 4 standardizes each dimension, it is possible to aggregate all the available information into a common frame.
To illustrate this, the following exercise is performed over age ranges: For each age range (i.e., 18-24, 25-34, 35-44, 45-54, 55-64), the average values of the preference dimension are calculated. The results are presented in Figure 2.5. The preference index for individuals aged 18-24 is concentrated around 0.3. The youngest cohort's preferences are therefore 0.3 standard deviations above the average of their own group. Each group has a different standard; e.g., for unemployed groups, higher preferences suggest more flexibility looking for a job, while for employed groups, higher preferences suggest working shifts and extra hours. Thus, the meaning of a standard deviation is different for each group. Yet independent of the standard for each group, younger individuals are about one-third of a standard deviation above the reference of their group. Furthermore, this exercise shows that as age increases, the preference index is systematically lower and that the final age groups are around 0.2 standard deviations below the average. By replicating this procedure for different policy variables (e.g., education level, province, etc.), it is possible to characterize the way in which these variables are associated with each dimension and RELI.

Construction of RELI for Out-Of-Sample Individuals
RELI and its dimensions approximate the engagement of an individual relative to the average of his/her group. Due to scaling, however, it is not straightforward to assign index values to individuals out of the sample. The inclusion of new individuals has two central uses in policy design and evaluation, as discussed below.
Targeting Thresholds: Many policies are designed to concentrate resources on individuals facing difficult conditions. For example, consider a policy that aims to help individuals with the most barriers. In this scenario, policy makers would need to identify whether the barrier dimension of a certain candidate is below a given threshold. Therefore, there must be a methodology capable of identifying the index value of an individual who was not part of the original sample.
Policy Evaluation: In contrast to the previous case, consider a group that has been identified for policy intervention. In this situation, the challenge for policy makers is to evaluate the policy's effect on the group over time. This comparison could measure the improvement of the group as a first step toward a rigorous impact evaluation.

Targeting Thresholds
This section explains how to calculate the index value of an individual who was not part of the original sample. For this purpose, recall that the formula used to calculate the dimension value is: where ( ) is the value of the dimension of individual , is the value of variable for individual , � and are the mean and standard deviation of this variable in the group, respectively, is the weight of that variable in the component, and is the overall standard deviation of the

Policy Evaluation
Using the equation from the previous section, this section considers a temporal extension of the index's construction. For illustration purposes, this section uses an example based on the barriers dimension.
All the variables are now indexed in time. Let be the value of variable for individual during period , ���� and are the respective mean and standard deviation of this variable in time . In that same way, is the PCA weight of variable constructed with the information of year and , its corresponding standard deviation. Finally, let be the focus group, which has individuals, and ( ) the average barrier value of that group. A focus group can be any group of individuals who need to have their score calculated after the baseline is defined.
For example, consider a set of individuals from the original sample who want to be tracked one period in the future. From the equation used in the previous subsection, where ( ) �������� stands for the average value of variable for an individual of group .
The evolution of the group index is then represented by: The standards (weights, means, and standard deviations) are fixed at time . Policy makers can thus measure changes in engagement between two periods of time. The methodology can also include situations where there is a control and a treatment group. In that case, the policy maker can use the standards of the control group to compare the evolution of the treatment.

Aggregate and Group Indices and Individuals' Characteristics
The relative labor engagement index constructed at the aggregate level or for each of the six population groups can be used to determine how engagement levels differ by individuals' characteristics. For illustration purposes, the paper presents results by age group for the aggregate index, and by education and sector of employment for the six group indices, using data on the Saudi labor market.

Aggregate Index by Age
Overall, younger individuals have higher engagement levels than their older cohorts, a result mainly driven by higher preferences and intensities. While a young individual (aged 18-24) is 0.3 standard deviations above the average of his respective group in terms of the preference dimension, an older individual (aged 55-64) is 0.3 standard deviations below the average of the corresponding group. In terms of the intensity dimension, an increase of 0.04 standard deviations arises between the two youngest cohorts, yet after that the intensity decreases until it is 0.16 standard deviations below the average for the oldest cohort. Finally, the effect of age on barriers is ambiguous. Whereas the young population might have more education and find it more acceptable to work, the older population has more experience. Therefore, age is not directly related to barriers.

Group Indices by Education
The analysis on educational attainment is conducted for each dimension and for the overall engagement index.
Barriers: Systematically across groups and unsurprisingly, having a secondary degree or below lower ones' barriers relative to having a higher degree. The comparison between vocational education and a bachelor's degree is less clear. For women, vocational education reduces barriers more than a bachelor's degree. The opposite holds true for men. This difference in signs suggests jobs that are available for women require personnel with technical skills, while men's vacancies demand higher studies.

2: Barrier Index Average by Educational Level for Six Labor Market Groups
Intensity: Individuals differ strongly in intensity levels based on their working status but not as much on their gender. For employed individuals, the lowest intensity is in the group with the highest education (-0.17 standard deviations for women and -0.21 standard deviations for men) while the highest intensity is obtained by people with vocational education (0.24 standard deviations for women and 0.33 standard deviations for men). This difference can be a consequence of the KSA's strong public sector, which hires bachelor's degree holders but does not incentivize higher work effort. For unemployed individuals, the opposite tendency occurs. In this case, the lowest intensity is in the group with the lowest education (-0.67 standard deviations for women and -0.17 standard deviations for men), while the highest intensity is obtained by people with the highest education level (0.73 standard deviations for women and 0.46 standard deviations for men). These numbers also highlight that these differences are stronger for women. One possible explanation of the ordering is that educated people who are willing to work have better knowledge of how to apply for jobs, and therefore apply more often. Finally, for the OLF, people with a vocational education have the highest values (0.19 standard deviations for women and 0.17 standard deviations for men), while highly educated people have the lowest values (-.08 standard deviations for women and -0.07 standard deviations for men). This might suggest that people with vocational education have enough skills to find a job if needed, so they might be keen to work in the future. In contrast, highly educated individuals may have both the wealth and willingness to keep studying, and thus may not feel a strong need to look for a job in the future.  Preferences: Individuals differ strongly in preference levels based on both their working status and gender. The highest preferences for unemployed men and women are those with vocational education. Women and men's preferences, respectively, are 0.68 and 0.47 standard deviations above the group average. On the other hand, employed women and men with tertiary education have lower preferences (0.31 standard deviations below the average for women and 0.14 standard deviations for men). Finally, for the OLF, the variation is low and inconclusive.

Figure 3.4: Preference Index Average by Educational Level for Six Labor Market Groups
Overall, after aggregating the three dimensions of the index, the results show that unemployed men and women with tertiary education have the highest engagement levels. For employed individuals, the highest engagement is manifested in people with vocational studies (0.69 standard deviations for women and 0.71 for men). As for OLF individuals, results differ by gender. Women with vocational studies have the highest engagement levels while this is the true for men with bachelor's degrees.

Group Indices by Sector of Employment
Public employers have fewer barriers than private ones. For women, the difference between these two groups is 0.39 standard deviations, while for men the difference is 0.16 standard deviations. In contrast, public employees manifest lower intensity and preferences. For men, the differences are -0.17 and -0.36 standard deviations, respectively, while for women they are -0.35 and -0.78 standard deviations. This implies that people in the private sector have a more positive attitude toward work and may work more hours and exert more effort. Due to the strong effect on preferences and intensity, the overall effect of engagement is lower for public employees.

Results from the Cluster Analysis
Section 2 described the clustering methodology, whereby individuals belonging to each of the six identified labor market groups were divided into four clusters based on their preferences, intensity, and barriers. These clusters were ordered such that individuals in cluster 1 have the lowest RELI average, while those in cluster

Unemployed Men
Characteristics and the engagement level of four clusters of unemployed men are described below 7 .

Characteristics of Clusters
Socioeconomic and demographic variables were not used during the construction of RELI and the clustering analysis. However, the post-cluster analysis identification indicates that the four clusters are very different in terms of their characteristics: • Education: In general, unlike women, unemployed men are largely of low education (62% with secondary or below). However, within the unemployed men group, the least engaged tend to have the lowest educational levels; clusters 1 and 2 have significantly lower levels of education than clusters 3 and 4. The percentage of individuals with secondary education or below is 78% in cluster 1 and 88% in cluster 2. Cluster 2 has an even lower level of higher education than cluster 1. In contrast, around 53% of individuals in clusters 3 and 4 have a secondary education or below. Cluster 3 has a slightly higher rate of unemployed men with vocational and technical education (17.65%), while cluster 4 has more individuals with a bachelor's education or above (34%). • English Skills: Similar to education, clusters 1 and 2 have the lowest levels of English proficiency -53% and 58% of their populations, respectively, do not speak English at all. Cluster 3 follows -39% of its population has no knowledge of English, but 27% has a good English level. Finally, in Cluster 4, the most educated one, only 27% of its members are without knowledge of English, while 37% speak it properly. • Age: Clusters 3 and 4 have the youngest populations. Cluster 2 also has a large portion of young people -80% are below 34 years old -but one-fifth of its individuals are 35-44 years old. Finally, cluster 1 has the oldest population profile -only 62% of its members are below 34 years old, and about one-fifth of its members are 45-54 years old. • Marital Status: Clusters 2 and 4 have the highest shares of single men (91%), while cluster 1 has the lowest share (60%).

Figure 3.8: Cluster Characteristics -Unemployed Men
Engagement Level of Clusters Three dimensions determine individuals' level of engagement and accordingly the cluster in which they belong to. Variables from the data were used to construct each of these three dimensions. For example, questions on work attitudes and types of jobs the unemployed are willing to accept were mapped to the preferences dimension; self-identified reasons on why the individual is not working and barriers faced while searching for jobs were mapped to the barriers dimension; and search actions and period and updating of CV were mapped to the intensity dimension. Analysis post cluster identification also looked at differences between clusters in terms of the variables used to construct the engagement level. 8  • Cluster 1's RELI is low due to its preferences and intensity levels. Individuals in cluster 1 are the hardest to place as they are the least engaged. Cluster 1's preferences are 1.9 standard deviations below the average individual. This is driven by the fact that 90% of its members are not willing to relocate, more than 40% are not willing to work shifts, only 16% are willing to work more than 40 hours per week, and they have the lowest share of individuals with the highest attitudes 9 (34%). Cluster 1's intensity is also very low -1.17 standard deviations below the group average. Around 76% of them updated their CV more than a year ago, 68% have applied at most to one job, and most of them barely spent any time searching, looking mainly at websites for jobs. Finally, in terms of barriers, this cluster fares 0.27 standard deviations better than the average unemployed men.
• Cluster 2 has the lowest average on the barrier dimension, at 2.9 standard deviations below average. Due to their low education levels, lack of English, and young age, this cluster's members confront strong technical barriers when applying for jobs. Indeed, 72% of them consider their education as a barrier, 58% were affected by their lack of skills, and 72% believe that they lack work experience. Yet their intensity is considerably high, about 0.5 standard deviations above the average individual. In contrast to cluster 1, about 80% of them updated their CV during the previous year, and 60% applied for more than one job. Cluster 2 also has positive preferences, about 0.15 standard deviations above the average. This is driven by high work attitudes, as 64% of cluster 2 members have very high attitudes toward work, only 22% are not willing to work shifts, and 44% are willing to work over 40 hours per week.
• Cluster 3's intensity is the lowest of all clusters, at 1.67 standard deviations below the average. Around 78% of its members updated their CV more than a year ago, and they have applied to one or no jobs at all. However, cluster 3 has the lowest barriers. Its individuals are 0.8 standard deviations above the average. This is due to their higher levels of education and English skills. Cluster 3's 9 Individuals who have the highest attitudes are those who agreed or strongly agreed with all of the following six statements: Life without work is very boring; I believe self-reliance is the key to being successful; Working is an important part of who I am; I always look out for opportunities for improving my situation; I find a hard day's work very fulfilling; and I would rather be at home than go to work (negative value). preferences are also 1.0 standard deviations above average. Many are willing to relocate for jobs, work more than 40 hours, and do shifts, and they have the highest work attitudes.
• Cluster 4 is the easiest to place and above average in all dimensions (average barrier 0.57, average intensity 1.22, and average preference 0.16). They are more flexible about the jobs they can accept and exert search effort and undertake many actions to search for jobs.

Unemployed Women
Characteristics and engagement levels of the four clusters of unemployed women are described below 10 .

Characteristics of Clusters
Women's clusters are also very different in terms of their characteristics: • Education: Unemployed women are highly educated -52% have a degree above secondary level. Unlike men though, women's level of engagement is not correlated with their educational level. 10 The four clusters have the following sample sizes: 56 in cluster 1, 59 in cluster 2, 30 in cluster 3, and 35 in cluster 4. • English Skills: Cluster 1 has the highest share of individuals without knowledge of the English language (39%). Even though their educational level is low, cluster 3 has the highest share of women with good English skills (53%). • Age: Most unemployed women are aged 25-34. Cluster 1 has the oldest population range: about one-third of its members are above 35 years old. In contrast, clusters 3 and 4 are mostly young women, of whom around 84% are less than 34 years old. • Marital Status: Marital status is correlated with engagement level, whereby the least engaged has the highest portion of married women.

Engagement Level of Clusters
The analysis in this subsection is similar to that of unemployed men. However, additional questions asked of women were used to construct the three dimensions, such as self-reported barriers stemming from family restrictions, views on gender mixing in workplaces, etc. (see annex for a complete list of questions). The level of engagement and profile of the four clusters of women analyzed from the data are as follows: • Cluster 1 is the hardest to place and is below average in all three dimensions. Barriers are 1.0 standard deviations below the average, mainly driven by technical barriers as this cluster has the lowest education and English skills. Cluster 1's intensity is also 1.3 standard deviations below the average. Around 90% of these women have not applied for any job, and 62% did not update their CV in the previous year. Their main search action is to check for employment on websites. Finally, regarding preferences, there is almost no willingness to relocate (only 10% are willing to do so). Paradoxically, these women have a high reservation wage (see next section).
• Cluster 2 is also hard to place, with very high barriers (0.7 standard deviations below the average) and low preferences (1.2 standard deviations below average). Cluster 2 is highly educated but faces social barriers. Moreover, their preferences are low; for example, many are not willing to work shifts. However, cluster 2's intensity is higher than average -30% updated their CVs in the last year. These women also have high reservation wages (see next section).
• Cluster 3's intensity is the lowest of all clusters (1.6 standard deviations below the average). Cluster 3's members have high engagement regarding preferences and barriers. More than 80% have a positive attitude toward work. However, their main effort for job searching is looking through traditional advertisements and many do not apply for jobs.
• Cluster 4 is the easiest to place and is above average in all dimensions. It is composed of highly educated, young, and single women. Their search intensity is high -about four-fifths have applied to two places, almost all of them recently updated their CV, and one-third frequently check employment advertisements.

19: Barriers -Unemployed Women
Reservation wages: Comparison between unemployed men and women All clusters of unemployed men have a similar reservation wage distribution, centered around 5,000-10,000 SAR/month. The dispersion is lower for clusters 2 and 4, which have about 65% of their members around the mean. For clusters 1 and 3, the dispersion is higher yet very concentrated.
In contrast to unemployed men, unemployed women have systematically different reservation wages. Clusters 1 and 2 have very high reservation wages: the mode of the distribution is between 5,000-10,000 SAR/month. This might be due to social barriers and preferences, which may lead women to only accept a job if the salary is high.
Further, a significant share of cluster 2 has a bachelor's degree, which may be aimed at getting a publicsector job that pays more, thereby influencing wage expectations. Cluster 3 has the lowest reservation wage, with a mode of 3,000-40,000 SAR/month. This is consistent with a young population that is willing to work and accept lower wages to join the labor market. Cluster 4's reservation wages exhibit a bimodal distribution.

Policy Applications
The findings from the cluster analyses disaggregated by the six population groups can be used to propose targeted interventions for each cluster, including an effective profiling procedure for jobseekers. The index may also be used to signal progress in engagement and the effectiveness of interventions between two periods of time, if the survey or the index-relevant subset of questions is repeated. Table 4.1 summarizes and compares the levels of RELI components by the six labor market groups and for each of the four clusters. This gives 24 rows and hence 24 potential labor market intervention and implementation proposals that are linked to the labor market engagement of a cluster. The lowest and the highest cluster numbers (i.e., clusters 1 and 2 of the OLF women and men groups, and clusters 3 and 4 of the employed men and women groups) are not considered a priority in this paper when proposing policy interventions, however. The lowest clusters represent the most difficult groups in the population -they have the lowest engagement and likely require the highest effort to engage. A similar but opposite consideration is suggested for clusters 3 and 4 of the employed group of both genders. They are not only employed but also the most engaged in the labor market. As a result, any intervention is likely to have a small marginal return. The paper thus focuses on clusters 3 and 4 of the OLF men and women groups, and clusters 1 and 2 of the employed men and women groups.

OLF Men and Women -Clusters 3 and 4:
Cluster results suggest interventions that focus on basic skills gaps. These skills gaps exist at the level of cognitive and noncognitive skills and can be addressed with educational retrofitting, teaching of labor market basics and job-search methods, and similar interventions; they should address identified labor market barriers and intensity issues. To address preference issues predominantly requires changes in attitude and behavior, which can be efficiently addressed by social marketing interventions, such as new role models presented in TV series. Employed Men and Women -Clusters 1 and 2: These groups profit most from: finessing their skills while working (e.g., working with employers to strengthen on-the-job learning and in-job training); increasing their job mobility (across firms and regions, and for some, across professions); and directly addressing barriers through gender-specific interventions and improved labor exchange services, especially for women in cluster 1, who have low education levels. Further, cluster 2 members mainly work in the public sector with below-average levels of preferences and work intensity; as such, improving public sector performance would be key to increasing their engagement levels.
Unemployed Men and Women -All Clusters: When profiled, personalized interventions can be implemented to assist individuals in finding employment. An example on profiling of the unemployed and identifying personalized interventions is illustrated below based on findings from section 3.

Unemployed Men
• Cluster 1: They are the least engaged, with low preferences and search intensity levels. Individuals with this profile have low education levels and English skills. They have the highest share among all profiles of old and married men. Preferences are low, as many are not willing to relocate for jobs, nor to work extra time or shifts, and they have low work attitudes. Intensity is low -many did not apply for more than one job, and they barely spent time updating resumes and searching for jobs. Cluster 1 would benefit most from interventions focusing on increasing the level of engagement in all dimensions. This implies interventions that would increase their level of education or provide them with job-specific skills, but also interventions that would change their attitudes toward work and help them with job searches. This profile is quite likely the hardest to activate.
• Cluster 2: They have the highest barriers. Individuals in this group have the lowest education levels and English skills among all profiles. Many face strong technical barriers when applying for jobs and almost all are young men. Their search intensity and preferences are higher than the average. They also have high reservation wages despite their low skills. Cluster 2 would benefit most from interventions that would increase their level of education or provide them with job-specific skills. They would also benefit from setting interventions that help them to adjust the right expectations.
• Cluster 3: They have the lowest search intensity. Individuals in this group mainly check websites and newspapers for jobs and do not really know how to apply for jobs even though their preferences are high and barriers low. Around 18% have a vocational training education, 50% secondary or below, and the rest a bachelor's degree. Further, 40% are not proficient in English, which may play a part in how they search for jobs. Cluster 3 would primarily require intermediation services such as information on available job opportunities, job-search assistance, counseling, or support on how to prepare a resume or for an interview.
• Cluster 4: They are the most engaged, with the highest preferences and intensity and lowest barriers. Individuals with this profile are young and better educated. Similar to cluster 2, reservation wages are high. Cluster 4's members are young and market-ready but have slightly lower preferences. It may be the easiest profile to find jobs for if the right expectations are set. .

Unemployed Women
• Cluster 1: They are the least engaged and are below average in all three dimensions. Women with this profile have the lowest education level and English skills. They have the highest share among all profiles of old and married women. Their work attitudes are low -many do not exert any search effort, and they face technical and social barriers. Paradoxically, they have high reservation wages. Cluster 1 is the hardest to place and would benefit most from interventions focusing on increasing the level of engagement in all dimensions.
• Cluster 2: They are also hard to place, with high barriers. Individuals with this profile are highly educated but face many social barriers. Their search intensity is higher than average, though. Their reservation wages are also high, driven by their preference for public sector jobs. Cluster 2 would primarily require intermediation services focusing on jobs that may be attractive to these women.
Interventions need to also focus on changing mindsets through use of behavioral tools.
• Cluster 3: They have the lowest search intensity. They are young with low education but good English skills. A significant share is not married. They have high attitudes toward work, but their main job search effort is looking through traditional advertisements. They also have the lowest reservation wages of all clusters. Cluster 3 would primarily require intermediation services such as information on available job opportunities, job-search assistance, counselling, and support on how to prepare a resume or for an interview.
• Cluster 4: They are the most engaged, with the highest preferences and intensity and the lowest barriers. They are highly educated and young women with a significant share that is not married. Cluster 4 members have the right preferences and intensity and low barriers, and they are market-ready. The actual barrier is likely to be effective labor demand. Table 4.2 summarizes the profiling intervention proposals. This should provide a better overview, allow for consistency checks, and inspire thinking about interventions for clusters by the three engagement dimensions -preferences, intensity, and barriers. Table 4.2 raises three main observations: (i) measurement of engagement level by three independent dimensions allows developing interventions for each dimension separately; (ii) the level of the engagement dimension offers first indications on how much an intervention is needed; and (iii) determination of specific interventions that are both needed and most promising requires deeper analysis of the survey results. Table 4.2 also suggests that not all engagement dimensions in all clusters require an individualized intervention. The lower the overall engagement index/cluster number, the more interventions are seemingly needed. This is the simple result that a lower-rated cluster signals deficiency in more than one or even all three dimensions. Higher-rated clusters require few or even no interventions, such as unemployed women in cluster 4.

Signaling Policy Effectiveness and Progress in Engagement
RELI can be applied to provide early signals of the effectiveness of an intervention for a subset of labor market participants. For example, consider an intervention for unemployed women in Cluster 1 to increase the intensity of job search. If the intensity of individuals is measured through the appropriate questions before and after the treatment, a measure of progress in the intensity dimension can be constructed. To this end, the weights of the pre-treatment intensity measurement need to be fixed and applied to the post-treatment intensity measurement (as an out-of-sample observation). This is similar to a Laspeyres price index where the consumed quantities are kept constant to measure the price level change between a base period and the current period.
A hypothetical example is presented to describe the methodology. Consider the evaluation of a government intervention that aims to encourage women to update their CV more frequently. The methodology would be to randomly select two representative groups of women: a treatment group and a control group. The program is only implemented in the first group, but after a reasonable period of time, both groups have to answer the same questionnaire. In this case, the relevant questions are those used for the intensity dimension, defined as follows:  Table 4.3 shows the current baseline values of these questions, as well as a hypothetical situation after an intervention takes place. This example highlights two possibilities that can arise during execution of the program. First, since people are encouraged to update their CV more often, they also end up increasing their search actions. Moreover, they invest more time in their job search and thus take it more seriously. For this reason, the mean of five categories increases. Second, other events might take place outside the program. For example, Internet diffusion helps people to look for online jobs easily. Given that an Internet search is a type of search action, individuals can increase their search actions independent of their participation in the program if their access to the Internet improves. Hence, the treatment group can increase its search actions. Without having a control group, it would be very difficult to separate the program effect from other events happening in society.
Following the methodology presented in section 2, the intensity dimension for both groups is calculated using the baseline, as depicted in Table 4.4: 1. The variable values are standardized using the mean and standard deviation of the baseline. 2. These values are multiplied by the PCA scores of the baseline and divided by the standard deviation of the PCA component. 3. These values are added together to calculate the new intensity value of each group. The results from the hypothetical example show that: • Overall, the intensity of the treatment group is now 1.36 standard deviations above the average of the baseline group. • Of that, 0.18 standard deviations correspond to events that occurred out of the program. Hence, the program effect is the remaining 1.18 standard deviations. • Whereas the program originally targeted CV updates, it has a positive spillover effect to other components of the dimension. Indeed, CV updates explain an increase of 0.65 standard deviations. The remaining 0.53 standard deviations are explained by the program's externalities.

Lessons Learned
The index -as presented -was not constructed in one go. It was developed by trial and error, until both a defendable methodology and applicable results emerged. This process holds valuable lessons to better understand the index structure and may aid countries that want to adapt the index to their own needs.
The following three sections summarize the most relevant of these lessons.

Adding Engagement Measures to Traditional Labor Market Measures
The To measure the level of engagement, the analysis uses the dimensions of preferences, intensity, and barriers already developed, and selects and maps the survey questions into these dimensions. The weights are exogenous and uniform, and the aggregation of individuals by genders is straightforward. The results in Figure 5.1 appear promising and useful. This structure and approach have some clear drawbacks, though: • The hierarchical order leads even the most engaged unemployed person to always have a higher distance than the most disengaged employed person. • Using fixed weights risks significant over-and underestimation.
• The mapping and normalization of questions into engagement/disengagement measures is somewhat ad hoc. • Individuals are aggregated across labor market status, even though they not have answered all the same questions.

Addressing the LMDI Limitations
These drawbacks of the LMDI led to a review of the index approach with two main goals: • To separate labor market status and engagement level; i.e., to establish individuals' preferences, intensity, and barriers independent of their labor market status (OLF, unemployed, or employed); and • To move from fixed weights for the engagement components to endogenous weights, which have better properties.
The solution presented herein is capable of achieving both goals, but also led to several conceptual issues, summarized next as three challenges.
Challenge 1: Fixed Versus Endogenous Weights The first challenge faced during construction of the LMDI centered on the benefits of using fixed or endogenous weights. The main objective is to measure the dimensions of engagement, but since it is not possible to measure these with an explicit question, they have to be inferred from questions already available in the survey. By doing so, the index built represents, to some extent, the underlying variable.
There are two ways to proceed. The index can be constructed using either fixed weights or weights from the correlation structure of the data. By using fixed weights, the data can be compared across time. For example, if the weight of an extra hour of work in the intensity dimension of employed men is 0.4, then independent of the year, an extra hour of work increases intensity by 0.4 units. The main problem with this methodology is that weights can change across time. For example, consider a cultural change in which women are now expected to work. Before the change, an unemployed woman that dedicates one hour per week to search for a job might manifest very high preferences because she is going against the status quo. After the change, dedicating an hour per week might be taken as laziness and will not manifest positive preferences; moreover, it might manifest a lack of them.
A methodology that extracts the weights from the data solves this problem. Nevertheless, it suffers from comparability. The weights now depend on the sample. i.e., the weights calculated from two different populations, or from a population in two different time periods, can have different values. Taking that into account, consider a population of unemployed women who increase their intensity in one unit, and the only variable that changes is search hours; that is, now they do one more hour of job search. Was the change in the dimension due to the increase in search hours? Or to a change in the weights? It is not possible to distinguish the effect, and therefore, it is very difficult to understand the exact effect of the increase of search hours on the intensity of the population. Hence, it is not possible to evaluate policy based on this methodology.
Given these issues, the methodology in this paper takes a middle path. For the baseline, the weights are constructed using PCA. This guarantees that for each dimension, every variable has a weight that corresponds to the structure of the population. However, for policy evaluation, the methodology presumes that the overall structure of the population has not changed and therefore, the calculation of indices outside the baseline assumes the baselines weights as fixed. This conceptual modification implies, in practice, that comparisons are now possible. Moreover, the underlying assumption is reasonable for scenarios that are not far in the future. Still, weights should be recalculated periodically according to changes in society.

Challenge 2: Group Comparability
Who is more engaged in the labor market: an employed man who only works four hours per day because he does not want to work more or an unemployed woman who has to do housework for twelve hours per day and dedicates four hours to looking for a job? Under the current methodology, it is impossible to answer this question. Employed men and unemployed women are in different groups, and their indices are built using different sets of variables. Hence these values are not comparable. Two questions emerge from this challenge: (i) Is it appropriate to compare these groups? and (ii) If so, how?
The answer to the first question is not obvious. It would be ideal to organize all individuals of a population in a single line to understand who is in "better" or "worse" condition and to design public policy according to these rankings. Yet the reality is multidimensional. As explained in previous sections, some populations are embedded in very different social structures. For example, the current Saudi society has different expectations of the kinds of jobs men and women should perform. Thus, ignoring context and imposing the same standards on men and women creates inappropriate interpretations. In addition, the objective of a public policy may be to create a cultural change that reduces structural differences between the two groups. For example, consider a policy that promotes equal job opportunities for both men and women, independent of industry and occupation. For this type of policy, it would be very important to place both groups under the same framework to see if structural differences diminish over time. In this context, the second question becomes more relevant.
The solution for the second question is theoretically easy but in practice requires very careful consideration. If policy makers want to make two groups comparable, their indexes must have the same variables. This condition is necessary and sufficient to solve this challenge. Unfortunately, application of this idea is not trivial. Consider, for example, the dimension intensity and the unemployed and OLF groups. For the unemployed, intensity was linked to efforts made by individuals to find a job, while for the OLF, intensity was associated with their willingness to participate in the labor force in the future. From this exercise is clear that even with two very similar groups, identifying common variables is not a trivial task. Fortunately, in other dimensions common variables are more easily constructed. For example, consider the barrier dimension. Independent of their gender or working status, people can have a family that supports them to work (or to find a job). Thus a question such as "Does your family approve that you work/look for a job?" is common to the barrier dimension of all groups. In this way, even if not all dimensions are comparable, some might become comparable after new questions are designed to harmonize the methodology across groups.

Challenge 3: Relative Versus Absolute Comparisons
Consider an individual with a RELIF of 1. Is this individual engaged in the labor market? The answer is not clear. For sure, this individual is more engaged than the average member of his group. Yet this does not mean that his level is good. If the group is very disengaged, 1.0 standard deviations above the group might not be enough to call this person engaged.
Implicit in the absolute comparison lies the idea of a minimum standard. To illustrate this, assume that the only measure of intensity for the unemployed is hours dedicated to a job search. Clearly, 0 hours per week suggests total disengagement while 168 hours per week (an unreasonable, yet possible quantity) undoubtedly suggests a very engaged individual. Yet it is not clear where to draw the line. For example, notice that 40 hours (8 hours per working day, as in many formal jobs) may sound excessive, but it is less than one-fourth of the available hours that an individual can dedicate per week to the job search. Phrased in this way, dedicating only 25% of the available time does not seem very engaged. To complicate the problem, consider the existence of multiple variables per dimension. From the PCA methodology, it is possible to derive the maximum and minimum values that an individual can achieve in any dimension. With this information, the variable can be normalized so that all variables are within the range 0-1. Yet two reasons explain why this exercise is not appropriate. First, it is possible that the theoretical maximum is not realistic (as in the example of the hours). Second, given that the index is composed of several variables, several combinations of values can have the same score. Thus, scores that are achieved when the standards of each individual variable are satisfied can also be achieved in cases when some of these standards are not fulfilled.
Previous attempts to develop the engagement index expressed it as a percentage, which lends itself to some standards regarding what is good and bad. But per the previous discussion, the added value of this type of index is outweighed by its conceptual constraints. Thus this paper developed a methodology that uses relative comparisons to compare groups over time. Even if it is not possible to claim that a value is good or bad, it is possible to know if it has improved in comparison to a base year.

Revising the Survey and Questions
The idea for an index that measures the level and heterogeneity in labor market distance/engagement was only realized after the results of the questionnaire were analyzed and the need for such an index to better understand and present the results become clear. In consequence, the index maps existing questions that may not be ideal from an engagement measurement point of view and, furthermore, may differ across the main employment status. To improve the index's capabilities and quality, some questions will need to be revised, some new ones added, and some existing ones dropped. Key considerations are that: • Identical questions across labor market status categories allow for easy aggregation, but utility and comparability come at a price. Not all questions have the same importance or the same meaning for all labor market segments. Hence asking all groups the same question incurs a tradeoff. • Preparation and analysis of RELI revealed that some questions provide the same answer/outcome and thus can essentially be dropped (or can be used if they are drawn from a larger set of questions to explore the engagement levels of registering unemployed for government services). • To improve the interpretation of the dimensions, questions should be coded in accordance to the dimension. I.e. greater values should reflect higher engagement. • New questions may need to be added if the use of RELI as a profiling instrument is strengthened. Individuals might learn how to answer these surveys in a way that their scores provide them economic benefits. Thus, updating the basic set of questions can help to reduce this bias.

Conclusions
The paper develops an index to measure engagement on the labor market, an issue of critical importance to labor market outcomes. The lower the level of labor market engagement, the worse labor market outcomes will be. RELI departs from traditional measures of labor markets in that it accounts for heterogeneity of individuals' engagement levels within the employed, unemployed, and OLF categories (and in the aggregate) based on three dimensions -preferences, intensity, and barriers. RELI is developed using data from a 2015/2016 labor market survey of nationals in the KSA. Findings offer very useful insights about the engagement differences by age cohort and education level. For example, on average, younger cohorts are more engaged in all dimensions than older cohorts. For women, having a vocational degree rather than a bachelor's degree reduces barriers for being engaged in the labor market.
The paper also presents a way in which the framework can be used to evaluate policies and to target interventions. Clustering technique along the index dimensions is used to group individuals with similar levels of engagement. Applying it to KSA results in four clusters which require different interventions. For example, some clusters of the unemployed necessitate search assistance while others upskilling.
The main contribution of the paper is its capacity to measure differences in engagement levels across labor market categories. To the best of authors' knowledge, the multivariate statistical techniques applied in this research have not been used in labor market analyses so far. Traditionally labor economics assumes that all individuals are willing to work if the wage compensates at least the cost of opportunity. By observing that the motivation of individuals depends on multiple dimensions, the paper constructs a conceptually grounded index that measures the engagement of individuals. Using this index, the paper demonstrates that the Saudi adult population is highly heterogeneous in their engagement level and in consequence different policies are required for each of the clustered profiles. An extension of the paper would be to apply the index to foreign labor in the KSA and other GCC countries. The latter countries would be natural candidates to measure the level and differences of engagement among their national populations.