Skill mismatch among migrant workers : evidence from a large multi-country dataset

Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: http://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.


Introduction
Is overeducation more common among migrants compared to native workers? If so, is the overeducation incidence alike across migrants from various home countries and across various host countries? Are the arguments behind overeducation the same for native and migrant workers? This article unravels the migrants' incidence of skill mismatch, defined as the situation in which workers have jobs for which lower skill levels are required compared to their current educational level. The focus is on the skill mismatch of almost 700,000 native and migrant workers in 86 countries¹ over the period 2008-2013. Migrants are defined as workers not born in the country where they are currently living. In the sample, they originate from 79 countries and therefore represent a highly heterogeneous group, ranging from refugees and those who have migrated for economic reasons, to expats, intercultural married couples and others.
The academic discourse on mismatch in the labour market covers such issues as residential mismatch and hours mismatch, but this article focuses on what is generally considered the most relevant sort of mismatch: skill mismatch. Furthermore, the literature in the field often considers skill mismatch a unique phenomenon, treating overeducation and undereducation as one. Since they are two separate circumstances, in this paper we solely focus on overeducation and its explanations, leaving undereducation to further studies. The literature on skill mismatch can be classified into three categories. A number of studies have investigated the incidence of over-and undereducation, some of which provide breakdowns for specific groups in the labour market, such as gender and firm size. Many studies have addressed the impact of over-and undereducation, mostly on wages. Finally, an important body of knowledge relates to the dynamics of overeducation, that is, how educational requirements and the educational composition of the workforce have changed over time. As Leuven and Oosterbeek (2011) point out in their overview study, few studies have addressed the incidence of overeducation among migrants, although the literature has grown since they performed their study.
In our study, we use a dataset that is particularly suited to investigate differences in skill mismatch between native and migrant workers and that allows us to distinguish between different country of origin and destination combinations. This article extends the body of knowledge on migrants' overeducation in two ways. Firstly, we provide a multi-country perspective including nations and migrants from all continents, whereas most of the existing literature relies on single-country studies. Secondly, we test the specific relation between migrants' overeducation and some of the most widely accepted theoretical explanations for the phenomenon. We achieve these results by fulfilling three research objectives, which are to investigate (1) the factors affecting overeducation and whether migrants are more often overqualified compared to native workers, (2) the relation between overeducation and different country of origin and destination combinations, and (3) whether a range of theoretically based assumptions affect the incidence of overeducation and the extent to which they are relevant in the case of migrant workers.
The outline of this article is as follows. Section 2 concerns the theoretical and empirical literature with regard to the skill mismatch of migrant and native workers. In section 3, we describe the data and methods used. We present our results in section 4, and in section 5 we discuss our findings and conclusions.

What is skill mismatch?
Skill mismatch refers to the mismatch between a worker's educational attainment and the requirements of his or her job, whereby several types of skill mismatch are distinguished (for example, McGuinness and Sloane 2011). A vertical mismatch refers to a worker whose level of education is either above or below the educational level required for his or her job. Here, the terms 'overeducation' (also referred to as 'overschooling') and 'undereducation' are used. Educational level is a crude measure for indicating an individual's educational attainment or job requirements. For jobs, the skill-based approach seems more adequate, as are the terms 'overskilling' and 'underskilling'. However, skills are more difficult to measure than educational attainment. The most common method is to measure an individual's generic skills, for example in cognitive tests or in the OECD's IALS and PIAAC literacy surveys, whereas job-specific skill requirements are hardly used because they are far more difficult to measure. A horizontal mismatch refers to a worker who is educated in a field other than the one that his or her job requires. Particularly in Germany, the concept of occupational mismatch is clearly distinguished from that of educational mismatch because of the country's widespread vocational training system, which provides the majority of the labour force with a generally accepted qualification for a wide range of occupations (Burkert and Seibert 2007). This article focuses solely on vertical skill mismatch, and particularly on overeducation.
Studying skill mismatch requires information about the educational attainment of individuals as well as insight into the educational level required for jobs. The former is less subject to dispute than the latter. In country-specific surveys the educational attainment of individuals is measured mainly in terms of national educational categories. For cross-country comparisons, the ISCED classification-which distinguishes seven educational attainment levels-is most often used (OECD 1999). In order to collect information about the educational requirements of jobs, the most frequently applied method is asking individual workers to indicate the educational attainment required for their job or whether they have sufficient skills to perform their job. This is called the subjective method because it is based on surveys entailing workers' self-assessment (Van der Velden and van Smoorenburg 1997; Groot and Maassen van den Brink 2000;Jensen et al. 2007; Leuven and Oosterbeek 2011;Piracha et al. 2012).
A second method is called the objective method because it is based on expert classification of the education and skills required to perform particular jobs. Here, a wide range of approaches can be noted. One approach is to classify jobs according to broad job levels, for example, the four skill levels ranging from unskilled to highly skilled, distinguished by the International Labour Organisation (ILO) in the first digit of its ISCO-08 occupational classification (ILO 2007). In many countries, national statistical agencies have adopted ISCO in their labour market surveys, either by classifying occupations directly in terms of ISCO or by using crossover tables from a national occupational classification. Statistics Netherlands has attempted to classify the 1,200 occupations in its SBC classification in terms of seven job levels (CBS 1993). O*net, the occupations database in the United States that is based on desk research and company visits, indicates skill requirements for a large range of occupations (O*net 2002).² A third method is called the empirical method, whereby the mean (alternatively the median or the mode) number of years of schooling of all workers in a given occupation or group of occupations is compared to the schooling of an individual in the occupation. Individuals are defined as overeducated if their schooling level is more than one standard deviation above the mean (or median or mode) of all individuals in that occupation (Clogg and Shockey 1984;Verdugo and Verdugo 1989;Van der Velden and van Smoorenburg 1997). Chevalier (2003) applies a mixed method based on subjective and objective overeducation to distinguish between apparently overeducated and truly overeducated workers.
Objections have been raised to all three methods. The first method is criticised because workers may be inclined to over-or understate the educational requirements of their job or simply to equate these requirements to their own level of education (Hartog and Jonker 1997). Furthermore, respondents may not always have a good insight into the level of education required to perform a job (Cohn and Khan 1995;Halaby 1994). The second method, the objective one, is criticised because skill requirements within a given occupation cannot vary (Halaby 1994). Based on a survey of school leavers, Van der Velden and van Smoorenburg (1997) conclude that job analysts systematically overestimate the level of required education, probably because they do not use the 'real' situation as the basis of their rating, but descriptions of the tasks and the nature and required level of knowledge and skills. The third method also ignores the variation in terms of educational requirements within an occupation. Additionally, the choice of the reference measure (mean, median or mode) and the choice of one standard deviation seem rather arbitrary (Halaby, 1994). Therefore, Hartog and Jonker (1997) and Verhaest and Omey (2006) conclude that this should be the least preferred method for determining overschooling.

The incidence of skill mismatch
All studies on skill mismatch confirm the existence of some rate of overeducation among workers. Leuven and Oosterbeek (2011) carried out a meta-analysis of more than 180 studies covering countries in Asia, Europe (predominantly the EU15), the Americas and Australia over a period of five decades and concluded that, on average, 30% of the workforce is overeducated and 26% is undereducated. Overeducation is found less often in Latin America and most often in the USA/Canada. From the 1970s to the 1990s, overeducation declined, but then increased in the 2000s, although the authors note that this might be due to a single 2008 study. In an earlier meta-analysis, Groot and Maassen van den Brink (2000) concluded that the overall incidence of overeducation in the labour market appears to be about 26%.
The incidence of overeducation is likely to be affected by the measurement method. According to Leuven and Oosterbeek (2011), studies based on self-assessment and job analysis methods do not point to large differences in this respect, but the method based on the mean reveals lower levels of overeducation. Groot and Maassen van den Brink (2000) find that overeducation is more frequent when self-reported rather than when objective measures are used. Leuven and Oosterbeek (2011) found that although many studies find statistically significant relations between overeducation and individual characteristics, the specifications of these characteristics vary widely. More or less consistent findings across studies are that young people, women and migrants are more likely to be overeducated. Remarkably few findings refer to the incidence of overeducation for specific educational categories. Mavromaras et al. (2009) analysed the Australian HILDA Survey 2001-2006 and found that overeducation occurs more often in the top half of education brackets than in the lower half, pointing to a relative lack of high-skilled jobs.³ According to Leuven and Oosterbeek (2011), only a few studies have addressed the incidence of over-and undereducation among migrants. The available evidence indicates that migrants are more likely to be overeducated. Different arguments are presented as the source of divergence, for example, the imperfect international transferability of human capital (Chiswick and Miller, 2009a), low destination-country language skills (Green et al., 2007) or previous work experience in the home country (Piracha et al. 2012). Most of these contributions are at a national level. This is the case for the labour markets of Canada (Wald and Fang, 2008), Sweden (Dahlstedt, 2011;Joona et al., 2014) and Denmark (Nielsen, 2011).
In a study based on the Labour Force Survey in the United Kingdom, Lindley and Lenton (2006) suggest that immigrants initially experience higher overeducation but that this difference is eroded with time spent in the UK. The result of an analysis run on immigrants in Sweden (Joona et al., 2014), however, suggests that the persistence over time of overeducation among migrant workers is, at least, higher than among native workers. In a study based on the Longitudinal Survey of Immigrant Australians (LSIA), Green et al. (2007) conclude that migrants are more likely to be overeducated than the native population, even if the migrants entered the country in question on skill-based visas. They were better educated than the native-born population but were relatively less likely to be found in managerial and professional occupations and were overrepresented in unskilled work. The authors find that overeducation is greater for migrants from non-English speaking backgrounds (Korpi, 2012). Further details on home countries are provided by Battu and Sloane (2002), using a survey of ethnic minorities in the UK. They conclude that different ethnic groups have varying levels of overeducation, with the highest incidence of overeducation among Indian and African-Asian groups.
However, the results of a study of the US high-skilled labour market by Chiswick and Miller (2009b) show that overeducation is widespread among both migrants and nativeborn. In the USA, the extent of overeducation declines with job-tenure as high-skilled migrants obtain jobs commensurate with their educational level. Using the Longitudinal Survey of Immigrants to Australia, Piracha et al. (2012) reveal that a significant part of the variation in the migrants' probability of being over-or undereducated in the Australian labour market can be explained by having been over-or undereducated in their last job in the home country. Home country mismatch was notably large in the case of undereducation. We could not find in the literature arguments relating migrant mismatch to the country of origin/destination combination. Nevertheless, we believe that this distinction could be the cause of different mismatch incidence since, firstly, the extent of the international transferability of human capital might vary according to this factor; secondly, it could reflect different levels of dissimilarities among labour markets; and thirdly, language and ethnic differences might vary consistently among different combinations.
Turning to the dynamics of over-and undereducation over time and their methodological implications, there is a massive literature on upgrading and downgrading with regard to occupations. In the past 15 years, much of this literature has been devoted to so-called skillbiased technological change, assuming-and largely confirming-that in developed countries the educational requirements for a similar job within industries have increased over time, mainly due to technological developments (Berman et al. 1998;Machin 2001;Autor et al. 2003). Upgrading entails that, with tenure, the incidence of undereducation increases, whereas downgrading works the other way round. A second dynamic process refers to the inflation of qualifications, implying that new entrants are more likely to be overeducated. Third, dynamics over time may also be caused by fluctuations in labour market conditions, with alternating periods of scarce and excess labour supply: in periods of scarce supply, new entrants are more likely to be undereducated, whereas the reverse holds for entrants in periods of excess supply. No studies have yet revealed the impact of the economic crisis on the skill structure of the labour market, that is, whether more high-skilled than low-skilled jobs have been lost, or vice versa. Finally, in a study of skill mismatch among migrants, the dynamics over time caused by national migration policies should be taken into account. Policies that stimulate access for high-skilled migrants may affect the educational composition of relevant cohorts of migrants, but this also applies to more restrictive policies towards migration (Korpi 2012). Our study does not consider these dynamic processes.
Few empirical attempts have been made to investigate the longitudinal impact of over-and undereducation, while a legitimate question is whether job allocation frictions diminish over an individual's lifecycle. Korpi and Tahlin (2009) did not find support for the assumption that mismatch dissolves with the time individuals spent in the labour market. Using cross-sectional and panel data from the Swedish Standard of Living Surveys 1974-2000, the authors conclude that the overeducated are penalised early on by an inferior rate of return to schooling, from which this group does not recover.
A final caveat must be made here. Following Piracha et al. (2012), a match or mismatch is observed only for employed individuals. Skill mismatches may be larger for the unemployed labour force, for example, if the educational level of the unemployed does not match the educational requirements of relevant job vacancies. When assuming a higher incidence of mismatch for migrants, the fact that they may constitute a self-selected sub-sample might be overlooked.

Explanations of skill mismatch and their relation to the migrant condition
In this section, we explore some theoretical explanations of overeducation and the implications of such explanations for the higher incidence of overeducation among migrants. Most of the literature points to explanations related to job allocation frictions. Here, we present five explanations for overeducation that appear to be widely accepted.
A first explanation refers to the assumption that, to begin with, entry-level workers might have jobs for which they are overeducated and later on move to jobs that better match their educational attainment. In their overview studies, Leuven and Oosterbeek (2011) and Cedefop (2010) conclude that, according to many studies, younger workers are more likely to be overeducated than older workers. This supports the assumption that overeducation is part of an adaptation process in the early stages of a working career, in which it compensates for the lack of other human capital endowments, such as ability, experience or on-the-job training. Following this explanation, we investigated job allocation frictions in our empirical study by testing the assumption that the incidence of overeducation is higher among workers who have recently entered the labour market.
A second explanation details the assumption of job allocation frictions. This explanation refers to specific groups of workers when entering the labour market. It is assumed that workers with low bargaining power-for example, students with a job on the side, re-entering housewives for whom a job-education match has a low priority, or workers who have had unemployment spells and involuntary quits-will have jobs for which they are overeducated. This assumption is supported by a range of research results. According to Groot and Maassen van den Brink (2000), workers who have experienced a career break are more likely to be found in jobs for which they are overeducated. Sloane et al. (1999) found that overeducated workers had more unemployment spells and involuntary quits than others. The evidence of Sicherman (1991) showed that overeducated workers changed jobs more frequently and that they had less experience, tenure, and onthe-job training than correctly matched workers. In our empirical analysis, we investigated this type of job allocation friction by testing the assumption that the incidence of overeducation is higher among females and workers who have experienced unemployment spells and quits.
A third theoretical explanation refers to job allocation frictions due to labour market discrimination: employers have a preference for workers from the 'same group'. Field experiments show pervasive ethnic discrimination in many countries (OECD 2007). The condition of ethnic disparity is often embedded in the condition of migrant. Nevertheless, to better isolate this effect in our empirical study, we also investigated whether second-generation migrants (nationals whose parents were born in a foreign country) are more likely to be overeducated compared to native workers.
A fourth theoretical explanation of overeducation refers to job allocation frictions that are related to career mobility. This explanation assumes that individuals accept a lower-level job if the probability of promotion is higher (Sicherman and Galor, 1990). In our empirical study, we tested whether the incidence of overeducation is higher for jobs with good promotion prospects compared to jobs with average or poor promotion prospects.
A fifth theoretical explanation concentrates on job allocation frictions due to the poor abilities of individual workers. This assumption goes beyond the crude measurement of educational attainment and details a worker's ability as well as the skill requirements of a job. As regards the migrant population, a single ability has been investigated, namely the worker's mastery of the native language or lingua franca of the host country. Thus, in this approach the language ability of the worker is critical. According to a study carried out in Australia, workers from non-native language speaking backgrounds showed a higher and more persistent incidence of overeducation than those from native-language speaking backgrounds (Kler, 2005). In our empirical study, we tested whether migrants from home countries where the native language or lingua franca does not match that of the home country are more likely to experience overeducation. The theories exposed and the proposed tests are summarised in Table 1.

Table 1 Overeducation theoretical explanations and proposed tests Theories of overeducation
Proposed test 1 Overeducation is part of an adaptation process in the early stages of a working career.
Overeducation is higher in workers who have recently entered the labour market.
2 Given job allocation frictions, workers with low bargaining power are more likely to be overeducated.
Overeducation is higher among females and workers who have experienced unemployment spells.
3 Labour market discrimination turns to overeducation.
Overeducation is higher among second generation migrants.
4 Individuals accept overeducation if the probability of promotion is higher.
Overeducation is higher for jobs with good promotion prospects.
5 Overeducation is related with personal abilities (not measured by education attainment).
Overeducation is higher among migrants whose native language or lingua franca does not match that of the home country.

Data and definitions
This article is based on statistical analyses of the WageIndicator dataset . The WageIndicator project is currently running in 86 countries on five continents. It consists of national websites, each of which receives large numbers of visitors, primarily because the websites post a 'salary check' that provides free information on occupation-specific wages. Worldwide, the national WageIndicator websites attract large numbers of web visitors-more than 20 million in 2013. The websites are consulted by workers when making job mobility decisions or before annual performance talks or wage negotiations. The sites are also consulted by school pupils, students and reentrant women facing occupational choices, and by employers in small and medium-sized companies when recruiting staff or negotiating wages with their employees.⁴ The WageIndicator dataset is derived from a web survey on work and wages that is posted on all national WageIndicator websites and is comparable across all countries. In return for the free provision of information, visitors are asked to complete the survey. Thus, the survey is voluntary, continuous, and multi-country.⁵ It contains detailed questions about, for example, education, occupation, skill mismatch, industry, country of birth, country of birth of parents, and, in some countries, ethnic group. Respondents are asked if they were born in the country of survey; if not, they can select a country from a list. In this article we use 'native workers' and 'migrant workers' to identify the two groups. The web survey does not allow the identification of return migration. The large sample size allows, for each country, a breakdown of migrant groups according to country of birth in order to better capture the heterogeneity of migrants. In order to improve the intelligibility of our results, countries of residence and countries of birth were grouped into continent classes (see section 3.2 for further details)⁶.
This source has been used to describe and study a wide variety of labour related issues. Often it has supplied information in circumstances where official sources (a) face technical difficulties, e.g., wage studies in African countries; (b) do not provide adequate sector breakdowns, e.g., De Vries and Tijdens (2010;Tijdens et al. (2013a), and Steinmetz et al. (2014), who approach several labour related issues in the health sector; or (c) do not provide specific information, e.g., Guzi and de Pedraza (2015) research workers well-being. One of its main advantages consists in providing a solid base for international comparisons. Fabo and Tijdens (2014), for example, measure the demand for specific skills in occupations across countries, while Tijdens et al. (2013b) internationally evaluate the tasks implemented within different occupations. The role of the Wageindicator is also discussed within the larger debate on the growth of internet-based information in the social sciences. Kureková et al. (2014) bring it into play when presenting advantages and disadvantages in using online data for labour market analysis. Askitas and Zimmermann (2015) mention this web survey among the internet sources of human resources data. The data source has also been subject of several methodological discussions (Andreadis, 2013, on response time analysis in web-surveys; Tijdens, 2014, on dropout rates; Steinmetz et al. 2009a, andSteinmetz et al. 2013a on potential biases and correction methods).
We used the pooled annual data on 86 countries for the years 2008-2013. Note, however, that some countries joined the survey later than 2008 and that in some of the countries the question about skill mismatch has been asked only since 2009. We excluded respondents aged under 15 or over 70, unemployed people, school pupils, students and those who have never had a job. Altogether, 673,898 observations were included in the analysis. However, the response rate for some of the variables presented in the following section was less than 20% of the total, which led to a consistent reduction in the number of complete cases. In order to consider the possible bias arising from this, we present two analyses, one including and one excluding those variables.
The web survey is voluntary, and therefore using this data may have important drawbacks. By definition, a web survey is completed only by individuals with sufficient language and computer skills to read and answer the survey questions. This might be particularly off-putting for migrants and could lead to biased data due to the low representativeness of some specific demographic groups.⁷ We did not employ withincountry weights since previous studies that used this dataset have shown how weighting to correct for these groups scarcely affects the means of some of the variables under study and that, in general, weighting volunteer surveys to control for socio-demographic composition does not solve the small bias in some specific variables (such as wages); see Steinmetz et al. (2009b) for further details. The problem, however, is not as bad as it seems because it can be assumed that literacy skills are higher among employed migrants compared to unemployed migrants, and the size of the group of employed migrants with insufficient literacy skills is relatively small compared to the labour force as a whole.
On the other hand, because of the characteristics discussed above, our dataset can represent the internet population, although not the entire world population. When focusing attention on migrants, the worldwide internet users form a population from which interesting information can be extracted.
In addition, the dataset presents other clear advantages. The main advantage is the large size of the sample and the large variety of countries of birth and destination, which allowed analyses to be performed on large groups even when considering specific continent of provenance and destination combinations. In addition, when our dataset is compared to the means of demographic variables known from other sources, the sample variable means do not deviate to a large extent. For example, two meta-analysis studies (Groot and Maassen van den Brink 2000;and Leuven and Oosterbeek, 2011) found an average of overeducation ranging from 26% to 30%. Our dataset reveals 25% of overeducation in the overall sample. The EU Labour Force Survey shows an incidence of migrants among the total population aged between 15 and 64 years of 8.7% in the EU15 and 1.6% in the EU12. These figures compare with the 7.4% and 3.5%, respectively, registered for the employed population in our dataset. All things considered, we acknowledge the limitation of our dataset and therefore consider our findings to be exploratory rather than conclusive.

Methodology
We set up an analytical model in order to observe the relation between self-assessed skill mismatch and individual characteristics. The WageIndicator survey includes the question "Do your qualifications match your job?" The three response options are "Yes", "No, I am over-qualified for my job", and "No, I am under-qualified for my job". We only considered the difference between overqualification and the other options (proper match and under-qualification) and obtained a binary variable: the dependent variable in our model. We then assessed the relation between the possibility of being overeducated and some idiosyncratic characteristics. These features can be classified into two groups. The first is composed of demographic characteristics (e.g., gender or country of residence), the second of proxy variables to test the five theoretical explanations of overeducation. The model specification is as follows: and the coefficients of this logistic regression model were estimated using the method of maximum likelihood.
In equation 1, y is the logit of being overeducated or not, in other words, the natural logarithm of the odds ratio of being overeducated and the other options. X d is the matrix of the demographic characteristics, and X t is the matrix of the proxies of the theoretical explanations of overeducation presented in section 2.3; m is a categorical variable distinguishing between native workers, native workers whose mother or father were born in a foreign country (second-generation migrants) and migrant workers. Accordingly, β d and β t are the vectors of the corresponding coefficients, and β m is the coefficient of the migrant condition.
The variables labelled as demographic characteristics are a mix of personal and professional characteristics. We took into consideration features such as gender (male being the reference category) or country of residence. We grouped the country of residence into seven continental categories: the EU15⁸ (the reference category), the EU12,⁹ Africa, Latin America, Asia, North America & Oceania, and non-EU European countries (predominantly Russia and CIS countries such as Belarus and Ukraine). We did this because for some countries the number of respondents is limited and to keep the number of categories manageable. Because mismatch is considered to be dependent on educational attainment, we included a variable describing the educational attainment of each individual. For the sake of international comparison, we recoded the national educational categories into the worldwide International Standard Classification of Education classification 1997, as designed by UNESCO.¹⁰ The variable ranges from 1 (primary level of education and our reference value) to 6 (second stage of tertiary education, leading to an advanced research qualification). The analysis controlled for job-specific attributes. We considered the difficulties related to measuring job levels in section 2. In our study, we used a job level indicator, derived from the occupation variable. It is called 'corporate hierarchy' and is based on a mapping of the 1,700 occupations distinguished in the survey into six corporate hierarchical levels, ranging from 1 = helper to 6 = CEO, developed by the second author¹¹. Furthermore, a firm-size categorical variable and an industry variable, where jobs are classified under three main categories (agriculture and manufacturing; distribution related services; and other public, commercial and personal services) were included.
Altogether, six variables were included to test the theoretical explanations of overeducation. The age of individuals was used to proxy the entry-level workers effect (theoretical explanation number 1). 'Breaks' is a variable denoting the number of career breaks an individual has experienced. It was used to test the assumption that particular groups of workers with low bargaining power are more likely to experience overeducation (theoretical explanation number 2). Female workers as opposed to male workers, measured by the gender variable, also illustrate the same effect on overeducation. Theoretical explanation number 3 involves the role of ethnic discrimination. The categorical variable distinguishing between native, migrant, and second-generation migrant workers helps to figure out this effect on overeducation. Survey respondents are asked whether they perceive to have career opportunities in the organisation they are presently working for. The response to this question was included to test theoretical explanation number 4. Although job allocation frictions due to the poor abilities of individual workers for the whole sample of workers (theoretical explanation 5) were not tested on the native population, this aspect was taken into consideration for the case of migrant workers. Workers with lower language abilities can proxy workers with poor individual abilities. Here, they are defined as migrant workers born in a country with a native language or a lingua franca that does not match that of the host country.
The relation between overeducation and the migrant condition, which was the main focus of the present study, was studied from different points of view. First, we tested the effect of being a migrant controlling for all demographic and theory related characteristics (estimating equation 1). We then tested whether overeducation is similar across migrants from various home countries and across different host countries. To do so, we estimated equation 2, where the effect of country of residence, country of birth and their interaction is taken into consideration. We made use of the estimated coefficients to compute the predicted probabilities of all groups of migrant workers and compare them with the native workers' predicted probabilities on each continent.
Since equation 2 is estimated through logistic regression, predicted probabilities are computed as follows: for natives e interceptþcont:of residence coefficient Finally, we modified equation 1, introducing interaction terms between the migrant condition and all other covariates (matrix X i in equation 3). Estimating equation 3 allowed us to observe whether the theoretical explanations for overeducation affect migrant and native workers in a different manner.
4. Empirical findings on skill mismatch 4.1 Descriptive analysis of skill mismatch and migration Table 2 provides basic information about the age and gender distribution of the workers in the dataset. We made use of 673,898 observations. Individuals between 15 and 45 years represent around 80% of the total; within this age group, males comprise the largest share (around 56%). Table 3 shows the respondents' continent of residence and continent of birth, and the matrix of continent of residence/birth. It is helpful to understand the incidence of the migrant population and its provenance on each continent. Our dataset is clearly European focused, since around 60% of the respondents work in a European country. Within Europe, we distinguish between EU15, EU12 and non-EU European countries. Central & South American workers and Asian workers account for 19% and 14% of the surveyed population, respectively. Africa and North America & Oceania are clearly underrepresented, with a share of 6% and 1%, respectively. The incidence of migration varies widely across continents. In Central & South America and in the EU12, the presence of migrants is the lowest, accounting for around 3% of the surveyed population. Migrants represent 5% of the population in Asia and around 10% in the EU15, Africa and the non-EU European countries. Finally, migrants make up 25% of the (low number of) survey respondent from North America & Oceania. Not surprisingly, within-continent migration accounts for most of the flows observed. Only in the cases of Asia and North America & Oceania they do not represent the largest share of the phenomenon.¹² In the former case, the relevance of migrants from non-EU European countries is evident. In the EU15, our largest group-more than 40% of all migrants (2.40% of the total population)-come from another EU country; Central & South America is the second largest region of provenance (20% of migrants, 1.59% of the total population).
The incidence of overeducation in the various countries is shown in Figs. 1, 2 and 3, which help to disentangle the relation between overeducation and migration status. At first, it is worth noticing how overeducation is a phenomenon that affects countries, and therefore continents, differently. Figure 1 shows how the incidence of self-assessed overeducated  workers in the total survey population is around 20% in EU countries. Around the same figure is observed in African countries, while its incidence is slightly higher in North America & Oceania (23%), Central & South America (26%) and Asia (27%). More than one third of respondents from non-EU European countries stated that they were overeducated.
When it comes to establishing whether overeducation has a different incidence among native and migrant workers (Fig. 2), it appears this is the case on every continent but not in the non-EU European countries. Nevertheless, the relation between migration and overeducation can be of two kinds. In Europe and Asia, overeducation affects migrants to a greater extent than it does native workers. Its incidence appears to be around 5% higher: 25% versus 20% in the EU15, 26% versus 21% in the EU12, and 33% versus 28% in Asian countries. Overeducation affects migrants to a lesser extent than native workers in African, Central & South American, and North American & Oceanian countries. On these continents the phenomenon is around 3% lower among immigrants. The migration phenomenon is different across countries in terms of magnitude as well as country of provenance, and these differences reflect differences in the reason behind the migration decision. Differences in the incidence of overeducation with respect to the native population might be a consequence of the different reasons for which people migrate. Figure 3 presents a comparison of the incidence of overeducation per continent of residence and distinguishes between native and migrant workers according to their continent of birth. There are some clear patterns. EU15 migrants, for example, show just a slightly higher overeducation incidence with respect to native workers in other EU15 countries and European non-EU countries, and they are clearly less affected by overeducation when they migrate to the rest of the world. North American & Oceanian migrant workers are in a similar situation, with a lower incidence of overeducation than native workers in all countries except the EU12 countries. Quite the opposite case is shown by EU12 migrants, who are always (except in North America & Oceania and non-EU Europe) more overeducated than native workers. Non-EU European migrants show a similar pattern. Differences in the incidence of overeducation with respect to native workers among African, Central & South American and Asian migrants depend on the continent of destination. It is clear that the differences in the incidence of overeducation depend not only on the country of residence but, as can be expected, also on the country of origin and country of destination bilateral relation.
The findings of our data exploration show the need for a deeper investigation into the relation between overeducation and migration, focusing on workers' idiosyncratic characteristics.

Overeducation and idiosyncratic characteristics
Our first research objective was to assess whether personal demographic and jobrelated characteristics are related to the self-assessed overeducation. This analysis was aimed at obtaining evidence as to whether overeducation is related to personal characteristics (which, in turn, reflect national labour market characteristics), assuming that skill mismatch varies among workers' characteristics, such as educational attainment and job level. In the first column of Table 4 are the results of the logistic regression analysis including what we defined in section 3.2 as demographic characteristics run on   the total survey population. The covariates' coefficients perform as expected, and the corresponding odds ratios are presented. According to the results, female workers are over 30% more likely to be overeducated than male workers. It is also interesting to observe the (expected) significant relation between overeducation and education level: the higher the education level attained by a worker, the greater the possibility of being overeducated. When considering job-specific characteristics, it turns out that the corporate hierarchy has a significant negative relation to overeducation. This relation appears to be close to linear, since high-level workers (CEOs) have 77% less probability of being overeducated than low-level workers (helpers). The firm size also seems to influence the probability of being overeducated: the probability is lower in large firms than in small firms. Workers in trade and transport sectors are the most likely to experience overeducation (around 60% more likely than agriculture and manufacturing workers). Finally, as regards geographical differences, the results confirm our previous findings: non-EU European workers are the most likely to experience overeducation, followed by Asian and Latin American workers.

Theoretical explanations
Our first research objective also comprised testing whether a range of theoretically based assumptions affect the incidence of overeducation. We tested whether the theoretical explanations for overeducation presented in section 2.3 hold in our dataset. For this, we made use of the proxies presented in section 3.2. The second column of Table 4 shows the regression results when these covariates are included. As expected, overeducation is a phenomenon that affects younger workers to a greater extent: as each year passes, the probability of the typical worker being overeducated decreases by more than 1%. Poor bargaining power-measured here by the number of career breaks workers experience-has a significant effect on the probability of having jobs for which they are overeducated. For each additional career break, this probability increases by almost 10%, confirming how overeducated workers changed jobs more frequently. Ethnic differences also matter, according to our findings: second-generation migrants who are nationals are 15% more likely to be overeducated than other nationals. The fourth theoretical explanation refers to job allocation frictions related to career mobility, assuming a direct relation between overeducation and probability of promotion. Our results do not seem to support this idea. There is no trade-off between good job prospects and accepting being overeducated. Indeed, the antithesis seems to be confirmed: workers who believe they have good job prospects in their present organisation are 60% less likely to be overeducated. This outcome suggests that there could be a clear distinction concerning the quality of jobs, with good skill match and proper career development opportunities opposed to low quality positions with poor match and career opportunities. It has to be highlighted, however, that the job prospects variable has a non-negligible number of missing values, with a response rate of only 18%, which might bias the whole sample. A similar response rate is found in the industry variable (25%). Results omitting these variables are therefore reported in the third column of Table 4. In order to avoid these biases, these variables were excluded from the subsequent analysis.

Overeducation and the migrant condition
The main aim of this study was to assess whether overeducation is more common among migrants compared to native workers. The last column of Table 4 shows the regression outcome when considering the full equation 1 including a discrete variable describing the migrant condition. It confirms that overeducation occurs more often among migrant workers. When considering the whole dataset, it is clear that being a migrant has a significant relation to the probability of experiencing overeducation: migrants are 10% more likely to be overeducated than native workers.
We clearly observed in section 4.1 that the migrant condition has a different effect on the overeducation incidence depending on the continent of residence and that of birth. We therefore estimated equation 2 taking into consideration the personal characteristics presented above as well as the interaction between the continent of birth and that of residence. We then computed the overeducation predicted probability for each country of birth/residence combination and compared it with the probability of native workers. The differences between migrant and native workers are reported in Table 5.¹³ The analysis so far had relied on the strong assumption that the covariates' effects have the same intensity across several countries, in which case the covariates' coefficients should be the same (or at least similar) across different countries. Furthermore, we are treating countries with high and low survey penetration similarly. In order to test the extent to which these assumptions held, we ran the regressions based on equation 1 for 10 country-specific datasets¹⁴ and a dataset composed by the information collected in the 46 countries with the lowest survey penetration. We than compared the odds obtained. The control countries were Germany, the Netherlands, Belgium, Belarus, Ukraine, Kazakhstan, Indonesia, Brazil, Argentina, and South Africa. The results are presented in Appendix 1. The mean and standard deviations presented suggest that the assumptions essentially hold. Demographic variables' coefficients show a very low variation across countries. The age effect shows a standard deviation of 0.07; the gender effect is negative for female workers in all but non-EU European countries. Jobrelated variables, such as the role of breaks and position in the corporate hierarchy, are very similar across the world. The coefficients concerning the role of educational attainment vary the most; nevertheless, almost all countries share an increasing effect as education levels grow. In conclusion, we acknowledge that the strong assumption of the same intensity for the covariates in different countries can be accepted. It is well known that migration leads different types of individuals to move to certain destination places according to their personal (observable and unobservable) characteristics (Borjas 1987(Borjas , 1989. This selection process can, in turn, affect the observed probabilities of migrants to be overeducated. According to the literature, individuals make the migration decision upon consideration of the utility they can get from the different options, which in turn are driven by both host and source countries' socio-economic conditions. Studying the migration assessment process goes well beyond the purpose of this paper. Nevertheless, we need to acknowledge that migrants across countries are non-randomly selected individuals and that the characteristics that drive the migration decision might also be related to the probabilities of these individuals to be overeducated. It is possible that potential migrants compare the professional development perspectives in the various countries and choose to move to the country with the best professional fit, of which overeducation is a measure. In equation 1 we can control for those observable characteristics influencing the fit; however, there could be some unobserved characteristics that affect simultaneously the migration decision and the overeducation outcome. Not controlling for these characteristics might lead to biased results. Since Table 3 shows how observed characteristics explain only limitedly overeducation (given the low pseudo-R2), unobserved idiosyncrasy differences among individuals could therefore be relevant on our study.
In order to take into account this self-selection mechanism and observe to which extent it affects our results, we estimated a system of equations where the probability of overeducation is jointly estimated with the probability of being a migrant. This twostage method (first developed by Heckman, 1976) corrects for selection bias. Furthermore it produces an estimation of the extent to which the individuals' unobservable characteristics are simultaneously related with both the probability of being a migrant and the probability of being overeducated. Indeed, this is measured by the rho coefficient, which can vary in absolute values from 0 to 1.
Results of our estimation are presented in Appendix 2. The significant and negative rho coefficient observed when the system is estimated on the whole dataset confirms the presence of a self-selection bias. There are personal characteristics among migrants responsible for the migration decision, as well as for the fact that the person consider herself overeducated in the present job even though they affect those probabilities antithetically. However, the value of the coefficient is limited, from 0.23 to 0.36. When the system is estimated for two specific countries (the Netherlands and Germany), the rho coefficient is not significantly different from 0. We therefore recognised the relevance of the selfselection process in the overeducation of migrants workers; nevertheless, we consider that assuming its triviality does not change the fundamental results of the analysis that follows.
The second research objective was to observe whether the probability that the migrant status affects overeducation varies consistently by continent of residence and of origin.
Despite the large variety among different combinations, observing the situation by migrant groups (by columns in Table 5), some patterns emerge. For example, when EU15 citizens migrate they seem to show a lower overeducation incidence than native workers, with the exception of other EU15 countries, where they are slightly more affected. African migrants, on the other hand, are more overeducated, especially when moving to European countries. Central & South American migrants are more likely than natives to experience overeducation when migrating to EU15 and EU12 countries but less likely when migrating to Asia, Central & South America, and Africa. Given the limited number of respondent residents in North America & Oceania, we can hardly draw conclusions on the figures exposed.
If personal characteristics and the migrant condition are related to overeducation, it is worth studying how these features interact. Our third research objective was to examine the relation between the theoretical explanations of overeducation presented in section 2.3 and the migrant condition. The effect of the interaction between migrant condition and the proxy variables standing for the theoretical explanations are presented in Table 6.¹⁵ Note how young migrant workers are no more overeducated than older migrants, ceteris paribus. It seems that if a native worker's probability of being overeducated decreases over his or her lifecycle, this is not the case for migrants. The effect is indeed the opposite: the odds of being overeducated increase slightly as the years go by. Time does not 'cure' migrants' overeducation as it does with native workers. This finding, which is in line with previous studies, indicates that overeducation is a problem that affects migrants not only during their integration process, but also in the long run. Although the low bargaining power assumption still holds for migrants, its effect is significantly less. A native worker is 10% more likely to be overeducated per each career break experience, while this value is reduced to half in the case of Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' Note: Results concerning the interaction between the migrant condition and demographical covariates are omitted, available from authors upon request migrants. When observing the poor bargaining power associated with gender, it seems that there is no significant difference between native and migrant workers. In section 2.3 it was also hypothesised that workers facing discrimination by employers are more likely to report overeducation. This third theoretical explanation (ethnic discrimination) was not tested for migrants, given the absence of a specific proxy variable that allowed for ethnic distinctions. Career mobility has the same (unexpected) effect for both native and migrant workers: the better the career prospects, the lower the propensity towards overeducation. We then tested the job allocation frictions due to the poor abilities of migrant workers. Not sharing the same mother language with native workers is assumed to be a proxy for the poor abilities. The language mismatch increases the probability of being overeducated by approximately 10%. In conclusion, given the exposed positive relation to age, career breaks, and poor job prospects, overeducation appears to be a persisting problem that affects immigrants in the long run.

Conclusion
Skill mismatch is more common among migrants compared to native workers, although the incidence differs across migrants depending on the country of residence and the country of origin. Furthermore, idiosyncratic workers' characteristics affect native and migrant workers' overeducation differently. To achieve these exploratory results, this study used data from a survey in which workers themselves assess whether they are qualified or overqualified for their job. The data stem from a continuous and voluntary multi-country web survey, extracting 673,898 observations for the years 2008-2013 from 86 countries. The dataset is European focused: almost 60% of the respondents work in the EU or in a non-EU eastern European country. The share of migrants varies, depending on the continent, from 3% to 10%.¹⁶ Within-continent migration represents the greatest share of migrants. The main disadvantage of the dataset is embodied in the low representativeness of some specific demographic groups; nevertheless, when our dataset is compared to the means of demographic variables known from other sources, the sample's demographic means do not deviate to a large extent. The main advantages are its ample size and the large variety of country of origin and destination combinations, which allows for detailed analysis of different migration flows. Nevertheless, given the limitations of our dataset, we consider our findings exploratory rather than conclusive.
Overeducation affects migrants to a different extent according to the country of residence. Migrants working in Europe and Asia are more likely than native workers to be overeducated, whereas migrants in Africa and Latin America are less likely to be overeducated compared to native workers. These findings might be the sign of two kinds of migration. Nevertheless, differences in the incidence of overeducation between native and migrant workers are related not only to the country of residence, but also to the combination of country of origin and destination. Immigrants experience a higher incidence of overeducation than native workers, although the disparity differs by migrant origin and destination. For example, the incidence of overeducation is higher compared to native workers for EU12 migrants living in a EU15 country, and it is lower for EU15 migrants working in the EU12. This finding is confirmed by both an explorative and a in-depth analysis.
As regards the relation between personal (demographic and job-related) characteristics and overeducation, our dataset confirms the results most frequently highlighted by the literature. Our analysis shows, not surprisingly, that female workers are more likely to be overeducated. The higher the individual's level of education, the more overeducation can be expected; and the higher the individual's job level, the less overeducation can be expected. Controls for firm size and industry reveal that overeducation occurs more often in small firms compared to large firms and more often in trade, transport, and hospitality compared to the other commercial services or to primary and secondary economic activities. Thus, the characteristics of both workers and national labour markets influence the incidence of overeducation.
A few theoretically based assumptions, grounded in the literature on the subject, are used to explain overeducation. When tested, they reveal that recent labour market entrants are more likely to be overqualified. This inclination decreases over time. Workers with poor bargaining power (e.g., workers with several career breaks) are also inclined towards a higher overeducation incidence. Employer discrimination is assumed to increase the incidence of overeducation as well. Indeed, second-generation migrants are prone to labour market discrimination, and this in turn seems to relate to the likelihood of overqualification. Finally, a trade-off between job prospects and overeducation is hypothesised, with workers accepting temporarily a job for which they are overqualified. Our analysis refutes this hypothesis.
When the migrant condition was introduced to test whether it is related to the probability of being overeducated, our study confirmed the higher incidence of overeducation among the migrant population. In the search for personal characteristics determining the migrant educational mismatch, we tested the same theoretically based assumptions specifically for the migrant population. The relation between overeducation and these personal features found for the total population does not always hold in the case of migrants. Age is not negatively related to the skill mismatch as it is in the case of native workers, while being part of a poor bargaining power group (for gender or career history reasons) seems to have less effect on the overeducation incidence among migrant than native workers. Good job prospects are related to lower overeducation, as in the case of native workers. Furthermore, other migrant-specific conditions-such as sharing the same mother language with native workers-matter, reducing the chances of overeducation. We conclude that overeducation is a persisting problem that affects immigrants over the long run to a greater extent than it does native workers.
This study confirms the existing literature on the effect of personal characteristics on overeducation and extends it to the case of migrant workers. In addition, our analysis classifies migrants into groups according to their country of origin and destination combination-something that had not been done before-and shows that this sort of classification needs to be controlled for when researching overeducation among migrants. However, this article calls for further investigations into the theoretical underpinnings behind the country of origin and destination combination as a factor affecting overeducation. It also underlines the need for further empirical research on comparative cross-national differences.
Endnotes 1 Of whom, 42% live in an EU country, 18% in a non-EU European country, 18% in Latin America, 13% in Asia, 6% in Africa, and 1% in North America & Oceania.
2 For the purpose of matching jobseekers to vacancies, skill requirements need to be far more detailed. This is usually done by professional job analysts, who analyse skill requirements in job advertisements, study realised job matches, or undertake company studies of required skills. However, this method typically addresses a selected set of occupations and does not cover all occupations in a national labour market, as the latter is a huge undertaking. 3 The latter finding could also be justified by the fact that a certain level of education is a necessary condition to observe overeducation. 4 The project's website is www.wageindicator.org. 5 Note that also web surveys that are based on email invitations from a large database (panel) of respondents are volunteer surveys. Only a very few web surveys, such as the LISS panel from Tilburg University, are randomly sampled using non-internet sampling frames. Note further that random sampled surveys may also be biased in the case of substantial non-response, which nowadays drops below 50% in many surveys. The six levels are: six levels within the corporate hierarchy, ranging from 1 = Helper, 2 = Occupation, 3 = First line supervisor, 4 = Departmental manager, 5 = Manager, company director, or chief executive, 6 = CEO. Respondents self-identify their occupation from a database of 1,700 occupational titles, which is sufficiently large for the vast majority of respondents, through text string matching and through a search tree. The second author has classified these 1,700 occupations according to the corporate hierarchy. 12 This is not surprising in the case of the North America & Oceania group, which is composed of only four countries. 13 Predicted probabilities for natives and migrant groups are computed and compared. Table 5 presents the difference of the predicted probability of each migrant group with respect to the corresponding native population on each continent. The regression results and the predicted probabilities are presented in table Table 10 in Appendix 3. 14 Countries were chosen taking into consideration continent representativeness and data availabilities criteria. 15 The regression results are reported in table Table 11 in Appendix 3. 16 With the exeption of North America & Oceania, where we registered an incidence of 25% of migrant population. Nevertheless, this value is probably biased by the low number of respondents from that geographical area.  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '       Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.'