Effects of urban environmental attributes on graduate job preferences in Northeastern China: an application of conjoint analysis and big data methods

A constant supply of novel ideas and contributions from all economic sectors is required to further the sustainable development of cities. Therefore, there is a growing need for well-educated graduates to enter metropolitan job markets. As urban environments and culture have been shown to affect a graduates’ eventual carrier choice and trajectory, governments often seek to change their local environments to attract graduates who can help efficiently allocate and utilize a city’s often-limited environmental budgets. In this study, the conjoint analysis (CA) method was employed to explore the effects of four environmental attributes (water pollution, air pollution, littering, and green area) on graduate employment preferences in northeast China. Water pollution was shown to have the greatest effect on graduate preferences (43.6%), followed by air pollution (34.1%), littering (20.7%), and green area (1.6%). According to this ranking of importance, cities could improve their environmental attributes to maximize the attraction of Northeast graduates. Moreover, this study applied the Baidu index (a big data sharing platform) to improve the attribute selection process of the CA method. The improvement reduced the cost of the CA method and enhanced its objectivity.


Introduction
The development of sustainable cities requires significant human resources [1]. However, it is difficult to find well-educated employees and preserve them for a long term [2]. In addition to this human capital, urban sustainability is directly affected by several other factors, particularly the local environment, economy, and society [3]. City governments can adjust these three factors to efficiently attract well-educated human resources and effectively advance sustainable development goals. Therefore, many policies have been issued by Chinese metropolitan governments that aim to attracting skilled workers.
Specifically, cities in Northeastern China have been attracting highly educated graduate workers. Northeastern China consists of three provinces (Heilongjiang, Jilin and Liaoning) with five cities in In this article, the term 'job preferences' is used to refer to various employee tendencies regarding the process of career selection. Typically, people decide where to live and work by measuring several critical attributes and comparing them between cities. These attributes often include wealth effects [8], city size, geographical preferences of employees [9][10][11], social services, income, and population [12]. Previous studies have also indicated that local environmental attributes affect job preferences [13]. Indeed, many economically prominent regions have created initiatives that promote environmental sustainability and clean-up to attract well-educated talent [14]. However, while numerous studies have suggested that environmental attributes are important for attracting talent, the relative importance of different attributes has yet to be explored.
Previously, studies have sought to quantify the level or health of urban environments by combining several of the most important environmental attributes. Zenker et al applied the conjoint analysis (CA) method to compare several German cities and analyze their attractiveness to well-educated graduates and talents [15]. Their research grouped several environmental attributes, including the number of parks, amount of pollution, and access to water, into one factor, 'Nature and Recreation' . de Noni et al [16] combined quantifications of a city's green areas, air pollution, and waste management efficiency into a single parameter to assess Milan, Italy. Similarly, Merrilees et al [17] utilized a 'clean environment' factor, which accounted for the cleanliness and pollution of a city's local environment. However, not all studies clearly delineate which approach is applied to determine the most relevant environmental criteria or the rank of importance given to each attribute. This information is vital for informing future studies as well as policy-level decisions regarding the distribution of the often-limited environment-related budgets of local governments. CA, as applied by Zenker et al [15], first became a widely applied marketing research technique used to assess consumer multi-attribute utility functions in the 1970s [18,19]. It has since become one of the most common approaches for exploring individual preferences over the past four decades [20,21]. CA was effectively used to assess the non-market value of chosen attributes [20] and was initially applied to environmental studies by Beggs et al [22]. CA has even been applied to determine the willingness of participants to pay for environmental issues [23]. Thus, due to the well-documented reliability of CA, it was chosen to examine graduate attitudes toward different urban environmental attributes in this study.
While CA can be an effective tool, studies that employ this method are generally limited regarding the number of attributes that can be considered in a single study, as a larger pool of attributes leads to an increased number of questions on the corresponding questionnaire. This can result in respondents experiencing a fatigue effect that reduces the reliability of the results [20,24]. Moreover, research has indicated that individuals make decisions primarily by considering only a few critical attributes [25,26]. It is possible to use the CA method to assess respondent preferences for a long list of relevant attributes; however, this requires complicated mathematical models (i.e. hybrid CA and adaptive CA) [27,28]. Therefore, it is common to use information from preliminary experiments or consultation with experts to determine the most important parameters and reduce the number of study attributes and questions [27][28][29]. However, conducting exploratory experiments often costs excess time and money.
Fortunately, big data technology has been shown to accurately predict human preferences and behaviors [30,31], creating an opportunity for a new attribute number reduction method to be explored and established in this study. The Baidu index is a big data sharing platform constructed by Baidu's massive users' behavior information and it represents a normalized search volume for selected keywords over a specified period [32,33]. Since Baidu is the most commonly used search engine by Chinese people, the Baidu index has become an important source for many big data studies in China [21,[34][35][36]. The application of the Baidu index may help researchers to improve traditional CA methods by establishing an objective low-cost attribute number reduction process.
In this study, the Baidu index was applied to develop and optimize a new attribute reduction process for the CA method. Moreover, the significance of each selected urban environmental attribute was explored in terms of its impact on Northeastern graduate job preferences. Finally, the differences among recent graduates with different expected incomes were also examined.
The remainder of this paper is organized as follows; section 2 introduces the research method and verifies the feasibility of simplifying the conventional CA process by exploiting Baidu index. Section 3 explains the results of the CA experiments including the importance rank of the different environmental attributes regarding their influence on graduate preferences, as well as the correlation between the expected income of recent graduates and their environmental preferences. Section 4 elucidates implications of our results and compares our results with the conclusions from other similar studies. Section 5 presents the conclusions of our work and provides suggestions for future research. An outline is provided at the end of the study.

Methods
A face-to-face survey was designed using the Tencent questionnaire (https://wj.qq.com/). All data were acquired between March 2019 and May 2019, and a mathematical model was built using the statistical software environment, R. An exploratory experiment examined the correlation between the Baidu index of environmental attributes and graduate preferences for the attributes.

Conjoint analysis (CA) method
CA was chosen to assess graduate job preferences in this study due to its well-documented reliability and use over time (see section 1). Relevant urban environmental attributes were chosen by the big data sharing platform of Baidu company. This method of attribute selection was chosen because it required significantly less time and money than preliminary analyses. All processes of the CA needed to be accomplished rapidly before the respondents graduated, as students usually plan their future careers and form certain psychological expectations in the months prior to graduation [37]. Furthermore, expert consultation is typically regarded as a highly subjective method that is difficult to apply uniformly to varying situations and was thus not considered in this study [38].

Sampling process
All respondents in this study were from one of five Chinese universities. At least one university was chosen from each major province in Northeastern China. According to the comprehensive strength ranking from New Oriental Education & Technology Group Inc. (Xindongfang in Chinese) [39], the largest education company in China, five universities at different levels (Jilin University (ranking 9th), Harbin Engineering University (ranking 63rd), Dalian Maritime University (ranking 112nd), Changchun University of Science and Technology (ranking 171st) and Jilin Jianzhu University (ranking 433rd)) was selected as sample sources. In 2017, 10.8% of graduates in China were from the top 50 Chinese universities [40]. Thus, we selected approximately 10.8% of respondents from Jilin University, and the remaining respondents were chosen from the four other universities. The 2017 graduate data were also used to determine the educational level of respondents, with 85% graduating with a bachelor's degree, 13.5% with a master's degree (13.5%) and 1.5% with a doctoral degree.
We planned to recruit 500 respondents for the preliminary experiment and 1600 respondents for the main experiment. Through cooperation with the staff of the five universities, we randomly selected respondents in the graduating class according to the proportions above and distributed questionnaires to their mobile phones or computers. Finally, 483 valid questionnaires were collected for the preliminary experiment, and 1589 valid questionnaires were collected for the CA experiment.

Experimental process design
As shown in figure 1, the urban environmental attributes and levels required to be considered for this experiment were identified in Step 1. A preliminary experiment was utilized to select these attributes. In the preliminary experiment, we selected the most relevant attributes using the traditional questionnaire method alongside the new Baidu index selection method. The results of the two selection methods were compared to verify the reliability of the new method. In Step 2, the main-experiment questionnaire was designed to examine the characteristics and socioeconomic information of respondents and their preferences for the combinations of different urban environmental attributes and levels. Table 1 outlines the respondent's reported demographics. We used the orthogonal processing function of SPSS software to design these attributes and levels into nine combinations. To examine graduate preferences, these were portrayed as nine virtual cities that the respondents scored in the CA portion of the questionnaire (see appendix 1). Then, in Step 3, the questionnaire data were analyzed based on the model established using R version 3.4.3 [41], and the part-worth utility (PWU) and relative importance results were obtained. Finally, in Step 4, we evaluated these results to identify overall trends in graduate preferences for urban environmental attributes and job preferences of graduates with different expected incomes.

Preliminary experiment
A preliminary experiment was conducted to rank the urban environmental attributes based on their importance using two methods. The first method was based on the Baidu index. The Baidu index Table 1. Four urban environmental attributes included in the conventional CA task. (P is the proportion of annual air pollution days and A is park green area per capita.).

Attributes
Attribute definition Attribute levels

Air pollution
In this study, the attribute of air pollution is quantified as the frequency of occurrence of fine particulate matter (PM 2.5) pollution in a year.
P < 5% 5% < P < 20% 20% < P < 35% Water pollution The attribute of water pollution is defined as the contamination of water sources in city.

No Slight Serious Littering
The attribute of littering refers to the existence of garbage in areas such as citizens' living quarters, workplaces, and major transportation routes.

No Slight Serious Green area
Green area refers to the size of the vegetation coverage area in the main living area of the citizens. In this study, this attribute is quantified by the city's park green area per capita. refers to the total number of times a particular word was searched on the Baidu website within a period specified by the researchers. On the Baidu website, researchers can select a specific word and determine the Baidu index of other words that are most closely related to this word. Through this function, we obtained the Baidu index for all attributes related to the urban environment in 2018. There were seven urban environmental attributes identified with an average of 10 or more searches per day: air pollution, water pollution, green area, littering, noise pollution, soil pollution, and light pollution. Figure 2 presents the monthly Baidu index of these attributes for the entire period of 2018. A higher Baidu index indicates more searches. Objects with more searches have attracted more interest from people [42,43], meaning that they are perceived as more important than other objects with fewer searches.
To examine the similarity between the urban environmental attribute ranks obtained with the Baidu index and those obtained with the traditional questionnaire method, we used web-based questionnaires to explore graduate preferences for the seven attributes mentioned above. They were asked to score them from 7 to 1 (7-point Likert scale), where higher scores represented greater importance. Graduate preferences regarding urban environmental attributes were then determined using the total scores from 483 respondents. This was compared with the results from the Baidu index analysis (figure 3).
A similar trend was found between the Baidu index and pre-test score. SPSS software was used to examine the Spearman correlation coefficient, which was 0.964 (p < 0.001, N = 7), suggesting that the Baidu index was highly correlated with the pre-test scores. Thus, to avoid the fatigue effect and focus only on the most important attributes, we selected the four environmental attributes with the highest Baidu index for the CA: air quality, water pollution, littering, and green area.
There exist several CA methods including conventional CA, hybrid CA, adaptive CA and hierarchical CA [44]. To achieve high performance in data analysis with less than five attributes, the conventional CA method was chosen for the following experiment [20].

Conjoint analysis (CA) process
The survey was conducted in 10-15 min of class time, as assisted by university staff. The first part of the questionnaire focused on respondent characteristics and socioeconomic information (i.e. gender, age, educational background, income, residential area).
The second part was the CA, where attribute levels were set according to the standards established by the Chinese government. In China, urban air quality is usually assessed using the annual percentage of days where air pollution is higher than a set threshold. Green area was assessed as the amount of park green area per capita. The levels of these two environmental attributes were set using data from the four most developed Chinese cities (Beijing, Shanghai, Guangzhou, and Shenzhen) [45]. The levels of the other two attributes (water pollution and littering) directly refer to the pollution degree classification of the Chinese urban environment [46,47]. Before answering the questionnaire, respondents were confirmed to be able to understand and distinguish each study attribute and level.
Orthogonal processing was used to simplify the levels of each attribute into nine different combinations. Table 1 lists the selected urban environmental attributes along with their levels. Bigsby and Ozanne [48] found that visual stimuli can help respondents understand the different attribute and level combinations; thus, respondents were presented with pictures to better understand different environmental attributes.
Additionally, participants were asked to respond to a traditional CA questionnaire. This model was first described by Louviere and Woodworth [49]. Shocker and Srinivasan further suggested that the attributes of CA method should be made operational [42].
Consequently, the respondents were required to indicate if they had previous experience with poor levels of any of the four study attributes. Then, the staff guided the respondents to recall their experience, personally involving them in the simulated situation. Then, respondents were required to rate the nine simulated pictures of the different attribute and level combinations. The rating scale ranged from 0 to 10 where 0, suggested that working in that city was unacceptable, whereas a ten indicated that it was extremely desirable to work there.

Mathematical model and data processing
The findings of the CA section of the survey were analyzed for all samples and 11 different socialdemographic and personal variables were additionally accounted for. Graduate preferences were calculated from the results of the CA section using a multinomial logit function. The function determined the importance of the respective attribute on the graduates' decision making relative to the other attributes, as well as the level of importance of each level of the attributes (PWU). The derived graduate preference model was thus expressed using: where β 0 denotes a constant coefficient for each alternative, and β 1 , β 2 , β 3 ,…, β n represent the coefficients obtained by the logit model and indicate the relative importance of the attributes in each alternative. The relative importance of attributes suggests their importance for decision making and the preferences for all levels in the attribute. The relative importance values of each attribute were calculated by determining the PWUs for each level of the attribute. PWU values show how much a specific level of an attribute is desired or unwanted. For example, the 'water pollution' attribute has three levels (no pollution, slight pollution, and serious pollution). If no pollution, slight pollution, and serious pollution had PWU values of 7, 0, and −4, respectively, then the 'unpolluted' level is the most desired level, with a positive PWU. 'Slight pollution' is neither desired nor unwanted, and 'serious pollution' is unwanted. The relative importance of an attribute is the ratio of the difference between its highest and the lowest PWU to the sum of the differences between the highest and lowest PWU of all attributes. Moreover, we adopted the demographics hierarchical Bayes model to analyze the 1589 samples of the main study. The model was built using R's Bayesm package [50] for the hierarchical Bayes randomeffects model, which is capable of estimating general and individual parameters simultaneously. Table 2 lists the demographic characteristics of the respondents.

Part-worth utilities (PWUs)
The PWUs of all attribute levels increased with decreasing pollution, as expected. However, 'green area' did not increase as significantly as the other attributes ( figure 4). The PWUs of this attribute's levels were extremely close, indicating that all levels were relatively desirable for most respondents. For air pollution, an attribute level of less than 5% was significantly more desirable (PWU = 2.672) than the 5%-20% level (PWU = 1.346) or the 20%-35% level (PWU = −0.2731). 'Water Pollution' exhibited the most significant PWU difference of all tested attributes. The PWUs of the three levels (None, Slight, Serious) were 3.127, 1.682 and −0.637, respectively, suggesting that respondents prefer no water pollution almost twice as much as slight water pollution. Moreover, the PWU values indicated that 'no littering' had similar desirability to 'slight littering' , and both of these less-severe levels were significantly more preferred than 'serious littering' .  Figure 5 presents the relative importance of the four selected attributes. The results revealed that water pollution (43.6%) was the most important attribute, followed by air pollution (34.1%), littering (20.7%), and green area (1.6%). The relative importance of Per capita green area was significantly lower than the other attributes. The rank of their relative importance   is completely identical to the rank of their Baidu indexes (figure 2). Figure 6 shows that the relative importance of urban environmental attributes changed with an increase in expected future income. To avoid the effect of education level, only students graduating with a bachelor's degree were selected for this analysis. The results showed that the importance of littering and air pollution increased with expected future income, the importance of water pollution decreased, and the importance of green areas was the greatest for graduates with the lowest expected future income.

Discussion
For the four urban environmental attributes examined in this study (water pollution, air pollution, littering, and green area), our results suggest that similar trends existed for the Baidu index and pre-test score results. Combined with a high Spearman correlation (0.964), our validation indicates that environmental attributes for the CA method can be accurately selected using the Baidu index, as both preliminary analyses created similar rankings. This result means that a Baidu index-based attribute selecting method can replace the traditional questionnaire method, solving the problem of long experimental time and high expenses while still retaining its utility. In the study by Arning et al that investigated the public acceptance of sustainable CO 2 -derived building materials, the preliminary experiment used five participants (out of 145 total participants) to determine the attributes for the main experiment [51]. In a study by Chelsea et al, ten patients (out of 200 total participants) were selected for one-toone 20 min interviews that investigated the most important attributes of arthrodesis and arthroplasty in their cognition [52]. For studies with too few preexperimental participants, use of the Baidu index method to filter attributes can help to avoid subjective results due to a shortage of interviewees. Additionally, CA of environmental cases often requires a large number of samples, so pre-experiments also require a large number of respondents to make them effective. Gil-HwanLim adopted a joint analysis method to investigate South Korean residents' attitudes towards nuclear power plants [53]. Researchers used the participants in the preliminary experiment used their questionnaire responses of 81 participants (out of a total of 1417 participants) in the preliminary experiment to identify the three most important attributes for all respondents. For CA experiments in the environmental field, the Baidu index method can be used to shorten the experimental period by simplifying the pre-experiment and greatly reducing experiment costs. The Baidu index is just a basic application of big data technology on the Chinese search engine platform. There are still many limitations to its application, particularly the lack of users in Englishspeaking regions of the world. Compared with Baidu index, Google Trends is a search behavior analysis platform with similar functions for English users. It is believed that Google Trends can replace the role of Baidu index in the attribute selection process of CA in order to apply this method on a more global scale.
Our results indicate that the most important environmental attribute influencing graduate job preferences is water pollution, followed by air quality, littering, and green area. These results indicate that improvements to water pollution issues will be more attractive for Northeast graduates than improvements to other environmental attributes. For graduates from other regions, the results of this study may not be applicable. Previous studies have suggested that green space and cleanliness are the most important environmental attributes for attracting workers or students [54,55]. In a study evaluating the preferences of residents in Porto, Portugal, the most important environmental attribute was green space, which was much more important than urban cleanliness or pollution (air, water, and noise) [56]. Conversely, environmental quality (green areas, urban waste management, and noise/air pollution prevention) was shown to have a more negative impact on the attractiveness of Milan, Italy to its residents [16]. It is likely that regional differences have caused the disparities between the results of our study and those of other cities and regions. Graduates from Northeastern China have different preferences and experiences than respondents from the Americas or Europe. For developing Chinese cities, Kumar et al indicates that the quality of water resource plays a critical role in their development, and preventing water pollution is the most important part of urban environmental pollution control [57]. These results show similarities to the environmental preferences of Northeast graduates in our study.
Besides Beijing, Shanghai, Guangzhou and Shenzhen, many fast-growing Chinese cities also have a huge demand for talents. Due to the significant differences in scale between these metropolises and other Chinese cities, it is necessary to minimize the gap between non-environment factors to better reflect the differences in the attractiveness of different environmental factors. Based on the agglomeration of commercial resources, urban hubs, urban resident activity, lifestyle diversity, and future plasticity levels, Chinese cities are divided into six levels. The number of cities at different levels is as follows: 4 first-tier cities, 15 new first-tier cities, 30 second-tier cities, 70 third-tier cities, 90 fourth-tier cities, 128 fifth-tier cities [58]. It is feasible to select cities of the same level and adopt our method to evaluate the environmental attractiveness to graduates. However, our research still has other limitations. Although our results point out that improving water pollution is the most important environmental measure, the graduates' detailed preferences of graduates on ways to improve water quality and their willingness to pay still need to be examined.
Expected future income refers to income estimated based on the past experiences of an individuals. Graduates with different expected future incomes varied in their indicated varying preferences for the environmental attribute levels. The graduates expecting higher incomes tended to select cities with higher levels of positive environmental attributes (e.g. less air pollution and littering). However, graduates expecting lower income were more concerned about local urban water pollution. Green areas did not exhibit any correlation with the expected future income of graduates. Since graduates usually have no real income before they graduate, we classified respondents based on their expected future income and examined the subsequent differences in their urban environmental preferences. The reason for the difference in preferences of students with different expected incomes might be that graduates with higher expected incomes have enough budgeted to select a living place which is surrounded by a clean environment. Air quality is not particularly different for various regions in the same city, thus 'air pollution' was the urban environmental attribute they considered firstly. Jacobsen, Lundhede and Thorsen indicated that the expected future is critical to environmental attitudes [59]. A study of 39 cities in the Czech Republic documented that inhabitants with unfavorable socioeconomic status mainly reside in smaller cities with higher concentration levels of combustion-related air pollutants [60]. A study by Liu et al revealed that the increase in air pollution concentration has short-term positive production effects but long-term negative impacts on individual income and will exacerbating the income inequality across socioeconomic statuses [61]. We speculate that graduates with higher expected incomes may have relatively strong confidence in their careers and are unwilling to trade long-term negative effects on their careers for short-term positive production effects, so they choose living place with clean environment. Li et al also indicated that regional environmental pollution will widen the income gap of local workers [62]. Regional income inequality is likely to affect the tendency of graduates to choose employment in this place. Future research on this subject could apply expected income as a special social background factor in urban environmental research questionnaires to classify unpaid people and obtain their different preferences. A study by Wang et al showed that absolute income has a positive correlations with environmental concerns [63]. Li and Chen argued in their study that relative income is more likely to exert a significant effect on environmental concerns than that of absolute income [64]. However, most of graduates do not yet have an income. Instead, their welfare is still dependent on their expected future income [65].
While graduates did indicate that it was undesirable to live in a location with a high level of littering, they overwhelmingly suggested that air and water pollution were the most critical environmental attributes that were used to help determine the location of their first carrier job. Thus, if policymakers focus only on improving these two attributes, it should be effective in attracting graduates to urban centers. However, the attribute of 'green area' should not be considered a necessary attribute to promote, as our results showed no significant differences in preferences for the different levels. Therefore, campaigns to increase green areas will not make any significant impact on skilled-worker immigration, meaning urban governments should focus on other environmental problems instead of increasing their green areas. Additionally, the close PWU values found for the low and moderate levels of 'littering' indicate that graduate preferences will not significantly change if a city with 'slight littering' enacts campaigns to reach 'no littering' . Thus, governments should not necessarily strive to improve littering from 'slight' to 'none' , as they will receive no added benefit in skilled labor influx.

Conclusions and suggestions
In this study, we utilized the CA method to examine the importance of various urban environmental attributes for attracting graduates to work in metropolises. This paper employed the Baidu index in a novel manner to improve the attribute selection process of the CA method, reducing its cost and enhancing its objectivity.
Data on graduate job preferences were collected via questionnaires. Then, a mathematical model was created within R's statistical environment to analyze the obtained data. The primary conclusions of this study are as follows: (a) The Baidu index can be feasibly and efficiently employed in the attribute selection process of the CA method. (b) Water pollution was the most critical urban environmental attribute with the highest importance (43.6%), followed by air pollution (34.1%), littering (20.7%), and the green area (1.6%). (c) Local governments should focus on improving local water and air quality to attract highlyeducated graduate workers; littering should simply be maintained at or below a moderate level. (d) Graduates with higher expected future income prefer to address the issues of air pollution and littering over water pollution. Conversely, graduates with lower expected future incomes prefer addressing water pollution over any other environmental attributes of metropolis.
In this study, we discussed which environmental attribute levels are preferred by graduates. This information can be used to help inform the spending decisions of regional or city governments to optimize the most crucial attributes. It should be noted, however, that the eventual profit of attracting graduates optimizing the local environment to their preferences depends greatly on the methods of environmental improvement and how the local budget is divided.
In the future, several problems still require further examination, primarily including the following:

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors. In order to helping you understand the different levels of environment attributes in our questionnaire clearly, we will show you some related pictures (see figure A1-A9).

Funding
In this survey, we set three levels for four environmental attributes (air pollution, water pollution, littering, green area) respectively, and nine virtual metropolises were combined with different levels of these attributes (detailed levels refer to table A1).
You need to evaluate your willingness to work in these nine virtual metropolises and score on a scale 1-10 (1 represents absolutely rejection, 10 is          Table A1. Four urban environmental attributes included in the conventional CA task. (P is the proportion of annual air pollution days and A is park green area per capita.).

Attributes
Attribute levels Air pollution P < 5% 5% < P < 20% 20% < P < 35% Water pollution No Slight Serious Littering No Slight Serious Green area 5 m 2 < A 10 m 2 < A < 20 m 2 20 m 2 < A extremely eager). Please carefully weigh these nine virtual metropolises and score them according to your will.
Thank you for taking the time to complete this questionnaire for us!