Research on the Changing Trend of Employment-Relevant Terms Based on Internet Big Data Analysis

With publicly-available data collected from mainstream information platforms, this study used the term frequency inverse document frequency (TF-IDF) algorithm to detect 74 popular terms and phrases about employment, analyzed the changes in the ranking of these terms and phrases, and visualized the changing trend in the attention to employment skills from 2017 to 2019. The research result will facilitate application of big data technology to teaching administration in colleges, and provide a guide for college students to plan their study of vocational skills.


Introduction
College students are a form of valuable human capital, full of vigor and creativity. Employment of college students is a matter concerned with a nation's political stability and economic sustainability. As reported by the National Bureau of Statistics , China's GDP witnessed an annual growth rate of 7.04%-11.47% from 2015 to 2019, but the number of college graduates in China grew by merely 0.69%-4.49% per year. The increase in the number of talents output by colleges has failed to catch up with the economic growth rate. Besides, as shown in the annual "Chinese College Student Employment Report" released by MyCOS , a third-party investigation agency, the employment rate of college students half a year after graduation stayed between 91% and 92% in five years, but presenting a slight decline, as shown in Table 1.
These numbers showcase two problems facing China's college education: first, the speed of talent training has been behind the economic growth rate; and second, the employment rate of college students has been declining year by year. It is not hard to tell the imbalance between the supply and demand of college students. This study attempts to visualize the changes in the attention to employment skills and capacities based on big data analysis. It is expected that this study could promote application of big data technology in teaching administration of colleges, improve supply-demand balance of college talents, and lead the Chinese economy to sustainable, healthy, and stable development.

Materials
Tencent WeChat, Sina Weibo, and Toutiao are typical "we-media" Internet information platforms in China. These we-media platforms are, on one hand, content receivers, and on the other, distributors of secondary contents. Thus, trending topics spread fast on these platforms, and the data collected from these platforms can well reflect social trends. Therefore, document data collected and sorted by professional Internet opinion analysis agencies from these three platforms were analyzed in this study.

Methods
In this study, the term frequency-inverse document frequency (TF-IDF) algorithm was used to process document terms. In term frequency, a higher frequency of a term in a single document indicates more prominence of the topic that the term represents. As for the inverse frequency, a term with a higher frequency in a single document and a low frequency in other documents indicates a better classification capacity. The calculation method is as follows:

Research implementation
In this study, 79,867 documents were collected containing three keywords "recruitment", "employment", and "job-hunting" publicly available on Sina Weibo, WeChat and Toutiao from January 1st, 2017 to December 31st, 2019, including Weibo blogs, subscription account articles, news and review texts. Using the TF-IDF algorithm, we removed terms and phrases irrelevant to vocational skills, and 74 terms and phrases related to vocational skills and capacities were selected. Terms of similar meanings are categorized into the same item. For instance, "word", "excel", "PowerPoint" and "ppt" are categorized into the item of "office"; and "collaboration", "team" and "coordination" are categorized into the item "cooperation". Finally, we obtained the top 20 terms and phrases each year, as shown in Table 2.

Research results
As Table 1 shows, five terms, i.e., "office", "cooperation", "English proficiency", "stress resilience" and "communication skills", remain top in the ranking, and also among the top-ranking terms are "sales experience", "copywriting skills", "writing skills", and "learning abilities". It is not hard to see that office software, English proficiency and writing skills are vocational skills valued by modern enterprises. The great value that enterprises attach to cooperation, communication skills, learning abilities and stress resilience reflect the complicated labor division, fast updating, and high efficiency of the modern job market. There are some other terms that occur in the table, such as "business trips", "programming", "SPSS", "database", "CAD", "dancing", "Python", "short-video production", "new media operation", and "data modeling", the changes of which in the ranking have reflected the changes in people's recognition in these skills across the years. These changes are worth attention from both college teachers and students.

Limitations
The limitations of this study are as follows. First, the documents studied are from "we-media" Internet information platforms, and it is very likely that the same documents are released multiple times by different users; second, due to the limits in resources, only data of three years from 2017 to 2019 are analyzed, and the analysis result fails to show obvious trends in the changes of hot social topics; third, the research results reflect the social hot topics, and if we want to apply these results to curriculum design and students' career planning, the real-world conditions should be considered.

Conclusions
This study has probed into the term frequency of terms about employment skills on "we-media" Internet information platforms in China from 2017 to 2019, analyzed the changes in the ranking of these terms, and explored the changes in the hot topics about employment skills on the Internet. Through description of the research process, the data processing methods are presented. It is expected that researchers in the future can, based on this study, collect a larger data set, process and visualize the data in a more precise manner, and in this way, the changes in the terms about employment on Internet media can be demonstrated to help colleges and students better understand the needs of the job market. With more research efforts, we can improve the balance between the supply of talents from colleges and the demand of talents in the job market, better leverage the demographic dividends and boost China's economy.