Exploring Travel Patterns during the Holiday Season—A Case Study of Shenzhen Metro System During the Chinese Spring Festival

: Research has shown that the growing holiday travel demand in modern society has a signiﬁcant inﬂuence on daily travel patterns. However, few studies have focused on the distinctness of travel patterns during a holiday season and as a speciﬁed case, travel behavior studies of the Chinese Spring Festival (CSF) at the city level are even rarer. This paper adopts a text-mining model (latent Dirichlet allocation (LDA)) to explore the travel patterns and travel purposes during the CSF season in Shenzhen based on the metro smart card data (MSC) and the points of interest (POIs) data. The study aims to answer two questions—(1) how to use MSC and POIs inferring travel purpose at the metro station level without the socioeconomic backgrounds of the cardholders? (2) What are the overall inner-city mobility patterns and travel activities during the Spring Festival holiday-week? The results show that six features of the CSF travel behavior are found and nine (three broad categories) travel patterns and trip activities are inferred. The activities in which travelers engaged during the CSF season are mainly consumption-oriented events, visiting relatives and friends and tra ﬃ c-oriented events. This study is beneﬁcial to metro corporations (timetable management), business owners (promotion strategy), researchers (travelers’ social attribute inference) and decision-makers (examine public service).


Introduction
Previously, traditional traffic surveys mainly collected data on people's workdays [1,2] and there were few surveys specifically for holidays (e.g., the 1995 American Travel Survey). As a consequence, research on travel behavior had focused on relatively habitual travel behavior, while some types of travel are more flexible and freer both in time and space, such as occasional weekend trips or holiday trips [3] and their characteristics and motivations were not fully understood [4]. However, with sightseeing, shopping and family gatherings have become the mainstream lifestyle of modern society, holiday travel demand has dramatically increased [5] and hence come into sight of policy-making organs and researchers. For example, in Germany, the National Household Travel Survey (MiD (Mobility in Germany)) includes long-distance travel information about public holidays such as Christmas. Likewise, the Reiseanalyse (RA) collects holiday behavior as well as the holiday interests and motivations of the German-speaking people in Germany [6]. Since behaviors like long-distance travel and leisure consumption are largely coupled with holidays, in many European countries, more and more large-scale household travel surveys have covered information on long-distance travel, such as the INVERMO (Germany), Micro Census (Switzerland) and MEST/TEST (France, Portugal, Sweden, UK) [7]. To be

Chinese Spring Festival and Its Travel Behavior Studies
The reason we chose the CSF as the holiday season in this study is not merely due to the availability of research data; another significant reason is that the CSF for all the Chinese is the most representative and crucial holiday. In addition, many internationals could easily and correctly associate the CSF with the color red, lion dances, fireworks and red packets, while few could name other major Chinese holidays such as the Mid-Autumn Festival or the Dragon Boat Festival. Although the CSF celebration custom is slightly different depending on places in China, its uniqueness can be generally reflected in the following six aspects, which may affect mobility patterns. Firstly, the CSF has the longest period off usually at least for seven days officially, which produces abundant activities and travel during this period. Secondly, the CSF might be one of the few but important opportunities for a family reunion over a whole year [31] and this consciousness is rooted in the Chinese and prompts them to return to their hometowns from afar. Thirdly, the CSF has an impact beyond the official holiday because people usually begin preparing for it one or two weeks before it starts [33]. Fourth, everybody goes out to purchase necessities such as food materials, couplets and new clothes before or during the CSF.
Fifth, social interaction between people becomes physically frequent and intimate during the CSF. This is caused by the custom that people should visit their parents-in-law, relatives and close friends on the first or second day of the New Year. Lastly, there are varied activities that people would participate in such as going to the temple to burn joss sticks, strolling around the flower market and watching a fireworks show. The CSF periods are non-working and family-gathering days, so people may make some fully independent and unusual travel activities.
From the above, there is supposed to be a mobility variation during the CSF, while as a specific holiday mobility study, the CSF mobility studies are even rarer previously. Until 2014, with the advent of location-based service, the first study of the CSF travel rush was preliminarily carried out based on the Baidu migration data [33]. Through visualization and statistics of the travel flows between cities, Wang et al. (2014) [33] found the overall migration trend has a big fluctuation between one week before and one week after New Year's Day (which serves as the cut-off point). Besides, taking Guangdong Province, Beijing and Shanghai as instances, they found that the migration source and destination regions have characteristics of geographical proximity. Subsequently, based on location-based service data of the Baidu, Tencent and Qihoo platforms, Li et al. (2016) [31] applied the complex network and time-sequence analysis method to study the spatiotemporal characteristics of the travel peak during the CSF. They found that the CSF travel network at the provincial scale showed a multicenter and geographic clustering characteristic instead of the small-world and scale-free characteristics. Moreover, they noted the CSF travel network was more influenced by the socioeconomic factors rather than geographical location factors. Using complex network analysis and data mining techniques, Hu et al. (2017) [34] built an urban network of the population based on the Weibo social media data. They visualized the spatial and temporal network structure characteristics of human mobility from the perspective of society as a whole and explored the relationship between human mobility patterns and urban economic development. They found the CSF customs and traditions indeed have an influence on people's travel behavior and the key attraction to the floating population is from the eastern region of China, which showed that people tend to move from/to areas with a higher level of economic development. Similarly, Wei et al. (2018) [35] used the weighted network's rich club coefficient and normalized imbalance coefficient method to analyze the phenomenon and imbalance of the rich clubs in the population movement network during the Spring Festival of China in 2015.
Apparently, as a kind of specific example of holiday mobility study, the previous CSF mobility studies are mainly concentrated on a relatively large-scale study area (interprovincial or intercity) to reveal the phenomenon of regional economic unbalanced development during the rapid urbanization process of China, while few CSF studies have performed analyses at the inner-city level. Accordingly, policy implications derived from the large-scale area were usually limited at the large-scale area (national level) such as the household registration policy, industrial structure adjustment policy and so on. In turn, why the inner-city CSF study is important is that it may offer some insights to policymakers at the city level for inspecting the public services provided within the city.

Metro Travel Purpose Inference
The MSC dataset collected by the AFC system can be regarded as appropriate data with which to study the inner-city CSF mobility because the metro network is extensive and its demand is high. For example, in Shenzhen, the traffic volume of the metro system accounts for 14% of the total share. However, an intrinsic limitation of the MSC data is that it is hard to estimate metro passengers' final destinations, trip purpose and activity information [18], whereas they are important information to predict travel demand, to model travel behavior, to adjust transportation planning decision and so on. Recently, various new datasets were used to infer travelers' trip purposes, for example, the social media and online service data [36,37], mobile phone data [38][39][40], taxi trajectory data [41] and bike-sharing data [42,43], which have a relative more information (in space, time and flexibility) for inferring trip purpose compared to the MSC data. Meanwhile, due to the limited data information, many research methods applied to the above datasets are hard to be used for the MSC data.
Overall, inferring the trip purpose at the metro station level is a difficult task since the MSC data itself has limited information, so the MSC data is usually combined with personal travel survey (PTS) data to accomplish this task. In this case, combined with MSC, PTS and land-use datasets, Chakirov and Erath (2012) [19] applied the rule-based model and discrete choice model to detect activities of public transport passengers in the city-state Singapore. Taking the activity duration, activity start-time and land-use into account, they identified home and work activities and their locations. With the same data (MSC, TPS and land-use data), Alsger et al. (2018) [20] used the rule-based model to predict five trip purposes (work, education, shopping, home and recreational) in Brisbane, Queensland with an overall 78% accuracy and among them, the inference accuracy of work and home trips are up to 92% and 96%. Kusakabe and Asakura (2014) [21] did the task a little bit differently, they fused the PTS data with MSC data to estimate the trip purpose based on the naïve Bayes classifier and five major travel activities (go to work, go to school, leisure, business and returning home) were identified with a 76.8% accuracy.
Regarding the metro travel purpose inference, although a few attempts had been made previously, relevant work almost had to rely on the PTS and land-use data. But the collection process of the PTS data is time-and labor-consuming, while the land-use data has a relatively low spatial-resolution for trip purpose inference. On the other hand, previous works were mainly concentrated on inferring passengers' trip purposes on workdays, while holiday ones that might be the dead zone of urban public service were seldom investigated. Therefore, a method that relies on fewer datasets to infer travel purposes at the metro station level on holiday seasons is expected.

Shenzhen and Shenzhen Metro System
Our study area, Shenzhen, is a highly developed city in China, with a total area of 1997 square kilometers. Shenzhen is located in the southern part of Guangdong province ( Figure 1) and it is a link and a bridge connecting Hong Kong and the Chinese mainland. According to the Shenzhen Statistical Yearbook of 2017, there are approximately 12 million people living in this city. The first metro line in Shenzhen officially opened on 28 December, 2004. Presently, there are 8 metro lines in Shenzhen with 166 stations and a total length of 285 km ( Figure 1). According to the Shenzhen Transport Annual Report 2016, the annual metro passenger traffic volume is 1297.13 million person-times and the daily average transport volume is 3.55 million person-times, accounting for near 14% of the total travel volume in Shenzhen. The share of the metro ridership (accounts for the total travel volume) may help readers to understand the impact of relevant conclusions in the following parts. public transport passengers in the city-state Singapore. Taking the activity duration, activity starttime and land-use into account, they identified home and work activities and their locations. With the same data (MSC, TPS and land-use data), Alsger et al. (2018) [20] used the rule-based model to predict five trip purposes (work, education, shopping, home and recreational) in Brisbane, Queensland with an overall 78% accuracy and among them, the inference accuracy of work and home trips are up to 92% and 96%. Kusakabe and Asakura (2014) [21] did the task a little bit differently, they fused the PTS data with MSC data to estimate the trip purpose based on the naïve Bayes classifier and five major travel activities (go to work, go to school, leisure, business and returning home) were identified with a 76.8% accuracy.
Regarding the metro travel purpose inference, although a few attempts had been made previously, relevant work almost had to rely on the PTS and land-use data. But the collection process of the PTS data is time-and labor-consuming, while the land-use data has a relatively low spatialresolution for trip purpose inference. On the other hand, previous works were mainly concentrated on inferring passengers' trip purposes on workdays, while holiday ones that might be the dead zone of urban public service were seldom investigated. Therefore, a method that relies on fewer datasets to infer travel purposes at the metro station level on holiday seasons is expected.

Shenzhen and Shenzhen Metro System
Our study area, Shenzhen, is a highly developed city in China, with a total area of 1997 square kilometers. Shenzhen is located in the southern part of Guangdong province ( Figure 1) and it is a link and a bridge connecting Hong Kong and the Chinese mainland. According to the Shenzhen Statistical Yearbook of 2017, there are approximately 12 million people living in this city. The first metro line in Shenzhen officially opened on 28 December, 2004. Presently, there are 8 metro lines in Shenzhen with 166 stations and a total length of 285 km ( Figure 1). According to the Shenzhen Transport Annual Report 2016, the annual metro passenger traffic volume is 1297.13 million person-times and the daily average transport volume is 3.55 million person-times, accounting for near 14% of the total travel volume in Shenzhen. The share of the metro ridership (accounts for the total travel volume) may help readers to understand the impact of relevant conclusions in the following parts.

Metro Smart Card and Pois Datasets
The metro smart card (MSC) dataset used in this study is the 3-week metro transaction records of 4,901,073 cardholders in Shenzhen (the second week is the Spring Festival holiday week). There are more than 50 million transaction records, covering 21 consecutive weekdays from 20 January to 9 February 2017 (27 January to 2 February is the holiday-week). Every time a traveler passes through the metro gantry, a transaction record is automatically collected. The 6 attributes contained in the records are shown in Table 1. The points of interest (POIs) data of Shenzhen City used in our research were collected from the Amap (https://www.amap.com/) in March 2019, with a total number of more than 700,000 data points, which is similar to a relevant study with 611,122 records [44]. The collected POIs data are separated into 17 categories and 140 subcategories. Enter or exit station Line The corresponding line to the station Figure 2 shows the overview methodology framework of this research, which includes three parts (highlighted in blue, orange and green): data preprocessing; descriptive analysis of the holiday mobility characteristics; and holiday travel patterns and trip activities exploring.

Metro Smart Card and Pois Datasets
The metro smart card (MSC) dataset used in this study is the 3-week metro transaction records of 4,901,073 cardholders in Shenzhen (the second week is the Spring Festival holiday week). There are more than 50 million transaction records, covering 21 consecutive weekdays from 20 Jan to 9 Feb, 2017 (27 Jan to 2 Feb is the holiday-week). Every time a traveler passes through the metro gantry, a transaction record is automatically collected. The 6 attributes contained in the records are shown in Table 1. The points of interest (POIs) data of Shenzhen City used in our research were collected from the Amap (https://www.amap.com/) in March 2019, with a total number of more than 700,000 data points, which is similar to a relevant study with 611,122 records [44]. The collected POIs data are separated into 17 categories and 140 subcategories.

Field
Value Card_ID Identifier of a unique cardholder Trmnl_ID Represents a unique subway station Transaction_Time Transaction time Transaction_Type Enter or exit station Line The corresponding line to the station Figure 2 shows the overview methodology framework of this research, which includes three parts (highlighted in blue, orange and green): data preprocessing; descriptive analysis of the holiday mobility characteristics; and holiday travel patterns and trip activities exploring.

Data Preprocessing
The basic step to derive the mobility pattern from the smart card dataset is to reconstruct the original data format. As described in Section 3.2, the original smart card data are stored by day and each entry represents one card-swiping act. However, this kind of data storage format is not efficient and convenient for knowledge mining. For instance, it is hard to identify a passenger's complete transaction record of multiday, which is listed in the time sequence. The objective of data reconstruction is to combine the weekly card-usage information of passengers. Thus, datasets are separated into 3 weeks (Figure 3b) from 21 separate days (Figure 3a).

Data Preprocessing
The basic step to derive the mobility pattern from the smart card dataset is to reconstruct the original data format. As described in Section 3.2, the original smart card data are stored by day and each entry represents one card-swiping act. However, this kind of data storage format is not efficient and convenient for knowledge mining. For instance, it is hard to identify a passenger's complete transaction record of multiday, which is listed in the time sequence. The objective of data reconstruction is to combine the weekly card-usage information of passengers. Thus, datasets are separated into 3 weeks (Figure 3b Furthermore, considering the appropriate walking time (6-10 min) and the coverage of metro stations in Shenzhen (Figure 1), six types of POIs with a radius of 500 m centered on metro stations are counted. Thus, we can differentiate the characteristics of the 166 stations and inferring the travel purpose of passengers at the metro station level becomes possible, which makes the inferred travel purposes change from a single event (e.g., working) to multi-events with a probability distribution (e.g., working: 60%, shopping: 30% and business: 10%). Some key issues related to the processing of the POIs dataset are explained as follows:


Why POIs data instead of land use data?
The POIs categories are closely related to the current land-use types in China [45], although they are not exactly the same, the POIs can reflect the type of land use [46]. Meanwhile, compared to land use data, POIs data have several advantages: (1) the POIs data are much finer than land use data, which contain more useful information to further personal preference studies. (2) POIs data could represent the mixed-use situation instead of a single land-use type; therefore, potential activities for a certain land use are expanded and specified.
(3) For researchers, POIs data are easier to obtain to carry out academic research.
 What are the POIs-appended stations like? As shown in Figure 4a, every metro station is changed from the text to a POIs feature vector.

How many POIs categories are used in this study?
According to the Urban Land Classification and Construction Land Planning Standards (GB50137-2011), we re-categorize the 17 primary categories (140 subcategories) POIs into 6 types, namely, housing-related POIs (HR), work-related (WR), consumption-related (CR), recreation-related (RR), public-service related (PR) and traffic-related (TR). Furthermore, considering the appropriate walking time (6-10 min) and the coverage of metro stations in Shenzhen (Figure 1), six types of POIs with a radius of 500 m centered on metro stations are counted. Thus, we can differentiate the characteristics of the 166 stations and inferring the travel purpose of passengers at the metro station level becomes possible, which makes the inferred travel purposes change from a single event (e.g., working) to multi-events with a probability distribution (e.g., working: 60%, shopping: 30% and business: 10%). Some key issues related to the processing of the POIs dataset are explained as follows: • Why POIs data instead of land use data? The POIs categories are closely related to the current land-use types in China [45], although they are not exactly the same, the POIs can reflect the type of land use [46]. Meanwhile, compared to land use data, POIs data have several advantages: (1) the POIs data are much finer than land use data, which contain more useful information to further personal preference studies. (2) POIs data could represent the mixed-use situation instead of a single land-use type; therefore, potential activities for a certain land use are expanded and specified.
(3) For researchers, POIs data are easier to obtain to carry out academic research.

•
What are the POIs-appended stations like? As shown in Figure 4a, every metro station is changed from the text to a POIs feature vector. For example, S 1 = (CR 1 , HR 1 , PR 1 , RR 1 , TR 1 , WR 1 ), where CR 1 , HR 1 , PR 1 , RR 1 , TR 1 and WR 1 are the proportion of each POIs category around station S 1 with a radius of 500 m. Therefore, the travel between two stations forms a matrix with 36 variables, which correspond to 36 potential types of activities ( Figure 4b).

Definition of Frequent and Focused Travelers
Occasional and infrequent metro passengers with a relatively infrequent travel frequency and lower periodicity are not informative for exploring their trip patterns. There is not enough information to extract or reveal any meaningful spatiotemporal patterns from their rare metro trips. To address this issue, we filter passengers from the whole dataset according to their metro travel frequency (active travel days) during the 21-day research period. For example, if a passenger uses the metro every day during these 21 days, his/her number of active days will be 21. Figure 5a shows the number of distributions of active days for all cardholders who used the metro once during the data collection period, which indicates that nearly 59.2% of passengers travel by metro only 1 or 2 days over these 21 days.
However, due to the different dataset structure, study period and study area, the active day threshold values to filter frequent travelers from the whole dataset are varied in previous studies. For example, 6/21 (6 days out of 21 days), 7/30, 10/30, 2/30 and 10/28 are used in relevant studies [47][48][49][50][51]. By contrast, k-means clustering is a more objective method regarding its application in handling passengers with different travel frequencies [51,52]. Referring to the operation in previous studies, we apply the k-means algorithm to cluster passengers into two groups-one group is frequent travelers and the other is infrequent travelers. Consequently, frequent/infrequent travelers are divided by 6 (days) out of 21 days, indicating that a frequent traveler should use the metro at least 28.6% of the days during the 21-day research period. The red line in Figure 5a differentiates frequent/infrequent passengers; the figure shows that among the 4,901,073 passengers, 865,557 of them (17.7%) are frequent travelers and 4,035,516 of them (82.3%) are infrequent travelers.
According to whether people use the subway each week (which week do they take the subway), among the 865,557 frequent travelers, the metro-usage pattern can be divided into 7 types (Figure 5b). We further filter focused travelers who use the metro at least one day every week (Figures 5b, 6 and 7) from frequent travelers to further explore the variation of the mobility patterns of various weeks. In summary, from the whole database of 21 days (4,901,073 travelers), we define frequent travelers (865,557 travelers) and focused travelers (421,156 travelers) as follows:

Definition of Frequent and Focused Travelers
Occasional and infrequent metro passengers with a relatively infrequent travel frequency and lower periodicity are not informative for exploring their trip patterns. There is not enough information to extract or reveal any meaningful spatiotemporal patterns from their rare metro trips. To address this issue, we filter passengers from the whole dataset according to their metro travel frequency (active travel days) during the 21-day research period. For example, if a passenger uses the metro every day during these 21 days, his/her number of active days will be 21. Figure 5a shows the number of distributions of active days for all cardholders who used the metro once during the data collection period, which indicates that nearly 59.2% of passengers travel by metro only 1 or 2 days over these 21 days.

Evaluation of the Holiday Mobility Features
The holiday mobility characteristic is measured from three levels for three different groups of people: the holiday overall mobility trend of all travelers (4,901,073); the overall holiday effect on the mobility of frequent travelers (865,557); and the overall holiday mobility pattern of focused travelers (421,156).
Firstly, three indicators illustrate the overall mobility trend features of the holiday season from a macro perspective. The first indicator is the travel flow volume of each day, which is represented by the daily number of card-swiping records of all passengers. The second one is all passengers' daily average time cost on the metro for each trip. The last one is the influence scope of the holiday, which is reflected by the flow patterns consisting of the hourly passenger flow volume per day. Secondly, the travel frequency of frequent travelers is used to describe the overall holiday effect, which can help to understand how many frequent travelers are influenced by the holiday. Lastly, focused travelers' overall mobility patterns of three different weeks are compared. The overall mobility patterns are represented by the POIs-depicted travel purpose, which is estimated by the shift between two metro stations with the appended POIs attribute. However, due to the different dataset structure, study period and study area, the active day threshold values to filter frequent travelers from the whole dataset are varied in previous studies. For example, 6/21 (6 days out of 21 days), 7/30, 10/30, 2/30 and 10/28 are used in relevant studies [47][48][49][50][51]. By contrast, k-means clustering is a more objective method regarding its application in handling passengers with different travel frequencies [51,52]. Referring to the operation in previous studies, we apply the k-means algorithm to cluster passengers into two groups-one group is frequent travelers and the other is infrequent travelers. Consequently, frequent/infrequent travelers are divided by 6 (days) out of 21 days, indicating that a frequent traveler should use the metro at least 28.6% of the days during the 21-day research period. The red line in Figure 5a differentiates frequent/infrequent passengers; the figure shows that among the 4,901,073 passengers, 865,557 of them (17.7%) are frequent travelers and 4,035,516 of them (82.3%) are infrequent travelers.
According to whether people use the subway each week (which week do they take the subway), among the 865,557 frequent travelers, the metro-usage pattern can be divided into 7 types (Figure 5b). We further filter focused travelers who use the metro at least one day every week (Figures 5b, 6 and 7) from frequent travelers to further explore the variation of the mobility patterns of various weeks. In summary, from the whole database of 21 days (4,901,073 travelers), we define frequent travelers (865,557 travelers) and focused travelers (421,156 travelers) as follows: The theory behind the LDA technique is that each document is treated as a random mixture of words and the latent topic is extracted by the word distribution probability in each document [54]. In the LDA model, a document consists of several words and the words of all documents form the corpus. Each document of the corpus has multiple topics, while each word of a document supports certain topics [60]. In the LDA model, there are several parameters used in the generating process. T is the number of topics in each document. V is the total number of words in the corpus D. M is the number of all documents. θ is the T × M matrix, which is the topic proportions for the T topics in each document. ф is the V × T matrix, which is the distribution of V in each topic. Z is the topic assignment for each document and W is the observed words for each document. α and β are the prior-parameters for θ and ф, respectively, which both follow the Dirichlet distribution and α = 50/T and β = 0.1 are the most commonly used values in related studies (e.g., References [42,57,61]). According to the notations above, the generating mechanism of LDA can be explained as a graphical model representation ( Figure 6 Except for α and β, the topic number (T) also needs to be predetermined. According to previous studies, perplexity can be applied to infer a reasonable value of T and it is one of the most widely used evaluations of LDA [54]. In general, a better LDA clustering result matches a lower value of perplexity, indicating a better predictive performance and fewer uncertainties in the model. Perplexity can be calculated by the following formula: where M is the number of documents, Nm is the document length (document-specific) and p(wm) is the likelihood of a text document of the corpus.

Data Reformatting and Mobility Pattern Clustering
Most previous research merely applies LDA for one dataset to discover pattern (topics) characteristics. Few researchers have tried to explore variations for multiple datasets through LDA. Based on the POIs-appended metro stations and the reconstructed data over three weeks presented in Section 4.1, we apply LDA twice to detect the mobility patterns of three separate weeks (week 1-3) and to explore the travel purpose during the holiday-week (week 2).
Before applying LDA, the previously processed datasets have to be reformatted. Slightly different from previous studies, we apply the LDA twice (in total) to two sub-databases. The first application is to detect the holiday mobility pattern based on the comparison of week 1 & week 2 and the second application is based on the comparison of week 2 & week 3. Two sub-databases and how they are reformatted from the processed data are introduced. The preliminaries described below and Figure 7 aid in understanding this process. Based on the above definitions, in our research context, the variation texts derived from all passengers' mobility patterns make up 2 sub-databases for LDA processing: (1) U21: all focused passengers' pattern variation texts of week 2 compared with week 1; (2) U23: all focused passengers' pattern variation texts of week 2 compared with week 3. Every passenger (cardID) is a document, the variation trip text is a word of a document and all trip texts form the corpus. After clustering the mobility patterns through LDA, 2 groups of LDA clustering results are generated. The patterns of these two groups are mobility patterns and only appear during the holiday season. Through the interpretation of these mobility patterns, we can determine what travel purpose and activities passengers are most likely engaged in.

Lower Travel Frequency
The mobility volume of each day according to the card-swiping frequency is counted (Figure 8), which shows that the travel frequency is significantly reduced since the Minor Spring Festival (1/20, also called Xiao Nian), reaching the lowest level on Chinese New Year's Eve (1/27) and gradually recovering after the end of the holiday. In general, the travel frequency during this holiday is much lower than that of normal weeks, indicating a large influence of this holiday on travel decisions and the findings are consistent with previous studies [33].

Evaluation of the Holiday Mobility Features
The holiday mobility characteristic is measured from three levels for three different groups of people: the holiday overall mobility trend of all travelers (4,901,073); the overall holiday effect on the mobility of frequent travelers (865,557); and the overall holiday mobility pattern of focused travelers (421,156).
Firstly, three indicators illustrate the overall mobility trend features of the holiday season from a macro perspective. The first indicator is the travel flow volume of each day, which is represented by the daily number of card-swiping records of all passengers. The second one is all passengers' daily average time cost on the metro for each trip. The last one is the influence scope of the holiday, which is reflected by the flow patterns consisting of the hourly passenger flow volume per day. Secondly, the travel frequency of frequent travelers is used to describe the overall holiday effect, which can help to understand how many frequent travelers are influenced by the holiday. Lastly, focused travelers' overall mobility patterns of three different weeks are compared. The overall mobility patterns are represented by the POIs-depicted travel purpose, which is estimated by the shift between two metro stations with the appended POIs attribute.

Latent Dirichlet Allocation
The latent Dirichlet allocation (LDA) is a three-level hierarchical Bayesian model that refines probabilistic latent semantic analysis [53]. Unlike the decision tree, Bayes classifier and support vector machine (SVM), which have to predefine the classes, LDA generates possible patterns from the data itself and is a data-driven, unsupervised learning method. Furthermore, unlike other approaches, which are sensitive to outliers, LDA is insensitive to data noise and has efficient computing power for big data. Therefore, in our research context, LDA is regarded as an efficient and robust method for mining travel patterns and trip purposes. More information about the advantages of LDA against other unsupervised learning approaches is described by Bao et al. (2017) and Blei et al. (2003) [42,54]. LDA was initially used for mining text topics, while recently, researchers have applied it to transportation, mobility and urban studies.  [59] used LDA to analyze the tweet data to differentiate different activity types.
The theory behind the LDA technique is that each document is treated as a random mixture of words and the latent topic is extracted by the word distribution probability in each document [54]. In the LDA model, a document consists of several words and the words of all documents form the corpus. Each document of the corpus has multiple topics, while each word of a document supports certain topics [60]. In the LDA model, there are several parameters used in the generating process. T is the number of topics in each document. V is the total number of words in the corpus D. M is the number of all documents. θ is the T × M matrix, which is the topic proportions for the T topics in each document. ф is the V × T matrix, which is the distribution of V in each topic. Z is the topic assignment for each document and W is the observed words for each document. α and β are the prior-parameters for θ and ф, respectively, which both follow the Dirichlet distribution and α = 50/T and β = 0.1 are the most commonly used values in related studies (e.g., References [42,57,61]). According to the notations above, the generating mechanism of LDA can be explained as a graphical model representation ( Figure 6): Draw W n~M ultinomial ( f Z n ) Except for α and β, the topic number (T) also needs to be predetermined. According to previous studies, perplexity can be applied to infer a reasonable value of T and it is one of the most widely used evaluations of LDA [54]. In general, a better LDA clustering result matches a lower value of perplexity, indicating a better predictive performance and fewer uncertainties in the model. Perplexity can be calculated by the following formula: where M is the number of documents, N m is the document length (document-specific) and p(w m ) is the likelihood of a text document of the corpus.

Data Reformatting and Mobility Pattern Clustering
Most previous research merely applies LDA for one dataset to discover pattern (topics) characteristics. Few researchers have tried to explore variations for multiple datasets through LDA. Based on the POIs-appended metro stations and the reconstructed data over three weeks presented in Section 4.1, we apply LDA twice to detect the mobility patterns of three separate weeks (week 1-3) and to explore the travel purpose during the holiday-week (week 2).
Before applying LDA, the previously processed datasets have to be reformatted. Slightly different from previous studies, we apply the LDA twice (in total) to two sub-databases. The first application is to detect the holiday mobility pattern based on the comparison of week 1 & week 2 and the second application is based on the comparison of week 2 & week 3. Two sub-databases and how they are reformatted from the processed data are introduced. The preliminaries described below and Figure 7 aid in understanding this process.

Definition 1 (Trip).
A trip T contains four items: leaving station L s , arriving station A s, leaving time L t and arriving time A t . T1, T2 and T3 represent the trips that happened in the first, second and third weeks of the research period, respectively. For example, the second trip in the first week of passenger X is T1 x2 = (L1 s2 , A1 s2 , L1 t2 , A1 t2 ). where Textualize is the process of translating the numeric variables into text characters and time slot is the process of making the timestamp into temporal intervals (e.g., 08:30 change to 08:00-09:00 and the 08:00-09:00 is expressed by 08 for later processing).

Definition 4 (Travel pattern texts).
The travel pattern texts are the sum of the trip texts of a certain passenger in a week. The travel pattern texts describe the travel pattern by the texts that contain the POIs-appended shift between two stations and the corresponding arriving time. The pattern texts can help to explore the possibility of travelers' potential activities. Generally, the more patterns there are for a certain type of text, the more likely that kind of activity is engaged. For example, '17WH' means arriving at a housing-related station from a working-related station at 17:00-18:00.
Based on the above definitions, any specific trip can be represented by several texts and the number of texts represents the possibility of potential activity. The larger the number of a certain type of text, the higher possibility of a corresponding activity. An example reveals this process as follows and the value on the upper right of the text represents the occurrence number of the text and this number is a reflection of the probabilities. (14 : 23)

Definition 5 (Mobility pattern variation texts).
The mobility pattern variation texts are the difference of a certain traveler's mobility pattern texts between two weeks. These texts can describe the difference in travel behavior between two weeks. For example, for a passenger, if the mobility text '23CH' does not exist in week 1 but appears in week 2, then '23CH' is his/her mobility pattern variation text between week 1 and week 2, which can also be regarded as his/her mobility pattern in week 2. Following this definition, we generate 2 kinds of variation texts to reflect the mobility pattern of the holiday-week: (1) mobility pattern texts in week 2 based on the comparison of week 1 and week 2; (2) mobility pattern texts in week 2 based on the comparison of week 2 and week 3. Accordingly, we can perform a pairwise comparison (week 1 & week2 and week 2 & week 3) to obtain the holiday mobility patterns and then explore their corresponding travel purpose.
Based on the above definitions, in our research context, the variation texts derived from all passengers' mobility patterns make up 2 sub-databases for LDA processing: (1) U21: all focused passengers' pattern variation texts of week 2 compared with week 1; (2) U23: all focused passengers' pattern variation texts of week 2 compared with week 3. Every passenger (cardID) is a document, the variation trip text is a word of a document and all trip texts form the corpus. After clustering the mobility patterns through LDA, 2 groups of LDA clustering results are generated. The patterns of these two groups are mobility patterns and only appear during the holiday season. Through the interpretation of these mobility patterns, we can determine what travel purpose and activities passengers are most likely engaged in.

Lower Travel Frequency
The mobility volume of each day according to the card-swiping frequency is counted (Figure 8), which shows that the travel frequency is significantly reduced since the Minor Spring Festival (1/20, also called Xiao Nian), reaching the lowest level on Chinese New Year's Eve (1/27) and gradually recovering after the end of the holiday. In general, the travel frequency during this holiday is much lower than that of normal weeks, indicating a large influence of this holiday on travel decisions and the findings are consistent with previous studies [33]. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 13 of 22

Longer Time for Each Trip
Another feature of Spring Festival travel is the average time travelers spent on each trip. Figure  9 shows that the travel time reaches a peak during the Spring Festival holiday season, which may indicate passengers travel longer distances in the city. Passengers spend a significantly longer time per trip. Besides, small samples can examine our above findings. We randomly select 1000 focused travelers from the datasets and we find their average travel frequency and travel time in the normal week (week 1) is 6.5 times/week and 27.2 min/week, respectively. However, during the holiday the travel frequency and travel time is 4.0 times/week and 32.0 min/week. Although according to holiday regulations, employees cannot take leave on the weekend before and after the holiday, in Figure 9 there still a little bit fluctuation in the travel time. This may due to some private enterprises and selfemployed individuals who do not have to follow the deferred-holiday rules.

Longer Time for Each Trip
Another feature of Spring Festival travel is the average time travelers spent on each trip. Figure 9 shows that the travel time reaches a peak during the Spring Festival holiday season, which may indicate passengers travel longer distances in the city. Passengers spend a significantly longer time per trip. Besides, small samples can examine our above findings. We randomly select 1000 focused travelers from the datasets and we find their average travel frequency and travel time in the normal week (week 1) is 6.5 times/week and 27.2 min/week, respectively. However, during the holiday the travel frequency and travel time is 4.0 times/week and 32.0 min/week. Although according to holiday regulations, employees cannot take leave on the weekend before and after the holiday, in Figure 9 there still a little bit fluctuation in the travel time. This may due to some private enterprises and self-employed individuals who do not have to follow the deferred-holiday rules.

Longer Time for Each Trip
Another feature of Spring Festival travel is the average time travelers spent on each trip. Figure  9 shows that the travel time reaches a peak during the Spring Festival holiday season, which may indicate passengers travel longer distances in the city. Passengers spend a significantly longer time per trip. Besides, small samples can examine our above findings. We randomly select 1000 focused travelers from the datasets and we find their average travel frequency and travel time in the normal week (week 1) is 6.5 times/week and 27.2 min/week, respectively. However, during the holiday the travel frequency and travel time is 4.0 times/week and 32.0 min/week. Although according to holiday regulations, employees cannot take leave on the weekend before and after the holiday, in Figure 9 there still a little bit fluctuation in the travel time. This may due to some private enterprises and selfemployed individuals who do not have to follow the deferred-holiday rules.

Influence Beyond the Holiday Season
In terms of the hourly number of passengers each day, according to prior studies [62][63][64], the travel volume patterns from Monday to Friday should be roughly the same, indicating a normal workday. However, in the 3-week study period, only 1/20 and 2/6 to 2/9 maintain a relatively regular state, the volume patterns of other dates are obviously interfered with by the Spring Festival holiday (Figure 10). It shows that before the official start or end of the holiday, even if it is still a working day,

Influence Beyond the Holiday Season
In terms of the hourly number of passengers each day, according to prior studies [62][63][64], the travel volume patterns from Monday to Friday should be roughly the same, indicating a normal workday. However, in the 3-week study period, only 1/20 and 2/6 to 2/9 maintain a relatively regular state, the volume patterns of other dates are obviously interfered with by the Spring Festival holiday ( Figure 10). It shows that before the official start or end of the holiday, even if it is still a working day, the volume pattern has been affected, such as 1/25 and 1/26 in week 2 and 2/3 to 2/5 in week 3. This indicates that the impact of the Spring Festival is continuous and that the scope of the impact is not limited to the holiday period. One possible reason for this finding is that passengers stop working to prepare for the festival before its official start and gradually return to work after its official end. This phenomenon, in which the travel volume pattern is disturbed for nearly half a month, has not been studied in related research.

Holiday Mobility Patterns Compared to Week 1
In the mobility pattern comparison between week 1 and week 2 (U21), among the 421,156 focused travelers, 409,410 (97.2%) have mobility patterns during the holiday-week (week 2). The perplexity reaches the minimum when T is set to T = 22 for the U21 (Figure 11a), which means 22 mobility patterns in the holiday-week do not exist in the week before the holiday (week 1). Although the LDA generated 22 mobility patterns, we only interpret the details of the most important 4 mobility patterns in Figure 12. The 18 remaining patterns have a similar structure with the four above and their interpretation can be easily understood from the figures.

Holiday Mobility Patterns Compared to Week 1
In the mobility pattern comparison between week 1 and week 2 (U21), among the 421,156 focused travelers, 409,410 (97.2%) have mobility patterns during the holiday-week (week 2). The perplexity reaches the minimum when T is set to T = 22 for the U21 (Figure 11a), which means 22 mobility patterns in the holiday-week do not exist in the week before the holiday (week 1). Although the LDA generated 22 mobility patterns, we only interpret the details of the most important 4 mobility patterns in Figure 12. The 18 remaining patterns have a similar structure with the four above and their interpretation can be easily understood from the figures. focused travelers, 409,410 (97.2%) have mobility patterns during the holiday-week (week 2). The perplexity reaches the minimum when T is set to T = 22 for the U21 (Figure 11a), which means 22 mobility patterns in the holiday-week do not exist in the week before the holiday (week 1). Although the LDA generated 22 mobility patterns, we only interpret the details of the most important 4 mobility patterns in Figure 12. The 18 remaining patterns have a similar structure with the four above and their interpretation can be easily understood from the figures. As shown in Figure 12a, mobility pattern #5 reveals that on the holiday, passengers have the greatest possibility of arriving at consumption-related metro stations from housing-related stations (H-C) or other consumption-related stations (C-C) at 16:00-17:00, which implies that the travelers' travel purpose at this time is mainly consumption-oriented events. Moreover, other high possibility shifts are also included. (1) Arriving at housing-related stations from consumption-related stations (C-H) at 16:00-17:00. (2) Arriving at housing-related stations from other housing-related stations (H-H) at 16:00-17:00. (3) Arriving at consumption-related stations from traffic-related stations (C-T) at 16:00-17:00. Mobility pattern #5 indicates that the activities in which people commonly participate in at 16:00-17:00 during the holiday-week are consumption-related events (shopping, eating and so on) or visiting friends and/or family (H-H). On the other hand, the time from 16:00-17:00 is when travelers are often working in a normal week, which leads to a particular mobility pattern during the holiday-week.
Compared with the mobility pattern #5, patterns #6, #4 and #19 (Figure 12b-d) show similar travel purposes, namely, consumption-related activities (reflected by C-C, H-C, etc.), going home after consuming (C-H) and visiting friends (H-H). The difference among them is the choice of arriving time for these activities. For example, pattern #6 makes it clear that passengers arriving at one place at 9:00-10:00 and then return home at 23:00-24:00. Cluster #4 implies people would go out slightly late (11:00-12:00) on the holiday and Cluster #19 indicates people would come home late at night (21:00-22:00). These patterns are not only mobility patterns that the week before the holiday does not have but also reveal the reasons why travelers might travel on a holiday. Furthermore, the remaining 18 holiday mobility patterns are listed in Figure 13. Compared to week 1, the time passengers choose to travel on holiday is relatively balanced and few travelers on the holiday would take the metro at 07:00-09:00. Instead, 9:00-10:00, 16:00-17:00 and 23:00-24:00 are the three peak periods when most of the travelers choose to travel, revealing that few people took the metro during these periods during week 1. Overall, these 22 mobility pattern clusters can be divided into 3 types. These 3 types can also explain why holiday-travel is distinct as well as the trip purposes of travelers.
In the first and second types, travelers choose other time to travel on holidays and their trip purposes are mainly consumption-oriented events (C-C, H-C, T-C, etc.) or visiting friends (H-H). These two types are represented by patterns #1, #2, #3, #4, #5, #6, #7, #9, #10, #12, #14, #15, #16, #19, #20 and #22 (Figure 12 and Figure 13a) and passengers associated with these patterns account for 94.5% (386,748/409,410). Two factors cause the uniqueness of these mobility patterns. One is that the Compared with the mobility pattern #5, patterns #6, #4 and #19 (Figure 12b-d) show similar travel purposes, namely, consumption-related activities (reflected by C-C, H-C, etc.), going home after consuming (C-H) and visiting friends (H-H). The difference among them is the choice of arriving time for these activities. For example, pattern #6 makes it clear that passengers arriving at one place at 9:00-10:00 and then return home at 23:00-24:00. Cluster #4 implies people would go out slightly late (11:00-12:00) on the holiday and Cluster #19 indicates people would come home late at night (21:00-22:00). These patterns are not only mobility patterns that the week before the holiday does not have but also reveal the reasons why travelers might travel on a holiday. Furthermore, the remaining 18 holiday mobility patterns are listed in Figure 13. Compared to week 1, the time passengers choose to travel on holiday is relatively balanced and few travelers on the holiday would take the metro at 07:00-09:00. Instead, 9:00-10:00, 16:00-17:00 and 23:00-24:00 are the three peak periods when most of the travelers choose to travel, revealing that few people took the metro during these periods during week 1. Overall, these 22 mobility pattern clusters can be divided into 3 types. These 3 types can also explain why holiday-travel is distinct as well as the trip purposes of travelers.
ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 16 of 21 time corresponding to travelers taking the metro on the holiday is different from one week before the holiday and the other is the different travel purposes. For example, people in pattern #5 did not travel at 16:00-17:00 during week 1 and most of their trip purposes were not consumption-oriented events or visiting friends. They may travel at the same time (16:00-17:00) for other purposes (e.g., working: H-W) or travel at another time for the same purposes.
In the third type, travelers change their trip purposes on the holiday and most of the trip purposes on the holiday are traffic-oriented events (T-C, T-H, C-T and H-T). This type is represented by patterns #8, #11, #13, #17, #18 and #21 (Figure 13b) and travelers involved in these patterns account for 5.5% (22,662/409,410). The reason why these patterns are distinct on the holiday is mainly due to travelers' trip purposes being different from week 1. These travelers' trip purposes are mainly trafficoriented events, indicating leaving or arriving in Shenzhen during the CSF holiday week.

Holiday Mobility Patterns Compared to Week 3
By comparing the mobility patterns between week 1 and week 2, we obtained some of the primary holiday travel purposes from 22 holiday mobility patterns (U21). However, a concern is raised here: do these mobility patterns and travel purposes remain the same between week 2 and week 3 (U23)? To address this concern, we also compare the mobility patterns between week 2 and week 3 to check the holiday mobility patterns and travel purposes. In the LDA clustering processing for U23, the perplexity reaches the minimum when T is set as T = 20. Thus, 20 distinct holiday mobility patterns are generated between week 2 and week 3 and most are consistent with the previous 22 holiday mobility patterns (U21). This finding means that, similarly, the 20 mobility patterns can also be divided into 3 types and 96.9% (395,618/408,258) are the consumption-oriented and visiting-related types and 3.1% (12,667/408,258) are the traffic-oriented type. These three approximate mobility types of U21 and U23 indicate that the trip purposes during the CSF in Shenzhen between week 1 and week 2, week 2 and week 3 are almost the same.
Therefore, we can state that the holiday travel purposes of focused travelers in order of possibility mainly include the following: (1) C-C: consuming or relaxing at various commercially oriented places, such as different shopping malls. (2) H-C: going to consumption-oriented places In the first and second types, travelers choose other time to travel on holidays and their trip purposes are mainly consumption-oriented events (C-C, H-C, T-C, etc.) or visiting friends (H-H). These two types are represented by patterns #1, #2, #3, #4, #5, #6, #7, #9, #10, #12, #14, #15, #16, #19, #20 and #22 (Figures 12 and 13a) and passengers associated with these patterns account for 94.5% (386,748/409,410). Two factors cause the uniqueness of these mobility patterns. One is that the time corresponding to travelers taking the metro on the holiday is different from one week before the holiday and the other is the different travel purposes. For example, people in pattern #5 did not travel at 16:00-17:00 during week 1 and most of their trip purposes were not consumption-oriented events or visiting friends. They may travel at the same time (16:00-17:00) for other purposes (e.g., working: H-W) or travel at another time for the same purposes.
In the third type, travelers change their trip purposes on the holiday and most of the trip purposes on the holiday are traffic-oriented events (T-C, T-H, C-T and H-T). This type is represented by patterns #8, #11, #13, #17, #18 and #21 (Figure 13b) and travelers involved in these patterns account for 5.5% (22,662/409,410). The reason why these patterns are distinct on the holiday is mainly due to travelers' trip purposes being different from week 1. These travelers' trip purposes are mainly traffic-oriented events, indicating leaving or arriving in Shenzhen during the CSF holiday week.

Holiday Mobility Patterns Compared to Week 3
By comparing the mobility patterns between week 1 and week 2, we obtained some of the primary holiday travel purposes from 22 holiday mobility patterns (U21). However, a concern is raised here: do these mobility patterns and travel purposes remain the same between week 2 and week 3 (U23)? To address this concern, we also compare the mobility patterns between week 2 and week 3 to check the holiday mobility patterns and travel purposes. In the LDA clustering processing for U23, the perplexity reaches the minimum when T is set as T = 20. Thus, 20 distinct holiday mobility patterns are generated between week 2 and week 3 and most are consistent with the previous 22 holiday mobility patterns (U21). This finding means that, similarly, the 20 mobility patterns can also be divided into 3 types and 96.9% (395,618/408,258) are the consumption-oriented and visiting-related types and 3.1% (12,667/408,258) are the traffic-oriented type. These three approximate mobility types of U21 and U23 indicate that the trip purposes during the CSF in Shenzhen between week 1 and week 2, week 2 and week 3 are almost the same.
Therefore, we can state that the holiday travel purposes of focused travelers in order of possibility mainly include the following: (1) C-C: consuming or relaxing at various commercially oriented places, such as different shopping malls.  6) is the visiting-related type and the (7)-(9) are the traffic-related type.

Mobility Patterns of the Normal-Day (Week 1)
To illustrate the different patterns of the Holiday and Normal-day, we extract the mobility pattern of weekday (week 1) with the same clustering method as well. Except for some similar patterns of consumption events, Figure 14 shows some weekday patterns have obvious working-and housing-related attributes, which are not prominent in holiday patterns. These weekday patterns could be roughly divided into three groups. The first group is the patterns for passengers starting or ending their lunch break (C-W, H-W, W-C, W-H) and the time is usually distributed between 12:00 to 15:00 (Figure 14a). The second group is the patterns for passengers getting to work (H-W) or going out to handle their personal affairs (H-C, H-P, H-T) and these activities usually happen from 7:00 to 10:00 (Figure 14b). The last group is the patterns for passengers getting off work (W-C, W-H) or involving in recreation at night (H-C, C-C) and they usually happen from 16:00 to 19:00 ( Figure 14c). This comparison shows that combing the multiply result of the POIs ratio of OD station pairs, the arriving time of the station pairs and the day type (holiday or weekday), our proposed method could help to estimate the different travel patterns and activities at the metro station level. out to handle their personal affairs (H-C, H-P, H-T) and these activities usually happen from 7:00 to 10:00 (Figure 14b). The last group is the patterns for passengers getting off work (W-C, W-H) or involving in recreation at night (H-C, C-C) and they usually happen from 16:00 to 19:00 (Figure 14c). This comparison shows that combing the multiply result of the POIs ratio of OD station pairs, the arriving time of the station pairs and the day type (holiday or weekday), our proposed method could help to estimate the different travel patterns and activities at the metro station level. Figure 14. Working-and housing-related mobility patterns of the weekday (week 1).

Summary of Results and Policy Implications
To date, public transport during vacations is poorly studied. Accordingly, this study performs experiments on two datasets, namely, the metro smart card (MSC) dataset and the POIs dataset, to

Summary of Results and Policy Implications
To date, public transport during vacations is poorly studied. Accordingly, this study performs experiments on two datasets, namely, the metro smart card (MSC) dataset and the POIs dataset, to explore travel patterns and travel activities during the CSF holiday season in Shenzhen. The main findings of this study in terms of our research question can be summarized as follows: Firstly, with MSC and POIs datasets only, the proposed strategy appending the POIs attribute to metro stations with a radius of 500 m can make the trip purpose inference change from a single event to multi-events with a possibility distribution at the station level. The holiday travel purposes of focused travelers are inferred through clustering analysis and the results highlight the uniqueness of the Spring Festival travel compared to one week before and one week after the holiday. Three general types of travel patterns are revealed: consumption-oriented or visiting-friends events and traffic-oriented events. Among the three types of patterns, nine primary travel purposes are discovered. Secondly, we define frequent travelers and focused travelers to separate them from all passengers according to their travel active days during the Spring Festival. Then, six characteristics of holiday mobility are measured in these three groups of passengers: (1) Extremely lower travel frequency; (2) Longer trip time. (3) Relatively long-lasting influence beyond the holiday; (4) About 50% of frequent travelers stop their metro travel on holidays; (5) The holiday mobility pattern of focused travelers is distinct from those of the other two normal-weeks in three aspects, namely, travel purposes, travel stability and travel peak time; (6) The time of going out and coming back home are both late on holidays.
Our analysis is beneficial to metro corporations (timetable management), business owners (promotion strategy), researchers (travelers' social attribute inference) and decision-makers (to examine public services). Firstly, the analysis could provide information to adjust the MTR holiday services. Although traffic-oriented activities have a great occurrence during the CSF is nearly a common-sense phenomenon, while beyond that the MTR corporations could identify where and when these activities are generating within the city from the passenger clusters of the analysis, then making corresponding countermeasures. Secondly, from an urban planning perspective, this study could offer help to optimize the pattern of urban retail business centers. Because for most urban planners (at least in China), holiday demand is what barely considered factors during the planning process. So if planners consider this factor, they may find some potential gaps lies in the current planning and then to better optimize the planning. After all, except for CSF, there are many other public holidays in China. Thirdly, the study helps researchers and policymakers to better detect different social groups and then deliver targeted social welfare, because travelers' trip purposes during a special period (holiday season) can reflect their socioeconomic classes in some ways. For instance, since the CSF is very important to all the Chinese and it is a rare opportunity for people to relax and reunion, so unusual travel behavior during this period would capture people's attention, which may also imply some essential underlying mechanism. Therefore, the analysis could be considered in the process of allocating social welfare such as the application of affordable housing and concessionary metro fare.

Limitations and Further Steps
Nevertheless, our study is an initial step of exploring different travel patterns between weekdays and a long holiday season. Based on our work presented herein, several improvements could be made. First, since the traffic volume of the metro system in Shenzhen only accounts for 14% of the total share, our results and conclusion may be confined to the subway system and have the limitation of generalization. Second, the usage of the POIs in the study has its innate limitations. Although the POIs might be more appropriate data than land-use data to infer travel activities, while POIs are applied only by the number is not accurate enough. Because information such as the scale and the size of POIs are not included. For example, a train station may be represented by only a few POIs but they contain more meaning than lots of restaurants' POIs. Third, due to the special development background of Shenzhen City (a large number of migrants), the CSF mobility pattern of other Chinese cities is worth exploring because they might have some common features or differences that need to be further discussed. Lastly, since the proposed approach to infer travel activities Future studies can proceed from the following aspects: (1) Adding some other mobility data sources such as the taxi trajectory, bus smart card and bike-sharing or combine some personal travel survey data to better understand the overall CSF inner-city mobility; and (2) Extending the holiday travel studies into a large-scale area such as the national level and also includes other public holiday seasons.