Forecasting Daily Room Rates on the Basis of an LSTM Model in Di ﬃ cult Times of Hong Kong: Evidence from Online Distribution Channels on the Hotel Industry

: Given the inﬂuence of the ﬁnancial-economic crisis, hotel room demand in Hong Kong has experienced a signiﬁcant drop since June 2019. Given that studies on the room rate aspect remains limited, this study considers the demand for hotel rooms from di ﬀ erent categories and districts. This study makes forecast attempts for room rates from mid-October of 2019 to mid-June of 2020, which was a di ﬃ cult period for Hong Kong owing to the onset of the social unrest and novel coronavirus outbreak. This study develops an approach to the short-term forecasting of hotel daily room rates on the basis of the Long Short-Term Memory (LSTM) model by leveraging the key properties of day-of-week to improve accuracy. This study collects a data set containing 235 hotels of the period from various online distribution channels and generates di ﬀ erent time series data with the same day-of-week. This study veriﬁes the proposed model through three baseline models, namely, autoregressive integrated moving average (ARIMA), support vector regression (SVR), and Naïve models. Findings shed light on how to lessen the impact of violent ﬂuctuations by combining a rolling procedure with separate day-of-week time series for the hospitality industry. Hence, theoretical and managerial areas for hotel room demand forecasting are enriched on the basis of adjusting room pricing strategies for hoteliers in improving revenue management and making appropriate deals for customers in booking hotel rooms. Draft Preparation, Literature Search, Methodology, Writing—Original Draft Preparation, S.L.; Data Curation, Formal Analysis: Z.C.; Data Acquisition, Data Preparation: Y.Q.; Initiation, Idea Generation, Writing—Review, Revision, and Editing, R.L. All authors have read and agreed the


Introduction
In many regions worldwide, the tourism industry has become the key or even the only decisive factor that drives local income growth, as well as economic and social changes [1]. As an international tourism destination, Hong Kong is particularly sensitive to global economic conditions [2]. From June 2019 to the present, tourist arrivals to Hong Kong continue to decline due to the social unrest since June 2019 and the novel coronavirus outbreak since early 2020. The economic crisis kept visitors away from visiting, thus affecting the sustainability of the tourism industry. As part of the tourism ecosystem, the hotel industry is very important to maintain the local sustainable development [3]. A recent review of hotel supply reports in Hong Kong across different classes of hotels from January 2019 to June 2019 (data source: https://securepartnernet.hktb.com/china/sc/research_statistics/research_ publications/index.html?id=3634) shows that the average room and occupancy rates were 1329 and

Literature Review
The hotel industry is one of the most crucial components of the tourism industry and essential to the region's economic growth. Due to the continuous change of market demand and the perishability of guest rooms, hoteliers must react promptly to the dynamic of customer source market, and take proper actions by keep monitoring of the current changes [14]. Accurate hotel room demand forecasting provides valuable information for hoteliers so that they can make timely decisions on the source market and provide useful guidance for consumers concerned about hotel room rates. Considerable advances have transpired in tourism and hotel demand modelling and forecasting in recent decades, with numerous related papers published in tourism journals.

Forecasting Objects
Forecasting of hotel room demand by researchers has been one of the essential components of hotel industry revenue management [15]. Guest arrivals, the number of nights stayed, and occupancy rates significantly affect the hotel's accommodation demand forecast, and all three variables can be influenced by hotel room rates [3]. However, research relative to hotel room rate forecasting is scarce. Therefore, in the following paragraphs, we review the publications related to guest arrivals, the number of nights stayed, and occupancy rates.
In the studies of guest arrivals and the number of nights stayed, comparatively, few published references exist. Weatherford and Kimes compared seven different forecasting methods using the data of guest arrivals from Choice Hotels and Marriott Hotels and determined the most accurate one [16]. Haensel and Koole used the actual hotel reservation data from three regions with different characteristics to test the forecast updating methods in predicting the accumulated booking curve and the number of reservations expected for each day in the booking horizon [17]. Some scholars used daily booking data to undertake empirical research on hotels in the United States and Hong Kong [18,19]. Kurt Brännäs et al. conducted an empirical study on Norwegian guests in Swedish hotels and found that the average check-in and check-out probabilities are seasonal [20]. Lim et al. [17] used a variety of time-series forecasting models to forecast the number of nights stayed of hotel and motel guests in New Zealand.
In terms of occupancy rate, using the historical data of hotel occupancy rate to forecast the occupancy rate of different time scales and to test the forecasting models' performance are two research directions that scholars are interested in. Daily [7,21], weekly [8,22,23], and monthly occupancy rates [24] were investigated in previous studies. Other attempts at evaluating the possibility of forthcoming occupancy peaks and troughs were made by using the historical hotel occupancy rate to establish a hotel demand estimation mechanism for different hotel categories [25]. Meanwhile, some researchers paid close attention to the impact of crisis on hotel room demand. For example, Wu et al. [7] used the independent component analysis (ICA) method to analyze the monthly occupancy rate of hotels in different regions of Hong Kong and found that the outbreaks of infectious diseases, economic performance, and service price were the main factors affecting the occupancy rate in Hong Kong [10]. Song et al. [26] predicted the quarterly room demand of four types of hotels in Hong Kong from nine major source markets to assess the impact of the financial/economic crisis on hotel room demand.

Forecasting Methods
Given the increasing requirements for the accuracy of demand forecasting, many forecasting models have been applied by researchers to tourism and hotel room demands. The forecasting models, such as pick-up [27], autoregressive moving average cause effect (ARMAX) [22,23], EEMD-ARIMA [7], and the modified EEMD-ARIMA models [8], were utilized and proved to be superior in performance under different circumstances. The forecasting models can be divided into three categories: time-series, econometric, and artificial intelligence models [9].
Time-series and econometric models are considered the two most widely used models in demand forecasting in the past three decades [8]. Time-series models focus on the historical trends and evolution patterns of variables. Given that the univariate time series model has only one variable, and the cost of data collection is relatively low, it appears to be prevalent among [28], especially the ARIMA model [9]. However, the application of these models must meet the stability of the economic environment and the time series [29]. Under the influence of external variables (such as war, epidemic, economic crisis), such linear models cannot exert their performance advantages.
In recent years, artificial intelligence models have gradually emerged in the fields of tourism and hospitality demand forecasting. The Artificial Neural Network (ANN) [30,31], rough set [32], and fuzzy time series [33] were frequently used in previous studies. Support vector regression (SVR), as a method for solving nonlinear regression problems, was also introduced in demand forecasting due to its good generalization ability. However, the SVR process requires researchers to define various parameters themselves. If the parameters are improperly defined, it will lead to over-fitting or under-fitting failure [34]. Therefore, how to accurately select various parameters has become the key issue to using SVR.
As a branch of artificial intelligence, deep learning models (DLM) have surpassed traditional artificial intelligence methods in demand forecasting in recent years and have been used in time series forecasting [35]. As a new nobility of deep learning, Long Short-Term Memory (LSTM) was specially designed to solve the long-term dependence problem of time series, which demonstrates a powerful advantage in prediction with sequence input [29].

Methodology
This study employed DLMs to forecast the daily demand for different categories of hotel rooms in Hong Kong. Rather than suggesting complex models, this study supported a simple approach that is similar to human intuition.

Data Set Collection
At the outset, the online travel agency (OTA) TripAdvisor.com was chosen as the main distribution channel from which to collect the data, as this is probably the largest and most popular hotel room booking website, and thousands of tourists book hotel rooms and share experiences with it [36][37][38]. To achieve this task, a bot with web-crawling technique in Python was used. First, the data from the hotel list were obtained from the official Hong Kong Tourism Board (HKTB) website in September 2019. Second, the URL of each hotel in the list was identified manually from TripAdvisor.com. According to TripAdvisor.com, 235 hotels in Hong Kong were available for online reservations. Third, for each day, the bot input all URLs and collected prices offered for a single overnight stay available through the OTA at the time of the request. Given that the room rate offered on the website varied considerably from prior days, the check-in date was set to seven days in advance, which is in line with previous studies [12,17,21,39]. The variables collected were the hotel's name, district, star rating, the collection date (i.e., the date when the price is collected from the website), the target date (i.e., the date of the overnight stay, seven days after the collection date here), names of sub-channels (i.e., Expedia.com, Booking.com, Agoda.com, and so on) and the corresponding room rates. Several issues related to raw data must be addressed further.

1.
The classification of Hong Kong hotels by district followed the official system used by the HKTB, namely, Hong Kong Island, Kowloon, New Territories, and Islands.

2.
Given that Hong Kong does not have a formal hotel star-rating system [40][41][42], the rating system on Hotels.com for hotel categories was used in this study. If the star rating of a selected hotel was not available on Hotels.com, it was verified by cross-checking with Bookings.com and TripAdvisor. In such cases, a half-star strategy was adopted. That is, an average of 0.5 was maintained, and there existed 3.5 and 4.5 stars in our category list. This would lead to accurate results for different categories. Given that Hotels.com uses the half-star strategy, the star-rating difference of hotels available on Hotels.com and those available on Bookings.com and TripAdvisor, and the star-rating difference between the Bookings.com and TripAdvisor on the same hotel could be leveraged.

3.
The data collection took place in a 9-month period from 2019 to 2020. The data range was from mid-October of 2019 to mid-June of 2020. There were 238 days in total.

4.
Prices were the average nightly price provided by the sub-channels. Consequently, the room rate is a real number (in HK dollars) if the scraping bot found the hotel on the collection date, for the given target date, and each sub-channel. For a particular date, the room rate could be missing if the bot could not find any price for the type of room returned by the sub-channels. Eventually, the raw data set contained about 455,457 data (room prices).

Model Development
Forecasting for a single hotel on a daily basis was especially difficult due to complex and erratic fluctuations [8,43]. Another difficulty stemmed from the influence of crises that were not consistent from month to month and may have varied or even overlapped between different days. Thus, predicting solely with historical data, especially with the small size of data, was challenging (34 total observations for each sample, see below for details). In this paper, a particular forecasting procedure that can be applied to the hotel room rate problem was developed.
First, the room rate offered on the website varied not only from prior days but also from the arrival day of the week, given that daily revenue and customer demand had similarities if they refer to the same calendar period and the same day-of-week(including weekdays and weekends). Thus, it made sense to introduce the idea of considering each day differently to the room rate forecasting, which was adopted by previous studies [3,17,18,44]. Hence, the data of 238 observations were divided into 34 observations each for seven days of the week (i.e., 34 Sundays as the first day-of-week sample, 34 Mondays as the second, and so on). That is, the room rate of past Mondays was used to forecast room rates on future Mondays. This meant that the forecasting horizon was set to 7, which was consistent with Tse's study [19]. This strategy could leverage the key properties of week days to improve the short-term forecasting accuracy of hotel room demand.
Second, to mimic the real-world scenario where new observations become available each day-of-week and can be used as inputs in forecasting of the next day-of-week, a rolling model was introduced. It predicted the value of the next day by adding the latest forecast result, so the predicted value of the today was regarded as known data to continue to forecast the next day. The rolling period is set to 1. In this case, the day-of-week sample could safely be assumed to be continuous as well as a periodical with a day-of-week period. Thus a common-sense strategy was always to predict that the room rate tomorrow would be equal to the room rate today by taking the observation from the last day-of-week as the input and the observation at the current day-of-week as the output. Given that LSTM includes the input gate, forgotten gate, and memory gate that allow the network to learn the previously forgotten hidden unit and update the hidden unit on the basis of the new information, it provided a sound solution for the rolling forecasting. Specifically, the LSTM-based rolling model was first trained on the training data set, and the entire training data set was predicted to build up statistics for the forecasting of the test data set. For each observation in the test data set, the rolling model was then used to predict the next day-of-week.

Data Set Preprocessing
The use of a single data point in a forecast may be subject to considerable noise [3]. In this research, a simple average of prices collected on the sub-channels was replaced to reduce erraticism of daily fluctuations. Another noteworthy issue was that the number of collected prices (more precisely, average prices here) of each hotel for each day-of-week did not often match the expected number of 34, as hotels sometimes decided not to offer their rooms through the OTA, for example, when they had no vacancies, were close to full capacity or were closed because of the sudden crises [12]. Therefore, a day-of-week sample that had 34 consecutive observations was left in the subsequent analysis. Otherwise, the sample was excluded from our study. Thus, the final data set was restricted to 1089 samples (instead of 235 × 7 = 1645), each of which had 34 non-missing observations. Eventually, the final data set was narrowed down to 37,026 prices (with regards to average prices, the same hereinafter) only, and each sample was a time-series with 34 observations of a specific day-of-week for a specific hotel.
To see how well our model was on "unseen" data, the final data set was divided into two groups. The first 80% of the data set was held back to train the initial model, while the remaining 20% of the data set would be iterated and test the model. That is, the majority of the data set were used to fit the model, and the last seven observations were held back as the test dataset to evaluate the model. Out of 34 observations for each day-of-week time-series, the last seven weeks' observations were reserved for hold-out validation to compare the accuracy of different forecasting methods.

Baseline Models
To investigate the performance of the proposed DLM in tourism demand forecasting, Naïve, SVR [45], and ARIMA models [46] were used as baseline models.
A previous study showed that the SVR method had certain nonlinear forecasting abilities when dealing with small sample data sets [47]. To avoid over-fitting or under-fitting [34,45], an automation of grid search was used instead of a manual setting to find the optimal parameters. As for the ARIMA model, the stationarity of historical data was evaluated by unit root test. If the data were non-stationary, carrying out differential processing on the obtained data was necessary. Subsequently, the smallest Akaike information criterion (AIC) and Bayesian information criterion (BIC) values were selected as the criteria for model parameters selection, so that the resulting model was closely related to the ARIMA (0, 1, 1). Not only was the Naïve model the most easily understood model but also the lowest computational complexity [31]. In addition, the Naïve method is the starting point of other complex time series analysis methods.
To validate the forecasting performance of different models, the data sample was split into two sub-samples: training and test data, as mentioned above. However, the way in which the data sample is divided or how the data were trained can influence the model's performance. Therefore, the evaluation of each model was executed through a 10-repeated-validation technique where the test data set was conducted 10 times to ensure the robustness of the models. That is, the final test result of our experiments was the average performance of the 10 validations. In this way, how well these models performed could be evaluated before forecasts were genearted on the unknown values of the series.

Measures of Forecast Errors
The performance was evaluated mainly by calculating the mean absolute percentage error (MAPE) [35] and mean arctangent absolute percentage error (MAAPE) [15,48] as measures of performance indicators. MAPE is the predominant measure of forecasting accuracy and the easiest to interpret [49], whereas MAAPE can overcome the problem of division by zero and has better performance than MAPE [15]. To provide a complete comparison for different models, four metrics were imported to measure forecast errors, namely, mean absolute errors (MAE), the root mean squared error (RMSE), the root mean square percentage error (RMSPE) [22,23], and symmetric mean absolute percentage error (SMAPE) [50]. These metrics were computed to show how well the proposed forecast models performed when the actual data became available.

Findings
The four models were applied to each of the hotels given that each hotel was unique in terms of the general demand, operation cost, and patterns of revenue management. For a particular hotel, each day-of-week sample was separately estimated. Therefore, 1089 estimation results were obtained. Figure 1 shows the dynamic trend of the average daily room rates of 235 hotels over a 34-week period. As shown in the figure, the average daily room rates fluctuate between HKD 600 and HKD 1300 on the whole time series. The difference in the first 16 weeks was relatively large, exhibiting great peak-trough fluctuations. This revealed that room rates for all hotels are extremely volatile even in the short period. The aggravation of social unrest and the continuous deterioration of the epidemic situation could explain the apparent fluctuation of average daily room rate in the early stage. The two external crisis factors forced hoteliers to tailor the room rate every day according to the market situation to reduce the risk of income disorder or even bankruptcy of the hotel industry. Nonetheless, from the 17th week to the 34th week, the fluctuation tended to be stable, and the changing pattern of different days of a week was consistent. This may have been due to the gradual abatement of social unrest, the government's control of the epidemic situation, and the number of tourists to Hong Kong becoming stable, hence the gap in average hotel room rates gradually narrowed in the later period. The dynamic trend of daily average room rates showed that the hotel room rates in Hong Kong were greatly affected by the external environment, implying that the pricing strategy largely depended on tourist arrivals. The daily average room rates of 3.5-star hotels fluctuated dramatically before the 16th week and weakened after the 16th week. The fluctuation of the 4-star hotel is similar to that of the 3-and 3.5-star hotels. However, the 4.5-star hotels fluctuated violently and frequently in the early stage, and the daily average room rates varied greatly within a week. The 5-star hotels had fluctuations in the 4th, 8th, and 11th weeks. Ultimately, the rate fluctuation trend of all categories was generally consistent, that is, the fluctuation in the early stage was obvious, and the fluctuation in the later period was dramatically weakened. Moreover, all categories had fluctuations on the Tuesday of the 11th week; this may have been due to the impact of the special festival, because the next day was the first day of 2020, and hoteliers increased hotel room prices to acquire a higher income. Although the average daily room rates of all starred hotels increased greatly in the 4th week, slight differences existed between them: the 3-, 3.5-, and 4-star hotels had obvious fluctuations on Tuesday, Wednesday, and Thursday of the 4th week, whereas for 4.5-and 5-star hotels, the fluctuation days were not only Tuesday, Wednesday, and Thursday, but also Monday, which may have been affected by tourist arrivals with different types.

Preliminary Analysis
The collected daily hotel room rates data were placed into ARIMA, SVR, and Naïve models, as well as the LSTM model to test the prediction performance, and verify it through six indicators: RMSE, MAE, SMAPE, RMSPE, MAAPE, and MAPE. Notably, MAPE was the most important measurement index [35]. Among these measures, the smaller the value is, the better the prediction performance was. The model with the lowest numerical value is considered the best prediction model. Tables S1-S4 (see Supplmentary Materials) show the model performance values of the four prediction models in 1089 samples.
The results showed that, compared with the three baseline models, the six evaluation values of the LSTM model were far less than those of the other three baseline models in the comparison from Sunday to Saturday, which indicated that the LSTM model was superior to the traditional prediction models in predicting short-term hotel room rate changes. For example, on Sunday and Monday, the average of the MAPE decreased from 13.80 to 3.86, as compared with SVR. This revealed that the LSTM model had better performance in capturing turning point of time-series data.  The daily average room rates of 3.5-star hotels fluctuated dramatically before the 16th week and weakened after the 16th week. The fluctuation of the 4-star hotel is similar to that of the 3-and 3.5-star hotels. However, the 4.5-star hotels fluctuated violently and frequently in the early stage, and the daily average room rates varied greatly within a week. The 5-star hotels had fluctuations in the 4th, 8th, and 11th weeks. Ultimately, the rate fluctuation trend of all categories was generally consistent, that is, the fluctuation in the early stage was obvious, and the fluctuation in the later period was dramatically weakened. Moreover, all categories had fluctuations on the Tuesday of the 11th week; this may have been due to the impact of the special festival, because the next day was the first day of 2020, and hoteliers increased hotel room prices to acquire a higher income. Although the average daily room rates of all starred hotels increased greatly in the 4th week, slight differences existed between them: the 3-, 3.5-, and 4-star hotels had obvious fluctuations on Tuesday, Wednesday, and Thursday of the 4th week, whereas for 4.5-and 5-star hotels, the fluctuation days were not only Tuesday, Wednesday, and Thursday, but also Monday, which may have been affected by tourist arrivals with different types.

Preliminary Analysis
The collected daily hotel room rates data were placed into ARIMA, SVR, and Naïve models, as well as the LSTM model to test the prediction performance, and verify it through six indicators: RMSE, MAE, SMAPE, RMSPE, MAAPE, and MAPE. Notably, MAPE was the most important measurement index [35]. Among these measures, the smaller the value is, the better the prediction performance was. The model with the lowest numerical value is considered the best prediction model. Tables S1-S4 (see Supplmentary Materials) show the model performance values of the four prediction models in 1089 samples.
The results showed that, compared with the three baseline models, the six evaluation values of the LSTM model were far less than those of the other three baseline models in the comparison from Sunday to Saturday, which indicated that the LSTM model was superior to the traditional prediction models in predicting short-term hotel room rate changes. For example, on Sunday and Monday, the average of the MAPE decreased from 13.80 to 3.86, as compared with SVR. This revealed that the LSTM model had better performance in capturing turning point of time-series data.
To show the model performance of LSTM clearly and intuitively, a scatter plot of the MAPE values of 235 hotels by using different models on different day-of-week appears in Figure A2a-d (see Appendix A). In general, LSTM, ARIMA, SVR and Naïve appear to be similar, albeit the distribution is "narrower" surrounding the mean in LSTM, while more spread out in other three models. The MAPE values of LSTM are substantially centralized and mostly aligned toward the origin (with 0 the lowest). They are fairly evenly distributed on the plotted area. In comparison, the highest MAPE of any baseline model reach above 35.00, whereas the highest was around 12.00 in the case of LSTM. The contrast is even more drastic in SVR and ARIMA, in that a great quantity of data are scattered throughout those high forecasting-error areas (refers to 20.00 or above here). The mean and standard deviation of the MAPE for different models are reported in Table 1. On average, the mean and standard deviation(SD) of the MAPE of LSTM were much lower than those of the three baseline models. Plenty of MAPE values below 4.00 in LSTM were primarily responsible for this. Another plausible reason is that, in ARIMA, SVR and Naïve, data near the mean are not more frequent in occurrence than data far from the mean. The fact that there exist quite a lot of isolated data (away from the dominantly centralized area) may provide a clue to understanding the amplification of the MAPE. The higher centralization (with less isolated data) and smaller MAPE values indicate that the LSTM model had lower forecast error, more accurate prediction, and more robust generalization ability than its competitors. According to the above analysis, our model shows excellent advantages in the short-term prediction of hotel room rates under the condition of extremely unstable external social environment. Given that the LSTM can detect and learn the long-time or short-time dynamic information of the time series, it produces a minimum error rate for the daily room rate with periodic or non-periodic fluctuation characteristics. On the basis of the results of the MAPE and the other five measurement indicators, it can be concluded that solutions by the LSTM model appear to be more feasible in predicting room rate fluctuations in difficult times than the traditional ARIMA, SVR, and Naïve models. A plausible explanation may be that forecasting tools such as the ARIMA and SVR are not good at dealing with the ever-changing tourism data [11].

Discussion
The current study is part of an ongoing research aimed at developing an intelligent system that uses big data to generate forecasts. It deals with the daily demand for room rates in different categories of hotels in Hong Kong on the basis of the LSTM model and a rolling strategy. The proposed framework draws upon an analogy with intuitive one-step forecasting by predicting that the room rate tomorrow will be equal to the room rate today. The results of out-sample validation based on past room rate data unveil the utility of the proposed model and suggest that it represents a promising approach to hotel demand modelling. A sufficiently large sample (referring to hotels) must be used for estimation, and a different day-of-week dataset must be examined so that the approach appears robust. The efficiency of this model is verified by obtaining small values of the MAPE, MAAPE, MAE, RMSE, RMSPE, and SMAPE errors. Given that the forecasting identified through LSTM is derived purely from the room rate time series under examination, it is blind to any external information. Better performance by integrating other external information (e.g., income, exchange rates, marketing spending, infrastructures, weather, and climate) at an aggregate level into the model can therefore be expected [6]. Another issue is that neither formal specification nor a probability distribution must be assumed for data. Therefore, unlike other methods (typically ARIMA), a careful examination of whether the time series is stationary or not can be waived, without which it may cause inaccurate forecasting in their cases. This can greatly reduce additional computations as well as forecasting errors when being put to practical uses.
The study shows the urgent need for reliable and accurate models for estimating hotel room rates that can provide hotels with information for planning investments and undertaking strategic marketing, and rendering customers appropriate booking suggestions, thus maintaining sustainability in the hotel industry. The novelty of the present study resides in the elaborated modeling with fine-grained performance by integrating a rolling procedure and a separate day-of-week forecasting. Although there are many existing models on hotel demand forecasting (typically on hotel occupancy rate and tourist arrivals), these models are not necessarily suitable for hotel room rate. On the one hand, the price range is quite different between luxury and economy level hotels. Room rate may vary substantially from "a cushion to an island". Meanwhile, unlike the other two popular variables, price elasticity is relatively low in hotel room rate. For example, the occupancy rate or tourist arrivals will reach zero in extremity but not the room rate, excluding those closed hotels. On the other hand, each hotel (with respect to occupancy rate) or area (with respect to tourist arrivals) has its own distinctive marketing strategy. The consequence is that a successful estimation of occupancy rate (tourist arrivals) on one hotel (area) may not be applicable to other hotels (areas). For example, for a particular date, the occupancy rate is high in one hotel but maybe low in another. Nevertheless, the pricing pattern appears to be much consistent across different hotels within the category of the same star level, whilst occupancy rate (tourist arrivals) may have large quantities of sharp and significant peaks and troughs [25]. The occupancy rate (tourist arrivals) is more volatile than room rate, thereby rendering difficulties in a good estimation on the generalization capability of the machine. Hence, room rate forecasting is area-specific (or at least category-specific) instead of hotel-specific, which indicates that our model is more robust and has a wider range of application. Through the above findings, the present study offers several valuable insights for research and practice.

Theoretical Insights
Theoretically, the present study bridges previous studies focusing on the room occupancy rate or tourist arrivals [30,33,51] and studies focusing on hotel revenue management [16]. The contribution develops novel and meaningful insights into hotel demand forecasting and sustainable development in hospitality and tourism using big data analytics.
First, this paper introduces a way to address complex time-series data by constructing an empirical analysis framework for hotel room rate forecasting in Hong Kong. Although the accuracy achieved in this study is arguably area-specific (Hong Kong), room rate prediction solely using historical data can potentially enrich our existing knowledge about hotel demand forecasting. This study was conducted on the basis of Hong Kong data with forthcoming estimates that provides insights that are applicable to comparable major destination cites. Second, the present study presents an important illustration of the power of the LSTM model that can achieve high performance with small size and oscillatory data. On the basis of the ingenious concept of self-looping, LSTM can detect and learn complex dynamics and produce low forecasting errors. Thus it indicates that LSTM can account for volatile fluctuations and sudden changes. Third, the current study adds to the growing body of tourism research on hotel demand forecasting by demonstrating how the rolling procedure and day-of-week strategy contributes to daily forecast on room rates. As the objective of this study is to develop and validate a model that can be used in crises for hoteliers as an effective tool to produce accurate predictions of daily room supply, the rolling procedure and day-of-week strategy to lessen the impact of violent fluctuations can help achieve this goal. Fourth, the present study extends the prior studies with its application of considering each day differently according to the same day-of-week [18,39]. Such an application is for verifying the effects of various characteristics of room revenue on room rate. Fifth, the present study provides new insights into the current understanding of the inference between the occupancy rates (or tourist arrivals) and hotel room revenue. For example, Chen used panel regression tests to examine the response of hotel performance to international tourism development and crisis events in Taiwan, where the author found that the SARS outbreak not only decreased hotel sales revenue but also increased discount rates [52]. Wu reported a study that applied independent component analysis to separate dominant factors determining the levels of hotel occupancy rates in Hong Kong [53]. The current study establishes a link between the occupancy rates (or tourist arrivals) and hotel room revenue by clarifying that the hotel room revenue is not directly influenced by the occupancy rates (or tourist arrivals) but mediated by hotel room price. Sixth, the present study provides a way to apply an "infinite" forecast horizon to hotel demand forecasting problem. That is, one can easily generalise the above findings to be able to obtain a good forecast consistently for all future days (according to specific day-of-week separately), which demonstrates the power of the rolling procedure. Fundamentally, as time elapses, you can substitute the "real" room rates of any particular day(s) for the predicted one(s) at any time step(s), when a more accurate prediction after those time step(s) is imposed. Seventh, major advantages of the room rates data provided by OTA distribution channels lie in that they are real time, high frequency (daily instead of quarterly or annually), and sensitive to slight changes in consumer behavior. With these overwhelming merits, this study highlights the great value of massive online data in generating accurate forecasts, which provides a solution to the problem of traditional models: the heavy reliance on a consistent historical pattern. Last, findings may open the door for more comprehensive applications that will account for integrating various variables into forecasting other important issues, such as room occupancy rates and tourist arrivals.

Managerial Implications
Discovering an accurate model to predict hotel room rates would be beneficial to consumers and business owners. First, for hotel researchers and practitioners, utilizing big data and analytic techniques when deciding hotel room rates with a consideration of optimal room prices is crucial. Even though previous studies could help practitioners to obtain tourist arrivals earlier than official reports at best, their applicability was limited. Hotel room rate forecasts would be more useful for practitioners, as hotel managers can make use of our findings to assess the extent of any negative (or positive) impact a particular adjustment is likely to have on the room rates of their hotels. Appropriate price strategies can therefore be implemented for different target guests to either maximize total revenue or reflect the buildup of reservations on a specific day-of-week. By using new technologies, the current study has contributed to the transformation of pricing processes from inventory controls to market developments [14], allowing hotels to improve their response to changes in demand while also ensuring the use of alternative tactics to increase revenues, as well as creative strategies to survive the contractions. For example, an operational or even tactical idea of presenting a lower rate would be an appropriate solution during the difficult time period in Hong Kong, ignoring the issue of distribution cost (transaction cost and processing fees) of online booking over a particular third-party channel. Furthermore, when a future major loss or an unexpected crisis occurs, hotel operators would have a better idea of expected rooms demand and estimated room pricing. Given this considerable practical value, the industry can benefit from using the proposed model because forecasts can be made at low cost for effective planning. Accurate estimations of hotel room rate may inform the development of more precise operational plans that minimize costs and maximize returns, thereby yielding potential savings and sustainable development for the hotel industry. Second, given that the practical implications of this study rest on its application on individual hotels, the practitioners can confidently use the findings to tailor their hotel room price according to hotel-specific characteristics at any future time, as mentioned above. Forecasts computed any days in advance allow revenue managers to plan marketing strategies that could be adopted either in prompt response to different moments or in long-run when continous efforts have to be made, especially possible low-demand periods in promoting off-season visits and possible season extensions. Hoteliers can be benefited from studying room rate trends in order to acquire background information and form a clearer picture when preparing to shift demand during expansion and contraction periods. Third, the study provides an alternative, fast method to estimate a competitor's average room rate before its financial statements are released. This method can help hoteliers make more informed decisions about their own pricing to react quickly to adjust to the new market situation carefully. Fourth, for consumers, knowing when the hotel reaches low room rate may assist them in reserving hotel rooms online. The proliferation of online distribution channels has rendered hotel room rate transparent to customers. Coupled with this exposure, accurate room rate forecasting facilitates their bookings with low-rate parity [54]. The desire to hire upscale hotel (to make reservation for a 5-star hotel) with low rate (e.g., at a 4-star price) to get high-quality service does not seem to be a mirage, especially during this difficult time, thus reducing their costs in return for better service. Likewise, making reservations several days or even several weeks before the desired check-in date can also help customers secure an attractive deal.

Conclusions, Limitation and Future Directions
Given the political turmoil and outbreaks of infectious disease, the tourism and hospitality industry in Hong Kong is struggling. Given that the prices exhibit irregular oscillations and the time-series size (referring to the length of each sequence here) is limited, traditional methods are far beyond the ideal solution to accurate forecasting. Considering that daily series of hotel room rates behave similarly if they refer to the same day-of-week, in this study we seek to represent a pioneering endeavor to tackle the hotel room rates forecasting problem. The problem is tackled by combining a rolling procedure with separate day-of-week time-series on the basis of the LSTM model. After the exploratory and empirical testing, we find that LSTM can bring more accurate daily forecasting than its competitors.
Our study unveils certain intriguing findings on room rate forecasting within short-term periods and initial experiments using this approach proved promising. However, certain existing limitations require consideration. Moreover, there are areas needing future attention in extrapolating even in-depth analysis from our study in the following ways. First, a diversity of room rates exists for the same room category and the same check-in date, hence prices must be collected periodically within a single day to truthfully reflect the room rate of that day. Unfortunately, obtaining these data every day at a consistent level for all hotels was not possible due to manpower shortage. A more refined sampling is necessary if the purpose is to run a fine-grained examination and validation on the model. Second, we collected fares for hotel room rates seven days in advance of the date of the overnight stay (i.e., the target date). Further research must be conducted 1,2,4,15,22,30,45,60, and 90 days in advance for the model to have better prediction accuracy [39]. Third, the range of the data set is comparatively small because it covers only 238 days, hence a larger sample range would be better for a more robust analysis. Future research extending the sample size is recommended to enhance the generalizability of the findings. Fourth, in real-world circumstances, hotel room rate must re-predicted multiple times as the date of stay approaches. Resultantly, a single attempt at data processing may not be sufficient for a complete evaluation. Fifth, we failed to separate the effects of these two crises (political turmoil and infectious diseases), given that algorithms based only on historical data are usually unable to distinguish non-recurring events. The potential impact on room rate patterns at a single level certainly deserves further investigation. Sixth, future studies are encouraged to account for some other variables that can characterize external information at an aggregate level. Given that pricing, occupancy and demand variables represent key factors in determining hotel revenue [55], especially in the short term, future research should focus on the role of occupancy and demand variables in the pricing optimization process. For example, it would be useful to find out how researchers can infer user preferences based on the history of hotel online bookings and information related to crises, and how to use these preferences, occupancy rate and schedule of future demand to improve the accuracy of hotel room rate forecasting. Therefore, pushing hotel room rate estimation coupled with occupancy, demand and external variables is a way of explaining causality and it justifies reliability of the proposed approach. Further research is likewise needed to extend the scope of these variables and identify those which have the largest impact on room rate forecasting for different levels of hotel. Seventh, the findings of this study are limited to a single geographic area (Hong Kong). Each area has its own unique local color and regional characteristics. Therefore, a natural extension of this study is to repeat the analysis of hotels in other geographical regions and additional analyses can be conducted to demonstrate the utility of the LSTM on room rate forecasting. Last, the average nightly price on each sub-channel was investigated in the present study. The price for subsequent analysis was further calculated as the mean of these averages. Hence, this dual-average may serve as a good approximation but not a true night fare for a particular room category. More work is needed in future studies to provide a holistic analysis of online hotel room rates forecasting. Nonetheless, the potential limitations do not reduce the internal validity of the proposed model to demonstrate the power of big data analytics using DLMs in hospitality.

Conflicts of Interest:
The authors declare no conflict of interest. The funding sponsor had no role in the design of the study, in the collection, analyses, interpretation of data, in the writing of the manuscript, and in the decision to publish the results.

Appendix A
Sustainability 2020, 12, x FOR PEER REVIEW 12 of 17 needed to extend the scope of these variables and identify those which have the largest impact on room rate forecasting for different levels of hotel. Seventh, the findings of this study are limited to a single geographic area (Hong Kong). Each area has its own unique local color and regional characteristics. Therefore, a natural extension of this study is to repeat the analysis of hotels in other geographical regions and additional analyses can be conducted to demonstrate the utility of the LSTM on room rate forecasting. Last, the average nightly price on each sub-channel was investigated in the present study. The price for subsequent analysis was further calculated as the mean of these averages. Hence, this dual-average may serve as a good approximation but not a true night fare for a particular room category. More work is needed in future studies to provide a holistic analysis of online hotel room rates forecasting. Nonetheless, the potential limitations do not reduce the internal validity of the proposed model to demonstrate the power of big data analytics using DLMs in hospitality.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Table S1: Forecast errors of hotels in Hong Kong (LSTM), Table S2: Forecast errors of hotels in Hong Kong (ARIMA), Table S3: Forecast errors of hotels in Hong Kong (SVR), Table S4: Forecast errors of hotels in Hong Kong (Naïve).