Parking lot occupancy prediction using long short-term memory and statistical methods

In crowded city centers, drivers looking for available parking space generate extra traffic and in addition, the resulting excessive exhaust gases cause air pollution. Therefore, directing the drivers to a parking spot in an intelligent way is an important task for smart city applications. This task requires the prediction of occupancy states of parking lots which involves appropriate processing of the historical parking data. In this work, Long-Short Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) methods were applied to parking data collected from curbside parking spots of Adana, Turkey for predicting the parking lot occupancy rates of future values. The experiments were performed for making predictions with different prediction horizons that are 1 minute, 5 minutes, and 15 minutes. The performances of the methods were compared by calculating root mean squared error (RMSE) and mean absolute error (MAE) values. The experiments were performed on data from five different days. According to the results, when the prediction horizon is set to 1 minute, LSTM achieved RMSE and MAE values of 0.98 and 0.72, respectively. For the same prediction horizon, ARIMA achieved RMSE and MAE values of 0.62 and 0.35, respectively. On the other hand, LSTM achieved smaller error values for larger prediction horizons. In conclusion, it was shown that LSTM is more suitable for larger prediction horizons, however, ARIMA is better at predicting near-future values.


Introduction
As a result of increment in world population, city centers are getting more crowded every day. Finding an available parking spot in such city centers is a challenging and frustrating task. Besides, extra time and fuel are consumed while looking for a parking spot. In addition, searching action of drivers causes slowing down the traffic movements and this recursive action causes more traffic problems. It is reported that parking spot searching drivers cause more than 1/3 of the traffic congestions [1]. Another source for this problem has emerged due to the Covid-19 situation. Today, more people started to prefer personal vehicles instead of urban transportation to decrement the risk of being infected. Hence, more vehicles may be present in traffic when compared to pre-pandemic times. Consequently, finding a parking spot is becoming critical for the economy, health, and air pollution.
To address this parking problem, various technologies such as wireless communication, sensors, and machine learning are used in smart city applications. Within the scope of Internet of Things (IoT) systems, a network of low-cost and low-power sensors may provide a useful infrastructure for a smart parking system. Such a system should involve more features than showing the instantaneous information for available parking spots to the users. Because of the dynamic properties of the parking problem, a smart parking system needs to be capable of organizing the drivers in an intelligent way. One key feature may be the processing of historical parking data using machine learning methods to make statistical inferences and future predictions. Recent advancements in various technologies enable the widespread usage of the sensors in smart city applications. One example of these technologies is the wireless communication protocols. Relatively recent protocol "long range wide area network" (LoRaWAN) makes wireless data transmission possible over distances up to 15 kilometers while consuming very small amount of power. Investments on other enabling technologies like cloud computing, big data analytics, and embedded systems triggers deployment of more smart parking systems in various cities. As a result, the amount of collected parking data increases and it becomes necessary to analyze the data using advanced methods. Analysis of such data involves performing statistical analysis on the previous data as well as predicting the parking space availability for a particular area.
Machine learning methods are suitable for such time-series prediction problems; hence, various machine learning methods are utilized for prediction of parking spot states. Among the traditional machine learning methods, Artificial Neural Networks (ANN) [5], Support Vector Machines (SVM) [6] and Regression Trees (RT) [6,7] are widely used for this purpose. On the other hand, Long Short-Term Memory (LSTM), an architecture based on Recursive Neural Networks (RNN) for time-series prediction, is a commonly used deep learning method [8].
In this work, Autoregressive Integrated Moving Average (ARIMA) and LSTM methods are applied to parking data collected from curbside parking spots of Adana, Turkey. ARIMA is a statistical method where the data is regressed on its own prior values. LSTM is a deep learning method that can handle the long-term dependencies in the data with the help of memory cells in its structure. Both of these methods are developed for time-series analysis problems, hence this study compares the performances of these methods on the parking data and underlines the advantages of one method over other.
The rest of the paper is organized as follows. In section 2, related literature review is given. The background information about the LSTM and ARIMA models are provided in section 3. Section 4 contains the details about the dataset and the experiments. The results are given in section 5 and the paper is concluded with section 6.

Literature review
The importance of parking lot occupancy prediction has been mentioned in the literature frequently with an increased rate in the recent years. The reason for this situation may be the emergence of IoT concept and its widespread applications. Within the scope of smart cities, the prediction of occupancy rates contributes to various aspects such as reduction in traffic congestion and environmental pollution. One other useful application can be given as efficient pricing for the smart parking systems. A machine learning based pricing system was proposed by Saharan et al., where on-street occupancy rates are predicted using four well-known methods, namely, linear regression, decision tree (DT), neural networks (NN), and random forest (RF) [9].
Typically, machine learning methods are utilized for solving this prediction problem. In a recent study by Awan et al., prediction performances of multilayer perceptron, k-nearest neighbors (KNN), DT, RF, and an ensemble of these methods were compared and it was shown that the voting-based ensemble method outperforms the others [10]. The ensemble methods attracted the attention of different researchers. For example, Sampathkumar et al. proposed a majority voting ensemble model that combines outputs of seven different prediction methods for predicting parking availability in a smart city [11]. On the other hand, further analysis of different ensemble methods for parking space availability was performed by Tekouabou et al. using four regression methods that are bagging, RF, adaptive boosting, and gradient boosting [12]. Instead of considering only the occupancy rate, Provoost et al, defined state of a parking area using occupancy rate as well as in-and out-flux of the vehicles [13]. They showed that this definition is less sensitive to the prediction horizon. Furthermore, there is a large amount of publications for this problem utilizing different variants of machine learning methods. These studies include road segment clustering and kalman filter based prediction [14], neural networks considering dynamic distribution characteristics [15], and wavelet neural networks [16].
Strategies utilizing deep learning methods for parking availability prediction is also very common [17]. For example, a hybrid approach was proposed by Qiu et al., where the parameters of an RNN model were optimized using genetic algorithm [18]. In the work proposed by Yang et al., several traffic data sources such as parking meter transactions, traffic speed, and weather conditions were used as input parameters to a Graph-Convolutional Neural Network (GCNN) models [19]. It was shown that GCNN has better prediction performance than the baseline LSTM and lasso models. Usage of ensemble learners together with deep learning models is also available in the literature. Lv et al. proposed an ensemble learning algorithm with an RNN model [20]. The optimized values for the parameters of the algorithm were determined using particle swarm optimization. At this point it is worth to mention that the number studies using time series analysis is limited when compared to machine learning and deep learning based methods [21,22]. MJEN MANAS Journal of Engineering, Volume 9 (Issue 2) © 2022 www.journals.manas.edu.kg

Long short-term memory
LSTM is a type of RNN architecture and has been applied in various problems such as language modeling, speech-to-text transcription, machine translation and time-series prediction. LSTM networks are able to learn long-term dependencies in the data with the repeating memory cells in its structure. There are three main building blocks of the memory cells which are input, output and forget gates. These gates process the data from the previous hidden state (ℎ −1 ), previous cell state ( −1 ) and the current input ( ). The previous hidden state and the current input data are processed together by concatenating them to form current information ( ). Specifically, forget gate decides whether to keep or forget the previous state of the cell by passing the current information through a sigmoid activation function. In the input gate, the important information within the current information is determined to allow the less important ones be removed from the cell. The outputs of these two gates are summed to update the cell state. This updated cell state ( ) is passed to a tanh function and then multiplied with the sigmoid output of the current information. This final step is called the output gate because it outputs the updated hidden state (ℎ ) for the cell. A memory cell and corresponding operations are illustrated in Figure 1.
Repeating structure of such cells can be achieved by providing the output states of a cell as inputs to the next one, eventually yielding an LSTM network. The weights associated with the gates can be learned appropriately when a sequence data is used for training the network. As a result, the dependencies between the states of the sequence are modelled using these networks. An LSTM network having a visible layer with one input and a hidden layer with four neurons is used in this work. For a time series data, , AR can be described as a model that depends on linear combination of its past values, − . Thus, it is a regression model based on its own lagged version.
( ) notation is used to denote an AR model of order and is defined as where is a constant number, are the model parameters, and is the white noise.
On the other hand, rather than past values, MA method uses linear combination of previous prediction errors and current error to model the current value of a variable. MA model of order q can be defined as where is the mean of the time series, are the white noise error terms, are the model parameters.
The combination of AR and MA methods cannot capture the non-stationary properties in the data. To overcome this situation, "integrated" part (I) of the ARIMA model is introduced. In this step the difference between the data points and their previous values are calculated. The data points are replaced with these difference values. This differencing process may be repeated more than once. Generally, the order of the integrated step is denoted with and represents the number of consecutive differencing operations. As a result, an ARIMA ( , , ) model is defined as where ′ is the differenced version of the time series .
Obviously, for such a model, is the order of AR part; is the number of differencing operations, and is the order of MA part.
In this work and ARIMA(1,1,0) model is used to make predictions on the parking data.

The dataset
The dataset used in this work is taken from Adana Metropolitan Municipality. It contains data for a curbside parking lot in the city center for five working days. The raw data was recorded in a vehicle-based format where each record MJEN MANAS Journal of Engineering, Volume 9 (Issue 2) © 2022 www.journals.manas.edu.kg contained enter and exit time of a vehicle. A preprocessing step was performed to convert this data into a time-series format in which each sample denotes the total number of vehicles for one minute of a day. A sample parking data after preprocessing is shown in Figure 2. Since the curbside parking system is active only during the daytime, the data belonging to intervals of hours [00:00,09:00] and [17:00,23:59] are all zero. When performing the experiments, these hours are simply ignored and only the data for active hours is used and this corresponds to nearly 500 samples of the time series data.

Experiments
For both ARIMA and LSTM methods, three different experiments were performed to predict the parking lot occupancy of 1, 5, and 15 minutes ahead. These experiments were applied separately to the days of data. In other words, the data belonging to each day is processed individually.
The sliding window approach is employed for making the predictions for one-step ahead. If denotes the t th sample in the time-series data, +1 is predicted by training the model on all the samples from 1 to . Next, the samples from 1 to +1 are used to predict +2 . This procedure is continued until the end of the time-series data. For this purpose, 66% of the one-day data is taken as the training set and the next single record in the dataset is used as test data. The remaining 34% of the one-day data is predicted in this manner and each prediction result is stored for performance analysis.
When training the LSTM model, five most recent data (i.e. − where ∈ ℤ and 0 ≤ ≤ 4) are used as lookback features and the training is performed for 30 epochs. These values are determined by increasing the lookback features and the training epochs gradually. It was observed that larger values for these hyperparameters have no considerable improvement in the performance of the model. The learning curve showing the training loss for different epochs is given in Figure 3. These parameters are not usable in the ARIMA model because it makes new predictions according to total updated history.

Figure 3. Change of the training loss in LSTM
The performances of the models are evaluated by computing root mean squared error (RMSE) and mean absolute error (MAE) which are given by the following equations: where , , and denote the predicted values, actual values, and the total number of samples, respectively.

Results and discussion
As indicated earlier, the dataset contains data for five different days and future predictions for three different times (1, 5 and 15 minutes ahead) are performed. The prediction results for LSTM and ARIMA methods are given in Tables 1 and 2, respectively.
As can be seen from the tables, the minimum error for both methods are achieved when predicting one minute ahead. This is an expected situation as the problem of predicting the future values gets more difficult as the prediction horizon increases. This is consistent with the obtained results because the amount of error increases when the prediction horizon is larger. According to the results presented in Tables 1 and 2, for all the experiments with a prediction horizon of 1 minute, ARIMA has lower error rate, while the error rate achieved by LSTM is generally lower when the prediction horizon is 5 minutes or 15 minutes. Thus, it is possible to compare the overall performances of the methods. On average, ARIMA is able to predict the future values when the prediction horizon is smaller. However, for larger values of prediction horizon the relative error for ARIMA gets higher than those for LSTM. It means that LSTM may be a better choice than ARIMA in case of a large prediction horizon.
This information can be handy when developing a hands-on application for a smart city parking system. If the driver is looking for a parking lot a nearby location, then ARIMA will be more suitable method for generation suggestions to the user. On the other hand, in case of a user looking for a parking lot a distant location (in other words, it will take some time for the driver to reach to the target location), usage of LSTM will allow the user to find a parking lot with a higher probability.

Conclusion
As the city centers get more crowded, it becomes a necessity to organize and manage the facilities of the cities in an intelligent way. Car parking problem is one of the consequences of high population, which brings about several other issues such as generating extra traffic and increased carbon emission to the atmosphere. The machine learning and data analysis methods available today may allow for predicting the occupancy status of a specific parking spot, thereby it becomes possible to direct the drivers to the available place. In this work, ARIMA and LSTM methods are used to predict parking lot occupancy rate using the parking data collected from Adana, Turkey. For both of the methods, the experiments were performed with three different prediction horizons that are 1 minute, 5 minutes and 15 minutes. According to the results, it was observed that ARIMA outperforms LSTM when the prediction horizon is smaller. On the other hand, for the increased prediction horizons, the performance of LSTM is higher. This situation may be due to the presence of memory cells in the structure of LSTM that enables considering the long-term dependencies when making predictions. Since the predictions with horizons more than one minute are more feasible in practical applications, it may be concluded that LSTM method is more suitable than ARIMA for such problems. Increasing the amount of data for training the model as well as inclusion of traditional machine learning methods in the experiments are planned as the future works.