A fundamental diagram based hybrid framework for traffic flow estimation and prediction by combining a Markovian model with deep learning

Accurate traffic congestion estimation and prediction are critical building blocks for smart trip planning and rerouting decisions in transportation systems. Over the decades, there have been many studies focusing on traffic congestion estimation and prediction with different statistical approaches (e.g., Markov chain) and machine learning models (e.g., clustering, Bayesian networks, and artificial neural networks). However, there is a lack of a unified framework to address the mechanisms of different models and integrate the advantages of different methods through combinations. This paper introduces the FD-Markov-LSTM model, a hybrid interpretable approach that combines the fundamental diagram (FD), Markov chain, and long short-term memory (LSTM). The aim is to estimate and predict traffic states by integrating statistical data in both congested and uncongested scenarios. The FD-Markov-LSTM model leverages the FD to identify hierarchical traffic states and utilizes the Markov process to capture the probabilistic transitions between these states. We employ the LSTM model to further capture the residual time series produced by the Markov chain model (assuming a memoryless property) to enhance the estimation and prediction performance. The proposed model ’ s accuracy in estimating and predicting traffic flow is evaluated using empirical data from three case studies conducted in Beijing and Los Angeles. The results highlight a significant improvement in accuracy compared to classical benchmark models such as the Markov model, ARIMA model, k-Nearest Neighbor model, Random Forest model, and LSTM. Specifically, the FD-Markov-LSTM model achieves reductions of over 39% in mean absolute error, 35% in root mean squared error, and 7.4% in mean absolute percentage error. These results clearly demonstrate that the FD-Markov-LSTM model outperforms the benchmark models, enabling more precise predictions of traffic flow.


Introduction
Short-term traffic forecasting plays a crucial role in Intelligent Transportation Systems (ITS) (Zhang et al., 2023).The accurate and timely estimation and prediction of traffic conditions can aid ITS in promptly adjusting the relevant state, leading to a more precise and expedient alleviation of traffic congestion (Comert et al., 2021).Traffic state estimation and prediction can be categorized into two main approaches: analytical models, such as fundamental diagrams (FDs), and data-driven methods, including machine learning techniques.Traffic parameter variables, including flow, speed, and occupancy, are the key measurements of network performance, and their relationships are usually described by FD models.FD models can analytically divide traffic states into uncongested and congested regimes; however, traffic dynamics over time cannot be captured.The increasing availability of transport big data and advancements in artificial intelligence models have led to a growing focus on data-driven methods, particularly deep learning approaches, for addressing traffic congestion estimation and prediction challenges (Gao et al., 2021).Data-driven methods usually need large amounts of historical data as training inputs, and they usually lack interpretability in model mechanisms and traffic flow phenomena.It is clear that both analytical model-based models and data-driven methods have their advantages and disadvantages, and it is natural to come up with an innovative approach by combining their advantages to result in a hybrid model (Hou et al., 2022).Therefore, this paper develops a hybrid stepwise framework to estimate and predict traffic dynamics features with analytical state transition probabilities.
The proposed hybrid stepwise model presents a combination of a Markov chain model and a deep learning model for accurate traffic flow estimation and prediction.This method involves two main steps.Firstly, utilizing the FD, we identify uncongested and congested states for a single lane on the freeway.By employing the Markovian model, we determine traffic flow states and calculate the state transition probability matrix, capturing regular patterns and explicitly addressing temporal correlation errors between estimated and observed flows.In the second step, the long short-term memory (LSTM) model is utilized to capture additional traffic flow features within the residual series that are not entirely captured by the Markov chain model.The integration of deep learning methods within the Markov model allows for the extraction of important traffic flow characteristics from the residuals of the Markov prediction model.Ultimately, the predicted flows from the Markovian and LSTM models are combined to provide a comprehensive and accurate prediction result.

Literature review on traffic state identification
Accurate identification of traffic flow states within a transportation network is crucial in improving traffic flow prediction capabilities and enabling effective traffic management (Pan et al., 2023).Accurately identifying the traffic state serves as a fundamental component for implementing informed traffic management strategies.The efficacy of traffic state identification significantly impacts the precision of shortterm traffic estimation and prediction.
Many previous studies concerning traffic state identification used fuzzy mathematical methods.Ban et al. (2007) used percentile speeds based on multiday data to identify and calibrate traffic flow in bottlenecks.Huang et al. (2011) introduced a novel real-time traffic state identification method utilizing the fuzzy c-means clustering method.Lu et al. (2015) presented an algorithm that leverages the fuzzy c-means model to improve the applicability of traffic state clustering for the identification and prediction of real-time traffic conditions.Moreover, researchers (Esfahani et al., 2018;Bao, 2019) have improved the accuracy of traffic flow state identification by introducing a modified fuzzy cmeans clustering strategy.
Various data-driven intelligent approaches have been employed by researchers to detect changes in traffic states.Yang and Qiao (1998) utilized a self-organizing neural network technique, categorizing traffic states into seven distinct groups.Additionally, the identification of traffic flow states has seen extensive exploration through the lens of traffic flow theory.Long et al. (2008) introduced a congestion propagation model that utilizes the cell transmission model to accurately identify congestion bottlenecks in urban networks.Kerner (2013) proposed the three-phase traffic theory, which categorizes traffic flow into three distinct states: free flow, synchronized flow, and wide-moving jam.Kerner (2013) introduced the three-phase traffic theory, which classifies traffic flow into three distinct states, providing a comprehensive framework for understanding traffic dynamics.Building upon this, Lin (2019) developed an approach for traffic state identification that combines the Macroscopic Fundamental Diagram (MFD) with the support vector machine algorithm.This method allows for the identification of traffic states across the entire network.
Accurately determining the actual traffic conditions, especially during periods of congestion, presents a significant challenge due to the complex interplay of multiple factors that contribute to the overall traffic state (Chen et al., 2022).Previous studies have rarely utilized a comprehensive set of classification indicators, including critical parameters such as flow, speed, and density obtained from the FD, to identify distinct traffic states.It is widely known that FD is to establish the mathematical relationship between traffic flow, speed, and density.It classifies similar samples into one class (congested or non-congested) by calibrating the FD with observed data, while different samples into another category.Therefore, the division of traffic states based on FD can not only conform to the characteristic of traffic flow, but also lead to a more accurate division of traffic states, especially for capturing congested and uncongested states.Thus, in this study, we will fill these knowledge gaps and introduce a classification indicator system based on the calibration of FD.

Literature review on traffic state estimation and prediction with Markov model
By examining the time series data of traffic flow, a strong relationship between the current and previous traffic flow states becomes apparent, which can be represented using conditional probabilities.The dynamic nature of traffic flow makes the Markov chain an appropriate model for describing this relationship.Consequently, several studies (Geroliminis and Skabardonis, 2005) have successfully utilized the Markov chain for traffic flow estimation and prediction.Shin and Sunwoo (2018) proposed a Markov-based model incorporating speed constraints to predict vehicle speed, particularly excelling in cornering sections.Evans et al. (2001) employed the Markov model for breakdown occurrences, demonstrating that higher arrival rates of vehicles increased the likelihood of breakdown.Kidando et al. (2018) developed a Markov-based structured model with random variations to describe the congestion, offering a valuable resource for detecting different traffic conditions.The Markov-based model has been widely utilized in travel time prediction in several studies (Kharoufeh and Gautam, 2004;Ramezani andGeroliminis, 2011, 2012;Tang et al., 2020).The experiments conducted have demonstrated the Markov chain approach's ability to accurately estimate the distribution of dependent link travel times and effectively capture correlations among link travel times.
Furthermore, there is a growing trend to integrate Markov models with other analytical or machine learning approaches to enhance traffic state prediction.Noroozi and Hellinga (2014) introduced a method that combines a short-memory time series model with a Markov model incorporating time-varying covariates, demonstrating its effectiveness in basic time series modeling.Li et al. (2020) devised a comprehensive Markov-based model that incorporates various external factors to predict traffic network conditions.By considering these factors, their model effectively captures the dynamic changes in traffic conditions, resulting in enhanced performance of the prediction system compared to singlemodel methods.The Markov model was utilized by Qi and Ishak (2014) and Sun et al. (2020) to predict congestion patterns during peak periods.
From the current literature, we can see that there is still a gap in the analysis of bottleneck with the Markov model and capturing the state transition dynamics of traffic flows during congested and uncongested periods in the urban transport system.

Literature review on estimation and prediction with machine learning
With the rapid progress of information processing technology, machine learning techniques have become widely utilized in the field of traffic state estimation and prediction.These methods bring notable benefits in modeling and optimizing nonlinear systems.Notable examples include LSTM networks, as utilized by Ma et al. (2015).Additionally, the Diffusion Convolutional Recurrent Neural Network, developed by Li et al. (2017), and the Spatial-Temporal Transformer Networks, proposed by Xu et al. (2020), have demonstrated effectiveness in this domain.Another notable model includes the Spatio-Temporal Graph Mixformer by Lablack and Shen (2023).These methods have significantly contributed to the advancement of traffic state estimation and prediction.Antoniou et al. (2013) proposed a two-step dynamic framework for local traffic state prediction, utilizing available information.Fusco et al. (2016aFusco et al. ( , 2016b) ) conducted a comparative analysis of models such as Y.A. Pan et al.Bayesian Networks for short-term traffic estimation and prediction.In the context of urban transportation, Wang et al. (2019) presented a pathbased deep learning framework for predicting city-wide traffic speeds, which provided logical and interpretable outcomes.In their recent work, Ma et al. (2023) introduced a novel multi-task learning approach for forecasting both traffic flow and speed, leveraging shared transferable features and task-specific correlations.Furthermore, data-driven methodologies have expanded to other domains, including demand estimation and prediction in network systems (Kim et al., 2020), traffic congestion forecasting for connected vehicles (Elfar et al., 2018), traffic speed interval prediction (Song et al., 2022), and the advancement of fusion techniques based on machine learning for precise predictions (Ma et al., 2020).
Despite the multitude of studies discussed above, many of them have overlooked the potential information contained in time series residuals during the estimation and prediction of traffic conditions.As residuals may still contain important traffic features, we have opted to utilize the LSTM model to further improve the accuracy of traffic estimation and prediction for congestion conditions in our study.

Literature review on estimation and prediction with hybrid method
In the context of congestion bottlenecks or adverse weather, accurately estimating driver behavior poses challenges for traffic state estimation.Both traditional approaches struggle to handle the complexity of such behavior on their own.To overcome these limitations, researchers have proposed a combined approach that integrates the strengths of data-driven and model-driven methods.This approach aims to compensate for the weaknesses of each method and improve accuracy.
A promising approach to improve the accuracy and interpretability of traditional methods in traffic state estimation is through the integration of model-driven and data-driven techniques.This hybrid method combines the use of traffic flow models to approximate traffic states and the utilization of observed traffic data to discover unknown model parameters.By leveraging the strengths of both approaches, this hybrid approach offers enhanced performance.Nevertheless, there are still numerous areas and unanswered questions that necessitate further investigation in the advancement of hybrid methods for traffic state estimation.
Numerous studies have investigated hybrid approaches for traffic state estimation and prediction, aiming to combine the strengths of model-driven and data-driven techniques.For instance, Krakovna and Doshi-Velez (2016) combined LSTM and a hidden Markov model to enhance interpretability and prediction accuracy.Raissi (2018), Raissi and Karniadakis (2018) and introduced a deep learning method that incorporates partial differential equations (PDE) features for traffic prediction.Belezamo et al. (2019) combined the kinematic wave model with the Markov Chain model to estimate traffic state variables and develop an explainable framework.Huang and Agarwal (2020) introduced a physics-informed method that enhances traffic state estimation accuracy, particularly in real-time scenarios with limited data.Kim et al. (2020) proposed an interpretable model that combines linear regression models with a neural network, incorporating LSTM layers, to predict taxi ride demand.These studies highlight the ongoing exploration and development of hybrid methods for more effective and accurate traffic state estimation and prediction.
In different domains, hybrid approaches have been utilized to enhance model accuracy.For instance, Nguyen-Le et al. (2020) integrated LSTM and Markov models to forecast crack propagation in engineering.Similarly, Ma et al. (2021) introduced an LSTM-Markov model that leveraged Markov models to mitigate prediction errors in LSTM models, leading to improved accuracy in predicting COVID-19.The integration of data-driven and model-driven approaches through hybrid methods holds great promise for enhancing the accuracy and interpretability of traffic flow prediction.Ongoing research is actively dedicated to refining and exploring these hybrid techniques across various applications.Specifically, combining traffic flow models, statistical methods, and machine learning models not only improves the accuracy of the model but also enhances its interpretability.This comprehensive approach ensures that the resulting estimations and predictions are both precise and readily understandable.
Various short-term traffic flow prediction methods possess distinct strengths and limitations.We compare the following commonly used short-time traffic flow forecasting methods, namely, time series methods, Kalman filtering methods, regression methods, machine learning methods, deep learning methods, and Hybrid models.The comprehensive analysis of the advantages and disadvantages of those models is shown in Table 1.In summary, selecting an appropriate prediction method hinges on careful consideration of factors such as data availability, model complexity, computational resources, and prediction time frame.Regardless of the chosen method, meticulous attention to data quality, preprocessing, and parameter tuning remains imperative for ensuring accurate and reliable short-term traffic flow predictions.

Objectives and contributions
Although numerous efforts have been made for different aspects of the traffic flow estimation and prediction problem, there are still some critical challenges that need to be further investigated.First, as for the traffic state identification, most existing methods apply the clustering method ( ), which lacks physical interpretability and cannot incorporate the traffic flow model into the traffic state identification method.The FD is recognized as a highly effective tool for capturing the intricate connection among traffic flow, speed, and density.Not only does it derive essential traffic flow parameters, but it also enables precise identification of traffic states.Nevertheless, numerous current methods for traffic flow estimation and prediction only utilize deep learning as a final stage in data processing, failing to fully exploit the valuable information present in the residuals.However, it is still possible to capture traffic features within the residuals sequence (Ma et al., 2020).This paper introduces a comprehensive hybrid framework, termed FD-Markov-LSTM, which combines the fundamental diagram, Markov model, and LSTM method.The primary goal of our research is to systematically address the interpretation and accuracy of prediction in bottleneck.The key contributions of this paper are summarized as follows: (1) By embedding the FD model into the traffic state identification, the calibrated key parameter (e.g., critical speed and critical density) based on FD could accurately capture the change in traffic flow during the transition between different traffic states.
(2) We utilize the Markov-based model to determine the state-tostate transition probabilities, enabling precise prediction of future traffic states.
(3) A hybrid interpretable model framework is constructed to combine Markov and LSTM, which can further capture the data residual.The FD-Markov-LSTM model demonstrates significantly smaller prediction errors compared to benchmark models.
(4) The proposed FD-Markov-LSTM model enhances the accuracy of short-and medium-term trend prediction for bottleneck traffic flow.
The subsequent sections of this paper are structured as follows.Section 2 introduces an interpretable framework that combines FD, Markovian, and LSTM for traffic state estimation and prediction.In Section 3, we introduce the experimental description and evaluation performance of the proposed method.Section 4 analyzed and discussed the experimental results.Finally, Section 5 summarizes this work and provides insights into potential future research directions.

Methodology
Enhancing the precision of traffic state estimation and prediction stands as an effective approach in alleviating urban traffic congestion within traffic management systems.To this end, this paper introduces a stepwise hybrid model (FDM-LSTM) that amalgamates the strengths of the FD, Markov, and LSTM models.By leveraging the advantages of both traditional traffic flow prediction methods and deep learning techniques, this model enables accurate forecasting of future traffic flow states.The elaborate flowchart of the proposed FD-Markov-LSTM model in this paper is shown in Fig. 1.We will further explain the differences in each step in the following.
(i) In stage 1, this paper used the calibrated FD to identify the traffic flow state, which could accurately capture the uncongested and congested conditions by critical speed v c .Improving the accuracy of the (iii) In stage 3, we use the deep learning method to capture traffic flow features by residuals.We build the LSTM model and use the residual between the Markov prediction model and the observed data and the traffic state as input data in LSTM.Then we could further improve the prediction results for future traffic flow.

Traffic state identification based on fundamental diagram
Numerous researchers (Ma et al., 2020) have emphasized the significance of defining and categorizing traffic states across various levels, ranging from free-flow conditions to congested situations.Common approaches for achieving this include the utilization of k-means or fuzzy c-means clustering methods.However, a limitation of these clustering methods is the need to predefine the number of clusters.Additionally, the k-means method assumes spherical clusters and an equal distribution of observations among groups, which renders it less suitable for analyzing time-series data.The fuzzy c-means clustering method is not suitable for clustering data with uneven density or large numerical fluctuation in observed data.In a bottleneck or congested segment for a single lane, the time series data fluctuated significantly between congested and uncongested states, as shown in Fig. 2. Hence, employing the traditional clustering method to partition the traffic state would result in significant inaccuracies.This suggests that neither the k-means method nor the fuzzy c-means clustering method is appropriate for handling time series data at the traffic bottleneck.Consequently, we categorize the traffic state by utilizing the FD, which is elaborated on in the following explanation.
Fig. 3 illustrates the limitations of the FD and time series plots in capturing the dynamic nature of traffic relationships over time.While the FD provides insights into traffic states and variable relationships, it does not fully capture the evolving nature of traffic.Conversely, time series plots effectively depict the temporal evolution of traffic but do not comprehensively capture traffic states through a combination of flow variables.In this study, we hope to capture the key parameters: capacity c, critical speed v c , critical density k c , free-flow speed v f , which can be calibrated by the FD with observation data.
We could further use the critical speed or critical density to divide the traffic state into uncongested and congested conditions in the FD (as shown in Fig. 3).This FD-based method is more interpretable in terms of traffic flow characteristics.In addition, the number of clusters does not need to be estimated in advance.As a result, in this paper, we calibrated the FD to derive the critical parameters and further define traffic states accordingly into the bottleneck.The calibration method employed in this study involves collecting traffic flow data from observation equipment and using the least squares method to calibrate the speed-density function of the S3 model (Cheng et al., 2021(Cheng et al., , 2024)).The S3 function is shown in Eq. ( 1).
The primary aim is to minimize the disparity between the observed speed data v i and the estimated speed value v calculated using the model.The unknown parameters in the objective function are the freeflow speed, critical density, and parameter m, as represented in Eq. ( 2).
where v f , k c , and m are the parameters for calibration.N is the total amount of data, v i and k i are the observed speed data and density data.By obtaining the critical density k c from Eq. ( 2) and subsequently substituting it into Eq.(1), we can determine the critical speed v c .This critical speed parameter serves as a means to differentiate between traffic congestion and non-congestion state.

State transition probability matrix in the Markovian model
The Markovian model operates on the assumption that the current state of the system is solely influenced by the preceding state, following the principle of memory lessness.In this model, the transition probability matrix is used to depict the stochastic process of traffic flow evolution.
The transition probability from state i to state j is represented as p ij .The total number of transitions from state index S i to state index S j is denoted as m ij , while m i represents the total number of transitions from state index S i to all other states, including state S i itself.These definitions lead to the Eq. ( 3) as stated.
The transition probability can be defined by the following Eq.( 4).
In Eq. ( 4), p ij is the transition probability from the state index S i transition to state index S j .For all i, j∊{1, 2⋯, n}, p ij ≥ 0 and ∑ j∊{1,2⋯,n} p ij = 1, i = 1, 2⋯, n.The details of the notations used in this paper can be found in Table 9 in the Appendix.
P in Eq. ( 4) can be partitioned into four blocks for an FD in terms of the uncongested and congested period, as shown in Eq. ( 5).
where P UU represents the transfer from an uncongested state to another uncongested state.P UC denotes the transition matrix from an uncongested state to a congested state.P CU signifies the transition matrix from a congested state to an uncongested state.Finally, P CC represents the transition matrix from a congested state to another congested state.
In the initial step of constructing the Markovian model, it is essential to systematically define different states.In this study, each state S i within the Markov chain model is precisely characterized by a tuple comprising flow, density, and speed ranges.This representation encompasses a range of flow values, density values, and a specific speed level, enabling a comprehensive description of the traffic state.The FD describes the congestion level by the critical speed, which is used to distinguish the uncongested and congested conditions; namely, the traffic flow condition satisfies {(q, k, v)|v > v c } is classified as an uncongested condition, and it is classified as a congested situation if the traffic flow satisfies {(q, k, v)|v ≤ v c }. Furthermore, based on the uncongested and congested identification, we divide the flow and density data into intervals to obtain different traffic states.

Markovian-based model for capturing regular patterns
In this research, X(t) is the observed link volume, S i is the state of X(t) after the classification.According to the transition probability matrix and the state of the initial predicted data, we could further predict the state of the subsequent time-series data by following Eq.( 6), which denotes how the current state S i is most likely to transfer to state S k .
where M(t) is the estimated flow values by the Markov model at time t, q k (t) is the average flow of all the traffic flow divided into states k at time t, p ik (t) is the transition probability from state i to another state k at time t.
The residual between the observed flow and the estimated flow could be derived as a new variable r(t).r(t) = X(t) − M(t), t∊{1, 2⋯, T} (8) Fig. 4 shows a time series of flow plots using traffic flow values collected from a loop detector on the Los Angeles I-405 corridor, with a periodic pattern as shown in Fig. 4 (a).Fig. 4 (b) further depicts the residual time series of flow produced from the Markovian model with an interpretable state transition matrix.It indicates that there is still a significant repeating pattern within the time series.Therefore, the use of the LSTM model is motivated to capture additional traffic features present in the residual series of flow, which the Markov chain model might not capture comprehensively.This approach allows for a more thorough exploration of the traffic patterns and characteristics that may be overlooked by the Markovian model alone.

LSTM-based model for capturing the traffic features within the residual time series
LSTM is extensively employed in traffic flow prediction due to its unique capabilities and consistent outperformance of traditional time series models in various benchmarks (Chen et al., 2019;Lu et al., 2022).LSTM was chosen for several reasons: firstly, LSTM's aptitude for handling sequential data and capturing temporal dynamics, both short and long-term, aligns perfectly with the inherent sequential nature of traffic flow data.Secondly, its proficiency in capturing complex and nonlinear traffic patterns, which traditional linear models struggle with, is attributed to the recursive neural network (RNN) architecture's ability to decipher intricate data relationships.Thirdly, LSTM excels in integrating diverse data sources like traffic sensors, weather data, and historical patterns, significantly enhancing prediction accuracy.Moreover, LSTM's automatic feature learning reduces the need for extensive manual feature engineering, which proves valuable when dealing with intricate and time-varying relationships between factors such as weather Traffic flow data represents a non-stationary random sequence, yet it displays discernible regularities and clear trends within a continuous time series.The proposed approach selects LSTM networks, a specific type of recurrent neural network model, for short-term traffic prediction.LSTM is well-suited for this task because it has the ability to (i) handle time series data (Abdoos and Bazzan, 2021) and (ii) capture the pronounced fluctuations and nonlinearity present in the data (Ma et al., 2015).As a result, this paper utilizes the LSTM model to effectively capture the flow characteristics within the residual time series.
We use the residual derived by the Markovian model as the input variable in the LSTM model.The model input is denoted as r(t), and the output sequence is denoted as L(t), where T is the maximum time index, and L(t) is the residual time series predicted by the LSTM.The structure of LSTM is shown in Fig. 5.The LSTM model employed in this paper is composed of one input layer, three hidden layers, and one output layer.It is worth noting that when working with time series data using LSTM, increasing the number of hidden layers can lead to a substantial increase in computation time and memory usage (Hu and Chen, 2018;Hua et al., 2019).Determining the various parameters used in the LSTM models is a crucial aspect of the modeling process, ensuring fair comparisons and optimal performance (Pan et al., 2022).While some parameters are automatically adjusted during model training through learning, hyperparameters must be set manually.These hyperparameters encompass critical elements such as the number of input layers, hidden layer configurations (including the number of hidden layers and nodes within them), output layer specifications, activation functions, loss functions, optimization algorithms, learning rates, and iteration counts, among others.In the context of our study, we adopted LSTM models with distinct network structures to explore their impact on predictive performance.Specifically, our models included one input layer, followed by multiple hidden layers.To ensure fairness and rigor in our comparisons, we conducted experiments with LSTM models employing one, two, and three hidden layers, respectively, all trained on the same dataset.The selection of the optimal model was based on minimizing error metrics obtained through rigorous testing and evaluation, ensuring a comprehensive and unbiased assessment of model performance.
The first step in the LSTM model is the utilization of the forget gate, denoted as f(t).It employs the sigmoid function δ to attenuate irrelevant noise, ensuring that the values of f(t) are confined within the range of 0 to 1.
The second step in the LSTM model involves the input gate, denoted as i(t), and the external input gate, denoted as g(t).These gates play a crucial role in updating and determining the incorporation of new information into the LSTM model. (10) The third step in the LSTM model focuses on updating the previous cell state, denoted as s(t).This step ensures that the LSTM model retains relevant information from previous time steps while discarding unnecessary information.
The last step in the LSTM model is the output gate o(t).
δ denotes the sigmoid function.
where W ff , W rf , W gg , W rg , W oo , W ro are weight and b f , b g , b o are bias.h t , r t are variable, ⨂ denotes the Hadamard product,t∊{1,2⋯,T}.The details of the notations used in this paper can be found in Table 10 in the Appendix.

Integrated modeling framework: FD-Markov-LSTM
In this study, we present a hybrid model that seamlessly integrates three distinct components: the FD, Markov chain, and LSTM models.By combining the strengths of traffic-theoretic statistical models with machine learning approaches, our proposed method is expected to syner- gistically complement one another and effectively capture complex patterns in traffic flows, thereby improving the accuracy of estimation and prediction.The simplified structure of the proposed FD-Markov-LSTM model is shown in Fig. 6.The comprehensive model can be written in Eq. ( 16).
where X(t) represents the time series observation flow, M(t) is the prediction of the Markov model, L(t) is the residual time series predicted by the LSTM.

Data description
In order to demonstrate the subtle but essential difference between the various traffic flow prediction models, we use three different datasets for the examples shown in Fig. 7 with the descriptions in Table 2.It should be emphasized that the three selected corridors are one of busiest freeways in California and the city of Beijing.
Traffic data for the busiest freeways in California is sourced from the Performance Measurement System (PEMS), a widely utilized repository for transportation research and analysis.PEMS gathers real-time traffic data, encompassing traffic flow, speed, and occupancy, from sensors and detectors on major California highways.This publicly available data can be accessed via the California Department of Transportation's website (https://pems.dot.ca.gov/) or data portal.Meanwhile, in Beijing, traffic data is collected through strategically placed loop detectors managed by local transportation authorities.These detectors capture real-time information on flow, speed, and occupancy, which is then aggregated and processed.Accessing this Beijing data typically necessitates coordination with relevant transportation agencies due to its proprietary sensor networks and data management systems (Fig. 8).
The raw data comprised aggregated flows, occupancies, and spot speeds for each mainline lane.To transform the occupancy data into density data, the conversion equation Eq. ( 17) introduced by May (1990) was employed.
where occ is occupancy, l is the average length of the vehicle, and d is the average area of influence for a detector.In this paper, l +d is 25 feet.

Evaluation metrics for prediction results
To assess the performance of different traffic flow prediction models, it is essential to establish evaluation criteria that can gauge prediction quality and facilitate method enhancement.This evaluation process primarily involves comparing predicted traffic outcomes with real-time traffic conditions.Three key metrics we will delve into are Mean   MAE is computed as the mean of the discrepancies between the actual values (ground truth) and the predicted values.In mathematical notation, it is represented as: where measured m is the measured traffic flow value for observation m, estimated m is the estimated traffic flow value for observation m, M is the total number of flow counts.RMSE is the square root of the mean of the squared disparities between the predicted values generated by the regression model and the actual target values.This can be expressed mathematically as: MAPE is commonly employed in traffic forecasting to mitigate the issue of a small number of instances with low traffic flow having a disproportionate impact on error measurement.It takes into account not only the absolute error between the predicted and actual values but also the relative error as a ratio to the actual value.The definition of MAPE is expressed as:

Traffic state identification
The FD establishes a relationship between flow, speed, and density, which captures the characteristics of traffic flow and road segments.We could use the critical speed v c to distinguish the congested and uncongested conditions.In Section 2.1, we provide a detailed description of how the least squares methodology is applied to calibrate the observed Defining different states systematically is a crucial step in establishing the Markovian model.In this study, each state in the Markov chain model is precisely defined using a tuple that includes ranges for flow, density, and speed.This approach effectively captures the variations in traffic conditions by incorporating specific flow and density intervals along with speed levels.First, we divide traffic flow into the uncongested and congested state by the critical speed, which is calibrated by FD.Then, we divide traffic flow and density into different states by equal intervals, and eventually, we can obtain 20 different traffic states in the Markovian model.We plot the fundamental diagram with different states defined in the Markovian model, as shown in Fig. 9.It is obvious that some states are unavailable in the observation data, such as states 8-12 and 18-20 in the Beijing West Third Ring dataset.These states correspond to a flow of more than 1400 veh/hr/ln and less than 400 veh/hr/ln, which is consistent with the FD.In addition, it is feasible and visible for transportation planners and analysts to examine the state transition from uncongested to congested conditions.

State transition probability matrices
The transition probability matrix is partitioned into four blocks: P UU , P UC , P CU and P CC .The transition probability matrices derived from the Markovian model revealed the probability of traffic flow transitioning to various possible states.Taking the Los Angeles I-405 corridor as an example, Table 4 is the transition probability matrix of transferring from an uncongested state to another uncongested state P UU .It shows that the uncongested state tends to evolve to its original state with a high probability.In the uncongested circumstances, the orange diagonals in the figure with the high value illustrate the pattern in the free-flow traffic period that will generally remain for a period of time before transferring.Besides, except for the possibility of maintaining the current state, there are some transitions with the largest value from one state evolving to the before/ after state, such as state 10 is most likely to evolve to state 9 with a probability of 100 %.
The transition matrix P UC of transferring from an uncongested state to a congested state is shown in Table 5.It is obvious that most of the probability value is 0 %, indicating that the transition from an uncongested to a congested state is low frequency.This outcome can be attributed to the breakdown phenomenon, which is characterized by an

Table 4
The 5-min state transition probability matrix in Los Angeles I-405: Transition matrix P UU .

Table 5
The 5-min state transition probability in Los Angeles I-405: Transition matrix P UC .

Table 6
The 5-min state transition probability in Los Angeles I-405: Transition matrix P CU .

Table 7
The 5-min state transition probability in Los Angeles I-405: Transition matrix P CC .instantaneous and abrupt reduction in traffic flow or speed.
In addition, there is a similar phenomenon that the transition probability matrix P CU has a small transition probability value, as shown in Table 6.With the P CC for congested states, each element in the matrix represents the probability of transitioning from one congested state to another.Table 7 displays the probabilities of traffic states transitioning either within the congested state or to a different state.Notably, the orange data in the matrices represents the highest probabilities of remaining within the same state, while the green data indicates the highest probabilities of transitioning from one state to another.Tables 10-17 in the Appendix provide the state transition probabilities for the remaining two cases.

Performance comparison of the proposed model with other benchmark models
In addition to the hybrid interpretable models proposed in this study, several benchmark models were tested in the experiments.The benchmark algorithms consisted of two traditional time-series forecasting models, namely the Markov chain model and the ARIMA model, as well as several machine learning approaches, such as k-Nearest Neighbor (KNN), Random Forest (RF), and LSTM.The performance comparison of the compared model is shown in Table 8.Three case studies were conducted using empirical data collected in Beijing and Los Angeles to evaluate the accuracy of traffic flow estimation and prediction achieved by the proposed FD-Markov-LSTM model.The results demonstrate superior performance compared to classical benchmark models, indicating higher prediction accuracy.Specifically, the MAE of the FD-Markov-LSTM model is reduced by an average of 39 % compared to the benchmark models, while the RMSE is reduced by an average of 35 %.Furthermore, the proposed FD-Markov-LSTM model shows an average reduction of 7.4 % in MAPE.These improvements across all performance indicators clearly indicate that the FD-Markov-LSTM model outperforms other models, resulting in more accurate traffic flow predictions, particularly in bottleneck scenarios (Table 18).This study employs a hybrid approach that integrates multiple models to enhance prediction accuracy and adaptability to diverse traffic flow conditions, particularly in complex congestion scenarios.When compared to both the basic Markov model and the LSTM model, the hybrid model in this study demonstrates superior performance.Additionally, hybrid models mitigate overfitting risk by amalgamating multiple methods, as each model may exhibit overfitting in different aspects, thereby enhancing the model's generalization capacity.However, creating a hybrid model through multiple stages can elevate model complexity and computational demands, necessitating increased computational resources and training time.Consequently, this model may not be suitable for large-scale traffic management but proves advantageous for precise traffic control and congestion management.It is important to note that this study solely evaluated prediction accuracy,

Table 11
The 2-min state transition probability in Beijing West Third Ring: Transition matrix P UU .

Table 12
The 2-min state transition probability in Beijing West Third Ring: Transition matrix P UC .

Table 13
The 2-min state transition probability in Beijing West Third Ring: Transition matrix P CU .

Table 14
The 2-min state transition probability in Beijing West Third Ring: Transition matrix P CC .omitting considerations of model training speed and time.

Performance comparison of congested and uncongested period
Furthermore, the objective of this paper is to assess the accuracy of the proposed method across different traffic flow conditions.To accomplish this, we partitioned the data from a single day into two categories: congestion periods and non-congestion periods based on the prevailing congestion conditions.Subsequently, we conducted separate tests on the identified study cases.According to traffic state identification in Section 4.1, we could use the calibrated critical speed v c in Table 3 to distinguish the uncongested and congested periods.Then we could provide an analysis of prediction performances with different traffic conditions, and the results are shown in Table 9.It is obvious that

Table 15
The 10-min state transition probability in Beijing North Third Ring: Transition matrix P UU .

Table 16
The 10-min state transition probability in Beijing North Third Ring: Transition matrix P UC .

Table 17
The 10-min state transition probability in Beijing North Third Ring: Transition matrix P CU .

Table 18
The 10-min state transition probability in Beijing North Third Ring: Transition matrix P CC .the method proposed in this article has small errors, indicating that the proposed hybrid interpretable model could reasonably capture the congestion and uncongested patterns.
Furthermore, we conducted a comparison of model performance in different time periods, including all-day, congested, and uncongested periods, as depicted in Fig. 10.The superiority of the proposed model is evident, particularly during congested periods, demonstrating its suitability for accurate traffic flow estimation and prediction in the presence of congestion, despite the increased complexity associated with congested traffic conditions.
Our study presents several key advantages in the realm of traffic flow estimation and prediction.Firstly, we introduce the FD-Markov-LSTM hybrid model, combining the strengths of the fundamental diagram, Markov chain, and LSTM to enhance accuracy and interpretability.This innovative approach addresses limitations inherent in conventional models and consistently outperforms classical benchmark models, establishing its superior accuracy in traffic flow prediction.However, it is important to note some limitations.Our model's complexity may pose implementation and resource challenges compared to simpler, traditional statistical approaches.Moreover, while excelling in short-and medium-term trend prediction, further research is required to bolster its long-term forecasting capabilities, particularly for extended time intervals.Additionally, the method relies on historical data, and its performance could be influenced by specific dataset characteristics in the case studies.

Conclusions
In summary, this study introduces the FD-Markov-LSTM model, a pioneering hybrid framework for traffic flow estimation and prediction.This unique model amalgamates the strengths of FD, Markov chain, and LSTM techniques to elevate the interpretability and accuracy of traffic forecasting.The key contributions of this research encompass improved traffic state identification through FD integration, precise state-to-state transition prediction using the Markov-based model, the introduction of an interpretable hybrid framework combining Markov and LSTM, and superior short-and medium-term trend prediction.Empirical findings consistently validate the FD-Markov-LSTM model's superior performance compared to traditional benchmark models.
The study's results highlight the outstanding predictive capabilities of the FD-Markov-LSTM model compared to classical benchmark models.When pitted against established benchmarks like Markov, ARIMA, k-Nearest Neighbor, Random Forest, and LSTM, the FD-Markov-LSTM model exhibits substantial improvement, with a remarkable 39 % reduction in MAE, over 35 % decrease in RMSE, and a notable 7.4 % drop in MAPE.This model signifies a significant leap forward in traffic flow forecasting, successfully bridging the gap between analytical and data-driven approaches.Its practical utility is underscored by the substantial reduction in prediction errors.These results underscore the FD-Markov-LSTM model's exceptional accuracy in traffic flow prediction, positioning it as a valuable choice for real-world applications within intelligent transportation systems.
Future research can focus on real-world validation, long-term forecasting refinements, and the incorporation of multi-source data fusion techniques, promising further improvements in dynamic traffic flow estimation and prediction for intelligent transportation systems.It can be developed in the following aspects: (1) To ensure the robustness and practicality of the proposed method, it is essential to validate its performance in real-world scenarios that involve various sources of nonrecurring delays, including adverse weather conditions, incidents, and work zones.(2) Further refinement of the proposed model is required to enhance its accuracy in making long-term forecasts, specifically for time intervals such as 30 min or 1 h.(3) To achieve more accurate estimation and prediction of dynamic traffic flow, it is recommended to employ multi-source data fusion techniques.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Illustration of the proposed stepwise framework for traffic flow prediction.

Fig. 4 .
Fig. 4. Time series of data.(a) time series of original flow measurements; (b) time series of Markovian residuals.

Fig. 7 .
Fig. 7. Areas of the selected corridors used in the datasets.

Fig. 8 .
Fig. 8. Uncongested and congested state identification in the three datasets.

Fig. 9 .
Fig. 9. Different traffic flow state identification in the Markovian model.
Y.A.Pan et al.

Fig. 10 .
Fig. 10.Performance comparison of the dataset during the different periods.
Y.A.Pan et al.

Table 1
Comparison of advantages and disadvantages of different short-time traffic flow forecasting methods.
Y.A.Pan et al.

Table 2
Description of empirical datasets.

Table 3
Calibrated parameters in the three datasets.

Table 8
Comparisons of prediction performance with different models in datasets.

Table 9
Comparisons of prediction performance with the uncongested and congested period.

Table 10
Nomenclature used in this paper.