ST-CRMF: Compensated Residual Matrix Factorization with Spatial-Temporal Regularization for Graph-Based Time Series Forecasting

Li, Jinlong; Wu, Pan; Li, Ruonan; Pian, Yuzhuang; Huang, Zilin; Xu, Lunhui; Li, Xiaochen

doi:10.3390/s22155877

Open AccessArticle

ST-CRMF: Compensated Residual Matrix Factorization with Spatial-Temporal Regularization for Graph-Based Time Series Forecasting

¹

School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510641, China

²

College of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China

³

Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(15), 5877; https://doi.org/10.3390/s22155877

Submission received: 30 June 2022 / Revised: 28 July 2022 / Accepted: 3 August 2022 / Published: 5 August 2022

(This article belongs to the Section Intelligent Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

Despite the extensive efforts, accurate traffic time series forecasting remains challenging. By taking into account the non-linear nature of traffic in-depth, we propose a novel ST-CRMF model consisting of the Compensated Residual Matrix Factorization with Spatial-Temporal regularization for graph-based traffic time series forecasting. Our model inherits the benefits of MF and regularizer optimization and further carries out the compensatory modeling of the spatial-temporal correlations through a well-designed bi-directional residual structure. Of particular concern is that MF modeling and later residual learning share and synchronize iterative updates as equal training parameters, which considerably alleviates the error propagation problem that associates with rolling forecasting. Besides, most of the existing prediction models have neglected the difficult-to-avoid issue of missing traffic data; the ST-CRMF model can repair the possible missing value while fulfilling the forecasting tasks. After testing the effects of key parameters on model performance, the numerous experimental results confirm that our ST-CRMF model can efficiently capture the comprehensive spatial-temporal dependencies and significantly outperform those state-of-the-art models in the short-to-long terms (5-/15-/30-/60-min) traffic forecasting tasks on the open Seattle-Loop and METR-LA traffic datasets.

Keywords:

traffic time series forecasting; matrix factorization; residual learning; missing value

Graphical Abstract

1. Introduction

Accompanied by the evolution of the urban intelligent transportation system (ITS), modern societies have benefited from a variety of human-centered traffic regulations. As one of the crucial steps for the ITS, searching for reliable and cost-effective traffic time series forecasting has been conducted for decades. The resulting advanced algorithms are vital for building ITS enabling cities to smarten up. In general, recorded data from multiple sensors form multivariate time series and can be interlinked [1]. Usually, the target of traffic time series forecasting involves the forward-looking estimation of traffic variables (e.g., volume, speed, or occupancy) and thus exploiting the potential for system efficiency upgrading and policy making. By the length of the time horizons, traffic forecasting can be briefly categorized into short-term (

\leq

30 min) and mid- to long-terms (

>

30 min) [2]. Technically, traffic forecasting falls within the scope of multi-variate time series analysis, but it faces challenges due to the high complexity and dynamics of ITS [3]. Specifically, how to capture the inherent spatial-temporal correlations of traffic time series comprehensively becomes the primary challenge [2,4,5].

Against that background, a great deal of research effort has been carried out to address this issue. Broadly speaking, existing models for traffic forecasting can be roughly classified as classical models and deep learning (DL) models, the former of which can be further subdivided into statistical models and traditional machine learning (ML) models [6]. In detail, many earlier works simply concatenated a series of data from different locations into a vector and then applied the vector autoregressive (VAR) and autoregressive integrated moving average (ARIMA) and their variants for time series forecasting [7,8]. However, these types of statistical models face significant challenges from the ML models because it fails to take the spatial or temporal correlation into account [2]. As the representative of data-driven methods, the traditional ML models have also suffered from various difficulties in traffic forecasting. For example, the classical support vector regression (SVR) model takes the choices of kernel function and hyper-parameters very sensitively. And these models either impose ideal stationary assumptions or prevail a computational burden when dealing with large-scale traffic datasets [6]. Once confronted with overly complex historical data, they fail to account for the highly non-linear traffic properties.

Different from those previous attempts, the DL models have emerged as the predominant traffic prediction models owing to their thorough consideration of the factors affecting traffic conditions [9]. The paradigm of the models manifests itself as the stacking of basic learnable blocks, or layers, to create the specified deep architectures [10], and much progress has been made in related works. Among the modeling efforts, the class of methods like recurrent neural network (RNN) is famous for processing sequential data, and its variants of the long short-term memory network (LSTM) and the simplified gated recurrent network (GRN) are proven to be effective structures for resolving gradient vanishing of RNN in capturing the temporal dependencies of historical data [6,11]. Studies on real-world issues have verified the applicability and advantages of the LSTM or GRN in traffic forecasting tasks [10,12]. Unlike RNN-based temporal modeling, the graph-related models have shown a powerful impact on spatial feature learning of road networks [13]. Typically, the joint spatial-temporal graph convolutional network (STGCN) can efficiently raise the accuracy of traffic forecasting by modeling the multi-scale traffic networks [2]. Moreover, sequence-to-sequence algorithms as well as other technical tricks (e.g., attention and residual mechanisms) can also greatly augment the predictive accuracy [14,15]. Despite DL models performing well, the drawbacks are that the diversified spatial-temporal correlations have not been fully leveraged from traffic networks and excessive model training may worsen the overall prediction performance [2,16]. In contrast, matrix factorization (MF) dominates in traffic forecasting as it can rapidly capture the global spatial-temporal properties by a low-rank approximation relation. Furthermore, the regularization terms designed for factor matrices can also strengthen the modification of Matrix Factorization (MF) models [17,18]. Additionally, most existing studies mainly work on a historical time series assuming no missing data entries [3]. However, when collecting traffic data from ITS, data loss is inevitable due to the sensors or software failures [5,19], which makes it difficult to make accurate forecasts [20].

To solve the above problems, we have developed a spatial-temporal MF framework with compensatory residual modeling, which works together with GRN and graph Laplacian regularizers for traffic time series forecasting. Overall, the main contributions of this article are summarized as follows:

We propose a novel forecasting model entitled ST-CRMF for depth extraction of the non-linear spatial-temporal dependencies within historical time series and its residuals, where the spatial-temporal regularizers and bi-directional residual structure greatly augment model performance.
Apart from accurately predicting future traffic sequences, the ST-CRM can deal with incomplete traffic datasets and proves its predictive effectiveness in various missing cases.
We empirically demonstrate the superiority of the ST-CRMF model on real-world traffic datasets (i.e., Seattle-Loop and METR-LA). The experimental results confirm that the proposed ST-CRMF model achieves satisfactory results for several pre-defined prediction lengths.

The rest of this article is organized as follows: Section 2 briefly introduces the existing literature; In Section 3, we illustrate the spatial-temporal matrix factorization and model architecture in detail; In Section 4, we describe the effectiveness of the ST-CRMF and its superiority over baseline methods, respectively; Finally, the concluding remarks are given in Section 5.

2. Related Work

Recently, traffic time series forecasting has attracted an increasing amount of research attention on account of the incentives of ITS [9]. Inspired by this, we first briefly review the forecasting of traffic time series under commonly encountered constraints, followed by their applications.

2.1. Traditional Multivariate Time Series Forecasting

Traditional multivariate time series forecasting mainly relies on a few mathematical and statistical modeling approaches [9]. In the early stages, Ahmed et al. [21] first proposed to apply the ARIMA model to calculate traffic data, and later Williams et al. [22] further used the seasonal ARIMA model to predict the short-term traffic flow. Furthermore, VAR extends the AR model to capture linear dependencies of multiple time series. For example, Yu et al. [23] introduced the AR regularizer for temporal modeling. Chen et al. [17] performed a matrix VAR for temporal regularization, which captured complex temporal dependencies as it utilized more parameters than AR. Nevertheless, the model complexity of these statistical methods grows quadratically with the number of variables [1]. Because of their capability of capturing non-linear relationships and dealing with high-dimensional time series, ML models have gradually outperformed the traditional models [6,9,11]. Since the re-activation of ML, including SVR, K-nearest neighbor (KNN), and random forest regression (RFR), are popularly employed for traffic time series forecasting thanks to their excellent learning capacity. Zhang et al. [24] built a hybrid forecasting framework based on SVR, RFR, and an enhanced genetic algorithm (GA) to improve the accuracy of traffic forecasting. Li et al. [25] developed an adapted SVR model for short-term traffic flow prediction. Besides, the hybrid models [8] combining statistical methods and ML have been successfully applied in traffic forecasting. Although these models exhibit certain advantages in terms of accuracy and interpretability of model parameters, they suffer from the constraints of large-scale traffic datasets and complex influences contained therein.

2.2. Deep Learning for Traffic Time Series Forecasting

Subsequently, DL has prevailed in correlated time series forecasting because of its superior ability to model complex functions and exploit underlying features without tricky feature engineering [26,27]. The capacity empowers traffic prediction to serve a vital role in ITS (e.g., driving decisions/services [28]). Previously, Lv et al. [29] confirmed that DL is effective for traffic prediction. Moreover, technically, DL models include those deep architectures formed by stacking several basic learnable blocks or layers and a series of related application paradigms have been developed and deployed. Ma et al. [30] first proposed a capsule network incorporating a nested LSTM for traffic state forecasting. Gu et al. [31] built the two-layer fusion DL (FDL) model combining LSTM and GRN to predict the traffic speeds of lanes. However, the potential of the RNN-based approaches is far from being adequately exploited. Except for sequential modeling, studies increasingly prefer to model historical data from temporal and spatial perspectives. As Zhao et al. [7] developed a temporal GCN (T-GCN) for capturing spatial and temporal dependencies. Fang et al. [32] proposed a meta-learning-based multi-source spatial-temporal network (Meta-MSNet) to model spatiotemporal characteristics of multi-source traffic data (e.g., weather, holiday notification). Graph WaveNet [4] integrated GCN with dilated causal convolution for saving the computational cost of dealing with long sequences. Moreover, Li et al. [33] built the dynamic graph convolutional recurrent network (DGCRN) which generated the dynamic adjacency matrices and extracted features from node attributes. Bai et al. [3] established the adaptive GCRN (AGCRN) for automatically capturing the node-specific spatial-temporal dependencies. Besides, like graph multi-attention network (GMAN) [34], they combined the attention mechanism with GCN to obtain more dynamic dependencies of traffic data.

2.3. Spatial-Temporal Modeling Using Incomplete Dataset

Most of the forecasting studies either delete incomplete data blocks or simply repair them via using interpolation models, but the removal of redundant data or unsatisfactory filling may induce a possible over-fitting [5]. A few models accomplished the prediction tasks along with data recovery. Cui et al. [10] developed the stacked bidirectional and unidirectional LSTM (SBU-LSTM) method with an imputation mechanism for network-wide traffic data repair and forecasting. However, this architecture inherently lacked the capability to extract spatial feature and so it failed to further enhance its overall performance. Matrix Factorization has been employed for traffic sequence recovery and prediction thanks to its approximation ability during the factorization process. Chen et al. [17] built a Bayesian temporal factorization (BTF) model by integrating a VAR layer into a Bayesian probabilistic matrix/tensor factorization algorithm to enable the forecasting and recovery of missing data. In the field of structural health monitoring (SHM), Ren et al. [35] established the incremental Bayesian matrix/tensor learning model for effective repair and response prediction of spatial-temporal data. Chen et al. [36] proposed a low-rank autoregressive tensor completion (LATC) framework, which introduced the temporal variation as the regularization term but overlooked the spatial factor. So far, despite the existence of a few models for time series recovery and prediction, the current methods still lack in-depth modeling of spatial-temporal data, especially for the residuals at the modeling stage. In this study, we propose a novel spatial-temporal factorization model ST-CRMF with residual learning for accurate forecasting and imputation of historical time series.

3. Methodology

3.1. Preliminaries

In this study, we focus on formulating a nonlinear fitting function

f_{θ}

with partially historical series collected from a sensor network for traffic time series prediction over a certain time interval. One of the traffic variables can be speed, flow, and density, but it refers to traffic speed in our experiment. We first describe the symbolic representation of variables and relevant concepts in detail and then give a formal problem definition. Figure 1 shows the illustration of multivariate time series forecasting.

Definition 1 (Urban Sensor Network Topology).

We assume that the sensor network is divided as a 2D grid map of the size

𝓂 \times 𝓃

by its longitude and latitude, each of which is an equal-sized cell region. Then, the topology of this sensor network is defined as an unweighted graph expressed by

G = (V, ℰ)

. Where

V \in ℝ^{M}

is a set of nodes and

M

is the number of nodes;

ℰ \in ℝ^{N}

is a set of edges that represents the connectivity between road segments, and

N

is the number of edges.

Definition 2 (Traffic Network Matrix Sequence).

As illustrated in Figure 1, the traffic speeds observed at the successive time intervals (

1, 2, \dots, t - 1, t

) result in the matrix sequence

X_{t} = {𝔁_{1}, 𝔁_{2}, \dots, 𝔁_{t - 1}, 𝔁_{t}}

, and

𝔁_{i j}

stands for the traffic value on the

i t h

node at the

j t h

time slot,

i \in {1, 2, \dots, M}

,

j \in {1, 2, \dots, N}

.

Definition 3 (Problem Statement).

As per previous studies [37,38], the forecasting problem of traffic time series can be interpreted as learning the function

f_{θ}

based on the topology

G

and its feature matrix

X \in ℝ^{M \times N}

to predict the traffic speeds in the future period of

T

with high accuracy. Therefore, we can formally define the studied problem in Equation (1) as follows:

{𝔁_{t + T}, 𝔁_{t + T - 1}, \dots, 𝔁_{t + 2}, 𝔁_{t + 1}} = f_{θ} ({𝔁_{t}, 𝔁_{t - 1}, \dots, 𝔁_{2}, 𝔁_{1}}; G)

(1)

where

f_{θ} (\cdot)

denotes the desired nonlinear mapping function;

θ

represents the learnable parameters;

T

refers to the length of traffic time series needed to be predicted.

3.2. Spatial-Temporal Matrix Factorization

3.2.1. Matrix Factorization Description

In general, multivariate traffic time series involves both spatial and temporal dimensions, and our forecasting task revolves around modeling the implied spatial-temporal features to establish a study paradigm [9]. Following the general framework of available MF models [5,17], the standard MF as depicted in Figure 2 satisfies the basic process of Equation (2) as follows:

X \approx Q^{T} T; the elementwise is 𝓍_{i j} \approx 𝓺_{i}^{T} 𝓽_{j}

(2)

where

Q \in ℝ^{R \times M}

is the spatial feature matrix whose row is

𝓺_{i}^{T}

;

T_{} \in ℝ^{R \times N}

is the temporal feature matrix whose column is

𝓽_{j}

; their element-wise

𝓍_{i j}

is estimated by the inner product of

𝓺_{i}^{T}

and

𝓽_{j}

;

R

denotes the positive integer referring to matrix rank. In Figure 2, the future time series

X_{n e w}

remains related with

T_{n e w}

at the same time direction when completing the data approximation by Equation (2).

Naturally, the motivation behind modeling spatial-temporal traffic data aims to impose a low-rank structure on the above 2D matrix for capturing its dependencies within the framework of the low-rank factorization and finally reconstructing this approximate relationship by the product of

Q \in ℝ^{R \times M}

and

T_{} \in ℝ^{R \times N}

with

R ≪ \min {M, N}

. More precisely, the dependencies can deliver valuable insights into the spatial-temporal structure and assist in traffic forecasting tasks [15]. Therefore, the general optimization problem of MF can be summarized as follows:

\min_{Q^{*}, T^{*}} \frac{1}{2} \sum_{(i, j) \in Φ} {(𝓍_{i j} - Q_{i}^{T} T_{j})}^{2} + \frac{λ_{𝓆} η}{2} {‖ Q ‖}_{F}^{2} + \frac{λ_{𝓉} η}{2} {‖ T ‖}_{F}^{2}

(3)

where

Φ

represents the set of (

i, j

) pairs for which the matrix element

𝓍_{i j}

is known (i.e., training set);

Q_{i}

and

T_{j}

stand for the

i th

row and

j th

column vectors of the feature matrices

Q

and

T

, respectively; the factors of

{‖ Q ‖}_{F}^{2}

and

{‖ T ‖}_{F}^{2}

serve for the over-fitting prevention and stability enhancement and their coefficients

λ_{𝓆}

,

λ_{𝓉}

and

η

adjust the degree of regularization. However, the traditional MF models lack an adequate consideration of the spatial-temporal dependencies, resulting in undesirable non-linear fits. As a result, we introduce the spatial-temporal regularizers in MF to reinforce their factorization process. Specifically, we employ the graph Laplacian [18] and GRN [12] as the spatial and temporal regularizers of MF to further characterize the spatial and temporal dependencies, respectively.

3.2.2. Regularized Spatial-Temporal Matrices Modeling

Inherently, the adjacent sensor in road networks collects traffic data with similar characteristics and the degree of mutual influence varies with its urban environment (e.g., location). In graph-based models, one usually constructs similarity graphs by linking close neighboring data points in a feature space and processes them to strengthen the modeling effects [9]. With that in mind, graph Laplacian is a sensible choice as the spatial regularizer of MF. In theory, the graph Laplacian spatial regularizer penalizes the discrepancies between the spatial feature vectors and their adjacent neighbors. More specifically, for an unweighted graph

G = (V, ℰ)

, the graph Laplacian matrix satisfies as follows:

L_{G} = D_{G} - A_{G}

(4)

where

L_{G}

is a positive semidefinite matrix;

D_{G}

is the diagonal degree matrix of nodes

V

, of which each element on the diagonal satisfies

D_{G} (i, i) = \sum_{k} A_{G} (i, k)

,

k \in {1, 2, \dots, M}

;

A_{G}

is the adjacency matrix of

G

, which states whether there are adjacent edges between nodes.

Therefore, the graph Laplacian spatial regularizer can be formulated as follow:

ℒ (Q | θ) = t r (Q^{T} L_{G} Q) = \frac{1}{2} \sum_{i, k}^{M} A_{G} (i, k) ‖ 𝓆_{i} - 𝓆_{k} ‖_{2}^{2}

(5)

where

θ

contains the learnable regularization parameters;

t r (\cdot)

denotes the matrix trace operator.

To incorporate periodic temporal information, we apply GRN as the temporal regularizer of MF to capture multiple temporal dependencies. As depicted in Figure 3, GRN is a combination of the number of gated recurrent units (GRU), while GRU is a simplified variant of LSTM that uses two gate structures (i.e., reset and update gates). Both gates are the basis vectors that determine which information should be passed to the output. For each GRU cell, the sequential representation of the traffic status can be calculated iteratively by the following equations:

Reset Gate : v_{t} = σ (W_{v} [𝓍_{t}, h_{t - 1}] + b_{v})

(6)

Update Gate : u_{t} = σ (W_{u} [𝓍_{t}, h_{t - 1}] + b_{u})

(7)

Cell State : {\tilde{h}}_{t} = t a n h (W_{h} [𝓍_{t}, v_{t} ⊛ h_{t - 1}] + b_{h})

(8)

Output : h_{t} = u_{t} ⊛ h_{t - 1} + (1 - u_{t}) ⊛ {\tilde{h}}_{t}

(9)

where

v_{t}

and

u_{t}

are the reset and update gates, separately; (

W_{v}

,

W_{u}

,

W_{h}

) and (

b_{v}

,

b_{u}

,

b_{h}

) stand for their weight and bias parameters;

h_{t - 1}

,

h_{t}

, and

h_{t + 1}

are the previous, current, and next outputs;

{\tilde{h}}_{t}

is the current cell state;

𝓍_{t - 1}

,

𝓍_{t}

, and

𝓍_{t + 1}

are the previous, current, and next inputs; tanh() denotes the tanh activation function; The symbol “

⊛

” is the Hadamard product of the matrix. In this experiment, we use the sigmoid function (indicated by

σ

) as the activation of the hidden state, and the hyperbolic tangent function (indicated by tanh) as the activation of output.

During temporal modeling, GRN offers a more efficient structure owing to fewer hyperparameters. And we adopt GRN as the temporal regularizer of MF, which can be formulated as follow:

ℒ (T | θ) = \frac{1}{2} \sum ‖ T - {\hat{T} ‖}_{2}^{2} = \frac{1}{2} \sum_{j = ℓ_{δ} + 1}^{N} ‖ 𝓽_{j} - 𝓎_{θ} {(𝓽_{j - ℓ_{1}}, 𝓽_{j - ℓ_{2}}, \dots, 𝓽_{j - ℓ_{δ}}) ‖}_{2}^{2}

(10)

where

𝓎_{θ} (\cdot)

is the GRN mapping function;

\hat{T}

denotes the predicted

T

by GRN;

φ = {ℓ_{1}, ℓ_{2}, ℓ_{3}, \dots, ℓ_{δ}}

stands for time lag set which represents the time-correlated topology.

Overall, by combining Equations (5)–(10), the objective function of the regularized MF is as follows:

\min_{Q^{*}, T^{*}, θ} Z (Q, T) = \min_{Q^{*}, T^{*}, θ} \frac{1}{2} \sum_{(i, j) \in Φ} {(𝓍_{i j} - Q_{i}^{T} T_{j})}^{2} + \frac{λ_{𝓆} η}{2} {‖ Q ‖}_{F}^{2} + \frac{λ_{𝓉} η}{2} ‖ {T ‖}_{F}^{2} + λ_{𝓆} ℒ (Q | θ) + λ_{𝓉} ℒ (T | θ)

(11)

3.3. Overall Architecture of the ST-CRMF Model

In this section, we will emphasize the overall architecture of the ST-CRMF model as detailed in Figure 4. Among them, the critical idea is to first model the spatiotemporal properties by the regularized MF model, then perform the complementary modeling of the resulting residuals to update MF model parameters, and finally predict the future time series precisely by the Equation (2) calculation. Besides, for the incomplete traffic datasets, we first repair the missing data and then conduct the forecasting tasks, but the latter is the focus of our study. Up next, we will describe the sampling and updating of spatial-temporal feature matrices to derive the model prediction and potential imputation.

3.3.1. Model Implementation

Starting with the data acquisition and matriculation in Figure 4, the implementation of the ST-CRMF unfolds sequentially. In essence, the object of sampling and updating

Q

and

T

is to obtain a tractable closed-form of Equation (11) to enable the concrete expression of the spatial-temporal correlations between observations. As the closed-form solutions of

Q

and

T

involve mutually, it is difficult to perform the model inference by a single back propagation. Motivated by the work [17], our idea was to estimate the spatial-temporal feature matrices (i.e.,

Q

and

T

) by utilizing an alternating minimization scheme. Specifically, we introduce a popular alternating least-squares (ALS) method to perform the inference on the regularized MF model. In our study, the ALS minimizes the cost function

Z (Q, T)

iteratively by optimizing each component individually while keeping other factor matrices fixed, then the issue can be reduced to a linear least-squares issue. Actually, the ALS is attractive for its simplicity and can also provide satisfactory results for low-rank MF models.

First, to update

Q

, we can write the partial derivative of

Z (Q, T)

about

Q

as below:

\frac{\partial ℒ (Q | θ)}{\partial Q} = \frac{\partial t r (Q^{T} L_{G} Q)}{\partial Q} = L_{G} Q + L_{G}^{T} Q

(12)

\frac{\partial Z}{\partial Q} = - (X - Q^{T} T) T + λ_{𝓆} η Q + λ_{𝓆} (L_{G} Q + L_{G}^{T} Q)

(13)

Let

\frac{\partial Z}{\partial Q}

= 0, then we have Equation (14):

Q = {(T T^{T} + λ_{𝓆} η I + λ_{𝓆} (L_{G} + L_{G}^{T}))}^{- 1} X T

(14)

Next, to update

T

, the partial derivative of

Z (Q, T)

about

T

is given by:

\frac{\partial Z}{\partial T} = - (X - Q^{T} T) Q^{T} + λ_{𝓉} η T + λ_{𝓉} (T - \hat{T})

(15)

Let

\frac{\partial Z}{\partial T}

= 0, then we have the Equation (16):

T = {(Q Q^{T} + λ_{𝓉} η I + λ_{𝓉} I)}^{- 1} (X Q + λ_{𝓉} \hat{T})

(16)

Additionally, we update the weights and biases of the GRN regularizer by a back-propagation with batch gradient descent until it satisfies that the error value of

Z (Q, T)

falls below a set criterion. Then, we make rolling forecasting for future temporal feature vector

{\hat{𝓉}}_{j + 1}

in GRN regularizer by Equation (17)

{\hat{𝓉}}_{j + 1} = 𝓎_{θ} (𝓉_{j - ℓ_{1} + 1}, 𝓉_{j - ℓ_{2} + 1}, \dots, 𝓉_{j - ℓ_{δ} + 1})

(17)

In fact, the alternating scheme first updates the

Q

and

T

through ALS and then updates GRN parameters by back-propagation in each iteration. Subsequently, we expand the temporal dimension of

X

iteratively by computing the

Q {\hat{𝓉}}_{j + 1}

, and further update the parameters of MF model. Moreover, when faced with incomplete datasets, the product of Equation (2) can impute the missing data contained in

X

in parallel after completing the update of

Q

and

T

.

3.3.2. Compensated Residual Learning

Although the above-trained traffic forecasting model can generate preliminary results, they lack sufficient competitiveness with state-of-the-art models [2,7] in terms of in-depth learning of spatial-temporal properties (e.g., daily and weekly periods in Figure 4). As such, we design a residual feedback structure that efficiently adjusts the training parameters of MF model, and thus strengthens the data processing capability of the model. In particular, the residual modeling operation shares the training parameters with MF model and further optimizes them iteratively by the well-designed bi-directional residual structure in Figure 4. As per Equation (1), the universal framework integrating both MF output and the compensated residual learning mechanism as follows:

{𝓍_{t + T}^{*}, 𝓍_{t + T - 1}^{*}, \dots, 𝓍_{t + 2}^{*}, 𝓍_{t + 1}^{*}} = f_{\hat{θ}} ({𝓍_{t}, 𝓍_{t - 1}, \dots, 𝓍_{2}, 𝓍_{1}}, X_{r e s}; G)

(18)

where

f_{\hat{θ}} (\cdot)

represents the well-retrained MF model parameterized by

G

and residual matrix

X_{r e s}

;

\hat{θ}

denotes the modified training parameters;

X_{r e s}

denote the residuals between the predicted values

\hat{X} = {{\hat{x}}_{t}, {\hat{x}}_{t - 1}, \dots, {\hat{x}}_{2}, {\hat{x}}_{1}} = f_{θ} ({x_{t}, x_{t - 1}, \dots, x_{2}, x_{1}}; G)

and its ground truth values.

As interpreted in Figure 4 for the architecture design, our residual computational unit includes the feature matrix

X

and its direction matrix

O

, the prediction matrix

\hat{X}

and its direction matrix

P

, the residual matrix

X_{r e s}

and the bi-directional matrix

S

. Of these, for the matrices

O

and

P

, the elements of

ℴ_{i j}

and

𝓅_{i j}

both satisfy the categorical criteria as follows:

\begin{matrix} if the 𝓍_{i j} / {\hat{𝓍}}_{i j} > 0, the ℴ_{i j} / 𝓅_{i j} = 1 \\ if the 𝓍_{i j} ∕ {\hat{𝓍}}_{i j} \leq 0, the ℴ_{i j} ∕ 𝓅_{i j} = - 1 \end{matrix}

(19)

Next, we calculate the Hadamard product of the matrices

O

and

P

to generate the above

S

, whose bi-directional attributes assist the MF model in acquiring more information within residuals. With these prerequisites ready, the single computation procedure of residual learning is as below:

X - β (S ⊛ (Q^{T} T)) = X_{r e s}, β = \frac{1}{ω + 1}

(20)

where

ω

is the training number for both MF modeling and residual learning;

β

is the step coefficient, which varies with the number of iterations. At each iteration, of particular interest is that MF updates alternate with residual learning, which ensures that the residuals keep decreasing while the learning ability of the ST-CRMF model keeps enhancing. Additionally, in Figure 4, apart from the data processing (e.g., dataset division and missing data preparation), we perform the parameter analysis for multiple learnable variables to examine their effects on the performance of the ST-CRMF model.

3.4. Pseudo-Code of the ST-CRMF Model

The training procedure of the ST-CRMF model is summarized in Algorithm 1. Notably, we define time lag as the combination of different time intervals (e.g.,

φ = {1, 3}

in Figure 1) to account for a variety of periodical patterns (e.g., closeness in Figure 4). In particular, the selection of the number of iterations and ranks varies considerably and will be analyzed in detail in Section 4.3.

Algorithm 1: Training Procedure of ST-CRMF Model

Input: Graph network

G = (V, ℰ);

Feature matrix

X

; Rank

R

; Missing rates/scenarios; Maximum iteration

λ

.
Output: Learned ST-CRMF model; Factor matrices

Q, T

and forecasted/repaired

\hat{X} = Q^{T} T

; Future sequence.
1. Initialize all trainable parameters in ST-CRMF.
2. For

ω = 0

to

λ - 1

do
3. Compute and update of

𝓺_{i}^{(ω + 1)}

by the ALS solution in Equation (14):

𝓺_{i}^{(ω + 1)} = {(\sum_{(i, j) \in Φ} 𝓽_{j}^{(ω)} 𝓽_{j}^{(ω)}^{T} + λ_{𝓺} η I + λ_{𝓺} \sum_{i, k}^{M} A_{G} (i, k) I)}^{- 1} (\sum_{(i, j) \in Φ} 𝓍_{i j} 𝓽_{j}^{(ω)} + λ_{𝓺} \sum_{i, k}^{M} A_{G} (i, k) 𝓺_{k}^{(ω)})

4. Compute and update of

𝓽_{j}^{(ω + 1)}

by the ALS solution in Equation (16):

𝓽_{j}^{(ω + 1)} = {(\sum_{(i, j) \in Φ} 𝓺_{i}^{(ω + 1)} 𝓺_{i}^{(ω + 1)}^{T} + λ_{𝓽} η I + λ_{𝓽} I)}^{- 1} (\sum_{(i, j) \in Φ} 𝓍_{i j} 𝓺_{i}^{(ω + 1)} + λ_{𝓽} {\hat{𝓽}}_{j}^{(ω + 1)})

5. Update the training parameters

θ

in GRN regularizer by back-propagation with batch gradient descent.
6. Compute the

β^{(ω)}

and the residual matrix

X_{r e s}^{(ω + 1)}

by Equation (20), and have it replaced

X^{(ω)}

:

X^{(ω)} - β^{(ω)} (S^{(ω)} ⊛ {(Q^{T} T)}^{(ω)}) = X_{r e s}^{(ω + 1)}

7. Update the training parameters

\hat{θ}

according to their gradient and learning rate.
8. Until met model stop criteria.
9. Repair the possible missing values in

X

and then update it.
10. Rolling forecast of the future time series

{𝓍_{t + T}^{*}, 𝓍_{t + T - 1}^{*}, \dots, 𝓍_{t + 2}^{*}, 𝓍_{t + 1}^{*}}

by Equation (18).

4. Experiment Study

4.1. Datasets Description

We conducted a lot of experiments on two publicly available and independent traffic speed datasets, i.e., Seattle-Loop and METR-LA for the training of ST-CRMF and its comparison with baseline models. Both datasets are widely used for traffic forecasting. The former was collected in 2015 on four connected freeways (I-5, I-405, I-90 and SR-520) in the Greater Seattle Area; the latter was collected on the highways of Los Angeles County from 1 March 2012 to 30 June 2012. Table 1 summarizes the key statistics [6,10] of both datasets, and Figure 5 provides their area maps [7,27]. In this experiment, we chose only the first two months of traffic data for modeling, and the sub-datasets were split in chronological order with 70% serving as the training set and the remaining 30% as the testing set.

4.2. Experimental Settings

4.2.1. Baseline Models

To fully evaluate the predictive performance of ST-CRMF, we compared it with a series of baseline models under the same condition. Notice that we omitted the introduction of ARIMA, GRN, LSTM, and T-GCN models because of their simplicity and commonality.

(1): STGCN: Spatio-Temporal Graph Convolutional Network employs Chebyshev GCN and gated CNN for capturing the dynamics of spatial and temporal dependencies, respectively [2];
(2): AGCRN: Adaptive Graph Convolutional Recurrent Network captures the node-specific spatial and temporal correlations in traffic time series automatically without a pre-defined graph [3];
(3): Graph-WaveNet: Graph WaveNet integrates the diffusion graph convolutions with dilated casual convolution (called WaveNet) to capture the spatial-temporal dependencies simultaneously [4];
(4): PGCN: Progressive Graph Convolutional Network combines the gated activation unit and the dilated causal convolution to extract the temporal feature in traffic data [26];
(5): DGCRN: Dynamic Graph Convolutional Recurrent Network indicates that their dynamic graph can cooperate effectively with pre-defined graph while improving the prediction performance [33];
(6): GMAN: Graph Multi-Attention Network utilizes a variety of types of purely attention modules to learn complex spatial-temporal dependencies [34];
(7): GATs-GAN: The model incorporates Graph Attention Networks and Generative Adversarial Network to learn the node features and achieve the traffic state derivation [37];
(8): MRA-BGCN: Multi-Range Attentive Bicomponent Graph Convolutional Network uses the edge-wise graph construction, attention mechanism, and so on for traffic prediction [38].

4.2.2. Measures of Model Effectiveness

We adopted four evaluation metrics to evaluate the effectiveness of the ST-CRMF: mean absolute percentage error (MAPE), root mean square error (RMSE) and mean absolute error (MAE), and the square of determination coefficient (i.e.,

R^{2}

) [39,40]. The formulas for these indexes were defined as:

Μ A P Ε = \frac{1}{| Φ |} \sum_{(i, j) \in Φ} | \frac{𝓍_{i j} - {\hat{𝓍}}_{i j}}{𝓍_{i j}} | \times 100

(21)

RMSE = \sqrt{\frac{1}{| Φ |} \sum_{(i, j) \in Φ} {(𝓍_{i j} - {\hat{𝓍}}_{i j})}^{2}}

(22)

MAE = \frac{1}{| Φ |} \sum_{(i, j) \in Φ} | 𝓍_{i j} - {\hat{𝓍}}_{i j} |

(23)

R^{2} = 1 - \frac{\sum_{(i, j) \in Φ} {(𝓍_{i j} - {\hat{𝓍}}_{i j})}^{2}}{\sum_{(i, j) \in Φ} {(𝓍_{i j} - {\bar{𝓍}}_{i j})}^{2}}

(24)

where

| Φ |

is the size of the index set

Φ

;

𝓍_{i j}

and

{\hat{𝓍}}_{i j}

are the actual and predicted values, respectively;

{\bar{𝓍}}_{i j}

is the average of actual values. In general, a higher

R^{2}

or lower other metrics will be a better model.

4.2.3. Parameters Study

Recall that we aimed to learn the function

f_{\hat{θ}}

from historical data and then forecasting the next data. To better achieve such a desired goal, it was essential that we identified the best experimental parameters of the ST-CRMF through repeated trials. To be specific, the critical parameters of the ST-CRMF model include the size of GRN’s hidden cells, batch size, and the number of iterations. Considering both the performance and efficiency, the size of GRN’s hidden cells was set as rank

R

for both datasets and their batch size was set as 64. We trained ST-CRMF by Adam optimizer with an initial learning rate of 10⁻⁴ and utilized MAE as the loss function for GRN training. We used the Linear as the activation function of the fully-connected layer of GRN and apply a grid search with sliding window cross-validation to select the

λ_{𝓆}

,

λ_{𝓉}

and

η

parameters, and employed the early stopping to avoid over-fitting. Besides, we set up a range of combinations of random missing (RM), cluster missing (CM), and hybrid missing (HM) scenarios [5] and 10–90% missing rates (by 10% steps) for ST-CRMF to test its repair effect.

4.3. Effect of Key Parameters on the ST-CRMF Model

At first, the hidden layer of GRN exerted an influence on the prediction results of the ST-CRMF. For 5-min traffic forecasting, we undertook the parameters analysis by adding the hidden layers from 1 to 128. Figure 6 displays the parameter sensitivity of the ST-CRMF model with varying layer settings. We can observe that the ST-CRMF achieves better performance when the hidden layers of datasets (S) and (M) are set to 16 and 8, respectively.

In traffic forecasting tasks, the choice of key parameters can define the performance frontiers of the ST-CRMF model. Thus, we further performed the parameter study affecting model complexity for investigating the effect of crucial parameters (e.g., ranks, the number of iterations) on the ST-CRMF. Note that all parameter test experiments of the ST-CRMF model were based on 5-min traffic forecasting. In each experiment, we altered the targeted parameter and kept other parameters fixed and Figure 7 shows the variation of the evaluation metrics of the ST-CRMF with different rank settings. Overall, the effect of ranks on the ST-CRMF complied with expectations. As rank increased, the metrics curves of both datasets (S) and (M) declined (MAPE, RMSE, and MAE) and ascend (

R^{2}

) significantly, and finally gradually stabilized. Theoretically, the optimal prediction results necessarily derive from the largest rank, but an excessive rank implies a greater consumption of storage and computational resource, and the trend of curves starts to slow down when the ranks exceed 40 in Figure 7. Hence, we finally ran the ST-CRMF model at the rank of 40 for both datasets from the balance of practical and theoretical.

One of the merits of the proposed ST-CRMF is the ability to perform compensatory modeling by means of designed feedback residual structure. Therefore, we further investigated the effect of training epochs on the efficiency of the ST-CRMF. Similarly, we altered the number of iterations while fixing all of the other parameters. As depicted in Figure 8, we showed the evaluation metrics of the ST-CRMF model on the datasets (S) and (M). With the increase in the training numbers, our ST-CRMF model did not suffer from the under-fitting problem owing to insufficient information. Instead, the RMSE and

R^{2}

of the ST-CRMF model stabilized after about 50 iterations and the over-training may increase the complexity of ST-CRMF model with low return. Accordingly, all parameters were optimal values and should be identified by the dataset itself [41] and we determine 50 iterations for the datasets (S) and (M).

4.4. Empirical Results and Analysis

4.4.1. Comparison with Baselines for 5-/15-/30-min Forecasting

As in most previous studies on fairness considerations, we provide some advanced baselines for comparison with the ST-CRMF. We adopted the default settings of the original proposals on the datasets (S) and (M) and listed some of the key parameters of the baseline models. Specifically, for Reference [7], the learning rate, batch size, and the training epoch of T-GCN were set to 10⁻³, 32, and 5000, respectively. For reference [37], the learning rates of generator and discriminator of GATs-GAN were 10⁻³ and 10⁻⁵, respectively. For reference [34], the number of attention blocks and heads were 3 and 8, respectively, and the dimensionality of each attention head was 8. For references [4,26], the PGCN and Graph-WaveNet models used the Adam optimizer with an initial learning rate of 10⁻³ for training. For reference [2], the ST-GCN executed a grid search to select optimal parameters. For reference [3], the dimension of the node embedding was one of the crucial parameters in AGCRN and its optimal value was 10. For reference [33], the layers, the size of the hidden state, and batch size of DGCRN were set to 1, 64, and 64, respectively.

Up next, the following tables present the evaluation metrics of the ST-CRMF and baseline models. Specific, Table 2 provides the prediction results on the dataset (M) for the 5-/15-/30-min forecasting.

Overall, the errors of all models increase with the prediction horizon, but the ST-CRMF achieves great performance over almost all-time periods, which verifies both the accuracy and stability of our model. At each time interval, our ST-CRMF outperforms (1) temporal models (e.g., LSTM, GRN, and T-GCN), confirming the importance of modeling spatial correlations; (2) spatial-temporal models (e.g., GMAN, PGCN, and Graph-WaveNet), demonstrating the benefits of compensated residual modeling. Further, the latest GATs-GAN model exhibits good performance, and even exceeds ST-CRMF at the first step, but it needs to improve its stability in the next intervals. Besides, we can derive another insight from Table 2 that the performance of all models tends to deteriorate accelerated with the multiplication of the forecasting period, and this will become increasingly visible in long-term forecasting.

4.4.2. Comparison with Baselines for 15-/30-/60-min Forecasting

To further verify whether our model still maintains its superiority in long-term forecasting tasks, we investigated the results of ST-CRMF for 15-/30-/60-min forecasting on the METR-LA dataset. Table 3 presents the experimental metrics of all models. What stands out is that all metrics are almost consistent with the trend of the short-term forecasts in Table 2. One of the differences is that the trend of accelerated deterioration of the metrics in Table 3 becomes more apparent with time span, but it is in line with the law of traffic prediction. Besides, it is obvious that our ST-CRMF delivers competitive results under almost all evaluation metrics for all the forecasting periods. In particular, the ST-CRMF outperforms the classical forecasting models, such as ARIMA/STGCN by a large margin. Moreover, for the Graph-based models (e.g., DGCRN, MRA-BGCN), the ST-CRMF model performs much better thanks to its spatial-temporal regularizers and residual modeling. Hence, we can ascertain that our ST-CRMF enables complex traffic data prediction and satisfies both short-to-long terms forecasting tasks.

4.5. Ablation Study

Similar to [2,5,27], we conducted a comprehensive ablation study on the Seattle-Loop and METR-LA datasets to investigate the effectiveness of key components that contribute to the improved results of the ST-CRMF model. We named the ST-CRMF without different components as follows: (1) ST-CRMF w/o GL: ST-CRMF without the graph Laplacian spatial regularizer; (2) ST-CRMF w/o GRN: ST-CRMF without the GRN temporal regularizer; (3) ST-CRMF w/o RL: ST-CRMF without the residual learning; and the baseline for the ablation study was the ST-CRMF model. Figure 9 shows the metrics of the ablation experiments for 15- and 30-min traffic forecasting tasks. We can conclude that: (1) compared with ST-CRMF model, all of the simplified ST-CRMF models will cause performance degradation; (2) both spatial and temporal regularizations of MF can improve its forecasting accuracy, but the latter is more effective; (3) the effect of bi-directional residual modeling is most evident as it further extracts more spatial-temporal information from residuals. All in all, this finding confirms that each component of the ST-CRMF model is indispensable for traffic time series forecasting.

4.6. Model Robustness Analysis

As discussed in [5,19,36,42], there is often a data quality problem in real-world traffic time series. As such, we further investigated how our ST-CRMF performed under data loss, including its forecasting and recovery effects. We conducted several experiments on the combinations of RM, CM, and HM scenarios and 10–90% missing rates. But due to space constraints, we only present the experimental results of 5-min traffic prediction on the Seattle-Loop dataset. Note that we highlight the effect of data loss on the predicted results of the ST-CRMF model rather than on the comparison with repair models. Figure 10 shows the metrics of the ST-CRMF model for the repair and forecasting tasks in three cases.

In terms of prediction, we can observe that the evaluation metrics of the ST-CRMF increased under missing data relative to the non-missing case. More precisely, as the missing data rate increased, it satisfied the change that the forecasting accuracy of our ST-CRMF model remained at a high level but gradually followed an accelerated declining tendency. Specifically, for the extreme missing data rates exceeding 80%, the performance of the ST-CRMF model deteriorated so sharply that the forecasting tasks almost failed. Meanwhile, the different scenarios have different influences on the ST-CRMF. For the three scenarios, the MAPE, RMSE, and MAE of the ST-CRMF model varied most markedly under the CM mode due to its heaviest data structure corruption. Instead, the RM mode did impair the performance of the ST-CRMF model the least, which was in line with expected. Consequently, we should avoid the continuous loss of a large portion of time series during the acquisition phase. Besides, it can be found from Figure 10 that the repair results of ST-CRMF model followed its forecasting pattern when facing varied combinations of missing data rates and scenarios. Therefore, we can conclude that our ST-CRMF can deliver excellent prediction and repair results (except for extreme cases) under complex missing conditions.

4.7. Prediction Visualization

We carried out a visualization survey for the ST-CRMF model to further validate its effectiveness. Figure 11 shows the heat maps of the ground and predicted values of ST-CRMF for 15-min forecasting. Of these, for datasets (S) and (M), which all exhibit the obvious periodic variations in Figure 11, our ST-CRMF model captured the periodicity better. Meanwhile, these residuals were very small overall, which also indirectly illustrates the superiority of the ST-CRMF model. In conclusion, both the above theoretical and visualization results indicate that our ST-CRMF model can effectively and accurately complete the tasks of forecasting and recovery of traffic time series.

5. Conclusions

The challenges of accurately predicting urban traffic time series arise from the extraction of non-linear spatial-temporal correlations. In this study, we proposed a novel matrix factorization framework (i.e., ST-CRMF) that integrated the spatial-temporal regularizers and compensatory residual learning to forecast/impute future/historical traffic time series, respectively. Overall, MF formed the core of our framework, which provided a research paradigm for subsequent complex spatial-temporal modeling. Here, we selected GRN as the temporal regularizer of MF to capture temporal correlations in time series; and graph Laplacian served as the spatial regularizer to exploit local spatial dependence information of adjacency sensors to strengthen the factorization process of MF. Further, we designed a bi-directional residual structure for compensatory modeling of the regularized MF by way of feedback optimization, which was in fact of great benefit to reinforcing the predictive performance of our ST-CRMF model.

Up next, the ST-CRMF shared and updated all the training parameters during rolling forecasting until it achieved the predetermined number of iterations. For each experiment, we performed the model inference on the ST-CRMF by an ALS method to estimate the spatial-temporal feature matrices of MF. We tested the effect of key parameters on the ST-CRMF through extensive experiments to identify their optimal values. Where for datasets (S) and (M), the hidden layers of GRN were set to 16 and 8 separately; the matrix ranks and the number of iterations were 40 and 50 respectively from the balance of practical and theoretical. Lastly, we evaluated ST-CRMF on two publicly available traffic speed datasets, Seattle-Loop and METR-LA. Experimental results indicated that our ST-CRMF outperformed other state-of-the-art baseline models (e.g., PGCN, GATs-GAN and DGCRN) under almost all evaluation metrics in the short- to long-terms (5-/15-/30-/60-min) traffic forecasting tasks, and the ablation experiments further confirmed the validity of key components of the ST-CRMF. Besides, our ST-CRMF can make predictions using the incomplete dataset while accomplishing data recovery. Except for extreme cases, our ST-CRMF model enabled highly precise imputation and prediction when facing the combinations of RM, CM, and HM scenarios and 10–90% missing rates.

In future studies, we will concentrate on improving the predictive efficiency through optimizing model parameters and increasing the forecasting accuracy by considering other factors (e.g., weather). Furthermore, we plan to test the proposed ST-CRMF model on more spatiotemporal traffic datasets.

Author Contributions

Conceptualization, J.L., X.L., P.W.; methodology, J.L., X.L., R.L., P.W.; software, J.L., Y.P.; resources, Z.H., L.X.; writing—original draft preparation, J.L.; writing—review and editing, J.L., X.L., R.L.; visualization, J.L., R.L., Y.P.; supervision, X.L., L.X.; project administration, Z.H., L.X.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (No. 11702099), and the Science and Technology Project in Guangzhou (No. 202102021053).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank all members of the research group for their technical support during the research activities. Additionally, the authors thank the anonymous referees for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wu, Z.H.; Pan, S.R.; Long, G.D.; Jiang, J.; Chang, X.J.; Zhang, C.G. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, California, CA, USA, 6–10 July 2020; pp. 753–763. [Google Scholar]
Yu, B.; Yin, H.T.; Zhu, Z.X. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar] [CrossRef]
Bai, L.; Yao, L.; Li, C.; Wang, X.Z.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17804–17815. [Google Scholar]
Wu, Z.H.; Pan, S.R.; Long, G.D.; Jiang, J.; Zhang, C.Q. Graph wavenet for deep spatial-temporal graph modeling. arXiv 2019, arXiv:1906.00121. [Google Scholar] [CrossRef]
Li, J.L.; Xu, L.H.; Li, R.N.; Wu, P.; Huang, Z.L. Deep spatial-temporal bi-directional residual optimisation based on tensor decomposition for traffic data imputation on urban road network. Appl. Intell. 2022, 52, 11363–11381. [Google Scholar] [CrossRef]
Yin, X.Y.; Wu, G.Z.; Wei, J.Z.; Shen, Y.M.; Qi, H.; Yin, B.C. Deep learning on traffic prediction: Methods, analysis and future directions. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4927–4943. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.J.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef] [Green Version]
Alsolami, B.; Mehmood, R.; Albeshri, A. Hybrid statistical and machine learning methods for road traffic prediction: A review and tutorial. In Smart Infrastructure and Applications; Springer: Cham, Switzerland, 2020; pp. 115–133. [Google Scholar]
Ye, J.X.; Zhao, J.J.; Ye, K.J.; Xu, C.Z. How to build a graph-based deep learning architecture in traffic domain: A survey. IEEE Trans. Intell. Transp. Syst. 2020, 23, 3904–3924. [Google Scholar] [CrossRef]
Cui, Z.Y.; Ke, R.M.; Pu, Z.Y.; Wang, Y.H. Stacked bidirectional and unidirectional LSTM recurrent neural network for forecasting network-wide traffic state with missing values. Transp. Res. Part C Emerg. Technol. 2020, 118, 102674. [Google Scholar] [CrossRef]
Gao, C.X.; Zhang, N.; Li, Y.R.; Bian, F.; Wan, H.Y. Self-attention-based time-variant neural networks for multi-step time series forecasting. Neural Comput. Appl. 2022, 34, 8737–8754. [Google Scholar] [CrossRef]
Wang, H.W.; Peng, Z.R.; Wang, D.S.; Meng, Y.; Wu, T.L.; Sun, W.L.; Lu, Q.C. Evaluation and prediction of transportation resilience under extreme weather events: A diffusion graph convolutional approach. Transp. Res. Part C Emerg. Technol. 2020, 115, 102619. [Google Scholar] [CrossRef]
Zhang, S.; Guo, Y.; Zhao, P.; Zheng, C.; Chen, X. A Graph-Based Temporal Attention Framework for Multi-Sensor Traffic Flow Forecasting. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7743–7758. [Google Scholar] [CrossRef]
Pan, Z.Y.; Liang, Y.X.; Wang, W.F.; Yu, Y.; Zheng, Y.; Zhang, J.B. Urban traffic prediction from spatio-temporal data using deep meta learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1720–1730. [Google Scholar]
Zhang, J.B.; Zheng, Y.; Qi, D.K. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 1655–1661. [Google Scholar]
Chen, F.L.; Chen, Z.Q.; Biswas, S.; Lei, S.; Ramakrishnan, N.; Lu, C.T. Graph convolutional networks with kalman filtering for traffic prediction. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2020; pp. 135–138. [Google Scholar]
Chen, X.Y.; Sun, L.J. Bayesian temporal factorization for multidimensional time series prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2021. [Google Scholar] [CrossRef]
Rao, N.; Yu, H.F.; Ravikumar, P.K.; Dhillon, I.S. Collaborative filtering with graph information: Consistency and scalable methods. Adv. Neural Inf. Process. 2015, 28, 2099–2107. [Google Scholar]
Chen, X.Y.; He, Z.C.; Sun, L.J. A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transp. Res. Part C Emerg. Technol. 2019, 98, 73–84. [Google Scholar] [CrossRef]
Laña, I.; Olabarrieta, I.I.; Vélez, M.; Del Ser, J. On the imputation of missing data for road traffic forecasting: New insights and novel techniques. Transp. Res. Part C Emerg. Technol. 2018, 90, 18–33. [Google Scholar] [CrossRef]
Ahmed, M.S.; Cook, A.R. Analysis of Freeway Traffic Time-Series Data by Using Box-Jenkins Techniques; Transportation Research Board: Washington, DC, USA, 1979; pp. 1–9. [Google Scholar]
Williams, B.M.; Hoel, L.A. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. J. Transp. Eng. 2003, 129, 664–672. [Google Scholar] [CrossRef] [Green Version]
Yu, H.F.; Rao, N.; Dhillon, I.S. Temporal regularized matrix factorization for high-dimensional time series prediction. Adv. Neural Inf. Process. Syst. 2016, 29, 847–855. [Google Scholar]
Zhang, L.Z.; Alharbe, N.R.; Luo, G.C.; Yao, Z.Y.; Li, Y. A hybrid forecasting framework based on support vector regression with a modified genetic algorithm and a random forest for traffic flow prediction. Tsinghua Sci. Technol. 2018, 23, 479–492. [Google Scholar] [CrossRef]
Li, C.; Xu, P. Application on traffic flow prediction of machine learning in intelligent transportation. Neural Comput. Appl. 2021, 33, 613–624. [Google Scholar] [CrossRef]
Shin, Y.; Yoon, Y.J. PGCN: Progressive Graph Convolutional Networks for Spatial-Temporal Traffic Forecasting. arXiv 2022, arXiv:2202.08982. [Google Scholar] [CrossRef]
Kong, X.Y.; Zhang, J.; Wei, X.; Xing, W.W.; Lu, W. Adaptive spatial-temporal graph attention networks for traffic flow forecasting. Appl. Intell. 2022, 52, 4300–4316. [Google Scholar] [CrossRef]
Zhang, K.L.; Xie, C.Y.; Wang, Y.J.; Ángel, S.M.; Nguyen, T.M.T.; Zhao, Q.D.; Li, Q. Hybrid short-term traffic forecasting architecture and mechanisms for reservation-based Cooperative ITS. J. Syst. Archit. 2021, 117, 102101. [Google Scholar] [CrossRef]
Lv, Y.S.; Duan, Y.J.; Kang, W.W.; Li, Z.X.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst. 2014, 16, 865–873. [Google Scholar] [CrossRef]
Ma, X.L.; Zhong, H.Y.; Li, Y.; Ma, J.Y.; Cui, Z.Y.; Wang, Y.H. Forecasting transportation network speed using deep capsule networks with nested LSTM models. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4813–4824. [Google Scholar] [CrossRef] [Green Version]
Gu, Y.L.; Lu, W.Q.; Qin, L.Q.; Li, M.; Shao, Z.Z. Short-term prediction of lane-level traffic speeds: A fusion deep learning model. Transp. Res. Part C Emerg. Technol. 2019, 106, 1–16. [Google Scholar] [CrossRef]
Fang, S.; Pan, X.B.; Xiang, S.M.; Pan, C.H. Meta-msnet: Meta-learning based multi-source data fusion for traffic flow prediction. IEEE Signal Process. Lett. 2021, 28, 6–10. [Google Scholar] [CrossRef]
Li, F.X.; Feng, J.; Yan, H.; Jin, G.Y.; Yang, F.; Sun, F.N.; Jin, D.P.; Li, Y. Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution. ACM Trans. Knowl. Discov. Data 2021, 1156–4681. [Google Scholar] [CrossRef]
Zheng, C.P.; Fan, X.L.; Wang, C.; Qi, J.Z. Gman: A graph multi-attention network for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 1234–1241. [Google Scholar]
Ren, P.; Chen, X.Y.; Sun, L.J.; Sun, H. Incremental Bayesian matrix/tensor learning for structural monitoring data imputation and response forecasting. Mech. Syst. Signal Process. 2021, 158, 107734. [Google Scholar] [CrossRef]
Chen, X.Y.; Lei, M.Y.; Saunier, N.; Sun, L.J. Low-rank autoregressive tensor completion for spatiotemporal traffic data imputation. IEEE Trans. Intell. Transp. Syst. 2021, 1–10. [Google Scholar] [CrossRef]
Xu, D.W.; Lin, Z.Q.; Zhou, L.; Li, H.J.; Niu, B. A GATs-GAN framework for road traffic states forecasting. Transp. B Transp. Dyn. 2022, 10, 718–730. [Google Scholar] [CrossRef]
Chen, W.Q.; Chen, L.; Xie, Y.; Cao, W.; Gao, Y.S.; Feng, X.J. Multi-range attentive bicomponent graph convolutional network for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3529–3536. [Google Scholar]
Li, J.L.; Sun, L.J.; Li, Y.S.; Lu, Y.C.; Pan, X.Y.; Zhang, X.L.; Song, Z.W. Rapid prediction of acid detergent fiber content in corn stover based on NIR-spectroscopy technology. Optik 2019, 180, 34–45. [Google Scholar] [CrossRef]
Li, J.L.; Sun, L.J.; Li, R.N. Nondestructive detection of frying times for soybean oil by NIR-spectroscopy technology with Adaboost-SVM (RBF). Optik 2020, 206, 164248. [Google Scholar] [CrossRef]
Huang, Z.L.; Xu, L.H.; Lin, Y.J. Multi-stage pedestrian positioning using filtered WiFi scanner data in an urban road environment. Sensors 2020, 20, 3259. [Google Scholar] [CrossRef] [PubMed]
Wu, P.; Huang, Z.L.; Pian, Y.Z.; Xu, L.H.; Li, J.L.; Chen, K.X. A combined deep learning method with attention-based LSTM model for short-term traffic speed forecasting. J. Adv. Transp. 2020, 2020, 8863724. [Google Scholar] [CrossRef]

Figure 1. Illustration of multivariate time series forecasting (each

𝔁_{t}

is a frame of traffic data at time step

t

).

Figure 1. Illustration of multivariate time series forecasting (each

𝔁_{t}

is a frame of traffic data at time step

t

).

Figure 2. Schematic of the spatial-temporal matrix factorization (white boxes denote missing data).

Figure 3. The structure of a GRN that consists of multiple GRU units.

Figure 4. Architecture description and flow of the ST-CRMF model.

Figure 5. The area maps of the Seattle-Loop and METR-LA traffic speed datasets.

Figure 6. Impact of the hidden layers of GRN on the ST-CRMF model.

Figure 7. Impact of ranks on the evaluation metrics of the ST-CRMF model.

Figure 8. The number of iterations of the ST-CRMF model on the Seattle-Loop and METR-LA datasets.

Figure 9. Ablation experiments of the ST-CRMF model on the Seattle-Loop and METR-LA datasets.

Figure 10. Imputation and prediction results of ST-CRMF model on Seattle-Loop dataset for RM, CM, and HM scenarios and 10–90% missing rates. Blue and green denote the predictive indicators of the non-missing and missing cases, respectively; red denotes the recovery metrics after missing.

Figure 11. Visualization results of the predicted maps by the ST-CRMF model for 15-min forecasting interval.

Table 1. Summary statistics of the Seattle-Loop and METR-LA traffic speed datasets.

	Seattle-Loop (S)	METR-LA (M)
Information	Seattle-Loop (S)	METR-LA (M)
Location	The Greater Seattle Area	The Los Angeles County
No. of sensors	323	207
Time scope	$1 January 2015 ~$ 31 December 2015	$1 May 2012 ~$ 30 June 2012
Time granularity	5 min	5 min
Period step	288 (60/5 × 24)	288 (60/5 × 24)
Timestamps	105120	34272
Sources (https://github.com)	/zhiyongc/Seattle-Loop-Data (accessed on 2 July 2018)	/liyaguang/DCRNN (accessed on 2 October 2018)

Table 2. Prediction results of the ST-CRMF and baseline models on the Seattle-Loop dataset.

	Results	Dataset (S)
	Results	5-min			15-min			30-min
Models		MAPE	RMSE	MAE	MAPE	RMSE	MAE	MAPE	RMSE	MAE
GRN [12]		8.74	4.92	3.24	9.67	5.39	3.48	10.23	5.76	3.63
LSTM [10]		8.17	4.70	3.09	8.88	5.15	3.28	9.80	5.67	3.55
T-GCN [7]		6.74	4.65	3.02	8.52	5.12	3.18	10.80	6.06	3.74
GMAN [34]		/	/	/	8.15	4.86	2.97	9.97	5.71	3.34
PGCN [26]		/	/	/	7.56	4.80	2.85	9.46	5.80	3.28
GATs-GAN [37]		6.38	3.85	2.65	7.63	4.56	2.97	8.89	5.19	3.46
Graph-WaveNet [4]		/	/	/	8.35	5.11	3.10	10.83	6.37	3.68
ST-CRMF (Ours)		6.81	4.02	2.59	7.39	4.45	2.80	8.63	5.04	3.25

Best results are highlighted in bold fonts and “/” stands for not available.

Table 3. Prediction results of the ST-CRMF and baseline models on the METR-LA dataset.

	Results	Dataset (M)
	Results	15-min			30-min			60-min
Models		MAPE	RMSE	MAE	MAPE	RMSE	MAE	MAPE	RMSE	MAE
ARIMA [21]		9.60	8.21	3.99	12.70	10.45	5.15	17.40	13.23	6.90
STGCN [2]		7.62	5.74	2.88	9.57	7.24	3.47	12.70	9.40	4.59
AGCRN [3]		7.70	5.58	2.87	9.00	6.58	3.23	10.38	7.51	3.62
GMAN [34]		7.41	5.55	2.80	8.73	6.49	3.12	10.07	7.35	3.44
DGCRN [33]		6.63	5.01	2.62	8.02	6.05	2.99	9.73	7.19	3.44
MRA-BGCN [38]		6.80	5.12	2.67	8.30	6.17	3.06	10.00	7.30	3.49
Graph-WaveNet [4]		6.90	5.15	2.69	8.37	6.22	3.07	10.01	7.37	3.53
ST-CRMF (Ours)		6.43	5.05	2.52	7.87	5.94	2.85	9.65	7.10	3.38

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Wu, P.; Li, R.; Pian, Y.; Huang, Z.; Xu, L.; Li, X. ST-CRMF: Compensated Residual Matrix Factorization with Spatial-Temporal Regularization for Graph-Based Time Series Forecasting. Sensors 2022, 22, 5877. https://doi.org/10.3390/s22155877

AMA Style

Li J, Wu P, Li R, Pian Y, Huang Z, Xu L, Li X. ST-CRMF: Compensated Residual Matrix Factorization with Spatial-Temporal Regularization for Graph-Based Time Series Forecasting. Sensors. 2022; 22(15):5877. https://doi.org/10.3390/s22155877

Chicago/Turabian Style

Li, Jinlong, Pan Wu, Ruonan Li, Yuzhuang Pian, Zilin Huang, Lunhui Xu, and Xiaochen Li. 2022. "ST-CRMF: Compensated Residual Matrix Factorization with Spatial-Temporal Regularization for Graph-Based Time Series Forecasting" Sensors 22, no. 15: 5877. https://doi.org/10.3390/s22155877

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ST-CRMF: Compensated Residual Matrix Factorization with Spatial-Temporal Regularization for Graph-Based Time Series Forecasting

Abstract

1. Introduction

2. Related Work

2.1. Traditional Multivariate Time Series Forecasting

2.2. Deep Learning for Traffic Time Series Forecasting

2.3. Spatial-Temporal Modeling Using Incomplete Dataset

3. Methodology

3.1. Preliminaries

3.2. Spatial-Temporal Matrix Factorization

3.2.1. Matrix Factorization Description

3.2.2. Regularized Spatial-Temporal Matrices Modeling

3.3. Overall Architecture of the ST-CRMF Model

3.3.1. Model Implementation

3.3.2. Compensated Residual Learning

3.4. Pseudo-Code of the ST-CRMF Model

4. Experiment Study

4.1. Datasets Description

4.2. Experimental Settings

4.2.1. Baseline Models

4.2.2. Measures of Model Effectiveness

4.2.3. Parameters Study

4.3. Effect of Key Parameters on the ST-CRMF Model

4.4. Empirical Results and Analysis

4.4.1. Comparison with Baselines for 5-/15-/30-min Forecasting

4.4.2. Comparison with Baselines for 15-/30-/60-min Forecasting

4.5. Ablation Study

4.6. Model Robustness Analysis

4.7. Prediction Visualization

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI