Observing and tracking bandlimited graph processes from sampled measurements

A critical challenge in graph signal processing is the sampling of bandlimited graph signals; signals that are sparse in a well-deﬁned graph Fourier domain. Current works focused on sampling time-invariant graph signals and ignored their temporal evolution. However, time can bring new insights on sampling since sensor, biological, and ﬁnancial network signals are correlated in both domains. Hence, in this work, we develop a sampling theory for time varying graph signals, named graph processes , to observe and track a process described by a linear state-space model. We provide a mathematical analysis to highlight the role of the graph, process bandwidth, and sample locations. We also propose sampling strategies that exploit the coupling between the topology and the corresponding process. Numerical experiments corroborate our theory and show the proposed methods trade well the number of samples with accuracy.


Introduction
Graph signal processing (GSP) provides new tools to analyze signals on networks such as measurements in sensor networks, fMRI recordings in brain networks, and preferences in opinion networks [1]. By introducing the notion of graph frequency, GSP allows for a harmonic decomposition of these signals [1,2]. Recent contributions have therefore developed concepts like graph signal filtering [2,3,4,5,6] and sampling [7,8,9,10,11] to support this analysis. Sampling theory for graph signals assumes the latter are often sparse (or bandlimited) in the graph frequency domain. This assumption is satisfied in temperature measurements in sensor networks [8], fMRI data in brain networks [12], and user ratings in recommendation networks [13], to name a few. The works in [14,15,8,10,11] exploit signal bandlimitedness to propose sampling and reconstruction strategies for graph signals.
While seminal, the above works ignore the signal temporal evolution and focus on a single snapshot. In our opinion, this is limited because time varying graph signals, referred to as graph processes, are often encountered in practice, e.g., in consecutive sensor measurements, biological signal evolution prone to stimuli, and information diffusion over networks. We wish, therefore, to propose a graph and time sampling framework to observe and track a bandlimited graph process-a time varying graph signal that has contribution only on a fixed subset of graph frequencies. Before detailing our contributions, we highlight the differences with earlier works that dealt with graph processes.

Related works
The importance to approach graph processes with GSP was considered in [16], which formalized a graph-time harmonic analysis. The works in [17,18] discuss the two dimensional graph-time filters, while [19,20] used these filters to predict the process evolution. The work in [21] considered sampling strategies for adaptive reconstruction of a bandlimited graph signal from a stream of noisy samples. We build on the latter and propose a more involved sampling theory to observe and track a graph process that follows a linear model.
Observability of network processes was considered in [22,23] for sensor placement and in [24] for designing observable topologies. However, these works do not discuss sampling. We here do so, where by exploiting the bandlimited prior, we derive theoretical guarantees for when a graph process is observable from a few samples. We also propose sampling schemes to collect the measurements over the graph and the temporal dimension.
Other works that exploited the bandlimited prior for graph processes are [21,25,26,27]. These works propose adaptive algorithms to track slowly time varying graph signals from a few nodes. But since their main goal is the adaptive signal reconstruction, tracking yields as a byproduct without theoretical guarantees. We instead leverage Kalman filtering and propose sampling strategies to optimally track the process. Our motivation to use the Kalman filter comes from its proven success in applications like network dynamics data fusion, and target tracking -see e.g., [28] and references therein. Kalman filtering for tracking graph processes has also been used in [29]. This work formulated tracking from a few measurements as a regularized interpolation problem on a larger graph; the Kalman filer is used here to alleviate the computational burden. We identify three main differences with [29]: i) we derive conditions on the minimum number of samples required to track the process; ii) we propose sampling schemes optimized to improve tracking and not uniformly at random sampling; iii) we use the Kalman filter to match the process model and avoid cross-validating the regularizer.
Kalman filtering has been used to track network dynamics in [30,31]. The graph topology embeds the communication links between sensors and is disconnected from the process over it. To the best of our knowledge, this work represents the first attempt to conciliate tracking with graph sampling theory.

Contribution
We group the paper contributions into two parts. i) Observability of graph processes (Section 4): We develop a sampling theory to observe graph processes. We derive necessary and sufficient conditions to observe the process from a few samples. We propose two approaches for observability: i-a) observability with deterministic sampling, i.e., when the nodes are sampled deterministically (Section 4.1); i-b) observability with random sampling, i.e., when the nodes are sampled with a given probability (Section 4.2). We also perform a mean-square error (MSE) analysis of the estimated state to connect the graph topology with the process bandwidth and sampling set. Finally, we develop sparse sensing techniques to pick the minimum number of samples that guarantee a target MSE performance.
ii) Tracking of graph processes (Section V): We propose Kalman filtering to track bandlimited graph processes that follow a linear dynamic model. First, we consider time varying models (Section 5.1) and then we specialize our derivations to time-invariant ones (Section 5.2). We derive necessary conditions on the minimum number of nodes required to track the process and conciliate these conditions with those for observability. The MSE tracking performance, given by the posterior (or steady-state) error covariance matrix, highlights the role played by the graph topology, process bandwidth, and sampling set. A sparse sensing approach is developed to collect samples with guaranteed tracking MSE.
These theoretical findings put focus for the first time on the sampling of time varying graph signals and are corroborated with synthetic experiments as well as employed in the ETEX data set [32].
Notation. We denote scalars, column vectors, and matrices with plain letters a (A), bold lowercase letters a, and bold uppercase letters A, respectively. A ij is the (i, j)th element of A, I N is the N × N identity matrix, and 1 N (0 N ) is the N ×1 vector of all ones (zeros). diag(·) denotes the diagonal operator, i.e., A = diag(a) is a diagonal matrix with a on the main diagonal and a = diag(A) stores the diagonal of A in a. Similarly, A = blkdiag(A 1 , . . . , A N ) is a block diagonal matrix containing A i as the ith diagonal block. The pseudoinverse of A is denoted as A † , its trace as Tr(A), and the matrix spectral norm as A . For two sets R and S, |R| denotes the cardinality of R, R ⊂ S the subset operation, R ∪ S the union of the sets, and S c = R\S the complementary set of S w.r.t. R. Vector 1 R is the set indicator vector, whose rth entry is equal to one if r ∈ R and zero otherwise. · indicates the ceiling operator.
The remaining parts of this paper are organized as follows. Section 2 covers the background information, while Section 3 the problem formulation. Section 6 contains the numerical experiments and Section 7 concludes the paper. The proofs are collected in the appendix.

Background
Basics of GSP. Consider an undirected graph G = (V, E) with node set V = {1, . . . , N } and edge set E. The nodes' connectivity is represented by the weighted adjacency matrix W , where W nm > 0 is the edge weight connecting nodes n and m and W nm = 0 means the nodes are disconnected. The graph Both W and L are candidates for the graph shift operator matrix S, an N × N matrix that allows for a frequency analysis of graph signals [1,2,33]. Due to its symmetry, S can be eigendecomposed as S = U ΛU T with eigenvector matrix U and diagonal eigenvalue matrix Λ. This eigendecomposition carries the notion of frequency in the graph setting. The projection of x onto the eigenspace of S, i.e.,x = U T x, is named the graph Fourier transform (GFT) and its inverse is x = Ux. The main diagonal of Λ contains the spectral support of x, named the graph frequencies.
Signal reconstruction. Reconstructing a graph signal x from its sampled version relates to the joint localization properties of x in the vertex and graph frequency domain [11]. Consider a subset of vertices S ⊆ V and the respective set projection matrix C S = diag(1 S ). The graph signal x is localized over the set S if C S x = x. Similarly, consider a subset of graph frequency indices F ⊆ {1, . . . , N } and define U F ∈ R N ×|F | as the matrix containing the columns of U relative to set F. The matrix C F = U F U T F is a bandlimiting operator and the graph signal x is localized over F, or F−bandlimited, if C F x = x. The latter coupled with the localization over S implies C S C F x = x, which, in turn, means x is an eigenvector of C S C F (or equivalently of C S U F ) associated with the unit eigenvalue. The unit eigenvalue is by construction the largest admissible eigenvalue of C S C F and x is the respective eigenvector if C S U F = 1. Then, an F−bandlimited graph signal can be recovered from the samples in S iff where C S c = I N − C S is the projection matrix onto the complementary vertex set S c = V\S [11]. I.e., an F−bandlimited graph signal can be recovered from samples in S if it is not localized over the complementary vertex set. Indeed, if C S c U F = 1 were true, then there exists a nontrivial F−bandlimited graph signal localized onto the complementary set S c . This signal cannot be recovered from the samples in S which are null due to the perfect localization over the complementary sampling set; therefore, making condition (1) necessary and sufficient for perfect recovery-refer to [11] for further detail.

Problem formulation
A graph process x t is a time varying graph signal in which entry x tn is a time series evolving on node n. In the sequel, we first formulate the graph process as a linear system on a graph. Then, we define a bandlimited graph process and analyze systems on graphs in the graph Fourier domain.

Systems on graphs
Consider the N −state discrete linear time varying system where x t is the N × 1 state representing the graph signal at time t, u t is the N × 1 input signal, and A t and B t are the N × N time varying statetransition and input matrices, respectively. The measurement is y t ∈ R N and C St =diag(c t1 , . . . , c tN ) denotes the sampling matrix with entry c tn = 1 if node n is sampled at time t. The set of nodes sampled at time t is S t = {(n, t) | n ∈ {1, . . . , N }, c tn = 1}. Vector v t denotes a zero-mean Gaussian noise with covariance matrix Σ v = σ 2 v I N . Throughout this work, we shall consider given the graph and the system matrices A t and B t . Readers interested in how to estimate the graph and the system matrices in (2) from data are directed to [34,35,36] and references therein. Model (2) comprises the following processes 2 .
Signal diffusion. For x 0 being the initial graph signal state, its instantaneous diffused state at time t is given by the exponential matrix product where w > 0 is the diffusion rate and A = e −wL is the time-invariant statetransition matrix [37]. This recursion is also known as the heat diffusion model. If the graph is the mesh of a surface, the initial state x 0 may be a heat source starting at some node and x t the heat state at time t. We can also incorporate an input signal u t−1 in (3) to represent additional heat sources available only at time t−1 > 0. Besides modeling heat diffusion on meshes, model (3) is also used to analyze chemical substance dispersion, opinion and influence propagation [38,39], brain signal connections [40], and protein interactions [41].
Wave propagation. For an initial state z 0 and speed c, the discretised wave equation on graphs follows the two-step recursion This recursion can be formulated in form (2a) by defining the state vector and z −1 = 0 N [42]. The latter is of interest in seismic data [16]. ARMA graph processes. We denote an ARMA graph process as where f (S) and g(S) are matrix functions that share the eigenvectors with the shift operator such as polynomials in S [19,43,20]. A particular form of (5) is the first-order recursion which has the steady-state if the parameter w satisfies 0 < w < 1/λ max (S). Expression (7) solves the aggregate diffusion model used in image smoothing [44], Tikhonov denoising [1], and rating prediction [45].

Bandlimited systems on graphs
To proceed with the graph Fourier analysis of (2), we define the following.
has non-zero frequency content only on the subset of graph frequency indices F.
The set F = {n ∈ {1, . . . , N }|∃t for whichx tn = 0} is common for all x t and t ≥ 0. In other words, F is the union of all instantaneous sets F t = {n ∈ {1, . . . , N }|x tn = 0}. We can then write the graph signal at time t as x t = U Fxt where the vectorx t ∈ R |F | contains the entries ofx t indexed by F.
We pose two assumptions to obtain an F−bandlimited graph process.
Assumption 1. The system evolution matrices A t and B t are diagonalizable by the eigenvectors of the graph shift operator S.
Assumption 2. The input u t is an F−bandlimited graph process.
Assumption 1 considers linear time varying systems on time-invariant graphs This assumption focuses to graph processes whose system evolution matrices share the eigenvectors with the shift operator S, which is the case for the processes in (3)-(7). Assumption 2 requires the input signal to have a sparse GFT over time. That is, u t should be a smooth graph signal, or have properties similar to the signals studied in [12,46,47,14,15,8,10,11,21,25,26,27,19,20,48,49]. While in practice it is difficult to find a perfect bandlimited graph process, this assumption is made in GSP to derive theoretical results. But it is possible to find in practice approximately bandlimited graph processes; thus, if we treat them as bandlimited, the theoretical findings are close to the empirical ones. In our numerical results, we will never impose exact bandlimitedness.
With the above assumptions, we want to link the graph process, its GFT, and the sampling strategy. If one of the two assumptions is violated, similar sampling strategies can be derived by ignoring the limited frequency support and consider the process as N −bandlimited. Since the latter is a particular case of our framework and since it has a looser connection with the graph spectrum, we shall not discuss it further. We will nevertheless evaluate the full bandwidth case in Section 6.
We can now claim the following. Proposition 1 implies that we can consider the in-band process evolutionx t to have an equivalent representation for x t . By denoting the GFT coefficients of u t−1 indexed by F withũ t−1 ∈ R |F | , we can write the evolution ofx t as are diagonal matrices that contain the in-band spectrum of A t and B t , respectively. The F−bandlimited graph processes considered in this work follow evolutions (2a) and (8a) in the vertex and spectral domain, respectively. We shall refer to systems in the form (2) that can be written in the form (8) as F−bandlimited systems on graphs.
Problem statement. Given an F−bandlimited graph process x t following model (2) in the vertex domain and model (8) in the spectral domain; the goal is to find conditions on and design the sampling matrices C St for t ≥ 0 to observe and track the graph process.

Observing graph processes
In this section, we devise a sampling theory for observing the graph process. We generalize condition (1) from reconstructing an F−bandlimited graph signal to observing an F−bandlimited graph process. We start by adapting the definition of observability [50] to our setting.

Definition 2.
An F−bandlimited system on a graph is observable over the set if for any F−bandlimited initial state x 0 and some final time T , the initial state x 0 can be uniquely determined in the absence of noise by the knowledge of the input u t and measurement y t collected over the sets S t for all t ∈ {0, . . . , T }.
Definition 2 follows the same baseline idea of conventional observability, i.e., describe when the initial state of a system x 0 is uniquely determined in the absence of noise. Since we now deal with F-bandlimited systems on graphs and samples collected only on the set S 0:T , it specialises the conventional definition in [46] to also account for these quantities. The set S 0:T specifies all graph-time locations where and when to collect samples in the interval {0, . . . , T }. With the above formulation in place, we want to answer the questions: (Q1) Under which conditions is an F−bandlimited graph process observable from a few samples? (Q2) How should we collect samples to estimate x 0 with a controlled accuracy?
We start by writing the measurement y t as a function of the initial F−bandlimited signalx 0 as where we defined the observability matrix the block matrices C S 0: J 0:T captures the input evolution in the interval {0, . . . , T } whose expression is not required for our derivations, but can be obtained from (9). Next, we answer questions (Q1) and (Q2) for C St being deterministic (called deterministic sampling), while in Section 4.2 we consider the case where the entries of C St follow a Bernoulli distribution (called random sampling).

Observability with deterministic sampling
Recall that C S 0:T is the set projection matrix for the set S 0:T . System (8) is observable over the set S 0:T if and only if the observability matrix O 0:T in (12) is full rank, i.e., rank(O 0:T ) = |F|. From the structure of O 0:T , we deduce that a sufficient condition for observability is that at least one block matrix C St U FÃt,0 has rank |F|, which in turn requires to sample at least |F| nodes for a specific t, i.e., |S t | ≥ |F|. This condition is similar to that obtained in adaptive graph signal reconstruction [21,27]. Differently, we link the cardinality of the sampling set with the process evolution by bringingÃ t:0 into the analysis. This link allows us to collect samples in a graph-time fashion, resulting in socalled graph-time samples. We claim the following. Proposition 2. An F−bandlimited system on a graph [cf. (8)] is observable over the set S 0:T only if the cardinality of the sampling set is greater than or equal to the process bandwidth, i.e., That is, only if at least |F| graph-time samples are taken in the time interval {0, . . . , T }. These samples can be taken by |F| nodes at a single time instant, one node in |F| time instants, or a combination of the two.
Note that Proposition 2 provides only a necessary condition: the observability matrix O 0:T [cf. (12)] may be ill-conditioned depending on the location of the samples and process spectral support. It is then paramount to pick the graph-time samples such that O 0:T has full rank |F|, i.e., rank(O † 0:T ) = |F|. By exploiting then rank identity rank(O 0:T O 0:T ) = rank(O 0:T ) = |F| and expression (12) of the observability matrix, the sampling set S 0:T should satisfy For T = 0, equation (14) particularizes to the one-shot graph signal reconstruction condition derived in [11]. The following theorem generalizes result (1) to a necessary and sufficient condition for the observability of an F-bandlimited graph process over a sampling set. Theorem 1 and Proposition 2 are our answer to question (Q1) for samples collected deterministically. Theorem 1 involves the localization properties of graph signals by accounting also for the process evolution through the matrix A 0:T . In other words, Theorem 1 implies there are no F-bandlimited graph processes localized on the complementary set S c 0:T throughout their evolution. For T = 0, the result in (15) for observing a graph process specializes to the one-shot graph signal reconstruction condition in (1).
MSE analysis. When the measurements y t are corrupted by noise v t = 0, we estimate the initial statex 0 through least squares as This implies that, besides ensuring full rank, the sampling set S 0:T should also contain those nodes that lead to the minimum reconstruction error from noisy measurements.The mean squared error is a standard metric to characterize the impact on noisy data and is formalized by the following proposition.
Proposition 3. Consider an F−bandlimited graph process following model (8) and let the sampling set S 0:T satisfy condition (15). Then, the MSE of the least The proof follows from the covariance matrix of the least squares estimator [51,Chapter 8].
Besides characterizing the impact of the graph-time samples on the MSE, the result in (17) shows that their location is as important as the number of samples. This is due to the presence of the sampling matrix C S 0:T into the inverse covariance matrix in (17). Thus, we focus next on how to select these samples to guarantee a target MSE performance.
Sampling strategy. We follow sparse sensing to design the sampling set S 0:T [52]. This consists of finding the minimum number of samples that ensure the MSE in (17) is smaller than a target value γ > 0. Defining the sampling vec- we find the graph-time samples by solving the relaxed convex problem where the objective function is the l 1 -norm surrogate of the l 0 -norm and imposes sparsity in S 0:T , the first three constraints impose an upper bound γ to the MSE, and the last constraint relaxes the Boolean constraint c 0:T,i ∈ {0, 1} to the box one 3 . As an alternative, we can adopt a greedy approach to build the set S 0:T [53]. Both approaches can be used to answer our question (Q2). We can also consider the opposite problem of (18), where we minimize the MSE but impose a fixed number of samples; this translates as well into a convex problem.

Observability with random sampling
One major limitation of deterministic observability is the design of the sampling set S 0:T by solving the suboptimal relaxed problem in (18). We tackle this limitation by considering the entries of C St = diag(c t ) in (2b) to be i.i.d. in time Bernoulli random variables with expected valueC = diag(c). Hence, the set S 0:T over which we want to observe the system becomes random. Our goal is to relate the sampling probabilities c with the observability of an F−bandlimited system on a graph over a realization of S 0:T . Depending on these probabilities, the set S 0:T may contain all samples (c n = 1 for all n ∈ {1, . . . , N }) or be empty (c n = 0 for all n ∈ {1, . . . , N }). Denote then byS = {n ∈ {1, . . . , N }|c n > 0} the set of nodes sampled with a probability greater than zero or the expected sampling set. Our task translates now into answering questions (Q1) and (Q2) w.r.t. the expected setS.
Given the measurements y 0:T and a realization of C S 0:T , we define the measurement vector to which we subtracted the input signal as 4 From Proposition 2, it follows that a necessary condition for the observability matrix O 0:T [cf. (12)] to be full rank is that the realization of the sampling set S 0:T has a cardinality greater than, or equal to, the signal bandwidth |F|. From the structure of O 0:T (C S 0:T ), it follows that the rank of O 0:T C S 0:T depends on the rank ofC. That is, the rank of the observability matrix depends on the cardinality of the expected sampling setS. We formalize this obervation in the following proposition. Proposition 4 provides a necessary condition on the expected sampling set S, which is equivalent to It implies that for T ≥ |F|, we could observe an F−bandlimited graph process by taking random measurements only at one node. This result is not entirely surprising: the authors in [10] have shown that a graph signal can be reconstructed also by sampling successive aggregations of a single node. Hence, we can observe the process form a node that has linearly independent measurements in time; but since the sampling is random in the finite interval {0, . . . , T }, there is a possibility that a realization of the set S 0:T has a cardinality smaller than |F|. We quantify the latter in the following proposition.
Proposition 5. Given an F−bandlimited graph process following model (8) and given the diagonal sampling matrix C St in (2) has i.i.d. in time Bernoulli entries with expected valueC. The cardinality of a realization of S 0:T is an integer ε ≥ 0 smaller than the process bandwidth |F| with probability where α = (T + 1)1 T Nc is the mean of a Poisson binomial distribution.
Propositions 4 and 5 are our answer to question (Q1) for samples collected randomly. The interesting part of (21) is for ε = 0, which quantifies the probability of the sampling set being smaller than the process bandwidth. The latter drops in fact to zero even for moderate sampling probabilitiesc if N ∼ 100 and T 0. In Section 6.2, we test the observability with random sampling for the ETEX data set and show the probability in (21) drops below machine precision. Next, we perform a mean squared error analysis on the least squares estimated state to quantify the impact of the sampling probabilitiesC = diag(c) on the process observability.
MSE analysis. The least squares estimate of the initial statex 0 isx o 0 = O † 0:T z 0:T . To render the MSE analysis tractable, we follow the procedure used for the Cramér-Rao lower bound 5 (CRLB) [51] and propose a lower bound on the achievable MSE. The following proposition quantifies this finding.
Besides expressing the lowest achievable MSE as a function ofC, the bound in (22) is useful to design these sampling probabilities.
Sampling strategy. We design the expected sampling setS by following the sparse sensing principle; but instead of using the CRLB as a design criterion, we use the lower bound in (22). This consists of finding the minimum sampling rate such that the MSE lower bound is smaller than a target value γ > 0. In other words, we findc, thereforeS, by solving the convex problem Although conceptually similar to the deterministic sampling design in (18), problem (23) differs in two main aspects that preserve the optimality of the solution. First, the objective is the true function (i.e., the overall sampling rate) we want to minimize. Second, the convex box constraintc ∈ [c min , c max ] N is not a relaxation, since we optimize over the sampling probabilities for some rate bounds c min and c max . Problem (23) preserves optimality because we changed the problem setting: the samples are collected randomly. The price to pay is that condition (20) is necessary but not sufficient for observability and holds in probability as quantified by (21). While by solving (23) we can answer question (Q2) for random sampling, we can also add a constraint on the probability criterion in (21) to control outliers. But the latter should be relaxed by an upper bound since it leads to a non-convex problem. As we show in Section 6, this constraint is unnecessary since a small enough γ on the MSE trades well the sampling probabilities with performance.

Tracking graph processes
We now consider tracking an F−bandlimited graph process from deterministic samples. Given the process (8), we use the Kalman filter to optimally track it [50]. First, we discuss the time varying scenario and then we move to time-invariant models [cf. (3)- (7)]. For both cases, we provide conditions on the sampling set and design strategies to track the process.

Kalman filtering for time varying models
The Kalman filter on graphs for (8) evolves as described in Algorithm 1. Here, we also consider uncertainty into the process evolution [cf. (8a)] through an additive zero-mean Gaussian noisew t−1 with covariance matrixΣ w independent from v t . The Kalman filtering initializes the a posteriori state estimatex + 0 and the a posteriori error covariance matrix P + 0 . The a priori error covariance matrix P − t is updated in step ii), while the Kalman gain matrix K t in step iii). The a posteriori estimatex + t is updated in step iv) and the corresponding error covariance matrix P + t in step v). The Kalman gain matrix K t depends on the sampling matrix C St and leads to the minimum a posteriori MSE, Tr(P + t ); hence, C St should be carefully designed to improve tracking.

Algorithm 1 : Kalman filtering on graphs
Initialize the a posteriori estimatex + 0 and the a posteriori error covariance matrix P + 0 = I N for a small > 0. For t > 0 repeat: Update the a posteriori error covariance matrix The sampling set C St in the Kalman gain requires a minimum number of samples to be fully exploited. Consider R < |F| samples at time t and rank(P − t ) = |F|. Then rank(C St ) < |F| and the rank of the pseudo-inverse part in the Kalman gain matrix K t is Thus, a necessary condition for K t to be full rank is that samples are collected at time t. In other words, the Kalman filter on graphs fully exploits the Kalman gain only if the number of sampled nodes is greater than, or equal to, the signal bandwidth for each t. Condition (24) generalizes the observability condition in (13) to tracking. Moreover, the sampled nodes impact also the a posteriori error covariance matrix P + t . We will exploit this connection to design set S t at time t that guarantees a target a posteriori MSE.
Sampling strategy. Condition (24) and the a posteriori error covariance matrix P + t highlight the role played by the location of the samples for tracking. To optimally select these samples, we consider a sparse sensing approach with the posterior Cramér-Rao bound (PCRB) as design criterion [54,55].
Given the log-likelihood of the measurements satisfies the regularity condition E[∂lnp(y t ;x t )/∂x t ] = 0 N ∀t, the PCRB satisfies where F t (x t ) is the posterior Fisher information matrix (FIM) and stands in for the matrix inequality in a positive semidefinite sense. For linear systems in additive Gaussian noise, P + t is independent ofx t and expression (25) holds with equality, i.e., the Kalman filter is optimal. Thus, by designing the sampling set S t w.r.t. the FIM F t (x t ), we achieve the same result as working with the a posteriori error covariance matrix P + t . The posterior FIM for the Kalman filter in Algorithm 1 is where c tn is the nth diagonal entry of sampling matrix C St and F o tn (x t ) = σ −2 v c tn u F n u T F n is the FIM related to the nth node measurement at time t. The first term on the right-hand side of (26) is the FIM of tracking up to t − 1, while the second term accounts for the innovation at time t. By substituting F o tn (x t ) and rearranging the sum, we obtain the a posteriori FIM Following then the sparse sensing idea as in (18) and (23), we design the instantaneous sampling set S t by solving the relaxed convex problem Problem (28) is the counterpart of the relaxed problem (18) to tracking and carries on all the remarks about solution optimality.

Steady-state Kalman filtering on graphs
We specialize the above derivations to time-invariant F−bandlimited systems on graphs as those for models (3)-(7). These systems converge often to a steady-state that can be exploited to design a fixed sampling set S = S t ∀t.
Differently from the time varying models, the sampling set S affects now the entire system evolution. The a priori error covariance matrix P − t converges to the unique limit point P ∞ if: i) the pair (Ã,B) is stabilizable; and ii) the pair (Ã, C S U F ) is detectable. The first condition is a characteristic of the graph process and is application specific. For stable time-invariant processes as the ones we discuss, this condition is satisfied. The second condition imposes the sampling set to guarantee steady-state convergence. As it follows from system theory, an observable system is also detectable; thus, the Kalman filter on graphs converges if the limiting observability matrix is full rank. Defining the complementary sampling set S c = V/S and following similar arguments as in Theorem 1, the Kalman filtering on graphs converges if lim T →∞ Condition (30) accounts for the localization properties of the sampling set throughout the entire temporal evolution (in practice for T 0). For a set S satisfying (30), the steady-state error covariance matrix P ∞ leads to the discrete algebraic Riccatti equation (DARE) which can be solved numerically [50]. The steady-state Kalman gain matrix is with posterior state estimatẽ The steady-state Kalman filter is only asymptotically optimal. As it follows from (32), a necessary condition to exploit the Kalman gain is that the cardinality of the sampling set satisfies |S| > |F|. That is, we need to fix at least |F| nodes. We discuss next how these nodes are sampled.
Sampling strategy. By following the sparse sensing idea, we design S as the set that minimizes the steady-state performance. This translates into solving the non-convex problem minimize c Tr(P ∞ ) subject to C S = diag(c) Problem (34) finds the sampling set S that leads to the best steady-state tracking accuracy. This problem is intractable since a closed-form solution for the DARE (31) misses. We tackle this issue by adapting the greedy selection strategy from [56] as illustrated in Algorithm 2. The sampling strategy starts with an empty set; solves the DARE in ii) numerically for all nodes; and adds greedily the node with the smallest steady-state error increment in step iii). The algorithm stops when |S| nodes are selected.
The greedy Algorithm 2 is optimal with respect to problem (34) if: i) the Algorithm 2 : Greedy node sampling algorithm for problem (34) Start with an empty sampling set S = ∅, a cardinality |S|, and a counter c = 0: i) while c ≤ |S| ii) Compute Tr(P ∞ (S ∪ {n})) as in (31) for all n ∈ S c iii) Select n as argmin n Tr(P ∞ (S ∪ {n})); iv) Update the sampling set S = S ∪ {n}; v) Update the counter c = c + 1.
measurement noise v t is uncorrelated; ii) the set of node information matri- F n is ordered w.r.t. the order relation of positive semidefiniteness [56]. The first condition is easily met in practice. The second connects the graph topology and process bandwidth to the optimal sampling set S. Consider then the information matrix for all nodes in S Condition ii) implies that for two sets S and S , the cost function in (34) satisfies Tr(P ∞ (S )) ≤ Tr(P ∞ (S )) if F (S ) F (S ). That is, the node sampled in S is a better choice than the node sampled in S [56]. This guarantees Algorithm 2 builds the sampling set with the best steady-state tracking accuracy.

Numerical evaluation
We corroborate the above theory with numerical results on both synthetic and real scenarios. First, we test observability and then tracking. We used the GSP box [57] and CVX [58].

Observing graph processes
We test the observability with deterministic sampling on the Molene weather data set 6 and the observability with random sampling on the European tracer experiment (ETEX) data set 7 [32].
Deterministic sampling. The Molene weather data set contains R = 744 hourly temperature recordings collected in January 2014 over N = 32 cities in Brest (France). We used these recordings to build a synthetic diffusion scenario for model (3). Our rationale is to corroborate the sampling approaches when  (18) and uniformly random sampling whose NMSE is averaged over additional 100 iterations. The graph process has a localized spectrum on the entire bandwidth |F | = N = 32. the bandlimited assumption on the state signal is violated. The graph is a knearest neighbor (kNN) with k = 3. Each temperature recording is treated as a separate instance; it represents the initial state x 0 for model (3) and it is diffused with a weight w = 1.5 for T = 10 time instances. First, we analyzed the effect of the sampling set S 0:T when the process has contribution on the entire bandwidth |F| = N = 32. We corrupted the measurements y t in (2b) with a zero-mean Gaussian noise with variance σ 2 v = 10 −1 . For r τ being the τ th recording, this corresponds to an average signal-tonoise ratio (SNR) of 19.3dB computed as The |S 0:T | samples are the ones that minimize the MSE in (17) in a sparse sense manner. The performance is evaluated through the normalized MSE (NMSE) between the estimated recording r o τ and the true one r τ defined as  (18) and uniformly random sampling. The state spectral evolution is localized on the first |F | = 8 frequencies. Figure 1 shows the NMSE as a function of the number of samples |S 0:T |. We observe that with 60 out of 320 samples the proposed approach achieves an NMSE of −20dB, while the uniformly random sampling requires far more measurements. This finding suggests that sparse observability is possible also for graph processes that have a contribution on the entire bandwidth. This is because of the sampling extrapolation to the temporal dimension.
To provide further insight, we show in Table 1 the theoretical and empirical NMSE as a function of the target value γ in (18). We also show the cardinality of the sampling set S 0:T . These results suggest that a stricter NMSE requirement (smaller γ) leads to a larger number of nodes in S 0:T and, vice-versa, a looser NMSE requirement (larger γ) leads to a smaller number of nodes in S 0:T . Second, we analyzed the effect of the sampling set S 0:T when the process bandwidth is restricted to the first |F| = 8 graph frequencies. This will inevitably cause a loss of information about x 0 . We tested two different noise variances σ 2 v = {10 −1 , 5} (SNR = {19.3dB, 2.3dB}). The |S 0:T | samples are collected as in the previous scenario. Figure 2 shows the proposed sampling approach yields a lower NMSE than uniformly random sampling. The NMSE has a higher floor when compared with the full bandwidth ( Figure 1) and its value does not reduce even by increasing the sampling set cardinality |S 0:T |. We attribute this limitation to the restricted bandwidth: the out-of-band process contribution plays a role to improve the performance. We corroborate this observation in Figure 3 where we show the average SNR as a function of the graph frequency index. We observe that in a low noise regime it is beneficial to consider a larger bandwidth since the average SNR per frequency is high. We do not observe this behavior in a high noise regime since the average SNR becomes negative at the higher graph frequencies.
Hence, the SNR-not the signal energy alone-defines the signal bandwidth. Third, we analyzed the effect of the signal bandwidth. We fixed the number of samples to |S 0:T | = 100 and computed the NMSE for different bandwidths |F| and noise powers σ 2 v . Figure 4 shows an increasing trend of the NMSE in the high noise regime (σ 2 v = 5), implying the meaningful information is concentrated in the first few frequencies and, therefore, the Molene recordings are naturally bandlimited. By increasing the bandwidth |F|, the average SNR degrades and we end up bringing in more noise; hence, the performance degrades. This observation is reinforced in the low noise regime, where a larger bandwidth is preferred to exploit the SNR at the higher frequencies and improve observability. This result further corroborates that the signal-to-noise ratio determines the process bandwidth.
Random sampling. The ETEX experiment contains measurements of an identifiable perfluorocarbon concentration released near Rennes (France) and diffused over Europe [32]. Thirty measurements were collected over a period of 72 hours at N = 168 ground-level stations. These stations are the nodes of a kNN graph and the T = 30 temporal measurements are the graph process. For different reasons, the measurements are often unavailable and, in these cases, the tracer concentration is set to zero. We considered model (3) to capture the tracer evolution over time. Set F contains the frequency indices where 30% of the process energy concentrates with cardinality |F| = 6. Since diffused graph signals are often bandlimited, our intuition is that also in this experiment most of the frequencies will not have useful information. With this setup, we found heuristically that k = 3 and w = 3.5 lead to the smallest estimation error by using all the 168 × 72 recordings. This result will serve as a baseline to test observability with random sampling. To account for the measurement noise, we corrupted the signal with a zero-mean Gaussian noise of variance σ 2 v = 10 −4 (SNR = 29.2dB). The obtained results are averaged over 2000 iterations. Figure 5 (left) shows the tracer concentration at t = 0, wherein the red circles are the three nodes with the highest concentration. The middle plot shows the estimate of the observed state by using model (3) with all nodes collecting measurements. The graph model is able to identify the region where the tracer was released, but, at the same time, they yield a tracer concentration of around 0.04ng/m 3 on all nodes. We attribute this leakage to the missing values that are set to zero and the absence of wind information on those days. For the scope of this work, we will use this observed state ( Figure 5 (center)) as a baseline since it is the best result with the fitted model. The right plot shows the average observed state with a sampling rate of 60; the sampling probabilities are obtained by solving the opposite of problem (23) and are illustrated in Figure 6. The proposed sampling approach achieves an average NMSE of −16.2dB with a variance of −48.1dB.  Figure 6: Optimal sampling probabilities over the nodes obtained from solving the opposite of problem (23). We observe that several nodes are sampled with probability one and that the overall solution is highly sparse. We observe the following: i) the deviation of a particular realization from the averaged observed state is in general negligible, yielding a good practical result; and ii) similarly to the approaches that use the CRLB to perform sparse sampling, the lower bound (22) is a suitable cost function to design a sparse sampler. These observations should nevertheless be considered insightful and not conclusive since random sampling is validated only in a specific scenario. Table 2 shows the impact of the MSE requirement γ in (23) on the lower bound (22), empirical NMSE, overall sampling rate 1 T Nc , and α in (21). Despite the gap between the lower bound and the empirical NMSE, a looser requirement on (22) induces a lower sampling rate; and all the reported values of α lead to a probability in (21) below machine precision. This corroborates criterion (22) to design the node sampling probabilities for observability with random sampling.

Tracking graph processes
We now corroborate the performance of the time varying and steady-state Kalman filtering discussed in Section 5. All results are averaged over 500 dif- ferent realizations. Time varying. We considered a synthetic tracking scenario where the graph process follows the diffusion model in (3) with a zero initial state and inputs u t for t ∈ {1, 101, . . . , 401} taken from the recordings of the Molene data set. We considered real (non synthetic) measurements as input to represent approximately bandlimited processes. The graph is a 3NN, the process bandwidth F consists of the first 16 graph frequencies, the diffusion weight is w = 1, and the input matrix is B t = I N . In a nutshell, the state evolution considers temperature diffusion for 100 iterations and then a new input is introduced. The model and measurement noise are zero-mean with covariance matrixes Σ w = 10 −4 I N and Σ v = 10 −1 I N , respectively. We initialized the Kalman filter with a posteriori estimatex + 0 = 1 |F | and error covariance matrix P + 0 = Σw. We compared the proposed sparse sensing approach (the opposite problem of (28) that selects |S| nodes with minimum MSE) with the uniformly random sampling whose performance is averaged over 500 additional realizations. Figure 7 shows the tracking NMSE as a function of the iteration index for different values of sampling set cardinalities |S t |. When more nodes are sampled a smaller NMSE is achieved. This is evidenced in the first iterations. These results show also that 50% of the samples can be saved compared with the full bandwidth case, yet without hindering the NMSE. Uniformly random sampling is an option to track the process only for large t. We also remark that problem (28) may often give a dense solution for higher t and, since it is an SDP relaxation, it might also lead to sampling rates that are far from the minimum MSE. Finally, remark the spikes in the estimated NMSE represent the input signal and are common for both sampling strategies.
Next, we compare the Kalman filter with the LMS [21] and RLS [27] on RLS on graphs [28] Iteration index NMSE (dB) 3 Figure 8: NMSE versus iteration index for Kalman filter (KF), LMS (µ = 0.0875) [21] and RLS (β = 0.95) [27]. The sampling set St of the Kalman filter is composed of one node, while the sampling rate for the LMS and RLS on graphs is 16.08 (greater than |F | = 16) and with five nodes sampled with probability one. graphs. The sampling set S t consists of one node sampled uniformly at random. The sampling probabilities for the RLS on graphs are found with β RLS = 0.95 and γ RLS = 7 × 10 −2 [27]. These parameters result in an average sampling rate of 16.08 (greater than |F| = 16) for each t and with five nodes sampled with probability one. These sampling probabilities are also used for the LMS on graphs and its step size is set to µ LMS = 0.0875 to meet the RLS steadystate MSE. Both algorithms are initialized as the Kalman filter. Figure 8 shows the Kalman filter suffers in the first iterations, but it outperforms both LMS and RLS (hence, other state-of-the-art algorithms with which LMS and RLS compare well [21,26,27]) when the system evolution is learned better. This result highlights the potential of the proposed approach to track the process by collecting only one sample at a time.
In Figure 9, we further analyze how the graph topology affect the Kalman filter. We changed the number of neighbors, k, in the kNN graph, while kept fixed all other parameters. The sampling set is composed of |S t | = 6 nodes selected uniformly at random. The Kalman filter tracks faster in less connected graphs but when the initialization effect vanished (after more iterations) it tracks better in denser graphs. For k ≥ 5, we have not seen any significant different in tracking accuracy.
Steady-state. We test the steady-state Kalman filter to track a heat diffusion process on a rectangular grid of N = 75 nodes (5 × 15) with unitary edge weights. We aim to provide insights into how GSP tools can be useful in temperature monitoring systems. The initial state x 0 is one at the five nodes of the leftmost column of the grid and zero elsewhere. This signal is diffused following the heat propagating model (3) with weight w = 10 for T = 500 instances. Set F comprises the frequency indices where 99% of the energy of x 0 is concentrated and resulted into |F| = 18 active frequencies. We considered a model and measurement noise with covariance matrices Σ w = 10 −4 I N and Σ v = 10 −1 I N , respectively. Figure 10 shows the tracking NMSE as a function of the diffusion time for different cardinalities of the sampling set. A larger number of samples improves the steady-state NMSE and convergence rate. The sampling strategy in Algorithm 2 is also beneficial when a few samples are collected, while uniformly random sampling can be adopted for larger |S|.
We then compared the steady-state Kalman filter with the LMS and RLS on graphs algorithms. We chose the parameters of the adaptive algorithms as in Figure 8 and found that µ LMS = 0.041 (LMS), β RLS = 0.99 (RLS), and a sampling rate of 18 samples per iteration (equalling the bandwidth). The Kalman filtering with time varying sampling in Algorithm 1 is considered as baseline with sampling set |S t | = |S| = 6 chosen uniformly at random. Figure. 11 shows that the Kalman filter outperforms the adaptive algorithms in both the steady-state performance and convergence speed. Both Kalman filter approaches reach an identical steady-state NMSE, but with a sampling rate that is three times smaller compared with the adaptive algorithms. The convergence improvement of the time varying Kalman filter comes at the expense of complexity, i.e., the updates for each t of the Kalman gain matrix, the a priori, and a posteriori error covariance matrices.
In Figure 12, we finally show how different grid connectivities affect the steady-state Kalman filter. In addition to the original rectangular grid, we considered also a star grid (i.e., nodes have also diagonal connections) and a multihop grid (i.e., nodes are connected also with the two hops away neighbors of the rectangular grid). The tracker is quite robust to different grid connec-  tivities 8 . Differently from the time varying scenario, we considered the greedy sampling strategy which has a bigger impact in the performance. The proposed method adapts the sampled nodes to achieve the optimal steady-state performance which is seen to have also a similar convergence rate.

Conclusion
This work generalized graph signal sampling to the temporal dimension for observing and tracking graph processes with contribution only on a fixed set of graph frequencies. We derived necessary and sufficient conditions to observe a bandlimited graph process from a subset of nodes. These conditions relate the graph structure with the process bandwidth and the location of the samples on the graph; and they treat sampling theory for graph signals and adaptive algorithms as particular cases. We also developed a mathematical framework for observability with random sampling: we proposed conditions to observe the graph process and sampling strategies. The proposed findings extend naturally to tracking the graph process with a Kalman filter. We derived conditions on the minimum number of nodes such that the Kalman gain is fully leveraged and provided sampling strategies for tracking. The theory is corroborated with numerical experiments, which show the usefulness of the bandlimited prior to reduce the number samples collected over graph and time.
We identify three directions that require additional research. First, we considered the process bandwidth time-invariant. This is necessary for our derivations to find recovery conditions; however, it limits the applicability to a few models. It should be investigated if similar or different conditions can be obtained when this assumption is violated. Second, our derivations concern graph topologies that are time-invariant. It is an interesting direction to generalize some of these findings to time varying graphs. Third, from a more practical perspective, future research is needed to adapt the proposed sampling methods to nonlinear dynamics over graphs which alter the state bandwidth over time From Definition 1, we have that an F−bandlimited graph process has contribution only on the set F of graph frequencies. Then, under Assumption 1, we can write x t as whereÂ t−1 andB t−1 are N ×N diagonal matrices containing the eigenvalues of A t−1 and B t−1 , respectively. Then, sincex t−1 = U T x t−1 andû t−1 = U T u t−1 are the GFTs of x t−1 and u t−1 , respectively, we have that Since matricesÂ t−1 andB t−1 are diagonal, we can split (38) into N independent models with nth model representing the process evolution in the nth graph frequency. Then,x t evolves in the frequency set F only if the previous statex t−1 or the inputû t−1 have a contribution on this set. Sinceû t−1 is F−bandlimited [cf. Assumption 2], it is zero in the complementary frequency set F c . Hence, statex t evolves in F c only if the former statex t−1 has a contribution on this set. The latter forms a recursion and implies that statex t evolves on F c only if the initial statex 0 in non-zero on this set.
For the sufficiency part, observe that ifx 0 is F−bandlimtied, it is zero on the complementary set F c . This implies thatx t has a contribution only in F; hence, F−bandlimited. For the necessary part, observe that ifx 0 is not F−bandlimited, it is non-zero on the complementary set F c . This implies that x t has a contribution also on F c ; hence, not F−bandlimited.

Proof of Proposition 2
By applying the rank inequality rank(AB) ≤ min{rank(A), rank(B)} to O 0:T = C S 0:T (I T +1 ⊗ U F )Ã 0:T in (12), O 0:T is full column rank |F| only if rank (C S 0:T ) ≥ |F| (39) which from the structure of C S 0:T is true when the claim is satisfied.

Proof of Theorem 1
By substituting C S 0:T = I N (T +1) − C c S 0:T into the rank argument of (14) we can write the matrix form expressioñ where λ min (A) is the minimum eigenvalue of A. Here, we exploited the positive semi-definiteness of both matrices on the right-hand side of (40). Then, from the norm sub-multiplicativity we have Ã .
The equality in (42) derives from the definition of the spectral norm and the relation between the singular and the eigenvalues of a matrix.

Proof of Proposition 4
From the structure of C S 0:T , we have that rank(C S 0:T ) ≤ rank(E[C S 0:T ]) = rank(I T +1 ⊗C).
A necessary condition then for rank(C S 0:T ) to be |F| is rank(I T +1 ⊗C) ≥ |F|.
Then, sinceC is diagonal, it means that at least |F|/(t + 1) nodes must be sampled with a probability different from zero.

Proof of Proposition 5
Denote by c t = diag(C St ) the random sampling vector with expectationc for t ∈ {0, . . . , T }. Let also d = |S 0:T | = T t=0 N n=1 c tn be an auxiliary variable that characterises the cardinality of realization of the set S 0:T . Then, d is a Poisson binomial random variable being it the sum of N (T + 1) independent Bernoulli random variables [59]. The claim (21) follows by simple statistical properties.

Proof of Proposition 6
By rewriting the MSE as and from (17) Then, since function ϕ : X → Tr[X −1 ] is convex for a positive semi-definite matrix X, we apply Jensen's inequality ϕ(E[X]) ≤ E[ϕ(X)] to lower bound (47) as in (22).