An Online Approach to Physical Watermark Design

This paper considers the problem of designing physical watermark signals in order to optimally detect possible replay attack in a linear time-invariant system, under the assumption that the system parameters are unknown and need to be identified online. We first provide a replay attack model, where an adversary replays the previous sensor data in order to fool the system. A physical watermarking scheme, which leverages a random control input as a watermark to detect the replay attack, is then introduced. The optimal watermark signal design problem is cast as an optimization problem, which aims to achieve the optimal trade-off between control performance and intrusion detection. An online watermarking design and system identification algorithm is provided to deal with systems with unknown parameters. We prove that the proposed algorithm converges to the optimal one and characterize the almost sure convergence rate. A numerical example and an industrial process example are provided to illustrate the effectiveness of the proposed strategy.


I. INTRODUCTION
C YBER-Physical Systems (CPSs) offer close integration of computational elements and physical processes [1]. They are defined as systems where "physical and software components are deeply intertwined, each operating on different spatial and temporal scales, exhibiting multiple and distinct behavioral modalities, and interacting with each other in a myriad of ways that change with context" [2]. Such systems play a critical role in large varieties of fields, such as manufacturing, health care, environment control, transportation, etc. Due to their wide applications and critical functions, it is of paramount importance to ensure the secure operation of CPS [3], [4]. Any successful attack on CPS may jeopardize critical infrastructure and people's lives and properties, even threaten national security. In 2010, Stuxnet malware launched a devastating attack on Iranian uranium enrichment facilities [5], [6]. This incident raised a great deal of attention to CPS security in recent years [7].
However, CPS security faces a wide variety of challenges. Cardenas et al. [8] discussed three main challenges and identified unique properties of CPS security compared to traditional IT security. Besides, the physical part of CPS poses new security challenges. Similar discussion can be found in [9]. Gollmann and Krotofil [10] pointed out that also people performing security analysis of CPS is a key challenge. The authors argued that it is difficult for people to expertise in both cyber and physical safety and able to appreciate limitations in their own domain.

Literature Review
A significant amount of research effort has been devoted to intrusion and anomaly detection algorithms to enhance CPS security. Zimmer et al. [11] presented three mechanisms for time-based intrusion detection. The techniques, through bounds checking, were developed in a self-checking manner by the application and through the operating system scheduler. Mitchell and Chen [12] proposed a hierarchical performance model and techniques for intrusion detection in CPS. They classified the modern CPS intrusion detection system techniques into two classes: detection technique and audit material. They summarized advantages and disadvantages in [13]. Kwon et al. [14] discussed necessary and sufficient conditions for when the attacker could be successful without being detected. Their method can be employed to evaluate vulnerability degree of certain CPSs. Corresponding detection and defense methodologies against stealthy deception attacks can be developed. In [15], the authors proposed a mathematical framework for CPS and investigated limitations of the monitoring system. Centralized and distributed attack detection and identification monitors were also discussed.
In this paper, we consider the detection problem of replay attacks. In [16], [17], [18], a replay attack model is defined and its effect on a steady-state control system is analyzed. An algebraic condition is provided on the detectability of the replay attack. For those systems that cannot detect replay attack efficiently, a physical watermarking scheme is proposed to enable the detection of a replay attack. In particular, by injecting a random control signal, the watermark signal, into the control system, it is possible to secure the system. However, the watermark signal may deteriorate the control performance, and therefore it is important to find the optimal trade-off between the control performance and the detection efficiency, which can be cast as an optimization problem. Similar watermarking schemes are also proposed in the literature [19], [20], [21].
Different from the previous additive watermarking schemes, a multiplicative sensor watermarking scheme is proposed in [22]. In this scheme, each output is respectively fed to a SISO watermark generator and due to the inclusion of a watermark removing functionality, the control performance will not be sacrificed. Applying some techniques of non-cooperative stochastic games, Miao et al. [23] designed a suboptimal switching control policy that balances control performance and the intrusion detection rate for replay attacks. Hoehn and Zhang [24] provided a novel technique via exciting the system in non-regular time intervals and signal processing to detect the replay attack. Other replay attack detection mechanisms have also been proposed in the literature [25].
It is worth noticing that in majority of the aforementioned research, the precise knowledge of the system parameters is required in order to design the watermark signal and the detector. However, acquiring these parameters may be troublesome and costly. Moreover, for a large system, the system parameters may change during its operation. Hence, it is beneficial for the system to learn the parameters in an online fashion and automatically generate the optimal detector and the watermark signal in real-time. The problem of learning parameters of dynamical systems, system identification, has been studied over the past decades. Most methods, however, require persistent excitation on the input.
In this paper, due to the nature of the optimal watermark signal, we shall design the input that asymptotically converges to a signal that does not satisfy the persistent excitation condition. However, by controlling the convergence rate, we can still prove that the system parameters converge to true parameters almost surely.
Some preliminaries results regarding online design of physical watermarks are contained in our former work [26]. The main differences between the current version of the paper and [26] are: 1) we not only prove that we can asymptotically identify the system parameters, but also characterize the rate of the convergence; 2) we provide a procedure to automatically generate the Neyman-Pearson detector; 3) we add the simulation on an industrial process to verify the effectiveness of the proposed approach.

Contributions
The goal of this paper is to develop a data-driven approach to design physical watermark signals to protect systems with unknown parameters, against replay attack. The main contributions of this paper are threefold: 1) An online "learning" algorithm is presented to simultaneously infer the parameters of the system based only on the system input and output data and generate the watermark signal as well as the optimal detector based on the estimated parameters.
To the best of our knowledge, it is the first time to study the detection of replay attacks under the scenario with unknown system parameters. 2) We prove that the system parameters which are inferred via our proposed online algorithm converge to the true parameters almost surely even if the input signal asymptotically converges to a degenerate signal. 3) We also characterize the almost sure convergence rate of the estimated system parameters to the true parameters and provide an upper bound for this rate.

Outline of the Paper
The rest of paper is organized as follows. Section II formulates the problem by introducing the system as well as the attack model. The physical watermarking scheme is introduced in Section III. In Section IV, we present an online algorithm to simultaneously infer the parameters of the system and design the watermark signal as well as the detector based on the estimated parameters. We further prove the almost sure convergence of the watermark signal to the optimal one and characterize the convergence rate. In Section V, a numerical example and an industrial process example are provided to verify the effectiveness of the proposed technique. Concluding remarks are given in Section VI. For the sake of legibility, most of the proofs are included in the appendix.

Notations
A of the matrix A is the spectral norm of an m × n matrix A, which is its largest singular value. A ⊗ B is the Kronecker product of matrices A and B. A > 0 (A ≥ 0) indicates that A is positive definite (positive semidefinite). A + denotes the pseudo-inverse of A. We say that f (k) ∼ O(g(k)) if there exists an M > 0, such that |f (k)| ≤ M × g(k) for all k ∈ N 0 .

II. PROBLEM FORMULATION
In this section, we introduce a linear time invariant system model of CPS as well as a replay attack model, which will be employed in the rest of this paper.
We consider a linear time-invariant system described by the following equation: where x k ∈ R n is the state vector at time k, and w k ∈ R n is a zero mean independently and identically distributed (i.i.d) Gaussian process noise with covariance Q ≥ 0. φ k ∈ R p is the watermark signal that will be discussed in details in Section III. A sensor network is monitoring the above system. The observation equation is given by where y k ∈ R m is a collection of all sensors' measurements at time k. v k ∈ R m is a zero mean i.i.d. Gaussian measurement noise with covariance R ≥ 0.

Remark 1.
To simplify notations, in this paper we consider a stable open-loop system. However, our framework can be easily extended to a closed loop system with an unstable plant but a stabilizing controller, which is discussed in Section III. Notice that the purpose of the watermark signal is intrusion detection instead of stabilization. As a result, we only consider stable systems or systems that have been pre-stabilized by some controller.
We assume that the process noise w 0 , w 1 , · · · and the measurement noise v 0 , v 1 , · · · are independent of each other. Furthermore, since CPSs usually operate for an extended period of time, it is assumed that the system is already in the steady state, which means that the initial condition x −1 is a zero mean Gaussian random vector independent of the process noise and the measurement noise and with covariance Σ, where Σ satisfies the following Lyapunov equation: We further make the following assumptions regarding the system parameters: Remark 2. The observability and controllability assumption is without loss of generality as we can perform a Kalman decomposition [27] and only work with the observable and controllable subspace.
Next we introduce a replay attack model. We assume that the adversary has the following capabilities: 1) The attacker has access to all the real-time sensory data. In other words, it knows the sensor's measurement y 0 , · · · , y k at time k.
2) The attacker can modify the real sensor signals y k to arbitrary sensor signals y ′ k . Given these capabilities, the adversary can employ the following replay attack strategy: 1) The attacker records a sequence of sensor measurements y k s from time k 1 to k 1 +T , where T is large enough to guarantee that the attacker can replay the sequence for an extended period of time during the attack. 2) The attacker modifies the sensor measurements y k to the recorded signals from time k 2 to k 2 + T , i.e., Notice that since the system is already in the steady state, both the replayed signal y ′ k and the real signal y k from the sensors will share exactly the same statistics. As a result, replay attack can be stealthy for a large class of linear systems, if no watermark signal is present, i.e. φ k = 0. For more detailed discussion on the detectability of replay attack, please refer to [16].
Let us consider the system illustrated in Fig. 1.

Detector
Online Learning The overarching goal of this paper is to design an online learning algorithm for the optimal replay attack detector as well as the optimal parameters U k of the physical watermark signals, based on the collected input φ k and output y k . The physical watermark scheme is introduced in detail in Section III. Based on this scheme, we develop an approach to infer the system parameters based only on the system input data φ k and output data y k , and design the highlighted parameters in Fig. 1: the covariance U k of the watermark signal φ k and the optimal detector based on the estimated parameters.

III. PHYSICAL WATERMARK FOR SYSTEMS WITH KNOWN PARAMETERS
This section introduces the concept of physical watermark, which enables the detection of replay attack. The optimal physical watermark is derived via solving an optimization problem which aims to achieve the optimal trade-off between control performance and intrusion detection. Then we will present the extension to a closed-loop system.

A. Physical Watermark Scheme
The main idea of physical watermark is to inject a random noise φ k , which is called the watermark signal, into the system (1) to excite the system and check whether the system responds to the watermark signal in accordance to the dynamical model of the system. In this section we will restrict the watermark signal φ k to be zero mean i.i.d. Gaussian random variables and its covariance is denoted as U .
In the absence of the attack, y k can be represented as: For simplicity, let us define where H τ is defined as Therefore, y k can be simplified as: It is easy to show that ϕ k is a zero mean Gaussian whose covariance converges to U, where Similarly, ϑ k is a zero mean Gaussian noise whose covariance is On the other hand, let us consider the system under the replay attack, where the replayed y ′ k can be written as y ′ k = y k−∆k = ϕ k−∆k + ϑ k−∆k , Now since ∆k is unknown to the system operator, we shall treat ϕ k−∆k as a zero mean Gaussian random variable with covariance U. As a result, y ′ k is a zero mean Gaussian random variable with covariance U + W. Therefore, to detect replay attack, we need a detector to differentiate the distribution of y k under the following two hypotheses: H 0 : The sensor measurement y k follows a Gaussian distribution N 0 (ϕ k , W). H 1 : The sensor measurement y k follows a Gaussian distribution N 1 (0, U + W).

Remark 3.
It is worth noticing that the watermark signal φ 0 , · · · , φ k are known to the system operator and detector and the conditional distribution (conditioned on {φ k } k ) of y k converges to a Gaussian distribution with mean ϕ k and covariance W.
The Neyman-Pearson detector [28] for hypothesis H 0 versus hypothesis H 1 takes the following form: where η is a threshold chosen by the system operator. Otherwise, hypothesis H 0 is accepted.
Remark 4. For simplicity, we only consider detecting replay attack based on the current measurement y k . In principle, one may take a moving horizon approach to design a detector, by considering joint distribution of y k , y k−1 , · · · , y k−∆t . However, the proposed methodology in this paper can be easily extended to multiple y k s case by stacking the state vector.
Remark 5. It is worth noticing that since hypothesis H 0 is time-varying due to the ϕ k term, the threshold η needs to be time-varying to ensure a constant false alarm rate. If η is still chosen as a constant instead, then the system operator could calculate the expected false alarm rate by numerical integration, since ϕ k is a stationary process.
The following theorem quantifies the performance of the detector, in terms of the expected KL-divergence between distribution N 0 and N 1 : Furthermore, the expected KL divergence satisfies the inequality Proof. The proof is essentially the same as the proof in [17].
Remark 6. It is worth noticing that the expected KL-divergence is a convex function of U and hence U . However, both the upper and lower bounds of it are increasing functions of tr(UW −1 ). Hence, instead of directly maximizing the detection performance, which is computationally difficult, we could maximize tr(UW −1 ), which is linear with respect to U .
Note that although the watermark signal can enable the detection of replay attack, it also deteriorates the system control performance. As a result, it is important to design the signal to achieve the optimal trade-off between the control performance loss and the detection performance. In this paper, to quantify the performance loss, we use the following Linear Quadratic Gaussian (LQG) metric: where is the weight matrix for the LQG control, which is chosen by the system operator.

Remark 7.
The LQG cost is a common choice to quantify the performance of a system running in steady state. On the other hand, we do not foresee any fundamental difficulty to incorporate other performance metrics into our framework, as long as they can be computed from the Markov parameters H τ .
Since y k and φ k converge to a stationary process, J can be written in an analytical form as Therefore, J is an affine function of U , which can be written as where J 0 is the optimal LQG cost, and S is linear with respect to U , being defined as Therefore, in order the achieve the optimal trade-off between the control performance and detection performance, we can formulate the following optimization problem: where δ is a design parameter depending on how much control performance loss is tolerable. An important property of the optimization problem (13) is that the optimal solution is usually a rank-1 matrix, which is formalized by the following theorem: where The optimal solution to (14) is where z is the eigenvector corresponding to the maximum eigenvalue of the matrix X −1 P and z T X z = δ. Furthermore, the solution is unique if X −1 P has only one maximum eigenvalue.
Proof. From the definition of U, we know that Following similar steps as in the above proof, we have that tr(XS) = tr(U X ). Moreover, since X > 0, we have that If the optimal U * has rank greater than 1, then follow the same line of argument in the proof of Theorem 7 in [18], U * can be decomposed as U = α 1 U 1 + · · · α l U l , where the following holds Therefore, by the optimality of U * , we can conclude that However, since U * is a convex combination of U 1 , . . . , U l , we must have tr(U * P) = tr(U 1 P) = · · · = tr(U l P), which shows that the rank one matrix U i is also optimal.
In order to derive the optimal rank one U * , it is clear that U * = zz T for some z = 0. Hence, the optimization problem (14) is converted to Using the Lagrangian multipliers, one can prove that Pz = λX z, which shows that z is the eigenvector of X −1 P. If we enumerate all eigenvectors of X −1 P, it is not difficult to prove that the maximum is achieved when z is the eigenvector corresponds to the largest eigenvalue of X −1 P and z T X z = δ.

B. Extension to Closed-loop Systems
Before continuing on to the next section, we would like to discuss how to generalize the problem formulation for a closed-loop system with a stabilizing controller. Consider the following system discussed in [16]: with the following estimator and controller: and LQG cost as where u k denotes the optimal LQG control signal. We can redefine the statex k and outputỹ k asx and the design of watermark signal in a closed-loop system can be converted to the open-loop formulation.
It is worth noticing that in order to design the detector and the optimal watermark signal, precise knowledge of the system parameters is needed. However, acquiring the parameters may be troublesome and costly. Furthermore, there may be unforeseen changes in the model of the system, such as topological changes in power systems. As a result, the identified system model may change during the system operation. Therefore, it is beneficial for the system to "learn" the parameters and design the detector and watermark signal in real-time, which will be our focus in the next section.

IV. PHYSICAL WATERMARK FOR SYSTEMS WITH UNKNOWN PARAMETERS
This section is devoted to developing an online "learning" procedure to infer the system parameters, based on which, we show how to design watermark signals and the optimal detector and prove that the physical watermark and the detector asymptotically converge to the optimal ones.
Throughout the section, we make the following assumptions: The system is not under attack during the learning phase.
4) The number of distinct eigenvalues of A, which is denoted asñ, is known. 5) The LQG weight matrix X and the largest tolerable LQG loss δ are known.
Remark 8. The first and second assumptions are required in order to ensure that the optimal covariance of the watermark signal is a differentiable function of H τ , i.e., the problem is not ill-conditioned. The third assumption is necessary since there is no way to do system identification without (real) sensory data and it is also needed to prove the asymptotic convergence of our algorithm to the true optimal solution as this cannot be achieved in finite time due to the inherent process and measurement noise. Nevertheless, we shall illustrate through simulation, that after a certain period of learning phase, our algorithm can approximate the optimal solution with reasonably well accuracy and the system can detect replay attack. The fourth assumption is also required to prove convergence, although we shall demonstrate in the simulation that we can use a reduced model to approximate the system with good accuracy. The fifth assumption should hold for all practical cases as X and δ are design parameters chosen by the system operator.
For the sake of legibility, we shall introduce our algorithm first and present the theorem on the correctness of our approach in the end.

A. An Online Algorithm
In this subsection, we will present the complete algorithm in a pseudo-code form. After that, the online "learning" scheme will be introduced in detail.
Algorithm 1 describes our proposed online watermarking algorithm. The notations are described later in the subsection. First, we initialize some parameters which will be used later. In each round of the while iteration, the optimal covariance of the watermarking U k, * based on current knowledge is computed firstly. Based on the derived covariance, one can update the covariance U k by combining "exploration" and "exploitation" term which will be described in detail later. According to the updated covariance, we generate the watermarking signals φ k and inject them to the plant. Then we collect the sensory data y k and employ them and watermarking signals to infer necessary system parameters H k,τ , P k , X k . Based on the estimated parameters, one can update the Neyman-Pearson detectorĝ k . Then one can repeat the above process to identify system parameters and design the watermarking signals as well as the optimal detector.
A pseudo-code form for Algorithm 1 is as follows: Remark 9. For Algorithm 1, P k , X k are defined in (18), U is the covariance of watermarking signal, and H k,τ is defined in (20).
Step 3 is the update of the covariance of the physical watermark in (17). All parameters will be illustrated in the following subsections.

Generation of the Watermark Signal φ k
Let us design U k , which can be considered as an approximation for the optimal covariance of the watermark signal U , as where 0 < β < 1, δ is the maximum tolerable LQG loss defined in (13), and U k, * is the solution of the following optimization problem and P k−1 and X k−1 are the estimate of P and X matrices, respectively, based on y 0 , . . . , y k−1 , φ 0 , . . . , φ k−1 , both of which are initialized as: The inference procedure of P k and X k for k ≥ 0 will be provided in the further subsections.
Remark 10. Notice that the second term (k + 1) −β I on the RHS of (17) is crucial for parameter identification. The reason is that U k, * is in general a rank 1 matrix (as is proved in Thereom 2) and hence it does not provide persistent excitation to the system for us to identify the necessary parameters. Conceptually, the (k + 1) −β I term can be interpreted as an "exploration" term, as it provide necessary excitation to the system in order for us to infer the parameters. The U k, * is the "exploitation" term, as it is optimal under our current knowledge of the system parameters.
At each time k, the watermark signal is chosen to be where ζ k s are i.i.d. Gaussian random vectors with covariance I.

Inference on H τ
The rest of this section is devoted to inferencing the system parameters from the collected sensory data y 0 , . . . , y k and watermarks φ 0 , . . . , φ k . We will first identify the Markov parameters H τ of the system. Let us define the following quantity H k,τ , where 0 ≤ τ ≤ 3ñ − 2, as where H k,τ is an estimate of H τ at time k.
Remark 11. It is worth noticing that other methods, such as subspace identification, may be superior for classical system identification tasks to the method we proposed. However, since the covariance of our watermark signal converges to a degenerate matrix (of rank 1), it is non-trivial to analyze the convergence properties for more advanced system identification methods, such as subspace identification, which we shall leave as a further research direction.
It is worth noticing that the calculation of the matrices U, W, P and X requires H τ for all τ ≥ 0. Next we shall show that in fact only finitely many H τ s are needed to compute those matrices, which requires one intermediate result: Lemma 2. Assuming the matrix A is diagonalizable with λ 1 , . . . , λñ being its distinct eigenvalues, then there exist unique Ω 1 , · · · , Ωñ, such that Proof. Without loss of generality, we assume that A is a diagonal matrix. As a result, A τ = diag(λ τ 1 I 1 , λ τ 2 I 2 , · · · , λ τ n Iñ), where λ i denotes the ith distinct eigenvalue of A, λ τ i denotes λ i to the power of τ , and I i is the identity matrix of size n i by n i with n i the multiplicity of λ i . Hence, we have with Ω i = C diag(0, . . . , 0, I i , 0, . . . , 0)B, which completes the proof.
Since A satisfies its own minimal polynomial p( . + α 0 , we know that for any i ≥ 0: Leveraging (22), we could use H 0 , H 1 , · · · , H 3ñ−2 to estimate both λ i s and Ω i s and thus H τ for any τ . To this end, let us define:    α k,0 . . . where Remark 12. One can prove that α k,i from (23) is the solution of the following minimization problem: where · F denotes the Frobenius norm of a matrix.
Let us denote the roots of the polynomial p k (x) = xñ +α k,ñ−1 xñ −1 +· · ·+α k,0 to be λ k,1 , · · · , λ k,ñ . Define a Vandermonde like matrix V k to be where λ k,i is an estimate of λ i at time k and λ τ k,i is λ k,i to the power of τ , and we shall estimate Ω i as    Ω k, 1 . . .
Inference on ϕ k , ϑ k and W This subsection is devoted to the inference of ϕ k and ϑ k defined in (5), which corresponds to the parts of y k generated by the watermark signal and noise respectively. We will further infer the covariance W of ϑ k .
Let us defineφ withφ k,i = λ k,iφk−1,i + Ω k,i φ k , andφ −1,i = 0. As a result, we can estimate ϑ k aŝ The covariance of ϑ k can be estimated as Inference on P, X , U and g k Finally we can derive an estimation of the P and X matrices, which are required to compute the optimal covariance U of the watermark signal, given by where (28) is derived from the summation of geometric series, and The Neyman-Pearson detection statistics g k can be approximated bŷ where Remark 13. For the proposed online algorithm, the system identification and watermark design are tightly coupled. As is commented in Remark 10, the watermarking-based replay attack detection requires the injection of a rank-1 watermarking signal (assuming it is performed optimally). On the other hand, persistency of excitation is required for system identification, i.e., the injected signal needs to be full rank. As a result, we carefully design the covariance of the injected signal to be the "optimal" rank-1 covariance matrix on the current knowledge of the system, plus a diminishing factor (k + 1) −β I, and we further prove in this paper that this additional term, although vanishing asymptotically, provides us with enough information to perfectly identify the necessary parameters of the system.

Remark 14.
It is worth noticing that comparing to an approach with off-line system identification first and then watermarking design later, our approach provides the following advantages: 1) Theoretically speaking, finite-time system identification cannot identify the system parameters precisely and hence the watermarking scheme will not be optimal if the system identification process is stopped. 2) In practice, the control system could slowly change due to various reasons (e.g., components wear out), so we need to adjust the parameters continuously. 3) Moreover, for many practical control systems, a model of the system is not available. It is often too expensive to stop the system operation and to perform off-line system identification. We would like to further point out that the classical system identification procedure can be easily integrated to our approach, by providing better estimation of P and X in the initialization step in Algorithm 1. Hence, classical system identification approach complements our algorithm very well.

B. Algorithm Properties
The following theorem establishes the convergence of U k, * and g k , the proof of which is reported in the appendix for the sake of legibility.
From the definition of U k = U k, * + (k + 1) −β δI, we immediately have the following corollary: Assuming that A is strictly stable and Assumption 2 holds. If 0 < β < 1, then for any ǫ > 0, the following limit holds almost surely: Remark 15. It is worth noticing that (32) implies that both U k, * − U * andĝ k − g k are of the order O(k −γ+ǫ ) as k goes to infinity. Hence, the convergence rate γ is maximized when β → 0 + , which corresponds to the case where the exploration term (k + 1) −β δI in U k stays constant. However, although this will maximize the performance for the inference algorithm, the covariance U k of the watermark signal φ k will not converge to the true optimal U * . In order to achieve "fastest" convergence rate of U k , we need to choose the decay rate for the exploration term to be β = 1/3 = arg max β (γ, β). We would also like to point out that Theorem 3 only provides an upper bound for the almost sure convergence rate and we plan to investigate the exact convergence rate in our future work. It is also interesting to see if faster convergence can be achieved by using more advanced system identification techniques.

V. SIMULATION RESULT
In this section, the performance of the proposed algorithm is evaluated. We will apply the proposed online "learning" approach to a numerical example and an industrial process, Tennessee Eastman Process (TEP).

A. A Numerical Example
First we choose m = 3, n = 5, p = 2 and A, B, C are all randomly generated, with A being stable. It is assumed that X in (12), the covariance matrices Q and R are all identity matrices with proper dimensions. We assume that δ in (14) is equal to 10% of optimal LQG cost J 0 . Fig. 2 shows relative error U k, * − U * F / U * F of the estimated U k, * v.s. time k for different βs. it seems that the convergence speed of the error for different β is comparable. Notice that Theorem 3 only provides an upper bound for the convergence rate. As a result, it would be interesting to quantify the exact impact of β on the convergence rate, which we shall leave as a future research direction. Now we consider the detection performance of our online watermark signal design, after an initial inference period, where no attack is present. It is assumed that the attacker records the sensor readings from time 10 4 + 1 to 10 4 + 100 and replays them to the system from time 10 4 + 101 to 10 4 + 200. Fig 3 shows the trajectory of the Neyman-Pearson statistic g k and our estimateĝ k of g k for one simulation. Notice thatĝ k can track g k with high accuracy. Furthermore, bothĝ k and g k are significantly larger when the system is under replay attack (after time 10 4 + 101). Hence one can conclude that even without parameter knowledge, we can successfully estimate g k and detect the presence of the replay attack.

B. TEP Example
Tennessee Eastman Process (TEP) is a commonly used process control system proposed by Downs and Vogel in [29]. In this simulation, we adopt a simplified version of TEP from [30], as follows: where A, B and C are constant matrices 1 .
This system simulates a MIMO system of order n = 8 with p = 4 inputs and m = 10 outputs. We discretize the system using the control system toolbox in MATLAB, by selecting a sample time of 0.6s. Again, we choose X in (12), the covariance matrices Q and R to be identity matrices with proper dimensions. We assume that δ in (14) is equal to 5% of J 0 , and β = 1/3. In this simulation, we assume that we do not know the dimension of the state space, which is 8, and instead we underestimate it by assuming that A only hasñ = 5 distinct eigenvalues. Fig. 4 illustrates the relative error U k, * − U * F / U * F after running the system for roughly 1 week (10 6 × 0.6s ≈ 0.992week). Fig 5 illustrates the NP statistics g k and the estimated NP statisticsĝ k , assuming that the adversary collects the measurement from 10 6 + 1 to 10 6 + 100 and replays them to the system from time 10 6 + 101 to 10 6 + 200. One can see that although we underestimate the dimensions of the system, our algorithm can still achieve a high accuracy.

VI. CONCLUSION
In this paper, an algorithm that can simultaneously generate the watermarking signal and infer the system parameters is proposed. We prove that our algorithm converges to the optimal one and characterize an upper bound for the almost surely convergence rate. For future works, we would like to quantify the exact convergence rate, as well as exploring other system identification methods and prove their convergence. It is of interest to study secure control in other cases, such as batachoperating process. We are also interested in adversarial learning when the sensor data is compromised.

APPENDIX A PROOF OF THEOREM 3
The whole appendix is devoted to proving Theorem 3. We shall present several preliminary results first and then proceed with the proof of Theorem 3.

Preliminary Results
To simplify notations, for a random variable (vector, or matrices) x k , we denote that Notice that x k ∼ O(k α ) implies that x k ∼ C(α), but the reverse is not necessarily true 2 . The following lemma establishes some basic properties of C(α) functions: Assuming that x k ∼ C(α) and y k ∼ C(β), with α ≥ β, then the following statements hold: 3) Suppose f is differentiable at 0 and α < 0, then Proof. The first three statements can be trivially proved and hence we only focus on the last three statements. 4) Since x k ∼ C(α), it is easy to see that for any ǫ > 0, As a result, which almost surely converges to 0 as k goes to ∞. As a result, s k ∼ C(α). 5) For the last statement, notice that Therefore, the argument that S k ∼ C(α) follows the same line of proof as the fourth statement. 6) We only need to prove for the case where ζ k follows the standard normal distribution. The high dimensional case can then be proved by checking each entry of ζ k with proper scaling and shifting. For any ǫ, φ > 0, we have Suppose that k is large enough, such that φk ǫ > 1, then we have As a result, using direct comparison test for infinite series, we can prove that which further implies (by Borel-Cantelli Lemma), that Since φ can be arbitrarily small, ζ k /k ǫ → 0 almost surely, which finishes the proof.
Let {F k } be a filtration of sigma algebras and {M k } be a matrix-valued stochastic process that is adapted to the filtration {F k }, we call {M k } a (matrix-valued) matingale (with respect to the filtration {F k }) if the following equality holds for all t: For the rest of the paper, we shall assume that the filtration F k is the σ-algebra generated by the random variables {x −1 , φ 0 , · · · , φ k , w 0 , · · · , w k , v 0 , · · · , v k }. Now we have the following lemma to establish a strong law for matrix-valued martingale: where 0 ≤ β < 1, then M k /k converges to 0 almost surely. Furthermore, Proof. Let Φ k,ij (M k,ij ) be the (i, j)-th entry of the matrix Φ k (M k ). It is easy to prove that {M k,ij } is a scalar martingale (adapted the same filtration {F k }) and since 3 we have that EΦ 2 k,ij ∼ C(β). For simplicity, let us define κ = (β + 1)/2. One can easily verify that for any ǫ > 0 and large enough i, the following inequalities hold: The last inequality is true since ǫ > 0 and for large enough i, Finally, one can prove the following equality As a result, by Lemma 1 in [31], we can deduce that Notice that (34) is true for all entries of the matrix M k . Therefore, M k /k ∼ C(κ − 1), with κ − 1 = (β + 1)/2 − 1 = (β − 1)/2. Since β < 1, M k /k converges to 0 almost surely.
We are now ready to prove Theorem 3, which requires several intermediate steps.
Proof. It is easy to see that y k and U k+1 are measurable w.r.t. F k . Furthermore, let k 1 , k 2 ≥ 0 be two time indices, then it is easy to prove that which, combined with (4), implies that Next we shall compute the expectation of y k+τ φ T k U −1 k 2 . Notice that from (19), φ k = U 1/2 k ζ k , where ζ k follows the standard normal distribution. Hence, The last inequality is true due to (35). As a result, by Cauchy-Schwarz inequality, Notice that ζ k is χ-distributed with p degree of freedom, which implies that E ζ k 4 = p(p + 2). On the other hand, one can prove that sup k E y k 4 is bounded since by (35), U k is upper bounded. As a result, we prove that which further implies that As a result, by (37), one can prove that the following stochastic process is a matrix-valued martingale for the filtration F kτ +i , whereτ = τ + 1, and 0 ≤ i ≤ τ . Now by (38) and Lemma 4, we know that From the definition of S τ,i (k), one can see that for large enough k, where k i = max{t ∈ N : tτ + i ≤ k}. Notice that k i ≥ 0 and k i = k. Hence, the estimation error of H k,τ − H τ is a convex combination of S τ,i s. As a result, for any ǫ > 0 Notice that when k is large enough, k/k i → τ , which implies that H k,τ − H τ ∼ C(−γ). The a.s. convergence can be trivially proved by the fact that γ > 0 is positive.

Convergence of λ k,i and Ω k,i
Notice that due to the convergence of H k,τ to H τ , we have that Ξ k converges to Ξ, where We shall first prove that Ξ is invertible. Suppose that there existsα = [α 0 , . . . ,αñ −1 ] T , such that Ξα = 0, then By the fact that (A, B) is controllable and (A, C) is observable,p(A) must be 0. However, since p(x) is minimal polynomial of A,p(x) must be constantly 0, which proves thatα = 0 and Ξ is invertible. Let us denote α i s as the coefficients of the minimal polynomial p(x) = xñ + αñ −1 xñ −1 + · · · + α 0 of A, i.e., the monic polynomial with minimum degree. As a result, we have H i+ñ + αñ −1 H i+ñ−1 + · · · + α 0 H i = CA i p(A)B = 0.
Since p(x) is the minimal polynomial of A and A is diagonalizable, the roots λ i s of p(x) are distinct, which proves that V is of full column rank, i.e., rank(V ) =ñ. Therefore, rank(V ⊗ I m ) = rank(V ) × rank(I m ) =ñm, which implies that V ⊗ I m is of full column rank. Therefore, by Lemma 2, Hence, by Lemma 3.3, Ω k,i − Ω i ∼ C(−γ).
where ϑ k k t=0 CA t w k−t + v k + CA k+1 x −1 .
Proof. Let us define function A : R n×n → R n×n , such that for any symmetric matrix X ∈ S n×n , A(X) = X + AXA T + A 2 XA 2T + · · · , For non-symmetric X ∈ R n×n , we define A(X) = A X + X T 2 .
One can prove that To simplify notations, let us define w −1 = x −1 . By mathematical induction, k t=0 ϑ t ϑ T t can be written as where with and initial condition One can then prove that Hence, M k − kW is a martingale and M k /k − W ∼ C(−0.5) by Lemma 4. On the other hand, for N k , since A ⊗ A is stable, N k ∼ C(0) by Lemma 3, which proves that 1 k + 1 k t=0 ϑ t ϑ T t − W ∼ C(−0.5).

Convergence of the Rest
By Lemma 3.3, one can prove that P k − P, X k − X are all of the class C(−γ), as they are differentiable functions of λ k,i , Ω k,i and W k . Therefore, U k, * − U * ∼ C(−γ) since U k, * is a differentiable function of P k and X k at a neighborhood of P and X (see [32]).
Hence, one can prove that U k − U ∼ C(−γ), as U k is a differentiable function of λ k,i , Ω k,i and U k, * . Finally we prove thatĝ k − g k ∼ C(−γ) due to Lemma 3.1.