Change Detection via Affine and Quadratic Detectors

The goal of the paper is to develop a specific application of the convex optimization based hypothesis testing techniques developed in A. Juditsky, A. Nemirovski,"Hypothesis testing via affine detectors,"Electronic Journal of Statistics 10:2204--2242, 2016. Namely, we consider the Change Detection problem as follows: given an evolving in time noisy observations of outputs of a discrete-time linear dynamical system, we intend to decide, in a sequential fashion, on the null hypothesis stating that the input to the system is a nuisance, vs. the alternative stating that the input is a"nontrivial signal,"with both the nuisances and the nontrivial signals modeled as inputs belonging to finite unions of some given convex sets. Assuming the observation noises zero mean sub-Gaussian, we develop"computation-friendly"sequential decision rules and demonstrate that in our context these rules are provably near-optimal.

• whenever the nuisance hypothesis is true, the probability of false alarm (signal conclusion somewhere on time horizon t = 1, ..., d) is at most a given ∈ (0, 1/2); • for every t ≤ d and every τ ≤ t, whenever the input is a signal of shape τ and magnitude ≥ ρ tτ , the probability of signal conclusion at time t or earlier is at least 1 − . In other words, for every input of shape τ and magnitude ≥ ρ tτ , the probability of the nuisance conclusions at all time instants 1, 2, ..., t should be at most .
In what follows we refer to as to risk of the collection {T t , 1 ≤ t ≤ d}. Needless to say, we would like to meet the outlined design specifications with as small thresholds ρ tτ as possible.
Our related results can be summarized as follows: we develop specific decision rules T t and thresholds ρ tτ meeting the design specifications and such that • T t and ρ tτ are yielded by explicit convex optimization problems and thus can be built in a computationally efficient fashion; moreover, the decision rules T t are easy to implement; • the resulting inference procedure is near-optimal in some precise sense. Specifically, for every τ and t, 1 ≤ τ ≤ t ≤ d, consider the testing problem where, given the observations ω 1 , ..., ω t , we want to decide on only two hypotheses on input u underlying the observations: the hypothesis H 1 "u = 0" and the alternative H 2 (ρ) "u is a signal of shape τ and magnitude ≥ ρ," where ρ > 0 is a parameter. It may happen that these two hypotheses can be decided upon with risk ≤ , meaning that "in the nature" there exists a test which, depending on observations ω 1 , ..., ω t , accepts exactly one of the hypotheses with error probabilities (i.e., probability to reject H 1 when u = 0 and the probability to reject H 2 (ρ) when u is a signal of shape τ and magnitude ≥ ρ) at most . One can easily find the smallest ρ = ρ * tτ for which such a test exists 1 . Clearly, by including those considered in this paper, the relatively broad applicability of operational results more than compensates for the lack of explanatory power that is typical of computation-based constructions. It should be added that under favorable circumstances (which, in the context of this paper, do take place in case I), the operational procedures we are about to develop are provably near-optimal in a certain precise sense (see Section 3.4). Therefore, their performance, whether good or bad from the viewpoint of a particular application, is nearly the best possible under the circumstances.

Terminology and notation
In what follows, 1. All vectors are column vectors. 2. We use "MATLAB notation:" for matrices A 1 , ..., A k of common width, [A 1 ; A 2 ; ...; A k ] stands for the matrix obtained by (up-to-down) vertical concatenation of A 1 , A 2 , ..., A k ; for matrices A 1 , ..., A k of common height, [A 1 , A 2 , ..., A k ] is the matrix obtained by (left-to-right) horizontal concatenation of A 1 , A 2 , ..., A k .
3. S n is the space of n × n real symmetric matrices, and S n + is the cone of positive semidefinite matrices from S n . Relation A B (A B) means that A, B are symmetric matrices of the same size such that A − B is positive semidefinite (respectively, positive definite), and B A (B ≺ A) is the same as A B (respectively, A B). 4. SG [U, U], where U is a nonempty subset of R n , and U is a nonempty subset of S n + , stands for the family of all Borel sub-Gaussian probability distributions on R n with sub-Gaussianity parameters from U × U. In other words, P ∈ SG [U, U] if and only if P is a probability distribution such that for some u ∈ U and Θ ∈ U one has ln( e h T y P (dy)) ≤ u T h + 1 2 h T Θh for all h ∈ R n (whenever this is the case, u is the expectation of P ); we refer to Θ as to sub-Gaussianity matrix of P . For a random variable ξ taking values in R n , we write ξ ∼ SG [U, U] to express the fact that the distribution P of ξ belongs to SG [U, U].
Similarly, G[U, U] stands for the family of all Gaussian distributions N (u, Θ) with expectation u ∈ U and covariance matrix Θ ∈ U, and ξ ∼ G[U, U] means that ξ ∼ N (u, Θ) with u ∈ U , Θ ∈ U.

Dynamic change detection: preliminaries
In the sequel, we address the situation which can be described informally as follows. We observe noisy outputs of a linear system at times t = 1, ..., d, the input to the system being an unknown vector x ∈ R n . Our "full observation" is whereĀ d is a given ν d × n sensing matrix, and ξ d ∼ SG [{0}, U] (see item 4 in Section 1.2), where U is a given nonempty convex compact subset of int S ν d + . Observation y d is obtained in d steps; at a step (time instant) t = 1, ..., d, the observation is where 1 ≤ ν 1 ≤ ν 2 ≤ ... ≤ ν d , S t is ν t ×ν d matrix of rank ν t and y t "remembers" y t−1 , meaning that S t−1 = R t S t for some matrix R t . Clearly, ξ t is sub-Gaussian with parameters (0, Θ t ), with Θ t = S t ΘS T t ⊂ U t := {S t ΘS T t : Θ ∈ U}; (2.3) note that U t , 1 ≤ t ≤ d, are convex compact sets comprised of positive definite ν t × ν t matrices.
Our goal is to build a dynamic test for deciding on the null, or nuisance, hypothesis, stating that the input to the system underlying our observations is a nuisance, vs. the alternative of a signal input. Specifically, at every time t = 1, ..., d, given observation y t , we can either decide that the input is a signal and terminate ("termination at step t with a signal conclusion," or, equivalently, "detection of a signal input at time t"), or to decide ("nuisance conclusion at step t") that so far, the nuisance hypothesis holds true, and to pass to the next time instant t + 1 (when t < d) or to terminate (when t = d).
Given an upper bound on the probability of a false alarm (detecting a signal input somewhere on the time horizon 1, ..., d in the situation when the true input is a nuisance), our informal goal is to build a dynamic test which respects the false alarm bound and under this restriction, detects signal inputs "as fast as possible. " We consider two different types of detection procedures, those based on affine and on quadratic detectors, each type dealing with its own structure of nuisance and signal inputs.

Change detection via affine detectors
We start with describing the structure of nuisance and signal inputs that we intend to deal with.

Setup
Consider the setup as follows.
1. Inputs to the system belong to a given convex compact set X ⊂ R n , and nuisance inputs form a given closed and convex subset N of X, with 0 ∈ N .
2. Informal description of a signal input x is as follows: x ∈ X is obtained from some nuisance input v by adding an "activation" w of some shape and some magnitude. There are K possible shapes, k-th of them represented by a closed convex set W k ⊂ R n such that 2.1. 0 ∈ W k ; 2.2. W k is semi-conic, meaning that when w ∈ W k and ρ ≥ 1, it holds ρw ∈ W k .
The magnitude of an activation is just a positive real, and an activation of shape k and magnitude at least ρ > 0 is an element of the set Example: Let K = n and let W k be the set of all inputs w ∈ R n with the first k − 1 entries in w equal to zero, and k-th entry ≥ 1. In this case, the shape of an activation w ∈ R n is its "location" -the index of the first nonzero entry in w, and activations of shape k and magnitude ≥ ρ are vectors w from R n with the first nonzero entry in position k and the value of this entry at least ρ.
We have presented the simplest formalization of what informally could be called "activation up." To get equally simple formalization of an "activation down," one should take K = 2n and define W 2i−1 and W 2i , i ≤ n, as the sets of all vectors from R n for which the first nonzero entry is in position i, and the value of this entry is at least 1 for W 2i−1 ("activation up" of magnitude ≥ 1 at time i) or is at most −1 for W 2i ("activation down" of magnitude ≥ 1 at time i).
3. The formal description of "signal" inputs is as follows: these are vectors x from X which for some k ≤ K can be represented as x = v + w with v ∈ V k and w ∈ W ρ k for some ρ > 0, where W k are as described above, and V k , 0 ∈ V k , are nonempty compact convex subsets of X. 2 Thus, when speaking about signals (or 8 Y. Cao et al. signal inputs), we assume that we are given K nonempty closed convex sets W k , k ≤ K, each of them semi-conic and not containing the origin, and K nonempty compact convex sets V k ⊂ X. These sets give rise to single-parametric families of compact convex sets k ] X, indexed by "activation shape" k and parameterized by "activation magnitude" ρ > 0. Signals are exactly the elements of the set X = ρ>0,k≤K X ρ k . In the sequel, we refer to inputs from N as to feasible nuisances, to inputs from X ρ k as to feasible signals with activation of shape k and magnitude ≥ ρ, and to inputs from X as to feasible signals. To save words, in what follows "a signal of shape k and magnitude ≥ ρ" means exactly the same as "a signal with activation of shape k and magnitude ≥ ρ." From now on, we make the following assumption: Since X ρ k shrinks as ρ grows due to semi-conicity of W k , it follows that for every k, the sets X ρ k are nonempty for all small enough positive ρ.

Outline
Given an upper bound ∈ (0, 1/2) on the probability of false alarm, our course of actions is as follows. 1. We select d positive reals t , 1 ≤ t ≤ d, such that d t=1 t = ; t will be an upper bound on the probability of a false alarm at time t.
2. We select thresholds ρ tk > 0, 1 ≤ k ≤ K in such a way that a properly designed test T t utilizing the techniques of [17,Section 3] is able to distinguish reliably, given an observation y t , between the hypotheses H 1,t : x ∈ N and H 2,t : x ∈ K k=1 X ρ tk k on the input x underlying observation y t . After y t is observed, we apply test T t to this observation, and, according to what the test says, • either claim that the input is a signal, and terminate, • or claim that so far, the hypothesis of nuisance input seems to be valid, and either pass to the next observation (when t < d), or terminate (when t = d).
The generic construction we intend to use when building the test T t stems from [8,17].

Implementation: preliminaries
Building block: affine detectors for sub-Gaussian families. Our principal building block originates from [17] and is as follows. Let U be a convex compact set comprised of positive definite ν × ν matrices, and U 1 , U 2 be two closed nonempty convex subsets in R ν , with U 1 bounded. The following result was proved in [17]: This saddle point problem is solvable, and a saddle point for the families of distributions Moreover, let is the normal error function. In particular, when deciding, via a single observation ω, on Gaussian hypotheses H G χ , χ = 1, 2, with H G χ stating that ω ∼ N (θ, Θ) with (θ, Θ) ∈ U χ × U, the risk of the test which accepts H G 1 when φ * (ω) ≥ 0 and accepts H G 2 otherwise is at most Erf(δ). Given k ∈ {1, ..., K}, observe that the set X ρ k is nonempty when ρ > 0 is small enough (this was already assumed) and is empty for all large enough values of ρ (since X is compact and W k is a nonempty closed convex set not containing the origin). From these observations and compactness of X it follows that there exists the largest ρ = R k > 0 for which X ρ k is nonempty. Let us fix t ∈ {1, ..., d}, and let be the set of allowed covariance matrices of the observation noise ξ t in observation y t , so that U t is a convex compact subset of the interior of S νt + . According to our assumptions, for any nuisance input the distribution of the associated observation y t , see (2.2), belongs to the family SG[N t , U t ], with where N ⊂ X is the convex compact set of nuisance inputs. Given, along with t, an integer k ≤ K and a real ρ ∈ (0, R k ], we can define the set whatever be a signal input from X ρ k , the distribution of observation y t associated with x belongs to the family SG[U t kρ , U t ]. Applying Proposition 3.1 to data U 1 = N t , U 2 = U t kρ , and U = U t , we arrive at the convex-concave saddle point problem The corresponding saddle point (h tkρ ; θ 1 tkρ , θ 2 tkρ , Θ tkρ ) does exist and gives rise to the affine detector and risk Therefore, in view of (1.3), (3.8) To proceed, we need the following simple observation:
3. Finally, we set α t = ln(κ t / ). and process observation y t at step t as follows: • if there exists k such that ρ tk < ∞ and φ tk (y t ) < α t , we claim that the input underlying observation is a signal and terminate; • otherwise, we claim that so far, the nuisance hypothesis is not rejected, and pass to the next time instant t + 1 (when t < d) or terminate (when t = d).

Characterizing performance
The performance of the above inference procedure can be described as follows: (ii) when t ∈ {1, ..., d} and k ∈ {1, ..., K} are such that ρ tk < ∞, and the input belongs to a set X ρ k with ρ ≥ ρ tk , then the probability to terminate at step t with the signal conclusion is at least 1 − .

Refinement in the Gaussian case
In the case when observation noise ξ in (2.1) is N (0, Θ) with Θ ∈ U, the outlined construction can be refined. Specifically, at a time instant t ≤ d we now act as follows.

Construction
1. Let ErfInv stand for the inverse error function: Assuming < 1/2 and given t ∈ {1, ..., d}, we set for δ ≥ 0 so that L k (δ) is a continuous from the right non-increasing function of δ ≥ 0. We put (3.13) Clearly, δ t is well defined, is positive, and since L k (δ) defined above is continuous from the right. 2. For k ∈ L t (δ t ), we have SV tk (R k ) < − 1 2 δ 2 t , and SV tk (ρ) > − 1 2 δ 2 t for all small enough ρ > 0. Invoking Lemma 3.1, there exists (and can be rapidly approximated to high accuracy by bisection) ρ = ρ tk ∈ Δ k such that After ρ tk is specified, we define the associated detector φ tk (·) ≡ φ tkρ tk (·) by applying the construction from Proposition 3.1 to the data ), that is, find a saddle point (h * ; θ * 1 , θ * 2 , Θ * ) of the convex-concave function → R (such a saddle point does exist). By Proposition 3.1, the affine detector Moreover (see (3.1)), for all α ≤ δ 2 and β ≤ δ 2 it holds Comparing the second equality in (3.16) with the description of SV tk (ρ tk ), we see that = exp{SV tk (ρ tk )}, which combines with the first equality in (3.16) and with (3.15) to imply that δ in (3.16) is nothing but δ t as given by (3.13). The bottom line is that (#) For k ∈ Lt(δt), we have defined reals ρ tk ∈ Δ k and affine detectors φ tk (y t ) such that relations (3.17) are satisfied with δ = δt given by (3.13) and every α ≤ For k ∈ L t (δ t ), we set ρ tk = ∞.
3. Finally, we process observation y t at step t as follows. We set thus ensuring, in view of (3.14), that α ≤ δ 2 t , β ≤ δ 2 t . Next, given observation y t , we look at the k's with finite ρ tk (that is, at k's from L t (δ t )) and check whether for at least one of these k's the relation φ tk (y t ) < α is satisfied. If this is the case, we terminate and claim that the input is a signal, otherwise we claim that so far, the nuisance hypothesis seems to be true, and pass to time t + 1 (if t < d) or terminate (when t = d).

Characterizing performance
The performance of the above inference procedure can be described as follows (cf. Proposition 3.2): (ii) when t ∈ {1, ..., d} and k ∈ {1, ..., K} are such that ρ tk < ∞, and the input belongs to a set X ρ k with ρ ≥ ρ tk , then the probability to terminate at step t with the signal conclusion is at least 1 − .

Near-optimality
Our goal now is to understand how good are the inference procedures we have developed. For the sake of definiteness, assume that We consider two assumptions about the observation noise ξ (2.1) along with two respective change inference procedures: • Sub-Gaussian case, where ξ is known to be sub-Gaussian with parameters (0, Θ) and Θ known to belong to U; the corresponding inference procedure is built in Section 3.2; • Gaussian case, where ξ ∼ N (0, Θ) with Θ ∈ U; the corresponding inference procedure is described in Section 3.3.
Let us fix time instant t ≤ d and signal shape k ≤ K.
Given ∈ (0, 1/2), it may happen that SV tk (R k ) > − 1 2 ErfInv 2 ( ). In this case, informally speaking, even the feasible signal of shape k and the largest possible magnitude R k does not allow to claim at time t that the input is signal "(1 − )-reliably." Indeed, denoting by (h * ; θ * 1 , θ * 2 , Θ * ) the saddle point of the convex-concave function (3.5) The latter implies that when ξ t ∼ N(0, Θ * ) (which is possible), there is no test which allows distinguishing via observation y t with risk ≤ between the feasible nuisance input z * and the feasible signal v * + R k w * of shape k and magnitude ≥ R k . In other words, even after the nuisance hypothesis is reduced to a single nuisance input z * , and the alternative to this hypothesis is reduced to a single signal v * + R k w * of shape k and magnitude R k , we are still unable to distinguish (1 − )-reliably between these two hypotheses via observation y t available at time t. Now consider the situation where Similarly to the above, ρ * tk is just the smallest magnitude of signal of shape k which is distinguishable from nuisance at time t, meaning that for every ρ < ρ * tk there exist a feasible nuisance input u and feasible signal input of shape k and magnitude ≥ ρ such that these two inputs cannot be distinguished via y t with risk ≤ . A natural way to quantify the quality of an inference procedure is to look at the smallest magnitude ρ of a feasible signal of shape k which, with probability 1 − , ensures the signal conclusion and termination at time t. We can quantify the performance of a procedure by the ratios ρ/ρ * tk stemming from various t and k, the closer these ratios are to 1, the better. The result of this quantification of the inference procedures we have developed is as follows: Gaussian case U contains -largest element, sub-Gaussian case.
(3.21) Then, whenever the input is a feasible signal of shape k and magnitude at least χρ * tk , the probability for the inference procedure from Section 3.2 in the sub-Gaussian case, and the procedure from Section 3.3 in the Gaussian case, to terminate at time t with the signal inference is at least 1 − . Discussion. Proposition 3.4 states that when (3.19) holds (which, as was explained, just says that feasible signals of shape k of the largest possible magnitude R k can be (1 − )-reliably detected at time t), the ratio χ of the magnitude of a signal of shape k which is detected (1 − )-reliably by the inference procedure we have developed to the lower bound ρ * tk on the magnitude of activation of shape k detectable (1 − )-reliably at time t by any other inference procedure can be made arbitrarily close to the right hand side quantities in (3.21). It is immediately seen that the latter quantities are upper-bounded bȳ χ = O(1) ln(Kd/ )/ ln(1/ ), provided ≤ 0.5. We see that unless K and/or d are extremely large,χ is a moderate constant. Moreover, when K, d remain fixed and → +0, we haveχ → 1, which, informally speaking, means that with K, d fixed, the performance of the inference routines in this section approaches the optimal performance as → +0.

Numerical illustration
The setup of the numerical experiment we are about to report upon is as follows. We observe on time horizon {t = 1, 2, ..., d = 16} the output z 1 , z 2 , ... of the dynamical system where Δ is the shift in the space of two-sided sequences: is the random input noise with zero mean independent Gaussian components ζ t with variances varying in [σ 2 , 1], with some given σ ∈ (0, 1]. Our goal is to dynamically test the nuisance hypothesis about system's input vs. a signal alternative. We start with specifying the model of the system input. Note that, aside from noise and the system input u d = [u 1 ; ...; u d ] on the time horizon we are interested in, the observed output [z 1 ; ...; z d ] depends on the past -prior to time instant t = 1 -outputs and inputs. The influence of this past on the observed behavior of the system can be summarized by the initial conditions v (in the case of the dynamics described by (3.22), v ∈ R 3 ). We could augment the input u d by these initial conditions to consider as the input the pair x = [v; u d ], and express our hypotheses on input in terms of x, thus bringing the situation back to that considered in Section 3.1. It turns out, however, that when no restrictions are imposed on the initial conditions, our inferential procedure may become numerically unstable. On the other hand, note that by varying the initial conditions we shift the trajectory z t = [z 1 ; ...; z t ] along the low-dimensional linear subspace E t ∈ R t (in the case of (3.22) E t is the space of collections (z 1 , ..., z s ) with entries z s quadratically depending on s). Given t, we can project the observed z t onto the orthogonal complement of E t in R t and treat this projection, y t , as the observation we have at time t. It is immediately seen that the resulting observation scheme is of the form (2.2): with matrixĀ t readily given by t, and zero mean Gaussian noise ξ t with covariance matrix Θ belonging to the "matrix interval" Note that the restriction t ≥ 4 reflects the fact that for t ≤ 3, E t = R t , and thus our observations z t , t ≤ 3, bear no information on the input u d . Now we have reduced the problem to the framework of Section 3.1, with inputs to the system being the actual external inputs u d = [u 1 ; ...; u d ] on the observation horizon. In our experiments, the nuisance and signal inputs were as follows: • The set X of all allowed inputs was • The set N of nuisances was just the origin: • The sets V k and W k , 1 ≤ k ≤ K, responsible for signal inputs, were as follows: the number K of these sets was set to d = 16, and we used V k = {0}, k ≤ K. We have considered three scenarios for the sets W k of "activations of shape k and magnitude at least 1:" In other words, in our experiments, signals of shape k are exactly the same as "pure activations" of shape k -these are the sequences u 1 , ..., u d which "start" at time k (i.e., u t = 0 for t < k), of magnitude which is the value of u k . In addition, there are some restrictions, depending on the scenario in question, on u t 's for t > k.
In this situation, the detection problem becomes a version of the standard problem of detecting sequentially a pulse of a given form in the (third) derivative of a time series observed in Gaussian noise. The goal of our experiment was to evaluate the performance of the inference procedure from Section 3.3 for this example. The procedure was tuned to the probability of false alarm = 0.01, equally distributed between the d = 16 time instants, that is, we used t = 0.01/16, t = 1, ..., d = 16.
We present the numerical results in Figure 1. We denote by ρ tk the magnitude of an activation of shape k which is provably detected at time k with confidence level 0.99; we also denote by ρ * tk the "oracle" lower bound on this quantity. 4 Figure 1 displays the dependence of ρ tk (left plots) and the ratio ρ tk /ρ * tk (right plots) on k (horizontal axis) for different activation geometries (pulses, jumps up, and steps). We display these data only for the pairs t, k with finite ρ * tk ; recall that ρ * tk = ∞ means that with the upper bound R = 10 4 on the uniform norm of a feasible input, even the ideal inference does not allow us to detect 0.99-reliably an activation of shape k at time t.
Our experiment shows that ρ * tk is finite in the domain The restriction k ≤ t is quite natural: we cannot detect a signal of shape k before the corresponding activation starts. Note that signals of shapes k = 1, 2 are "undetectable," and that no signal inputs can be detected at time t = 3 seemingly due to the fact that activation can be completely masked by the initial conditions in the case of "early" activation and/or short observation horizon. Our experiment shows that this phenomenon affects equally the inference routines from Sections 3.2 and 3.3, and the ideal detection, and disappears when the initial conditions for (3.23) are set to 0 and our inferences are adjusted to this a priori information.
The data in Figure 1 show that the "non-optimality ratios" ρ tk /ρ * tk of the proposed inferences as compared to the ideal detectors are quite moderatethey never exceed 1.34; not that bad, especially when taking into account that the ideal detection assumes a priori knowledge of the activation shape (position).

Extension: union-type nuisance
So far, we have considered the case of a single nuisance hypothesis and multiple signal alternatives. The proposed approach can be easily extended to the case of multiple nuisance hypotheses, namely, to the situation differing from the one described in Section 3.1 in exactly one point -instead of assuming that nuisances belong to a closed convex set N ⊂ X, we can assume that nuisance inputs run through the union M m=1 N m of given closed convex sets N m ⊂ X, with 0 ∈ N m for all m. The implied modifications of our constructions and results are as follows.
1 ≤ m ≤ M , and thus -to the parametric families (3.25) so that K t (κ) is nondecreasing and continuous from the left. At time instant t we act as follows: 1. We define the quantity Clearly, κ t is well defined, takes values in (0, 1), and since K t (κ) is continuous from the left, we have Invoking Lemma 3.1, there exists (and can be rapidly approximated to high accuracy by bisection) Given ρ tk , we define the affine detectors where (h tkm ; θ 1,tkm , θ 2,tkm , Θ tkm ) is a solution to the saddle point problem (3.24) with ρ = ρ tk . For k ∈ K t (κ t ), we set ρ tk = +∞. 3. Finally, we process the observation y t at step t as follows: • if there exists k such that ρ tk < ∞ and φ m tk (y t ) < α t for all m ≤ M , we claim that the observed input is a signal, and terminate; • otherwise, we claim that so far, the nuisance hypothesis is not rejected, and pass to the next time instant t + 1 (when t < d) or terminate (when t = d).
The performance of the inference policy we have described is given by the following analogue of Proposition 3.2: are well defined (for "lower bound interpretation" of these quantities, see comments after (3.20)). Then for every χ satisfying and every feasible signal input of shape k and magnitude ≥ χ max m≤M ρ m * tk , the probability of termination with signal conclusion at time t is ≥ 1 − .
Proof of Proposition 3.5 is given by a straightforward modification of the proofs of Propositions 3.2 and 3.4.

Outline
In Section 3, we were interested in deciding as early as possible upon the hypotheses about the input x underlying observations (2.2) in the situation where both signals and nuisances formed finite unions of convex sets. Solving this problem was reduced to decisions on pairs of convex hypotheses -those stating that the expectation of a (sub-)Gaussian random vector with partly known covariance matrix belongs to the union of convex sets associated with the hypotheses, and we could make decisions looking at the (signs of) properly built affine detectors -affine functions of observations. Now we intend to address the case when the signals (or nuisances) are specified by non-convex restrictions, such as "u belongs to a given linear subspace and has Euclidean norm at least ρ > 0." This natural setting is difficult to capture via convex hypotheses: in such an attempt, we are supposed to "approximate" the restriction "the · 2 -norm of vector x is ≥ ρ" by the union of convex hypotheses like "i-th entry in x is ≥ ρ "/"i-th entry in x is ≤ −ρ "; the number of these hypotheses grows with the input's dimension, and the "quality of approximation," whatever be its definition, deteriorates as the dimension grows.
In this situation, a natural way to proceed is to look at "quadratic liftings" of inputs and observations. Specifically, given a vector w of dimension m, let us associate with it its "quadratic lifting" -the symmetric (m + 1) × (m + 1) matrix Z(w) = [w; 1][w; 1] T . Observe that the restrictions on w expressed by linear and quadratic constraints induce linear restrictions on Z(w). Secondly, given noisy observation y t =Ā t x + ξ t of signal x, the quadratic lifting Z(y t ) can be thought of as noisy observation of an (here and in what follows the empty block refers to the null matrix). As a result, roughly speaking, linear and quadratic constraints on the input translate into linear constraints on the expectation of "lifted observation" Z(y t ), and different hypotheses on input, expressed by linear and quadratic constraints, give rise to convex hypotheses on Z(y t ). Then, in order to decide on the resulting convex hypotheses, we can use affine in Z(y t ), that is, quadratic in y t , detectors, and this is what we intend to do.

Gaussian case
In the sequel, the following result (which is a slightly modified concatenation of Propositions 3.1 and 5.1 of [17]) is used: (i) Let U be a convex compact set contained in the interior of the cone S ν + of positive semidefinite ν × ν matrices in the space S ν of symmetric ν × ν matrices. Let Θ * ∈ S ν + be such that Θ * Θ for all Θ ∈ U, and let δ ∈ [0, 2] be such that where · is the spectral norm. 5 Finally, let γ ∈ (0, 1), A be a ν × (n + 1) matrix, Z be a nonempty convex compact subset of the set Z + = {Z ∈ S n+1 be the support function of Z. These data specify the closed convex set  (4.6) That is, the risk, as defined in item 5 of Section 1.2, of the detector φ * on the families For the proof, see [17]; for the reader's convenience, we reproduce the proof in Section A.5. The justification for the remark below can be found in appendix A.6.
with diagonal matrices Q χ j , 6 and these sets intersect the interior of the positive semidefinite cone S ν+1 + . In this case, the convex-concave saddle point problem (4.7) admits a saddle point (h * , H * ; Θ * 1 , Θ * 2 ) where h * = 0 and H * is diagonal, and restricting h to be zero and H to be diagonal reduces drastically the design dimension of the saddle point problem.

Sub-Gaussian case
Sub-Gaussian version of Proposition 4.1 is as follows:

Proposition 4.2.
(i) Let U be a convex compact set contained in the interior of the cone S ν + of positive semidefinite ν × ν matrices in the space S ν of symmetric ν × ν matrices, let Θ * ∈ S ν + be such that Θ * Θ for all Θ ∈ U, and let δ ∈ [0, 2] be such that (4.1) holds true. Finally, let γ, γ + be such that 0 < γ < γ + < 1, A be ν × (n + 1) matrix, Z be a nonempty convex compact subset of the set Z + = {Z ∈ S n+1 + : Z n+1,n+1 = 1}, and let φ Z (Y ) be the support function of Z, see (4.2). These data specify the closed convex sets (4.11) when applied to the families of sub-Gaussian distributions SG χ , χ = 1, 2, has the risk Similarly, the convex minimization problem Change detection via affine and quadratic detectors 25 is solvable, and the induced by its optimal solution (h * , H * ) quadratic detector when applied to the families of sub-Gaussian distributions SG χ , χ = 1, 2, has the risk so that for just defined φ * and relation (4.13) takes place.
Remark 4.2. Proposition 4.2 offers two options for building quadratic detectors for the families SG 1 , SG 2 , those based on the saddle point of (4.12) and on the optimal solution to (4.13). Inspecting the proof, the number of options can be increased to 4: we can replace any of the functions Φ δχ Aχ,Zχ , χ = 1, 2 (or both these functions simultaneously) with Φ Aχ,Zχ . The second of the original two options is exactly what we get when replacing both Φ δχ Aχ,Zχ , χ = 1, 2, with Φ Aχ,Zχ . It is easily seen that depending on the data, each of these 4 options can result in the smallest risk bound. Thus, it makes sense to keep all these options in mind and to use the one which, under the circumstances, results in the best risk bound. Note that the risk bounds are efficiently computable, so that identifying the best option is easy.

Setup
We continue to consider the situation described in Section 2, but with different specifications of noise and of nuisance and signal inputs, as compared to Section 3.1.
We define nuisance and signal inputs as follows. 1. Admissible inputs, nuisance and signal alike, belong to a bounded set X ⊂ R n containing the origin cut off R n by a system of quadratic inequalities: where Q i are (n + 1) × (n + 1) symmetric matrices. We assume w.l.o.g. that the first constraint defining X is x 2 2 ≤ R 2 , that is, Q 1 is the diagonal matrix with the diagonal 1, ..., 1, 0, and q 1 = R 2 . We set so that X is a convex compact set in S n+1 + , and Z(x) ∈ X for all x ∈ X. 2. The set N of nuisance inputs contains the origin and is cut off X by a system of quadratic inequalities, so that N = {x ∈ R n : Tr(Q i Z(x)) ≤ q i , 1 ≤ i ≤ I + }, I + > I. (4.16) We set 17) so that N ⊂ X is a convex compact set in S n+1 + , and Z(x) ∈ N for all x ∈ N . 3. Signals belonging to X are of different shapes and magnitudes, with signal of shape k, 1 ≤ k ≤ K, and magnitude ≥ 1 defined as a vector from the set with two types of quadratic constraints: • constraints of type A: b ik ≤ 0, the symmetric matrices Q ik have zero North-West (NW) block of size n × n, and zero South-East (SE) diagonal entry; these constraints are just linear constraints on x; • constraints of type B: b ik ≤ 0, the only nonzeros in Q ik are in the NW block of size n × n.
We denote the sets of indices t of constraints of these two types by I A k and I B k and assume that at least one of the right hand sides b ik is strictly negative, implying that W k is at a positive distance from the origin. We define a signal of shape k and magnitude ≥ ρ > 0 as a vector from the set W ρ k = ρW k ; note that We set ensuring that Z(x) ∈ W ρ k whenever x ∈ W ρ k . Note that sets W ρ k shrink as ρ > 0 grows due to b ik ≤ 0. We assume that for small ρ > 0, the sets W ρ k ∩ X are nonempty (this is definitely the case when some signals of shape k and positive magnitude are admissible inputs -otherwise signals of shape k are of no interest in our context, and we can ignore them). Since X is compact and some of b ik are negative, the sets W ρ k are empty for large enough values of ρ. As a byproduct of the compactness of X , it is immediately seen that there exists R k ∈ (0, ∞) such that W ρ k is nonempty when ρ ≤ R k and is empty when ρ > R k .

Change detection via quadratic detectors, Gaussian case
In this section, we consider the situation of Section 2, assuming the noise ξ d in (2.1) to be zero mean Gaussian: ξ d ∼ N (0, Θ).

Preliminaries
Given t ≤ d, let us set so that the observation y t ∈ R νt at time t is Gaussian with the expectation A t [x; 1] and covariance matrix Θ belonging to the convex compact subset U t of the interior of the positive semidefinite cone S νt + , see (2.2), (2.3). We fix γ ∈ (0, 1), and Θ * ,d ∈ S ν d + such that Θ * ,d Θ for all Θ ∈ U d . For 1 ≤ t ≤ d, we set Θ * ,t = S t Θ * ,d S T t , so that Θ * ,t 0 is such that Θ * ,t Θ for all Θ ∈ U t . Further, we specify reals δ t ∈ [0, 2] and γ ∈ (0, 1) such that and given t, k and ρ ∈ (0, R k ], we set and Z ρ k = W ρ k X . Invoking Proposition 4.1, we obtain the following

This saddle point problem is solvable, and a saddle point (h * , H
such that, when applied to observation y t =Ā t x + ξ t , see (2.2), we have:

Y. Cao et al.
(ii) whenever x ∈ X is a signal of shape k and magnitude ≥ ρ,
Our change detection procedure is as follows: at a step t = 1, 2, ..., d, given the observation y t , we look at all values k ≤ K for which ρ tk < ∞. If k is such that ρ tk < ∞, we check whether φ tkρ tk (y t ) < α. If it is the case, we terminate with a signal conclusion. If φ tkρ tk (y t ) ≥ α for all k corresponding to ρ tk < ∞, we claim that so far, the nuisance hypothesis seems to be valid, and pass to time t + 1 (if t < d) or terminate (if t = d).

Proposition 4.3.
Let the input x ∈ X be observed according to (2.2), and let the observation noise ξ d be Gaussian with zero mean and covariance matrix Θ ∈ U d . Then • if x is a nuisance, the probability for the above detection procedure to terminate with a signal conclusion is at most ; • if x is a signal of shape k and magnitude ≥ ρ > 0, and t ≤ d is such that ρ tk ≤ ρ, then the probability for the detection procedure to terminate with a signal conclusion at time t or earlier is at least 1 − .

Numerical illustration
Here we report on a preliminary numerical experiment with the proposed detection procedure via quadratic detectors.
Observation scheme we deal with is given by here z t , x t , w t are, respectively, the states, the inputs and the outputs of a linear dynamical system, of dimensions n z , n x , n w , respectively, and ξ t are independent across t standard Gaussian noises. We assume that the observation at time t, In order to account for the initial state x 0 and to make the expectations of observations known linear functions of the inputs, we, same as in Section 3.5, define E t as the linear subspace in R nwt comprised by all collections of accumulated outputs [w 1 ; ...; w t ] of the zero-input system z s = Az s−1 , w s = Cz s , and define our (accumulated) observation y t at time t as the projection of the observation w t onto the orthogonal complement E ⊥ t of E t . We represent this projection by the vector y t of its coordinates in an orthonormal basis of E ⊥ t and set ν t = dim E ⊥ t . Note that in this case the corresponding noises ξ t , 1 ≤ t ≤ d, see (2.2), are standard Gaussian of dimensions ν t (as projections of standard Gaussian vectors), so that we are in the situation of U t = {I νt }, see (2.3). Therefore we can set Θ * ,t = I νt , and δ t = 0, see Section 4.4.1.
We define the admissible nuisance and signals inputs as follows: • the admissible inputs x = [x 1 ; ...; x d ], x t ∈ R nx , are those with x 2 ≤ R (we set R = 10 4 ); • the only nuisance input is x = 0 ∈ R n , n = n x d; • there are K = d signal shapes, signal of shape k and magnitude ≥ 1 being a vector of the form x = [0; ...; 0; x k ; x k+1 ; ...; x d ] with x k 2 ≥ 1 ("signal of shape k and magnitude ≥ 1 starts at time k with block x k of energy ≥ 1"). We consider three different types of the signal behavior after time k: free jump: x k+1 , ..., x d may be arbitrary.
The description of the matrixĀ t arising in (2.2) is self-evident. The description, required in Section 4.3, of the nuisance set N by quadratic constraints imposed on the quadratic lifting of an input is equally self-evident. The corresponding descriptions of signals of shape k and magnitude ≥ 1 are as follows: • pulse: Q 1k is the diagonal (n + 1) × (n + 1) matrix with the only nonzero diagonal entries, equal to -1, in positions (i, i), i ∈ J k := {i : (k−1)n x +1 ≤ i ≤ kn x }, and b 1k = −1. The constraint Tr(Q 1k Z(x)) ≤ b 1k says exactly that x k 2 2 ≥ 1. The remaining constraints are homogeneous and express the facts that the entries in Z(x) with indices (i, n + 1) and i ≤ n, except for those with i ∈ J k , are zeros, which can be easily expressed by homogeneous constraints of type A, and, the entries in Z(x) with indices (i, j), i ≤ j ≤ n, except for those with i, j ∈ J k , are zeros, which can be easily expressed by homogeneous constraints of type B; • step: Q 1k and b 1k are exactly as above. The remaining constraints are homogeneous and express the facts that . We obtain the discrete-time system or, which is the same, the system The system output u t is observed with the standard Gaussian noise at times t = 1, 2, ..., d. Our time horizon was d = 8, and required probability of false alarm was = 0.01 The results of experiments are presented in Table 1; the cells t, k with k > t are blank, because signals of shape k > t start after time t and are therefore "completely invisible" at this time. Along with the quantity ρ tk -the magnitude of the signal of shape k which makes it detectable, with probability 1 − = 0.99 at time t (the first number in a cell) we present the "non-optimality index" (second number in a cell) defined as follows. Given t and k, we compute the largest ρ = ρ * tk such that for a signal x tk of shape k and magnitude ≥ ρ, the · 2 -norm ofĀ t x, see (2.2), is ≤ 2ErfInv( ). The latter implies that if all we need to decide at time t is whether the input is the signal θx tk with θ < 1, or is identically zero, a (1 − )-reliable decision would be impossible. 8 Since θ can be made arbitrarily close to 1, ρ * tk is a lower bound on the magnitude of a signal of shape k which can be detected (1 − )-reliably, by a procedure utilizing observation y t (cf. Section 3.4). The non-optimality index reported in the table is the ratio ρ tk /ρ * tk . Note that the computed values of this ratio are neither close to one (which is a bad news for us), nor "disastrously large" (which is a good news). In this respect it should be mentioned that ρ * tk are overly optimistic estimates of the performance of an "ideal" change detection routine.

Change detection via quadratic detectors, sub-Gaussian case
Using Proposition 4.2 in the role of Proposition 4.1, the constructions and the results of Section 4.4 can be easily adjusted to the situation when the noise ξ d in (2.1) is zero mean sub-Gaussian, ξ T ∼ SG(0, Θ), rather than Gaussian. In fact, there are two options for such an adjustment, based on quadratic detectors yielded by saddle point problem (4.12) and convex minimization problem (4.13), respectively. To save space, we restrict ourselves with the first option; utilizing the second option is completely similar.
The only modification of the contents of Section 4.4 needed to pass from Gaussian to sub-Gaussian observation noise is the redefinition of the functions Φ t (h, H; Θ) and Φ tkρ (h, H; Θ) introduced in Section 4.4.1. In our present situation,  mean and covariance matrix Θ ∈ U d " replaced with "let the observation noise ξ d be sub-Gaussian with zero mean and matrix parameter Θ ∈ U d ") remain intact.

Situation
In this Section, we present an example motivated by material science applications, in which one aims to detect the onset of a rust signal in a piece of metal from a sequence of noisy images. In general, this setup can be used to detect degradation in systems of a similar nature.
The rust signal occurs at some time, and its energy grows in the subsequent images. This can be modeled as follows. At times t = 0, 1, ..., d, we observe vectors where • y is a fixed deterministic "background," • x t is a deterministic spot, which may correspond to a rust signal at time t, and • ξ t are independent across all t zero mean Gaussian observation noises with covariance matrices Σ t .
We assume that x 0 = 0, and our ideal goal is to decide on the nuisance hypothesis x t = 0, 1 ≤ t ≤ d, versus the alternative that the input ("signal") x = [x 1 ; ...; x d ] is of some shape and some positive magnitude. We specify the shape and the magnitude below.

Assumptions on observation noise
Assume that the observation noise covariance matrices Σ t , for all t, are known to belong to a given convex compact subset Ξ of the interior of the positive semidefinite cone S ν + . We allow the following two scenarios: C.1 : Σ t = Σ ∈ Ξ for all t; C.2 : Σ t can vary with t, but stay all the time in Ξ.

Assumptions on spots
We specify signals x = [x 1 ; ...; x d ] by shape k ∈ {1, ..., K}, K = d, and magnitude ρ > 0. Namely, signal x = [x 1 ; ...; x d ] of shape k and magnitude ≥ ρ > 0 "starts" at time k, meaning that x t = 0 when t < k. After the "change" happens, the signal satisfies where α t,s are given nonnegative coefficients responsible for dynamics of the energies x t • with α t,s ≡ 0, we get an "occasional spot" of magnitude ≥ ρ and shape k: x t = 0 for t < k, the energy of x k is at least ρ, and there are no restrictions on the energy of x t for t > k; • with α t,1 = λ t ≥ 0 and α t,s = 0 when s > 1, we get x t = 0 for t < k, In other words, the energy of the signal of shape k increases or decreases in a prescribed way after the instant k.
2. Setting α t,s ≡ 0, we get signals of shape k with x t = 0 for t < k, and energies satisfying x t 2 2 ≥ ρp(t − k + 1) for t ≥ k. On top of (5.2), we impose on signal x of shape k and magnitude ≥ ρ a system (perhaps, empty) of linear constraints with c k ≤ 0.

Processing the situation: formulation
Let us treat as our observation at time t, t = 1, ..., d, the vector y t with blocks y i − y 0 , 1 ≤ i ≤ t, arriving at the observation scheme where •Ā d is the unit matrix of size n = νd; and S t is the natural projection of R n onto the space of the first ν t = νt coordinates; • ξ d ∼ N (0, Θ), where Θ is a positive semidefinite d × d block matrix with ν × ν blocks Θ tτ , 1 ≤ t, τ ≤ d, given by 9 We can easily translate a priori information on Σ s , 0 ≤ s ≤ d, described in Section 5.1.1, into a convex compact subset U of the interior of S νd + such that Θ always belongs to U. We now cast the above "spot detection" problem into the setup from Section 4.3 as follows. We set Z(x) = [x; 1][x; 1] T .
1. We assume that the magnitudes of all entries in a meaningful input are bounded by R, for a given R > 0, and put where e i is ith canonical basis vector in R n+1 . We further set I = n and (cf. (4.15)) 2. In our current situation, the nuisance set N is the origin. To represent this set in the form (4.16), it suffices to set I + = I + 1 = n + 1, q n+1 = 0, and to take, as Q n+1 , the (n + 1) × (n + 1) diagonal matrix with the diagonal entries 1, ..., 1, 0. We put (cf. (4.17)) 3. Sets W k of signals of shape k and magnitude ≥ 1, as described in Section 5.1.2, are given by quadratic constraints on x = [x 1 ; ...; x d ]: • linear constraints on the traces of diagonal blocks In the terminology of Section 4.3, these are type B constraints, • In addition, linear constraints C k x ≤ c k defined in (5.3) map to linear constraints on the first n entries in the last column of Z(x). All these constraints are of type A (recall that c k ≤ 0).
Observe that among the right hand sides of the constraints (5.6) there is a (−1), implying that all W k are at a positive distance from the origin.
Finally, we put W ρ k = ρW k and convert these sets, as described in Section 4.3, into sets W ρ k such that Z(x) ∈ W ρ k whenever x ∈ W ρ k . Note that with our X , all sets W ρ k with small positive ρ do intersect with X . We have covered the problem posed in Section 5.1 by the setup of Section 4.3, and, consequently, can apply the machinery from Section 4.4 to process the problem.

Processing the situation: computation
A computational issue related to this approach stems from the fact that in our intended application y and x t are images, implying that ν = dim y = dim x t can be in the range of tens of thousands. This would make our approach completely unrealistic computationally, unless we can "kill" the huge dimensions of the arising convex programs. We are about to demonstrate that under some meaningful structural assumptions this indeed can be done. These assumptions, in their simplest version, are as follows: 1. Matrices Σ t , 0 ≤ t ≤ d, are equal to each other and are of the form θσ 2 I ν , with known σ > 0 and known range [ϑ, 1] of the factor θ, with ϑ ∈ (0, 1]. 2. The only restrictions on the activation signal, apart from the componentwise boundedness, are energy constraints in (5.2) (e.g., linear constraints as in (5.3) are not allowed).

1) We deal with observations
where (a) y t , x t are block vectors with t blocks, y i and x i , respectively; dimension of every block is ν; with parameter θ running through [ϑ, 1] (cf. (5.5)). In other words, denoting by J t the t × t matrix with diagonal entries equal to 2 and off-diagonal entries equal to 1, we have It is immediately seen that U t has the -largest element, specifically, the matrix Θ * ,t = σ 2 J t ⊗ I ν .
Note that 2) We specify the set Z tkρ ⊂ S νt+1 + as follows: : Z νt+1,νt+1 = 1, Tr (ZDiag{D tks , 0}) ≤ ρd tks , 1 ≤ s ≤ S tk , D tks = D tks ⊗ I ν with diagonal t × t matrices D tks readily given by the coefficients in (5.6). Now, we are in the situation where functions Φ t and Φ tkρ from Section 4.4.1 are as follows: The saddle point problem SP(t, k, ρ) reads and NW (Q) is the North-Western × block of ( + 1) × ( + 1) matrix Q. Note that the saddle point problem in (5.9) has symmetry; specifically, if D = I t ⊗ P with matrix P which is obtained from ν × ν permutation matrix by replacing some entries equal to 1 with minus these entries, then Hence, as is immediately seen from (5.9), it holds Ψ(D T HD) = Ψ(H). As a result, (5.9) has a saddle point with H = D T HD for all indicated D's, or, which is the same, with H = G ⊗ I ν , for some t × t symmetric matrix G. Specifying G reduces to solving saddle point problem of sizes not affected by ν, specifically, the problem Remark 5.1. Our approach is aimed at processing the situation where the magnitude of a spot is quantified by its energy. When y t represents an image with ν pixels, this model makes sense if changes in image are more or less spatially uniform, so that a "typical spot of the magnitude 1" means small (eventually, σ) change in brightness of a significant fraction of the pixels (i.e., we are in the case of dense alternatives, in the terminology of [15]). We can also easily process the model where "typical spot of magnitude 1" means large (of order of 1) changes in brightnesses of just few pixels (in the terminology of [15], this is the case of sparse alternatives). In the latter situation, we do not need quadratic lift: we can model the set of "spots of shape k and magnitude ≥ ρ > 0" as the union of two convex sets, one where the k-th entry in the spot is ≥ ρ, and the other one -where this entry is ≤ −ρ. In this model, all we need are affine detectors.

Real-data example
In this Section, we consider a sequence of metal corrosion images captured using bright-field transmission electron microscopy. 10 We downsize each image to 308by-308 pixels. There are 23 gray images (frames) in the sequence and 2 frames per second. Hence, this corresponds to 11.5 seconds from the original video. At some point, a corrosion spot initiates in the image sequence. Sample images from the sequence are illustrated in Fig. 2.
The dynamics of the signal model, in terms of the definition in (5.2), has the following parameters: α t,1 = 1, and α t,s = 0 for s > 1; p(1) = 1 and p(s) = 0 for s > 1; ρ is about 1.2 × 10 2 and it is estimated from the real-data.
In the example, we set the risk tolerance = 0.1, and let ϑ = 0.5. To evaluate detection performance, we run 3000 Monte Carlo trials and add zero-mean Gaussian noise (with variance 25) to the images. To estimate the noise variance σ 2 , we use the empirical estimation obtained taking the first 5 noisy images in the sequence (hence we assume they do not contain a rust spot). The corresponding estimation is 25.
Since the rust signal is local, i.e., when it occurs, a cluster of pixels captures the rust, we will apply our detector in the following scheme. Break each image into (rectangular or square) patches of equal size. Design a quadratic detector as described above for a patch. Then at each time, whenever one patch detects a change, we claim there has been a change -this corresponds to a "multi-sensor" scheme and the local detection statistic by taking their maximum.
We compare our quadratic detector to the "sliding window" (Sl-W) detector developed in [18,12] and defined as follows. Given "window width" h ∈ {1, 2, ...} and denoting by y tj the vector of observations at time t in patch j, we build the left and the right estimates,ȳ t j (h) andȳ t rj (h), of y tj : At time t, Sl-W always accepts the nuisance hypothesis when t ≤ 2h − 2; when t ≥ 2h − 1, the nuisance hypothesis is accepted if for every patch j = 1, . . . , N, it holds max and is rejected otherwise. In our experiments, h = 2 and h = 3 were used. The corresponding thresholds κ are computed using Monte-Carle simulation, see [12] for details.  Simulation results are presented in Table 2. While the performance of Sl-W with properly selected h and the number of patches N is quite good, the quadratic detector is a clear winner in terms of reliability (zero empirical probabilities of a false alarm and a miss), and with N = 49, there is no delay in detecting the change.
This implies that the probability to come to the signal conclusion at step t (this conclusion is made only when L t (δ t ) > 0 and φ tk (y t ) < α for some k ∈ L t (δ t )) is at most L t (δ t ) · ( t /L t (δ t )) = t , as claimed.
(ii) Now assume that t and k are such that ρ tk < ∞, and that the input belongs to X ρ k with ρ ≥ ρ tk . Since X ρ k shrinks when ρ grows, the input in fact belongs to X ρ tk k , and therefore the distribution P of observation y t belongs to G[U t kρ tk , U t ]. Since, as it was already explained, β as given by (3.18) satisfies β ≤ δ 2 t , invoking (#), we conclude that for our P the inequality in (3.17.b) holds, that is, (3.14) and since Erf(·) is nonincreasing] = .
In other words, in the situation in question, P -probability to terminate at time t with the signal conclusion (which is made when φ tk (y t ) < α for some k with ρ tk < ∞) is at least 1 − .
Invoking (A.6), we get SV tk (R k ) < ln(κ t ) by (A.7), that is, recalling the construction from Section 3.2.3, ρ tk is well defined and satisfies ρ tk ∈ (0, R k ) with SV tk (ρ tk ) = ln(κ t ). (A.8) Because SV tk (ρ) is nonincreasing, we conclude from (A.4), (A.7) and the second relation in (A.8) thatρ ≥ ρ tk . Invoking item (ii) of Proposition 3.2, we conclude that if the input is a feasible signal with activation of shape k and magnitude at leastρ, the probability of the inference routine from Section 3.2.3 to terminate at time t with the signal conclusion is at least 1 − .

A.5. Proof of Proposition 4.1
Observe that when m, ν ∈ R n and S ∈ S n , S < 1, one has Indeed, which is exactly (A.9).

A.7.1. Preliminaries
We start with the following result: The resulting inequality holds true for all small positive λ; taking lim inf of the right hand side as λ → +0, and recalling that Θ 0 = Θ, we get