Quickest Change Detection With Controlled Sensing

In the problem of quickest change detection, a change occurs at some unknown time in the distribution of a sequence of random vectors that are monitored in real time, and the goal is to detect this change as quickly as possible subject to a certain false alarm constraint. In this work we consider this problem in the presence of parametric uncertainty in the post-change regime and controlled sensing. That is, the post-change distribution contains an unknown parameter, and the distribution of each observation, before and after the change, is affected by a control action. In this context, in addition to a stopping rule that determines the time at which it is declared that the change has occurred, one also needs to determine a sequential control policy, which chooses the control action at each time based on the already collected observations. We formulate this problem mathematically using Lorden’s minimax criterion, and assuming that there are finitely many possible actions and post-change parameter values. We then propose a specific procedure for this problem that employs an adaptive CuSum statistic in which (i) the estimate of the parameter is based on a fixed number of the more recent observations, and (ii) each action is selected to maximize the Kullback-Leibler divergence of the next observation based on the current parameter estimate, apart from a small number of exploration times. We show that this procedure, which we call the Windowed Chernoff-CuSum (WCC), is first-order asymptotically optimal under Lorden’s minimax criterion, for every possible value of the unknown post-change parameter, as the mean time to false alarm goes to infinity. We also provide simulation results to illustrate the performance of the WCC procedure.

branches of science and engineering.The observations of the system are assumed to undergo a change in distribution at the change-point, and the goal is detect this change as soon as possible, subject to false alarm constraints.See [2], [3], [4], [5] for books and survey articles on the topic.
In this paper we study an interesting variant of the QCD problem described in [6], [7], which the authors call the "bandit" QCD problem.In this setting, the distribution of the observations is affected not only by the change but also through a control (action) variable that is chosen by the observer.As described in [7], a canonical example application for such a setting is in surveillance systems in which sensors can be switched and steered (controlled) to look for targets in different locations in space, and only a subset of locations can be probed at any given time.The policy for controlling the sensors has to be designed jointly with the change detection algorithm to provide the best tradeoff between detection delay and false alarm rate.A number of other applications contexts are described in [7].
On a fundamental level, the QCD problem in which the distribution of the observations is affected by control actions falls squarely within the larger context of sequential decisionmaking problems with observation control (or controlled sensing [8]).Such controlled sensing problems have a rich history going back to the seminal work of Chernoff [9] on the sequential design of experiments, in which a sequential composite binary hypothesis testing problem with observation control is studied.Other works on sequential hypothesis testing with observation control include [10], [11], [12], [13], [14], [15] and more recently [16], [17], [18], [19].
There has also been considerable progress on the special "multi-channel" case of sequential hypothesis testing with observation control, which is also commonly referred to as sequential anomaly detection.In this context, there are multiple data streams, some of which are anomalous, and the goal is to accurately pick out the anomalous ones among them, while observing only a subset of the streams at each timestep [20], [21], [22], [23], [24], [25], [26].The QCD problem in the multi-channel setting with observation control has been studied in [27], and more recently in [6], [28], [29], [30].In this context, an unknown subset of the streams undergo a change in their distributions at the unknown change-point, and only a subset of them can be observed at each time-step.
Our work is inspired by [7], in which a general setting for QCD with controlled sensing is considered.In particular, it is assumed that the pre-and post-change distributions can be affected by the choice of a control action, which takes values in a finite set.Furthermore, the post-change is allowed to have parametric uncertainty within a finite parameter set.
The formulation of the optimization problem to obtain the best tradeoff between detection delay and false alarm rate in [7] is different from the standard formulations of the QCD problem [4].In particular, the false alarm constraint used in [7] is one where the probability of stopping (and declaring a change) before a fixed time m under the pre-change regime is constrained to some level α.The authors of [7] refer to [31] to justify their false alarm constraint, but the false alarm constraint proposed in [31] requires the probability of stopping in any time-window of size m in the pre-change regime be small (see [31,Sec.II.B]), not just the window from 0 to m.Furthermore, it is more common in the QCD literature to constrain the mean time to false alarm [4], which is also a constraint used in [31,Sec. I].
The measure of delay used in [7] is the expected delay conditioned on a fixed change-point.In QCD problems where no prior is assumed on the change-point, it is common to measure delay by taking the supremum of the expected delay over all possible change-points (see, e.g., the Lorden and Pollak formulations [4]).Furthermore, the asymptotic upper bound on the delay for a fixed change-point) given in [7,Th. 3] is larger than the corresponding lower bound given in [7,Th. 2] (which assumes that the change happens at time 1) by a factor greater than 160.In contrast, first-order asymptotic optimality results in the QCD literature require the ratio of the upper and lower bounds on the delay metric to converge to 1, as the false alarm rate goes to 0 [4].
The -Greedy Change Detector ( -GCD) proposed in [7] for change detection with observation control uses at each time instant a maximum likelihood estimate (MLE) of the post-change parameter to determine the best action at each time step, except that with a fixed probability , the action is chosen uniformly at random.The MLE at each time instant is determined only by those samples that resulted from random exploration.The use of the current maximum likelihood estimate for determining the current action is in fact the key feature in Chernoff's proposed control policy in [9] for the problem of sequential composite binary hypothesis testing problem with observation control.The need for random exploration of actions in that context arises because the true post-change parameter may not be distinguishable from other possible post-change parameter values under certain actions.However, exploring at random with probability > 0 at each time instant will generally lead to substantial performance loss (by roughly a factor of 1/(1 − )) relative to that of an oracle that knows the post-change parameter.In contrast, as we show in this paper, we can achieve the performance of the latter to a first-order asymptotic approximation as the false alarm rate goes to 0, by performing random exploration at only o(n) time-steps over a time-horizon of length n.
Another consideration with the use of the MLE that relies on data from the beginning to estimate the post-change parameter in the QCD setting (in contrast to the sequential hypothesis testing setting) is that it could potentially be biased away from the true post-change parameter, due to the pre-change observations, if the change-point is not small.This means that it will take longer for the MLE to converge to the true post-change parameter in the post-change regime, thereby increasing the delay, as the change-point gets larger.We get around this problem by forcing the MLE to use only a window of past observations, and by using a windowed CuSum test (as is done in [32] for the problem of quickest change detection with post-change parametric uncertainty, without controlled sensing).
To sum up, our goal in this paper to precisely formulate the QCD problem with controlled sensing, as well as to propose and analyze a novel, asymptotically optimal algorithm for this problem.As described in Section II, we use Lorden's metric [33] for the delay, and we pose the optimization problem as the minimization of this delay metric, under a constraint on the mean time to false alarm (MTFA).In Section III, we derive a universal lower bound on the delay of any procedure, under the MTFA constraint.In Section IV, motivated by the discussion in the previous paragraphs, we develop a procedure, which we call the Windowed Chernoff-CuSum (WCC) procedure.We establish that the WCC procedure is asymptotically optimal under Lorden's criterion as the MTFA goes to infinity.In Section V, we provide some simulation results that illustrate the main points of the theoretical analysis.

II. PROBLEM FORMULATION
Let {X n : n ∈ N} be a sequence of random vectors whose values are observed sequentially, let {U n : n ∈ N} be a sequence of random variables to be used for randomization purposes, and let {F n , : n ∈ N} the filtration generated by these two sequences, i.e., We also denote by F 0 the trivial σ -algebra, and for any m, n ∈ N with m ≤ n we set: Assumption 1: We assume that, for any n ∈ N, U n is independent of F n−1 and uniformly distributed in [0, 1], and that X n is independent of U n and conditionally independent of F n−1 given the value of a control/action A n .We assume that the action A n takes values in a finite set A, with at least two elements.Furthermore, for n > 1, A n is a F n−1 -measurable random variable, with A 1 being uniformly distributed on the set A.
We refer to the sequence of actions A := {A n : n ∈ N} as a control policy and we denote by A the family of all control policies, i.e., A = {A n : n ∈ N} ∈ A.
Let {f θ a : θ ∈ , a ∈ A} be a set of densities with respect to a dominating measure, λ, where is an arbitrary finite set.Furthermore, for some θ 0 / ∈ , let {f θ 0 a : a ∈ A} be a set of densities with respect to λ.
Let the change-point be denoted by ν ∈ N, which we assume is completely unknown and deterministic.Given that A n = a, where a ∈ A, we assume that for n < ν (prechange), X n has conditional density f θ 0 a , and that for n ≥ ν (post-change), X n has conditional density f θ a , with θ ∈ .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
To be more specific, we denote by P θ ν,A the underlying probability measure, and by E θ ν,A the corresponding expectation, when the change-point is ν, the post-change parameter θ ∈ , and the control policy A is used, which means that for any n ∈ N and any Borel set B we have A n dλ, if n ≥ ν.Moreover, we denote by P ∞,A the underlying probability measure, and by E ∞,A the corresponding expectation, when the change never occurs and the control policy A is used, which means that for any n ∈ N and any Borel set B we have A procedure for quickest change detection with controlled sensing consists of a pair (A, T), where A is a control policy, i.e., A ∈ A, and T is an {F n }-stopping time, i.e., {T = n} ∈ F n for every n ∈ N. We denote by C the family of all procedures, i.e., (A, T) ∈ C.
False Alarm Measure: We measure the false alarm performance of a procedure in terms of its mean time to false alarm, and we denote by C γ the subfamily of procedures for which the mean time to false alarm is at least γ , i.e., Delay Measure: We use a worst-case measure for delay, that is the commonly used Lorden's measure [33].Specifically, for any θ ∈ and (A, T) ∈ C we set Optimization Problem: The optimization problem we consider is to find a procedure that can be designed to belong to C γ for every γ > 1, and achieve inf to a first-order asymptotic approximation as γ → ∞ simultaneously for every post-change parameter θ ∈ .We make the following further assumptions in our analysis: Assumption 2: For every θ ∈ there exists an a ∈ A such that the Kullback-Leibler (KL) divergence between the densities f θ a and f θ 0 a is positive, i.e., This means that the post-change distribution is distinguishable from the pre-change distribution for at least one choice of control.
If Assumption 2 does not hold for some θ ∈ , then the change will not be detected efficiently if the true post-change parameter is θ .This assumption implies that Assumption 3: For every θ ∈ and a ∈ A we have Assumption 3 is a technical condition that is needed in our analysis (see, e.g., Theorem 1 and Theorem 2).
Assumption 4: For every θ, θ ∈ , such that θ = θ , there exists an a ∈ A so that: i.e., the post-change distribution for two distinct values of the post-change parameter are distinguishable by at least one control.If there is no control that distinguishes a particular pair of distinct values of the post-change parameter, then we keep only one of them in and reduce the size of by one.Assumption 4 is crucial for the consistent estimation of θ in the post-change regime (see Lemma 3).
Remark 1: One could strengthen Assumption 4 by assuming that for all θ, θ ∈ , such that θ = θ , and for all a ∈ A. However, this stronger assumption fails to hold in many applications of interest.
For example, consider a multichannel setting with K streams (see, also, Section V) in which a single (unknown) stream of observations undergoes a change in distribution at the changepoint, and only one of the streams is observed at each time step.Here = A = {1, 2, . . ., K}.If θ = j, then the j-th stream undergoes a change in distribution, and if a = i, the i-th channel is observed.In particular, if a = 1, then for θ = 2 and θ = 3, the observed channel (channel 1) has the pre-change distribution, which means that: Thus, this strengthened version of Assumption 4 fails to hold in this case.
For any θ, θ ∈ , such that θ = θ , and a ∈ A, we further define the Bhattacharya coefficient (see, e.g., [34]): By Assumptions 2 and 4, it follows that for any θ, θ ∈ , such that θ = θ , there exists an a ∈ A such that ρ(θ, θ, a) < 1, and as a result We also denote by ρ the maximum of these quantities: The quantity ρ, which is less than 1, will play a key role in the consistent estimation of θ in the post-change regime (see Lemma 3), as well as in lower bounding the post-change expected values of the increments of the proposed detection statistic (see Lemma 4).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

III. UNIVERSAL LOWER BOUND
For θ ∈ , A ∈ A, we define the log-likelihood ratio of the observation at time m ∈ N as and we observe that for any ν, t ∈ N, we have In the following result, we provide an asymptotic lower bound on the value of the optimization problem in (4) as the MTFA goes to infinity, i.e., as γ → ∞.Theorem 1: For any θ ∈ , as γ → ∞ we have inf Proof: Fix (A, T) ∈ C γ and θ ∈ .By [31, Th. 1] it suffices to show that, for every δ > 0, the sequence converges to 0 as n → ∞.Indeed, for every n, ν ∈ N we have Here, the first inequality follows from the fact that I θ a ≤ I θ for every a ∈ A, and the second one from a conditional version of Doob's submartingale inequality.We can apply the latter because In the next section we propose a specific procedure for quickest change detection with controlled sensing, and establish that it achieves the lower bound in Theorem 1 asymptotically as γ → ∞.

IV. THE WINDOWED CHERNOFF-CUSUM PROCEDURE
Consider any control policy A ∈ A, and window of size w ∈ N. At time n > w, the maximum likelihood estimator (MLE) of θ based on observations {X m : m = n−w, . . ., n−1} is given by: To define the proposed policy, we also need to introduce a sequence of deterministic times, N ⊆ {w + 1, w + 2, . ..}, which contains at most q elements in any interval of length w, i.e., for some positive integer q that is smaller than w.The sequence of times N will be used for random exploration of controls, which will be crucial for consistent estimation of θ in the post-change regime, under Assumption 4.
Given such a sequence, we propose a control policy A * as follows.
• For n = 1, . . ., w, A * n is selected uniformly at random on the set A, using the randomization variable U n−1 .
• If n > w, and n ∈ N , then A * n is selected uniformly at random on A, using the randomization variable U n−1 .
• If n > w, and n / ∈ N , A * n is selected to maximize the Kullback-Leibler divergence of the post-change versus the pre-change distribution based on θn , i.e., The proposed stopping time is defined using a windowed CuSum test, which was introduced in [32].Specifically, for any A ∈ A, and for n > w, we define the following CuSumlike statistic recursively: where the log-likelihood ratio is as defined in (12).The change is declared at the first time this statistic exceeds a threshold b, i.e., where b is to be selected to satisfy the false alarm constraint.We refer to the pair (A * , T b,A * ) as the Windowed Chernoff-CuSum (WCC) procedure to acknowledge that Chernoff [9] was the first to suggest a control policy similar to A * for the problem of sequential hypothesis testing with controlled sensing.
In the remainder of this section, we establish bounds on the performance of the WCC procedure.We begin by establishing a choice of the threshold b that guarantees that the MTFA constraint is met.
Lemma 1: For any A ∈ A and γ > 1, by setting b = log γ , we can ensure that (A, T b,A ) ∈ C γ , i.e., that E ∞,A [T b,A ] ≥ γ.
Proof: We can rewrite the recursion for test statistic in (16) as: We now define a new statistic {R n,A } akin to the Shiryaev-Roberts statistic (see, e.g., [4]), which has the recursion: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Clearly, R n,A ≥ e W n,A , for all n ≥ w.
It is easily seen that {R n,A − n : n ≥ w} is a martingale sequence under P ∞,A , with respect to the filtration {F n : n ∈ N}.Indeed, since A n is F n−1 -measurable by Assumption 1, and θn is F n−1 -measurable by construction (see ( 13)), we have for n > w: Then, by applying the Optional Sampling Theorem [3]: Therefore, setting b = log γ ensures that (A, T b,A ) ∈ C γ .Note that Lemma 1 holds for any control policy A ∈ A satisfying Assumption 1, and not just the control policy A * .
Our main goal in the remainder of this section is to analyze the delay of the WCC procedure as γ → ∞.Towards this end, we first establish an upper bound on our measure of delay for any control policy A ∈ A (see (3)): Lemma 2: For any change-point ν ∈ N, control policy A ∈ A, and threshold b > 0 Proof: For each ν ≥ 1, we denote by T ν b,A a version of the stopping time in (17) that starts computing the test statistic from time ν + w, with initialization 0 at time ν + w − 1, i.e., with Then, for any ν ≥ 1 we have where step (a) holds because x + ≤ w + (x − w) + for any x, w > 0.
Step (b) follows because for n > ν + w, the test statistic W n,A (defined in ( 16)) is greater than or equal to W ν n,A (defined in ( 21)), thus, T b,A ≤ T ν b,A .Also, by definition (see ( 20)), T ν b,A > ν + w − 1, which means that the positive part can be dropped.
Step (c) is due to the law of iterated expectation (and as explained at the end of Section II, A [ν,ν+w−1] denotes the set {A ν , . . ., A ν+w−1 }).
Step (d) holds because T ν b,A depends on the data before time ν only through the controls A [ν,ν+w−1] , which determine the distributions of the data X [ν+1,ν+w] .Finally, step (e) holds due to the definition of T ν b,A and the fact that T b,A = T ν b,A for ν = 1.Next, we establish an auxiliary result for the MLE, which applies to any control policy that samples uniformly at random from the set A at the subsequence of time instances N .Specifically, at each time n > w, we upper bound the conditional probability that the MLE at some time instant makes an error in identifying the true post-change parameter θ , given all the already collected data apart from those at the w most recent time instants.
Lemma 3: If A ∈ A samples uniformly at random from A at the subsequence of time instances N , then: Proof: Fix an arbitrary integer n with n > w.From the definition of the MLE in (13) it is clear that θn = θ if S n (θ, θ) > 0 for all θ = θ , where Therefore, and by an application of the union bound we obtain Furthermore, by the conditional Markov inequality it follows that: By the conditional independence of Z m (θ, θ) given F m−1 and ( 9)- (10) we have Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where ρ is defined in (11) and I is the indicator function.Applying the previous inequality for m = n − 1 we obtain Here, the equality follows by an application of the law of iterated expectation, since w ≥ 1 implies that F n−2 ⊇ F n−w−1 , and n > w implies that A [1,w] , which is Repeating the same argument w times we obtain: where the second inequality follows from ( 14), according to which at any time-period of length w there are at most q times in N .Combining (25), along with ( 23) and ( 24) completes the proof.We next establish a lower bound on the drift of the loglikelihood ratio process that drives the WCC procedure in the post-change regime.This lower bound will be useful in upper bounding the delay of the WCC procedure.For this, we need to introduce some additional notation.In particular, for every u, θ ∈ , a ∈ A, we set Note that a * (u) is an optimal control for the WCC procedure at time n / ∈ N , when the post-change parameter estimate θn = u (see (15)).Also, it is clear that: Lemma 4: Let n > w.Then, where with a * (u) as defined in (27).Moreover, for large enough q, both lower bounds in ( 29) are strictly positive.Proof: When n / ∈ N , A * n is chosen according to (15), and where (a) follows from ( 15) and ( 27), and from the fact that conditioned on θn = u, the optimal control A * n = a * (u) and X n is independent of everything else.Step (b) follows from (28), and (c) from Lemma 3.This proves the first inequality in (29).
When n ∈ N , A * n is chosen uniformly at random from the set A, and following steps similar to those used in the case when n / ∈ N , we obtain: which proves the second inequality in (29).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Finally, we note that from (26) it follows that, for all a ∈ A and u ∈ , B a (θ, u) ≤ I θ a and B a * (u) (θ, u) ≤ I θ .Therefore, J θ ≥ 0 and K θ a ≥ 0, for all a ∈ A. Nevertheless, since 0 < ρ < 1, as q → ∞ we have: which proves that both lower bounds in ( 29) are both positive for large enough q > 0.
Proof: Define the stopping time Note from ( 16) that the difference between W n,A * and W n,A * is that we do not take the positive part on the right-handside of the recursion in (30).This means that T b,A * ≤ T b,A * .Therefore, it suffices to show that as q, w, b → ∞ so that q = o(w) and w = o(b), In what follows, we fix arbitrary a 1 , . . ., a w in A, and we show that 1)).To see this, we make the following identifications in Proposition 1 in the Appendix: We also observe that by Lemma 4 we can set μ * , μ * , μ, v in Proposition 1 as follows: By Lemma 4 again we have μ * → I θ = μ as q → ∞.Thus, the condition in part (iii) of Proposition 1 is satisfied, which implies the result.
We then have the following result that establishes the firstorder optimality of the WCC procedure.

V. SIMULATIONS
We consider a simulation study in which, for each n ∈ N, X n is a K-dimensional random vector (X 1 n , . . ., X K n ) with independent components.We assume that each of these components has variance 1 and that the mean changes in an unknown subset of components, θ .Specifically, if k / ∈ θ , each X k n has mean 0, whereas if k ∈ θ , then X k n has mean 0 for n < ν and mean μ k for n ≥ ν.Therefore, in this context, the parameter space, , consists of all nonempty subsets of {1, . . ., K}.We further assume that it is possible to sample only one of these components at each time instant.Therefore, the action space A consists of all singletons {{k} : 1 ≤ k ≤ K}.
We set K = 10, (μ 1 , . . ., μ k ) = (0.5, 0. . . ., 1), and we consider the case that only the first three components experience the change, i.e., the true value of θ is {1, 2, 3}.That is, among the three affected components, only one of them has the largest possible drift.Moreover, based on Lemma 2, we assume that the change occurs at time ν = 1.
We consider two schemes.(i) The proposed Windowed Chernoff-CuSum procedure, with two choices for the size of the window, w, 10 and 20.With this procedure, we sample uniformly at random from all sources at the first w time instants, and an alarm can be raised only after this time.For both window choices, we set q = 1, that is, during each interval of the form [mw, (m + 1)w), where m ∈ N, only once the component is selected at random.Apart from these times, at each time after w, we estimate the subset of components that have been affected by the change, and select to sample among those the one with the largest post-change mean.This means that whenever a component apart from the first two is estimated to be affected by the change, it can be selected for sampling next.In order to break such ties, we select the component whose observations in the last w time instants had the largest average.This choice was found to lead to much better performance than random sampling among the components that are currently estimated as affected by the change.(ii) The greedy algorithm considered in [6], [27], [28], [29], [30], according to which we keep sampling the same component until its cumulative log-likelihood ratio statistic hits either the threshold b or 0. In the former case, the alarm is raised and the process is terminated.
In the latter, all past observations are forgotten and the next component is sampled.This algorithm has been The vertical axis represents expected detection delay and the horizontal axis log γ .The blue lines/circles correspond to the proposed, window-limited Chernoff-CuSum scheme with windows w = 10 (dashed line) and w = 20 (solid line).The black lines/triangles correspond to the greedy algorithm.The solid line corresponds to its average behavior, and the dashed line to its best case scenario, which we refer to as "oracle".
shown to enjoy certain asymptotic optimality properties when either only a single component is affected by the change [28] or all affected components have the same post-change distribution [29], but not in the setup of this simulation study.Moreover, its performance depends heavily on the component that is being sampled when the change occurs.In order to capture its average behavior, we simulate at random the component that is being sampled at the time of the change.For comparison purposes, we also consider the best case/oracle scenario, where it is component 3, i.e., the one with the largest postchange mean among those affected, that is being sampled at the time of the change in each simulation run.In all cases, the threshold b is selected as log γ , where γ takes values in {10 i : i = 1, . . ., 16}. Figure 1 presents the results, where in the vertical axis we plot the average detection delays of the above schemes, based on 16, 000 simulation runs for each of them, and in the horizontal axis we plot log(γ ).
Based on this graph, we can make the following observations.First, for the proposed Windowed Chernoff-CuSum, the larger window, i.e., setting w equal to 20 instead of 10, leads to smaller expected detection delay for large enough values of γ .This is consistent with our asymptotic theory, according to which the window size w needs to go to infinity as γ → ∞ in order to achieve asymptotic optimality.Second, both these window choices lead to expected detection delays smaller than that of the greedy algorithm.In fact, the expected detection delay of the proposed scheme, with the larger window, approaches the oracle for the greedy algorithm as γ increases.

VI. CONCLUSION
We considered the problem of quickest change detection when there is parameter uncertainty regarding the post-change regime, and the distribution of the observations is affected not only by the change but also by a control input that is chosen by the observer.We provided a precise mathematical formulation for a general setting of this problem.We then developed a procedure that is asymptotically optimal under Lorden's criterion as the false alarm rate goes to zero.
Our asymptotic optimality result is established as the expected time to false alarm goes to infinity, under the assumption of a finite and fixed action space and a finite and fixed parameter space regarding the post-change distribution.Therefore, a natural direction for further research is the theoretical analysis of this algorithm as the sizes of the action space and the post-change parameter space go to infinity, or even in the case of an infinite action space and an infinite post-change parameter space.Another direction for further research is the consideration of more general stochastic models for the observations, in which the assumption of conditional independence is removed, in the spirit of [18].Also, in our problem formulation, we implicity assumed that all control actions incur the same cost.Extending the formulation and the analysis to allow for non-uniform costs across control actions would be of interest in certain applications.Finally, the proposed algorithm relies on a window of past observations during which the post-change parameter is estimated, and this is subsequently used to assign the appropriate control.Our asymptotic theory requires that the length of this window goes to infinity as the false alarm rate goes to 0. It would be of interest to obtain more precise bounds for the worstcase conditional expected detection delay that could inform the selection of this window size.

ACKNOWLEDGMENT
The first author owes a debt of gratitude to Prof. Toby Berger for encouraging him to work on quickest change detection when they were colleagues at Cornell University in the late 1990s, and for the engaging discussions they had on this and many other topics.

APPENDIX
In this Appendix we present a general, non-asymptotic upper bound for the expected hitting time of a sequence for which the conditional expectation of each increment given all past observations does not exceed a certain positive constant.This is the basis of the proof of Theorem 2, but it is stated in a more abstract form, as it can be of independent interest.
Proposition 1: Let ( , F, P) be a probability space, and let E denote expectation with respect to P. Let {G n , n ≥ 0} be a filtration on this space, where G 0 is the trivial σ -algebra.Let {Y n , n ≥ 1} a sequence of random variables in L 2 that is adapted to {G n , n ≥ 1}.Let ω ∈ N, N ⊆ {w + 1, w + 2, . ..}, μ * , μ * , μ, v > 0, and q ∈ N, and suppose that, for every n > w, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Then: (ii) There is a C > 0, that does not depend on b, q, or w, so that where o( 1) is a vanishing term as q, w, b → ∞ so that q = o(w) and w = o(b).Remark 2: This proposition generalizes [14, Proposition 2.1], as in the latter condition ( 31) is assumed to hold for w = 1 and condition (34) for q = 0.
Proof: (i) For any c > 0 and n > w we have where (a) follows by the conditional Cauchy-Schwarz inequality, (b) by the conditional Markov inequality, (c) by the law of iterated expectation and (33).As a result, for c large enough we have where we use a ∧ b to denote min{a, b}.Consequently, for c large enough we have where the last inequality follows from (31) and (35).Ỹn . (39) We start by lower bounding the conditional expectation of the first term in the right-hand side in (39).To this end, we first observe that The interchange of series and expectation in step (a) is justified because τ ∧ m is a bounded stopping time and the sequence ( Ỹn ) is bounded above.
Step (b) follows by an application of the law of iterated expectation, in view of the fact that Then, by (40) and (37) we conclude that We continue by upper bounding the conditional expectation of the second term in the right-hand side in (39).Since ( Ỹn ) is a bounded above sequence and τ ∧ m is a bounded stopping time that takes values larger than w, by Wald's identity we have where the inequality follows from (32).
Combining (38), (39), (41), and (42), we conclude that for every m ∈ N with m > w we have Letting m → ∞, by the Monotone Convergence Theorem we obtain Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
and it is clear that it suffices to show that there is a C > 0 so that In order to do so, we first claim that where Combining (51) and (52), we have Solving this quadratic inequality, we conclude that where the second inequality follows by the subadditivity of the square-root.This implies (47) with C = max{K, √ K, √ Kμ} and completes the proof.
(iii) This follows directly from (ii).

Fig. 1 .
Fig. 1.The vertical axis represents expected detection delay and the horizontal axis log γ .The blue lines/circles correspond to the proposed, window-limited Chernoff-CuSum scheme with windows w = 10 (dashed line) and w = 20 (solid line).The black lines/triangles correspond to the greedy algorithm.The solid line corresponds to its average behavior, and the dashed line to its best case scenario, which we refer to as "oracle".

EE≥≤
Y n | G n−w−1 • I{τ b + w ≥ n} Y n | G n−w−1 • I{n / ∈ N } • I{τ b + w ≥ n} μ * (1 − q/w) E[τ b ].(44)Step (a) is obtained by following exactly the same steps as in (40), with Ỹn replaced by Y n and τ ∧ m replaced by τ b , and it is justified because, by the result of part (i), τ b is integrable, and, by assumption(33),(E[Y 2 n | G n−1 ]) is bounded.Step (b) and (c) follow by(31), and step (d) by(34).Moreover,S τ b | F w + E b + E[R b ] + wμ,(45)where R b ≡ S τ b − b and the inequality follows in exactly the same way as (42), again with Ỹn replaced by Y n and τ ∧ m replaced by τ b .From (44) and (45) we have