Asymptotically optimal Bayesian sequential change detection and identification rules

We study the joint problem of sequential change detection and multiple hypothesis testing. Suppose that the common distribution of a sequence of i.i.d. random variables changes suddenly at some unobservable time to one of finitely many distinct alternatives, and one needs to both detect and identify the change at the earliest possible time. We propose computationally efficient sequential decision rules that are asymptotically either Bayes-optimal or optimal in a Bayesian fixed-error-probability formulation, as the unit detection delay cost or the misdiagnosis and false alarm probabilities go to zero, respectively. Numerical examples are provided to verify the asymptotic optimality and the speed of convergence.

problem boils down to optimally solving the trade-off between the expected detection delay and the false alarm and misdiagnosis costs.
The sequential analysis methods such as Wald's (1947) sequential probability ratio test and Page's (1954) cumulative sum were developed for the quality control problems, in which a production process may suddenly get out of control at some unknown and unobservable time and one needs to detect the failure time as soon as possible. However, it is more realistic to assume that a production process consists of multiple processing units, each of which is prone to failure, and one needs to detect the earliest failure time and accurately identify the failed component.
In economics and biosurveillance, elevated concerns about financial crises and bioterrorism have increased the importance of early warning systems (see Bussiere andFratzscher 2006 andHeffernan et al. 2004); structural changes need to be detected in time series such as the S&P 500 index for better financial risk management and over-the-counter medication sales for early signs of a possible disease outbreak. There are a number of potential causes of structural changes, and one needs to identify the cause of the change in order to take the most appropriate countermeasures. Although most existing structural change detection methods employ retrospective tests on historical data, online tests are more appropriate in these settings because time-inhomogeneous data arrive sequentially, and the changes must be identified as soon as possible after they occur.
In this paper, we focus on two online Bayesian formulations and propose two computationally efficient and asymptotically optimal strategies inspired by the separate asymptotic analyses of SMHT (Baum and Veeravalli 1994;Dragalin et al. 1999;Dragalin et al. 2000) and CPD (Tartakovsky and Veeravalli 2004).
We suppose that a system starts in regime 0 and suddenly switches at some unknown and unobservable disorder time θ to one of finitely many regimes μ ∈ M := {1, . . . , M}. One observes a sequence of random variables X = (X n ) n≥1 which are, conditionally on θ and μ, independent and distributed according to some cumulative distribution function F 0 before time θ and F μ at and after time θ ; namely, X 1 , . . . , X θ−1 F 0 -distributed , X θ , X θ+1 . . . Fμ-distributed .
The objective is to detect the change as quickly as possible, and at the same time to identify the new regime μ as accurately as possible. More precisely, we want to find a strategy (τ, d), consisting of a pair of detection time τ and diagnosis rule d, in order to minimize the expected detection delay time and the false alarm and misdiagnosis probabilities. This paper studies the following formulations: (i) In the minimum Bayes risk formulation, one minimizes a Bayes risk which is the sum of the expected detection delay time and the false alarm and misdiagnosis probabilities. (ii) In the Bayesian fixed-error-probability formulation, one minimizes the expected detection delay time subject to some small upper bounds on the false alarm and misdiagnosis probabilities.
The precise formulations are given as Problems 1 and 2, respectively, on p. 341 in Sect. 2. A majority of practitioners prefer working with the Bayesian fixed-error-probability formulation because the hard constraints on error probabilities are easier to set up and understand than the costs of detection delay, false alarm, and misdiagnosis in the minimum Bayes risk formulation. The Bayesian fixed-error-probability formulation is often solved by means of its Lagrange relaxation, which turns out to be a minimum Bayes risk problem where the costs are the Lagrange multipliers (or shadow prices) of the false alarm and misdiagnosis constraints. We discuss in more detail the correspondence between the optimal solutions of these two formulations in Sect. 2. Another reason for solving the minimum Bayes risk formulation is that it allows the expert opinions about the risks to be naturally included in the solution. Therefore, we decide to study both formulations in this paper. Finding the optimal solutions under both formulations requires intensive computations. For example, the minimum Bayes risk formulation reduces to an optimal stopping problem as shown by Dayanik et al. (2008) (see also Lovejoy (1991), White (1991), Borkar (1991), and Runggaldier (1991) for general solution methods available for the partially observed Markov decision processes and Burnetas and Katehakis (1997) for adaptive control for Markov decision processes), and the optimal strategy is to stop as soon as the posterior probability process = ( (0) n , . . . , (M) n ) n≥0 , where (i) n := P{The system is in regime i at time n | X 1 , . . . , X n } for every i ∈ M 0 and n ≥ 0, with M 0 := M∪{0}, enters some suitable region of the M-dimensional probability simplex. Figure 1(a) illustrates the optimal stopping regions for a typical problem with M = 2. The process starts in the lower-left corner, which corresponds to the "no change" state or regime 0. As observations are made, it progresses through the light-colored region, where raising a change-alarm is suboptimal. If it enters the shaded region in the top corner, then declaring a regime switch from 0 to 1 is optimal. If it enters the shaded region in the lowerright corner, then declaring a regime switch from 0 to 2 is optimal. The first hitting time to one of those shaded regions and the corresponding estimate of the new regime minimize the costs for the minimum Bayes risk formulation.
These shaded regions can in principle be found by dynamic programming methods; see, for example, Derman (1970), Puterman (1994) and Bertsekas (2005). However, those methods are generally computationally intensive due to the curse of dimensionality. The state space increases exponentially in the number of regimes, and finding an optimal strategy by using the classical dynamic programming methods tends to be practically impossible in higher dimensions.
Our goal is to obtain a practical solution that is both near-optimal and computationally feasible. We propose two simple and asymptotically optimal strategies by approximating the optimal stopping regions with simpler shapes. In particular, our strategy for the minimum Bayes risk formulation raises a change alarm and estimates the new regime when the posterior probability of at least one of the change types exceeds some predetermined threshold for the first time. In Fig. 1(b), the stopping regions of this strategy correspond to the union of the triangles in the two corners. Those triangular regions determine a stopping and selection strategy, and hence the problem is simplified to designing the triangular regions to minimize the risks.
We give an asymptotic analysis of the change detection and identification problem. The SMHT and CPD are the special cases. The asymptotic optimality of our strategies can be proved using nonlinear renewal theory after casting the log-likelihood-ratio (LLR) processes as the sum of suitable random walks and some slowly-changing stochastic processes. We show that the r-quick convergence of Lai (1977) for an appropriate subset of the LLR processes in (1) is a sufficient condition for asymptotic optimality. We also pursue higher-order asymptotic approximations for the minimum Bayes risk formulation as inspired by Baum and Veeravalli (1994)'s work for SMHT. The remainder of the paper is organized as follows. We formulate the Bayesian sequential change detection and identification problem in Sect. 2. In Sect. 3, we propose two sequential change detection and identification strategies and obtain sufficient conditions for their asymptotic optimality in terms of the LLR processes. In Sect. 4 we study certain convergence properties of the LLR processes that are required to implement the asymptotically optimal strategies. In Sect. 5, we obtain higher-order asymptotic approximations for the minimum Bayes risk formulation using nonlinear renewal theory. Section 6 concludes with numerical examples. The proofs and some auxiliary results are presented in the appendix.

Problem formulations
Consider a probability space ( , F , P) hosting a stochastic process X = (X n ) n≥1 taking values in some measurable space (E, E). Let θ : → {0, 1, . . .} and μ : → M := {1, . . . , M} be independent random variables defined on the same probability space with the probability distributions for some known constants p 0 ∈ [0, 1), p ∈ (0, 1), and positive constants ν = (ν i ) i∈M . The random variable θ has an exponential tail with Given μ = i and θ = t , the random variables X 1 , X 2 , . . . are conditionally independent, and (X n ) 1≤n≤t−1 and (X n ) n≥t have common conditional probability density functions f 0 and f i , respectively, with respect to some σ -finite measure m on (E, E); namely, for every i ∈ M, t ≥ 1, n ≥ 1, and (E 1 × · · · × E n ) ∈ E n . The following assumptions remove certain trivial cases; see Remark 4.10 below.
Assumption 2.1 For every i ∈ M 0 and j ∈ M 0 \ {i}, 0 < f i (X 1 )/f j (X 1 ) < ∞ a.s., and F i and Let F = (F n ) n≥0 denote the filtration generated by X; namely, F 0 = {∅, } and F n = σ (X 1 , . . . , X n ) for every n ≥ 1. A sequential change detection and identification rule (τ, d) is a pair consisting of an F-stopping time τ (in short, τ ∈ F) and a random variable d : → M that is measurable with respect to the observation history F τ up to the stopping time τ (namely, d ∈ F τ ). Let be the collection of all sequential change detection and identification rules. The objective is to find a strategy (τ, d) that solves optimally the trade-off between the mth moment of the detection delay time (τ − θ) + for some m ≥ 1 and the false alarm and misdiagnosis probabilities Here and for the rest of the paper, x + := max(x, 0) and x − := max(−x, 0) for any x ∈ R.
We formulate the optimal trade-offs between (3)-(5) as in the following two related problems: Problem 1 (Minimum Bayes risk formulation) For fixed m ≥ 1, c > 0, and strictly positive constants a = (a ji ) i∈M,j ∈M 0 \{i} , calculate the minimum Bayes risk inf (τ,d)∈ R (c,a,m) is the expected sum of all risks arising from the detection delay time, false alarm and misdiagnosis, and find a strategy (τ * , d * ) ∈ which attains the minimum Bayes risk, if such a strategy exists.
Problem 2 (Bayesian fixed-error-probability formulation) For fixed positive constants m ≥ 1 and R = (R ji ) i∈M,j ∈M 0 \{i} , calculate the smallest mth moment inf (τ,d)∈ (R) D (m) (τ ) of detection delay time among all decision rules in with the same predetermined upper bounds on false alarm and misdiagnosis probabilities, and find a strategy (τ * , d * ) ∈ (R) which attains the minimum, if such a strategy exists.
Problem 1 can in principle be solved optimally by stochastic dynamic programming. A standard way to solve Problem 2 optimally is by working through its Lagrange relaxation, which turns out to be an instance of Problem 1, where a ji serves as the Lagrange multiplier of the constraint R ji (τ, d) ≤ R ji for every i ∈ M and j ∈ M 0 \ {i}. Indeed, if for some a, a decision rule (τ * , d * ) ∈ attains the minimum Bayes risk inf (τ,d)∈ R (c,a,m) (τ, d) implies that c(D (m) (R ji (τ, d) − R ji ) ≤ 0, and hence, the same (τ * , d * ) rule is also optimal for the Bayesian fixed-error-probability formulation. The asymptotically optimal decision rules proposed for Problems 1 and 2 will likewise be related.
On the one hand, a majority of practitioners favor the formulation in Problem 2 over that in Problem 1, because the hard constraints R ji (τ, d) ≤ R ji , i ∈ M, j ∈ M 0 \ {i} in Problem 2 are easier to set up and to understood than the (shadow) costs c and a of decision delay, false alarm, and misdiagnosis. On the other hand, some practitioners still find Problem 1 useful to incorporate expert opinions.
Let us denote by α (i) n the random variable α (i) n (X 1 , . . . , X n ) for every n ≥ 0. Then the LLR processes defined in (1) can be written as In our analyses, it is often very convenient to work under the conditional probability measures: defined for every i ∈ M, n ≥ 1, (E 1 × · · · × E n ) ∈ E n . Let E i and E (t) i , respectively, be the expectations with respect to P i and P (t) i . Under P (0) i and P (∞) i , the random variables X 1 , X 2 , . . . are independent and have common probability density functions f i (·) and f 0 (·), respectively. We denote by P (∞) any P (∞) i for any i ∈ M. The LLR processes in (1) or (7) play a role in changing probability measures as the next lemma shows. Lemma 2.3 (Change of measure) For every i ∈ M, an F-stopping time τ , and an F τmeasurable event F , The next proposition introduces the key risk components and its proof follows directly from Lemma 2.3 after setting F := {d = i} ∈ F τ for every i ∈ M. Proposition 2.4 For every strategy (τ, d) ∈ , c > 0, m ≥ 1 and strictly positive constants a = (a ji ) i∈M,j ∈M\{i} , we can rewrite (4)-(6) as R (c,a,m) Here (10)-(12) correspond to the conditional risks given μ = i, written in terms of the process G (a) i (n), which is a linear combination of the exponents of the LLR processes and serves as the Radon-Nikodym derivative.
Remark 2.5 In the remainder, we prove a number of results in the P i -a.s. sense for given i ∈ M. These also hold automatically P (t) i -a.s. for every t ≥ 1. Indeed, because P{θ < ∞} = 1, P{θ = t} > 0 for every t ≥ 1 and P i (F ) = ∞ t=0 P{θ = t}P (t) i (F ) for every F ∈ F , P i (F ) = 1 implies P (t) i (F ) = 1 for every t ≥ 1.

Asymptotically optimal sequential detection and identification strategies
We will introduce two strategies that are computationally efficient and asymptotically optimal. The first strategy raises an alarm as soon as the posterior probability of the event that at least one of the change types occurred exceeds some suitable threshold, and is shown to be asymptotically optimal for Problem 1. The second strategy is its variant expressed in terms of the LLR processes and is shown to be asymptotically optimal for Problem 2. The asymptotic performance analyses of both rules depend on the same convergence results of the LLR processes. The proofs can be conducted in parallel and almost simultaneously both for Problem 1 and for Problem 2 because the detection times can be approximated by the first hitting times of certain processes that share the same asymptotic properties. where τ (i) A := inf n ≥ 1 : (i) n > Define the logarithm of the odds-ratio processes as Then (14) can be rewritten as The values of A determine the sizes of the polyhedrons that approximate the original optimal stopping regions, e.g., the triangular regions when M = 2 as in Fig. 1(b), and need to be determined so as to minimize the Bayes risk. where υ (i) We show that, after choosing suitable A and B, the strategy (τ A , d A ) is asymptotically optimal for Problem 1 as c goes to zero, and the strategy (υ B , d B ) is asymptotically optimal for Problem 2 as goes to zero-while R ji /R ki for every j, k ∈ M 0 \ {i} remains bounded away from zero in the sense that for any strictly positive constants k = (k i ) i∈M -and this limit mode will still be denoted by " R ↓ 0" for brevity. More precisely, we find functions A(c) of the unit sampling cost c in Problem 1 and B(R) of the upper bounds (R ji ) i∈M,j ∈M 0 \{i} on the false alarm and misdiagnosis probabilities in for every fixed m ≥ 1 and every set a = (a ji ) i∈M,j ∈M 0 \{i} of strictly positive constants.
Here "x γ ∼ y γ as γ → γ 0 " means lim γ →γ 0 x γ /y γ = 1. In fact, we obtain results stronger than 3.1 Convergence of false alarm and misdiagnosis probabilities and detection delay As c and R decrease to zero in Problems 1 and 2, respectively, we expect that the optimal stopping regions shrink, or equivalently the values of A and B should decrease. We therefore study the asymptotic behaviors of the false alarm and misdiagnosis probabilities and the change detection time as go to zero, and then adapt their values as functions of c and R so as to attain asymptotically optimal strategies. Here in concordance with (18) the limits B i ↓ 0 for every i ∈ M are taken such that We first study the asymptotic behaviors of the false alarm and misdiagnosis probabilities. The upper bounds can be obtained by a direct application of Proposition 2.4.

Proposition 3.4 (Bounds on false alarm and misdiagnosis probabilities) (i) For every fixed
The asymptotic behavior of the detection delay is closely related to the convergence of the average increment n (i, j )/n. According to the next proposition, n (i, j )/n converges P i -a.s. as n ↑ ∞ to some strictly positive constant for every i ∈ M and j ∈ M 0 \ {i}. The proof of Proposition 3.7 is deferred to Sect. 4, where the limiting values are analytically expressed in terms of the Kullback-Leibler divergence between the alternative probability measures.
Proposition 3.7 For every i ∈ M and j ∈ M 0 \ {i}, we have P i -a.s. n (i, j )/n → l(i, j ) as n ↑ ∞ for some strictly positive constant l(i, j ).
Let us fix any i ∈ M. We show that, for small values of A and B, the stopping times τ (i) A and υ (i) B in (14) and (17) are essentially determined by the process (i, j (i)), where and P i -a.s. n (i, j (i))/n ≈ (i) n /n ≈ (i) n /n ≈ l(i) for sufficiently large n as the next proposition suggests.
The proof of part (i) follows from Proposition 3.7, and part (ii) follows from part (i) and Baum and Veeravalli (1994, Lemma 5.2). Proposition 3.8 implies the following convergence results.

Lemma 3.9 For every i ∈ M and any
Remark 3.10 We shall always assume that 0 < B ij < 1 or −∞ < log B ij < 0 for all i ∈ M and j ∈ M 0 \{i} as we are interested in the limits of certain quantities as B ↓ 0. Because where the last equality follows from the first two equalities.
Because we want to minimize the mth moment of the detection delay time for any m ≥ 1, we will strengthen the convergence results of Lemma 3.9. Condition 3.11 below for some r ≥ m is both necessary and sufficient for the L m -convergences.

Condition 3.11 (Uniform Integrability) For some r ≥ m,
Lemma 3.12 Let m ≥ 1 be any integer.
where the limits B i ↓ 0 for all i ∈ M are taken such that (25) is satisfied.
The proof of Lemma 3.12 follows from Lemma 3.9, Chung (2001, Theorem 4.5.4), Gut (2005, Theorem 5.2) and because τ (i) Using renewal theory, one can show that Condition 3.11 holds if n (i, j ) = X 1 + · · · + X n is a random walk for some sequence (X n ) n≥1 of i.i.d. random variables with Lai (1975). In the case of the SMHT, n (i, j ) is indeed a random walk with positive drift for every i ∈ M and j ∈ M 0 \ {i}; see Baum and Veeravalli (1994).
Condition 3.11 is often hard to verify. An alternative sufficient condition can be given in terms of the r-quick convergence. The r-quick convergence of suitable stochastic processes is known to be sufficient for the asymptotic optimalities of certain sequential rules based on non-i.i.d. observations in CPD and SMHT problems. We will show that the r-quick convergence of the LLR processes is also sufficient for the joint sequential change detection and identification problem.
Definition 3.13 (The r-quick convergence) Let (ξ n ) n≥0 be any stochastic process and r > 0.
According to Proposition 3.15, stated below and proved in the appendix, Condition 3.11 holds if ( (i) n /n) n≥1 and ( (i) n /n) n≥1 converge r-quickly to l(i) under P i for every i ∈ M, which we put together as a different condition: Proposition 3.15 Let m ≥ 1. (i) If Condition 3.14 (i) holds for some r ≥ m, then (28) and Condition 3.11 (i) hold. (ii) If Condition 3.14 (ii) holds for some r ≥ m, then (29) and Condition 3.11 (ii) hold.

Asymptotic optimality
We now prove the asymptotic optimalities of (τ A , d A ) and (υ B , d B ) for Problems 1 and 2 under Condition 3.11 (i) and (ii), respectively.
We first derive a lower bound on the expected detection delay under the optimal strategy. The lower bound on the expected detection delay under the optimal strategy can be obtained similarly to CPD and SMHT; see Baum and Veeravalli (1994), Dragalin et al. (1999), Dragalin et al. (2000), Lai (2000), Tartakovsky and Veeravalli (2004) and Baron and Tartakovsky (2006). This lower bound and Lemma 3.12 below can be combined to obtain asymptotic optimality for both problems.

Lemma 3.17 For every
We now study how to set A in terms of c in order to achieve asymptotic optimality in Problem 1. We see from Proposition 3.4 and Lemma 3.12 that the false alarm and misdiagnosis probabilities decrease faster than the expected delay time and are negligible when A and B are small. Indeed, we have, in view of the definition of the Bayes risk in (10), by Proposition 3.4 and Lemma 3.12, for any 0 < σ i < a i for every i ∈ M, This motivates us to choose the value of A i such that it minimizes Consequently, it is sufficient to show that The proof of the asymptotic optimality below is similar to that of Theorem 3.1 in Baron and Tartakovsky (2006) for CPD.
Proposition 3.18 (Asymptotic optimality of (τ A , d A ) in Problem 1) Fix m ≥ 1 and a set of strictly positive constants a. Under Conditions 3.11 (i) or 3.14 (i) for the given m, the strategy (τ A(c) , d A(c) ) is asymptotically optimal as c ↓ 0; that is (21) holds for every i ∈ M.
It should be remarked here that the asymptotic optimality results hold for any 0 < σ i < a i . However, for higher-order approximation, it is ideal to choose such that In Sect. 5, we achieve this value using nonlinear renewal theory. We now show that (υ B , d B ) is asymptotically optimal for Problem 2. By Proposition 3.4, if we set This together with Lemma 3.17 shows the asymptotic optimality.

The convergence results of the LLR processes
In this section, we will prove Proposition 3.7 and obtain the limits l(i, j ) for every i ∈ M and j ∈ M 0 \ {i}, which can be expressed in terms of the Kullback-Leibler divergence of the pre-and post-change probability density functions and the exponential decay rate in (2) of the disorder time probability distribution. Under some mild condition, we show that the convergence also holds in L r for every r ≥ 1. Let us denote the Kullback-Leibler divergence of f i from f j by which always exists and is non-negative. Furthermore, Assumption 2.1 ensures that To ensure that E (0) i [log(f 0 (X 1 ))/(f j (X 1 ))] exists for every i ∈ M, j ∈ M 0 \ {i}, we assume the following.

Decomposition of the LLR processes
We will decompose each LLR process (1) into some random walk with a positive drift and some stochastic process whose running average increment vanishes in the limit. In the SMHT case (namely, when p 0 = 1), for every i ∈ M and j ∈ M \ {i}, is a P i -random walk. Its running average increment n (i, j )/n converges P i -a.s. to the Kullback-Leibler divergence q(i, j ) as n ↑ ∞ by the strong law of large numbers (SLLN).
Although ( (i, j )) j ∈M 0 \{i} , for p 0 = 0, are not P i -random walks, this observation nonetheless motivates us to approximate them by some random walks. Let We show that (i, j ) can be approximated by a random walk with drift q(i, j ) > 0 if j ∈ i and with q(i, 0) + > 0 otherwise; namely, with drift min(q for every n ≥ 1 and j ∈ M 0 . Then it can be checked easily that, for any j ∈ M 0 \ {i}, we have By (7), after taking logarithms on both sides, each LLR process can be written as where Moreover, n l=1 h ij (X l ) can be split into post-and pre-change terms, and we have for every fixed j ∈ M 0 \ {i}. Notice that the first term in (45) is conditionally a random walk under P (t) i given θ = t for every t ≥ 0.

The convergence of the LLR processes
Fix i ∈ M and j ∈ M 0 \ {i}. In view of (42), we can explore the convergence for ( n l=1 h ij (X l ))/n and n (i, j )/n separately. For the first term, notice that Because θ is an a.s. finite random variable, the first term on the righthand side converges by the SLLN, while the second term converges to zero. Then Remark 2.5 implies Lemma 4.2, and, under some mild additional conditions, Lemma 4.3 below.

Lemma 4.2 For every
Note that (47) holds if and only if the following condition holds.

Condition 4.4
For every i ∈ M, j ∈ M 0 \ {i}, and r ≥ 1, suppose that We now show that n (i, j )/n converges P i -a.s. to zero. The convergence result holds in L r (P i ) as well for r ≥ 1 under a mild condition. To show this, we first determine the limits of (L (·) n /n) n≥1 and (K (·) n /n) n≥1 as n ↑ ∞ under P i .

Lemma 4.5
For every i ∈ M, we have the followings under P i .
(vii) For every j ∈ M, (|K (j ) n /n| q ) n≥1 is uniformly integrable for every 0 ≤ q ≤ r, if (48) holds and Notice in Lemma 4.5 (vi) that in order for L (i) n to converge in L r under P i to zero, it is sufficient to have The characterization of n (i, j ) in (44) leads to the next convergence result.
Lemma 4.6 For every i ∈ M and j ∈ M 0 \ {i}, we have n (i, j )/n → 0 as n ↑ ∞ P i -a.s.
Moreover, the convergence holds in L r under P i as well for some r ≥ 1 given the following condition.

Condition 4.7
Given i ∈ M, j ∈ M 0 \ {i} and r ≥ 1, we suppose that (50) holds and (i) j ∈ i and (48) holds, or (ii) j / ∈ i or j = 0 and (49) holds for the given r.
By combining the results in Lemmas 4.5 and 4.6, Proposition 3.7 indeed holds with l(·, ·) as defined in (46). Moreover, the following convergence results hold by Lemmas 4.5 and 4.8.  guarantees that l(i, j ) > 0 for every i ∈ M and j ∈ M 0 \ {i}. (iii) We later assume, in Sect. 5 below for higher-order approximations, that there is a unique j (i) ∈ M 0 \ {i} such that l(i) = l(i, j (i)) = min j ∈M 0 \{i} l(i, j ) for every i ∈ M. Then (i) implies l(i) < l(i, 0) and q(i, j (i)) < q(i, 0) + , and j (i) ∈ i and i = ∅.

Remark 4.11
We proved a number of results on the convergence of the LLR processes. However, those results do not guarantee their r-quick convergence. A sufficient condition derived by means of Jensen's inequality can be found in our technical report (Dayanik et al. 2011).

Higher-order approximations
In this section, we derive a higher-order asymptotic approximation for the minimum Bayes risk in Problem 1 by choosing the values of σ in (31) as discussed in the previous section. Proposition 3.4 (i) gives an upper bound on (R (a) i (·, ·)) i∈M , and here we investigate if there exists some σ such that (36) holds.

Asymptotic behaviors of the false alarm and misdiagnosis probabilities
Fix i ∈ M. By (12) and because τ A = τ (i)

Recall that τ (i)
A is the first time the process (i) n exceeds the threshold − log A i , and − log A i ↑ ∞ ⇐⇒ A i ↓ 0. The following lemma shows that the convergence holds on condition that the overshoot converges in distribution as A i ↓ 0 to some random variable W i under P i .

Lemma 5.1 Fix i ∈ M. If j (i) is unique and the overshoot W i (A i ) in (53) converges in distribution as
In Lemma 5.1 above, σ i does not depend on a ji for any j ∈ M 0 \ {i, j (i)} and therefore we see that R ji (τ A , d A ) is negligible compared with R j (i)i (τ A , d A ) for any j ∈ M 0 \ {i, j (i)} for small A.

Nonlinear renewal theory and the overshoot distribution
We now see that Lemma 5.1 indeed holds via nonlinear renewal theory on condition that j (i) is unique. We obtain the limiting distribution of the overshoot (53).
Observe that, for every k ∈ M 0 \ {i}, By (45) and (54), we have (i) n = n l=θ∨1 h ij (i) (X l ) + ξ n (i, j (i)), where We will take advantage of the fact that, given θ , the process n l=θ∨1 h ij (i) (X l ) is conditionally a random walk and ξ n (i, j (i)) can be shown to be "slowly-changing", in the sense that ξ n+1 (i, j (i)) − ξ n (i, j (i)) ≈ 0 for large n. This implies that the increments of the slowly-changing process ξ n (i, j (i)) are negligible compared to those of the random walk term n l=θ∨1 h ij (i) (X l ) at every large n. This result can be used to obtain the overshoot distribution of the process (i) at its boundary-crossing time τ (i) A for small A i by means of the nonlinear renewal theory (Woodroofe 1982;Siegmund 1985). Let us firstly give a few definitions and state a fundamental theorem of nonlinear renewal theory.

Definition 5.3
A sequence of random variables (ξ n ) n≥1 is said to be slowly-changing if it is u.c.i.p. and max{|ξ 1 |, . . . , |ξ n |} n in probability Remark 5.4 If a process converges a.s. to a finite random variable, then it is a slowlychanging process. Moreover, the sum of two slowly-changing processes is also a slowlychanging process.
The following theorem states that, if a process is the sum of a random walk with positive drift and a slowly-changing process, then the overshoot at the first time it exceeds some threshold has the same asymptotic distribution as that of the overshoot of the random walk, as the threshold tends to infinity.
Theorem 5.5 (Woodroofe 1982, Theorem 4.1;Siegmund 1985, Theorem 9.12) On some ( , E, P), let (Z n ) n≥1 be a sequence of i.i.d. random variables with some common nonarithmetic distribution and mean 0 < EZ 1 < ∞. Let (ξ n ) n≥1 be a slowly-changing process and (Z k ) k≥n+1 be independent of (ξ l ) 1≤l≤n for every n ≥ 1. If T b := inf{n ≥ 1 : n i=1 Z i − ξ n > b} and T b := inf{n ≥ 1 : n i=1 Z i > b} for every b ≥ 0, We fix i ∈ M and obtain the limiting distribution of the overshoot W i (A i ) as A i ↓ ∞ using Theorem 5.5.
For every t ≥ 1 and j (i) ∈ arg min j ∈M 0 \{i} l(i, j ), define a stopping time, and random variable W (t) i whose distribution is given by The next lemma follows immediately from Theorem 5.5.

Lemma 5.7 Fix i ∈ M and t ≥ 0. If j (i) is unique, then the overshoot W i (A i ) converges to
i for every t ≥ 1, which leads to Lemma 5.8 below.

Proposition 5.9 Fix i ∈ M and suppose j (i) is unique. Then R (a) i (τ A , d A )/A i
where W i is the random variable defined in Lemma 5.8. Therefore, a higherorder approximation for Problem 1 can be achieved by setting in (32)

Numerical examples
To assess the performance of the asymptotically optimal rule, one firstly needs to find, for comparison, the optimal solution. As outlined in Sect. 2, in order to solve optimally the fixed-error-probability formulation, one first needs to transform it to a minimum Bayes risk formulation by means of Lagrange relaxation, and then solve repeatedly the latter for different values of Lagrange multipliers. Because this method requires extensive calculations and its details are not of the primary interest of this paper, we focus on the minimum Bayes risk formulation and evaluate the performance of the strategy (τ A(c) , d A(c) ) numerically in the i.i.d. Gaussian case described below. Its asymptotic optimality ensures that the strategy is near-optimal when the unit detection delay cost c is small. Our numerical example suggests that it is near-optimal even for mildly higher values of the unit detection delay cost.

The numerical comparison of the minimum and asymptotically minimum Bayes risks
We calculate the minimum and asymptotically minimum Bayes risks for the following example. We assume that M = 2, K = 2, p 0 = 0, p = 0.01, (ν 1 , ν 2 ) = (0.1, 0.9), and the mean vectors λ 0 = (λ (1) 0 , λ (2) 0 ) and λ i = (λ (1) i , λ (2) i ), i = 1, 2 before and after the change, respectively, satisfy Table 2 compares the performances of the strategy (τ A(c) , d A(c) ) and the optimal strategy for fixed a ji = 1 for every i ∈ M and j ∈ M 0 \ {i} as the unit detection delay cost c decreases. The optimal stopping regions are found by the value iteration described by Dayanik et al. (2008). The Bayes risks of the strategies are estimated via Monte Carlo simulation. For accurate approximations, we used (59), and (σ i ) i∈M are computed with Monte Carlo methods.
We see that (τ A(c) , d A(c) ) is asymptotically optimal; the ratio of the optimal and approximate Bayes risk values converges to 1 as c ↓ 0 as listed in the last column. Moreover, the approximate and the minimum Bayes risk values are close even for large c values, and this is due to the higher-order approximation as studied in Sect. 5. Acknowledgements The authors thank Alexander Tartakovsky for the illuminating discussions. We also thank an anonymous referee and the editors for the constructive remarks and suggestions which significantly improved our presentation. The research of Savas Dayanik was supported by the TÜBİTAK Research Grants 109M714 and 110M610. Warren B. Powell was supported in part by the Air Force Office of Scientific Research, contract FA9550-08-1-0195, and the National Science Foundation, contract CMMI-0856153. Kazutoshi Yamazaki was in part supported by Grant-in-Aid for Young Scientists (B)22710143, the Ministry of Education, Culture, Sports, Science and Technology, and Grant-in-Aid for Scientific Research (B)2271014, Japan Society for the Promotion of Science.

Appendix A: Proofs and auxiliary results
A.1 Proof of Remark 2.2 We will prove that which implies that P-a.s. 0 < (i) To prove (61), let E i := {x : 0 < f i (x)/f 0 (x) < ∞} for every i ∈ M. Then Assumption 2.1 implies that Because P{θ ≤ 1, μ = j } > 0 for every j ∈ M and P{θ > 1} > 0, we must have the first equality follows. The proof of the second equality is similar.
A.3 Proof of Proposition 3.4 (13), where the equality and the last inequality follow from (15) and (16), respectively. Hence, we have R (a) A.4 Proof of Proposition 3.6 For (i), because (τ (i) A ) increases as A i ↓ 0, it is enough to show that there is a subsequence the limit of which exists and equals ∞, P i -a.s. Fix n ≥ 1. By (14), we have , which is zero by Remark 2.2. Namely, τ (i) A → ∞ in probability under P i as A i ↓ 0. Hence, there is a subsequence of (A i ) along which P i -a.s. τ (i) A ↑ ∞, which proves (i).
≤ 2ν j A j by Proposition 3.4 (i), for every fixed n ≥ 1, we have which goes to zero as A ↓ 0 by (i) and by Proposition 3.4. Namely, τ A → ∞ in probability under P i as A ↓ 0; therefore, there is a subsequence of (τ A ) A>0 that goes to ∞, P i -a.s. as A ↓ 0. Because (τ A ) A>0 is increasing P i -a.s. as A ↓ 0, its limit exists and equals ∞, P i -a.s. as well, and (ii) follows.
2. Therefore, as in the proof of (i), P i -a.s. υ (i) B → ∞ as B i ↓ 0, and (iii) follows. Furthermore, (iv) is immediate because, for every fixed n ≥ 1, Proposition 3.4 (ii) implies A.5 Proof of Lemma 3.9 First, (16) implies that (i) By Proposition 3.8 (i) and Proposition 3.6 (i), we have . By Proposition 3.8 (ii) and Proposition 3.6 (iii), we have l s. If we divide and multiply by − log B ij (i) before we take the limits and use (27), then (iii) follows; (iv) follows from (iii) because A.6 Proof of Proposition 3.15 Fix i ∈ M. (i) Lemma 3.9 (i) and Fatou's lemma give the inequality lim inf Let us next define T δ := inf{n ≥ 1 : inf k≥n ( (i) k /k) > l(i) − δ} for every 0 < δ < l(i). Because by hypothesis (i) n /n converges m-quickly (m ≤ r) to l(i) as n ↑ ∞ un- After dividing both sides by (− log A i ) and taking the m-norm on both sides, Minkowski's inequality applied to the righthand side gives [(τ (i) A /(− log A i )) m ] 1/m ≤ 1/l(i), which together with (62) proves (i).
(ii) Lemma 3.9 (iii) and Fatou's lemma imply that lim inf Let us define T δ := inf{n ≥ 1 : inf k≥n ( (i) k /k) > l(i) − δ} for every 0 < δ < l(i). Because by hypothesis (i) n /n converges m-quickly (m ≤ r) to l(i) as n ↑ ∞ under P i , we have E i [(T δ ) m ] < ∞ for every 0 < δ < l(i). Using a similar argument as in the first part, we can show that υ (i) B < − log B i /(l(i) − δ) + 1 + T δ . After diving both sides by (− log B i ) and taking the m-norm of both sides, an application of Minkowski's inequality on the righthand side gives [(υ (i) B /(− log B i )) m ] 1/m ≤ 1/l(i). After raising both sides to power m, the inequality υ (i) Dividing and multiplying the lefthand side with (− log B ij (i) ) m prior to taking the limit give lim sup B i ↓0 E i [(υ (i) B /(− log B ij (i) )) m ] ≤ 1/l(i) m thanks to (27). The last inequality and (63) prove (ii).
Proof of Lemma 3.17 Fix a set of positive constants R, 0 < δ < 1 and (τ, d) ∈ . By Markov inequality, By taking limits on both sides, lim inf which is greater than or equal to δ by Lemma A.3. The claim is proved because 0 < δ < 1 is arbitrary.
A.9 Proof of Proposition 3.18 Assume on the contrary that lim inf c↓0 inf (τ,d)∈ R (c,a,m) i (τ, d)/g (c) i (A i (c)) < 1, implying that there is a decreasing subsequence (c n ) n≥1 ↓ 0 and corresponding strategies (τ * cn , d * cn ) such that lim n↑∞ R (cn,a,m) By (34) where the last inequality follows from (33). However, this contradicts with (65), and the proof is complete.
A.10 Proof of Lemma 4.3 By Lemma 4.2, it is sufficient to show that (|(1/n) n l=1 h ij (X l )| r ) n≥1 is uniformly integrable under P i . The running sum n l=1 h ij (X l ) is a random walk under both P (∞) and P (0) i , and it is uniformly integrable under both measures because (47) holds; see Gut (1988, Theorem 4.1). Hence, it is also uniformly integrable as well under P i because i Z for every random variable Z.
A.11 Proof of Lemma 4.5 We first prove the following.
We now prove (iii) using the next sufficient condition for uniform integrability.