Online Learning for Changing Environments using Coin Betting

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments. Recent studies have proposed"meta"algorithms that convert any online learning algorithm to one that is adaptive to changing environments, where the adaptivity is analyzed in a quantity called the strongly-adaptive regret. This paper describes a new meta algorithm that has a strongly-adaptive regret bound that is a factor of $\sqrt{\log(T)}$ better than other algorithms with the same time complexity, where $T$ is the time horizon. We also extend our algorithm to achieve a first-order (i.e., dependent on the observed losses) strongly-adaptive regret bound for the first time, to our knowledge. At its heart is a new parameter-free algorithm for the learning with expert advice (LEA) problem in which experts sometimes do not output advice for consecutive time steps (i.e., \emph{sleeping} experts). This algorithm is derived by a reduction from optimal algorithms for the so-called coin betting problem. Empirical results show that our algorithm outperforms state-of-the-art methods in both learning with expert advice and metric learning scenarios.


Introduction
Online learning algorithms are typically tailored to stationary environments, but in many applications the environment is dynamic. In online portfolio management, for example, stock price trends can vary unexpectedly, and the ability to track changing trends and adapt to them are crucial in maximizing profit. In product reviews, words describing product quality may change over time as products evolve and the tastes of customers change. Keeping track of the changes in the metric describing the relationship between review text and rating is crucial for improving analysis and the quality of recommendations.
We consider the problem of adapting to changing environments in the online learning context. Let D be the decision space, L be a family of loss functions that map D to R, and T be the target time horizon. Let A be an online learning algorithm. We define the online learning protocol in Figure 1.
The usual goal of online learning is to find a strategy that compares favorably with the best fixed comparator in a subset W of decision space D, in hindsight. (Often, W = D.) Classically, one seeks a At each time t = 1, 2, . . . , T , • The learner A picks a decision x A t ∈ D. • The environment reveals a loss function f t ∈ L.
• The learner A suffers loss f t (x A t ). Figure 1: Online learning protocol Algorithm SA-Regret order Time factor FLH [9] √ T log T T AFLH [9] √ T log T log(I 2 − I 1 ) log T SAOL [5] (I 2 − I 1 ) log 2 (I 2 ) log T CBCE (ours) (I 2 − I 1 ) log(I 2 ) log T . We show the part of the regret due to the meta algorithm only, not the black-box. The last column is the multiplicative factor in the time complexity introduced by the meta algorithm. CBCE (our algorithm) achieves the best SA-Regret and time complexity.
low value of the following (cumulative) static regret objective: When the environment is changing, static regret is not a suitable measure, since it compares the learning strategy against a decision that is fixed for all t. We need to make use of stronger notions of regret that allow comparators to change over time. We introduce the notation Throughout, I 1 (I 2 ) denotes the starting (ending) time step of an interval I. We call an algorithm strongly-adaptive if it has a low value of SA-Regret, by which we mean a value O(polylog(T )R P (I)), where R P (I) is the minimax static regret of the online learning problem P restricted to interval I. Let us call w 1:T := {w 1 , . . . , w T } an m-shift sequence if it changes at most m times, that is, A related notion, m-shift regret [10], measures the regret with respect to a comparator that changes at most m times in T time steps.

m-Shift-Regret
While the m-shift regret is more interpretable, SA-Regret is a stronger notion since it is well-known that a tight SA-Regret bound implies a tight m-shift regret bound [13,5], as we discuss further in Section 7.2. As noted by [20], SA-Regret has a strong connection to so-called dynamic regret (with respect to the temporal variations of the f t 's).
Several generic online algorithms that adapt to changing environments have been proposed recently. Rather than being designed for a specific learning problem, these are "meta" algorithms that take any online learning algorithm as a black-box and turn it into an adaptive one. We summarize the SA-Regret of existing meta algorithms in Table 1. In particular, the pioneering work of Hazan & Seshadhri [9] introduced adaptive regret, a slightly weaker notion than the SA-Regret, and proposed two meta Fixed Share [10,4] mT (log N + log T ) N T m 2 T (log N + log T ) N T GeneralTracking EXP [8] mT (log N + m log 2 T ) N T log T mT (log N + log 2 T ) N T log T (γ ∈ (0, 1)) 1 γ mT (log N + m log T ) N T 1+γ log T 1 γ mT (log N + log T ) N T 1+γ log T ATV [13] mT (log N + log T ) N T 2 SAOL MW [5] mT (log N + log 2 T ) N T log T CBCE CB (ours) mT (log N + log T ) N T log T Table 2: m-shift regret bounds of LEA algorithms. Our proposed algorithm (last line) achieves the best regret among those with the same time complexity and does not need to know m. Each quantity omits constant factors. "Agnostic to m" means that an algorithm does not need to know the number m of switches in the best expert.
algorithms called Follow-the-Leading-History (FLH) and Advanced FLH (AFLH). 1 However, their SA-Regret depends on T rather than |I| and hence can be significantly larger. The SAOL approach of [5] improves the SA-Regret to O (I 2 − I 1 ) log 2 (I 2 ) .
In this paper, we propose a new meta algorithm called Coin Betting for Changing Environments (CBCE) that combines the idea of "sleeping experts" introduced in [2,6] with the Coin Betting (CB) algorithm [15]. The SA-Regret of CBCE is better by a factor log(I 2 ) than that of SAOL, as shown in Table 1. We present our extension of CB to sleeping experts and prove its regret bound in Section 3. This result leads to the improved SA-Regret bound of CBCE in Section 4.
Our improved bound yields a number of improvements in various online learning problems. In describing these improvements, we designate by M B a complete algorithm assembled from meta algorithm M and black-box B. In this notation, our algorithm is designated by CBCE CB .
Consider the learning with expert advice (LEA) problem with N experts. We make comparisons with respect to m-shift regret bounds, as many LEA algorithms provide only bounds of this type. Our algorithm CBCE CB has m-shift regret and time complexity O(N T log T ). This regret is a factor √ log T better than existing algorithms with the same time complexity. Although AdaNormalHedge.TV (ATV) and Fixed Share achieve the same regret, the former has larger time complexity, and the latter requires prior knowledge of the number of shifts m. We summarize the m-shift regret bounds of various algorithms in Table 2. We emphasize that the same regret order and time complexity as CBCE CB can be achieved by combining our proposed CBCE meta algorithm with any blackbox algorithm (e.g., AdaNormalHedge [13]).
In online convex optimization with G-Lipschitz loss functions over a convex set D ∈ R d of diameter B, Online Gradient Descent (OGD) has regret O(BG √ T ) [18]. Thus, CBCE with OGD (CBCE OGD ) has the following SA-Regret: which improves by a factor log(I 2 ) over SAOL OGD .
We also propose an improved version of CBCE that has a so-called first-order regret bound. That is, the SA-Regret on an interval I = {I 1 , . . . , I 2 } scales with min w∈W t∈I f t (w) rather than |I| as follows: where we omit the term due to the blackbox algorithm. We emphasize that, while there is an extra log(I 2 ) factor and additive term, the main quantity min w∈W t∈[T ] f t (w) can be significantly smaller than |I| if there exists the decision w whose loss is very small in I. To our knowledge, this is the first first-order SA-Regret bound in online learning. 2 In Section 5, we compare CBCE empirically to a number of meta algorithms for changing environments in two online learning problems: LEA and Mahalanobis metric learning. We observe that CBCE outperforms the state-of-the-art methods in both tasks, thus confirming our theoretical findings.

Meta Algorithms for Changing Environments
Let B be a black-box online learning algorithm following the protocol in Figure 1. A trick commonly used in designing a meta algorithm M for changing environments is to initiate a new instance of B at every time step [9,8,1]. That is, we run Denoting by B J the run of black-box B on interval J, a meta algorithm at time t takes a weighted average of decisions from the runs {B J : t ∈ J}. The underlying idea is as follows. Suppose we are at time t and the environment has changed at an earlier time t < t. We hope that the meta algorithm would assign a large weight to the black-box run B J (where J = [t ..∞]), since other runs are either based on data prior to t or use only a subset of the data generated since t . Ideally, the meta algorithm would assign a large weight to B J soon after time t , by carefully examining the online performance of each black-box run.
This schema requires updating of t instances of the black-box algorithm at each time step t, leading to a O(t) multiplicative increase in complexity over a single run. This factor can be reduced to O(log t) by restarting black-box algorithms on a carefully designed set of intervals, such as the geometric covering intervals [5] (GC) or the data streaming intervals [9,8] (DS), which is a special case of a more general set of intervals considered in [19]. While both GC and DS achieve the same goal, as we show in Section 7.3, 3 we use the former as our starting point for ease of exposition.   Define the set of intervals that includes time t as follows: It can be shown that |Active(t)| = log 2 (t) + 1. Since at most O(log(t)) intervals contain any given time point t, the time complexity of the meta algorithm is a factor O(log(t)) larger than that of the black-box B.
The following Lemma from Daniely et al. [5] shows that an arbitrary interval I can be partitioned into a sequence of smaller blocks whose lengths successively double, then successively halve. This result is key to the usefulness of the geometric covering intervals.

Regret Decomposition.
We show now how to use the geometric covering intervals to decompose the SA-Regret of a complete algorithm M B . Using the notation we can restate (1) as follows:
(2) (We purposely use symbol J for intervals in J and I for a generic interval that is not necessarily in J .) The black-box regret on J = [J 1 ..J 2 ] ∈ J is exactly the standard regret for T = |J|, since the black-box run B J was started from time J 1 . Thus, in order to prove that a meta algorithm M suffers low SA-Regret, it remains to show two things: 1. M has low regret on interval J ∈ J ; 2. The outer sums over i in (2) are small for both the meta algorithm and the black-box algorithm. Daniely et al. [5] address the second issue above in their analysis. They show that if the black-box regret on J (i) is c |J (i) | for some c then the second double summation of (2) is bounded by 8c |I|, 4 which is perhaps the best one can hope for. The same holds true for the meta algorithm. Thus, it remains to focus on the first issue above. This is our main contribution.
In the next two sections, we describe the design and application of our meta algorithm. In Section 3, we propose a novel method that incorporates sleeping experts and the coin betting framework. Section 4 describes how our method can be used as a meta algorithm that has an SA-Regret guarantee.

Coin Betting Meets Sleeping Experts
Our meta algorithm CBCE extends the coin-betting framework [15] to a variant of the learning with expert advice (LEA) problem called "sleeping experts" [2,6]. CBCE is parameter-free (there is no explicit learning rate) and has near-optimal regret. Our construction below has further interest as a near-optimal solution for the sleeping bandits problem.

Sleeping Experts.
In the LEA framework, the decision set is D = ∆ N , an N -dimensional probability simplex of weights assigned to the various experts. To distinguish LEA from the general online learning problem, we use notation p t in place of x t , and h t in place of f t . Denoting by t := ( t,1 , . . . , t,N ) ∈ [0, 1] N the vector of loss values of experts at time t provided by the environment, the learner's loss function is h t (p) := p t .
Since p ∈ D is a probability vector, the learner's decision can be viewed as hedging between the N alternatives. Let e i be an indicator vector for dimension i; e.g., e 2 = (0, 1, 0, . . . , 0) . In this notation, the comparator set W is {e 1 , . . . , e N }, that is, the learner competes with a strategy that commits to a single expert for the entire time interval [1..T ]. 5 Recall that each black-box run B J is on a different interval J. The meta algorithm's role is to hedge bets over multiple black-box runs. Thus, it is natural to treat each run B J as an expert and use an LEA algorithm to combine decisions from each expert B J . The loss incurred on run B J is . The challenge is that each expert B J may not output decisions at time steps outside the interval J. This problem can be reduced to the sleeping experts problem studied in [2,6], in which experts are not required to provide decisions at every time step; see [13]. We introduce a indicator variable I t,i ∈ {0, 1}, which is set to 1 if expert i is awake (that is, outputting a decision) at time t, and zero otherwise.
where N can be countably infinite. The algorithm is said to be "aware" of I t and it assigns zero weight to the experts that are sleeping, that is, We would like to have a guarantee on the regret with respect to expert i, but only for the time steps for which expert i is awake. Following Luo & Schapire [13], we define a regret bound with respect to u ∈ ∆ N as follows: If we set u = e j for some j, the above is simply regret with respect to expert j while that expert is awake, and we aim to achieve a regret of O( t I t,j ) up to logarithmic factors. If I t,j = 1 for all t ∈ [T ], then it recovers the standard static regret in LEA.
Coin Betting for LEA. We consider the coin betting framework of Orabona & Pál [15], which constructs an LEA algorithm from a coin betting potential function (explained below). A player starts from the initial endowment 1. At each time step, the adversary chooses an outcome arbitrarily while the player decides on which side to bet (heads or tails) and how much to bet. Then the outcome is revealed. The outcome can be a head (+1), tail (-1), or any point on the continuum between these two extremes (e.g., −0.3) where the absolute value of the outcome indicates the weight of being a head or tail. We encode a coin flip at iteration t as g t ∈ [−1, 1] where |g t | indicates the weight of the outcome. Let Wealth t−1 be the total money the player possesses after time step t − 1. (Note that Wealth 0 = 1.) We encode the player's betting decision as the signed betting fraction β t ∈ (−1, 1), where the positive (negative) sign indicates head (tail) and the absolute value |β t | < 1 indicates the fraction of his current wealth to bet. Thus, the actual amount wagered is w t := β t Wealth t−1 . Once the coin flip g t is revealed, the player's wealth changes as follows: Wealth t = (1 + g t β t )Wealth t−1 . The player makes (loses) money when the betted side is correct (wrong), and the amount of wealth change depends on both the flip weight |g t | and his betting amount |w t |.
In the coin betting framework, a potential function denoted by F t (g 1 , . . . , g t ) has an important role. Given this function, and denoting g 1:t := g 1 , g 2 , . . . , g t , the betting fraction β t and the amount wagered w t are determined as follows: (We use β t in place of β t (g 1:t−1 ) when it is clear from the context.) A precise definition of F t appears in Section 7.1; it suffices for now to say that the sequence F 1 , F 2 , . . . must satisfy the following key condition by (4a): That is, F t is a lower bound on the wealth of a player who bets by (4a). We emphasize that the term w t is decided before g t is revealed, yet the inequality (5) holds for any g t ∈ [−1, 1]. Property (5) is key to analyzing the wealth arising from the strategy (4a); see Section 7.1. In the restricted setting in which g s ∈ {±1}, a betting strategy β t based on a potential function proposed by Krichevsky & Trofimov [11] achieves the optimal wealth up to constant factors [3].

Algorithm 1 Sleeping CB
Orabona & Pál [15] have devised a reduction of LEA to the simple coin betting problem described above. The idea is to instantiate a coin betting problem for each expert i where the signed coin flip g t,i is set as a conditionally truncated regret with respect to expert i, rather than being set by an adversary. We denote by β t,i the betting fraction for expert i and by w t,i the amount of betting for We apply this treatment to the sleeping experts setting and propose a new algorithm Sleeping CB. Modifications are required because some experts may not output a decision for some time steps. Defining z t,i := I t,i g t,i , we modify (4) as follows: Condition (5) on the potential functions is modified accordingly to We denote by π It the prior π restricted to experts that are awake (for which I t,i = 1). The Sleeping CB algorithm is specified in Algorithm 1. (Here and subsequently, we use notation [x] + := max(x, 0).) The regret of Sleeping CB is bounded in Theorem 2. Unlike the standard CB, in which all the experts use F t at time t, expert i in Sleeping CB uses F t,i , which is different for each expert. For this reason, the proof of the CB regret in [15] does not transfer easily to the regret (3) of Sleeping CB. However, this result is crucial to an improved strongly-adaptive regret bound.
be a sequence of potential functions that satisfies (7). Suppose that for some c 1 > 0 and c 2,i ∈ R. Then for the regret defined in (3), Algorithm 1 satisfies Then, because of the property (7) of the coin betting potentials, we have Since H is symmetric around 0, its inverse H −1 usually maps to two distinct values with opposite sign. To resolve this ambiguity, we define it to map to the nonnegative real value. Then, for any comparator u ∈ ∆ N , we have where (a) is due to the Cauchy-Schwartz inequality (noting that the factors under the square root are all nonnegative since log Note that if u = e j , then the regret scales with S T,j , which is essentially the number of time steps at which expert j is awake. While any potential function satisfying the condition (7) and symmetricity around 0 can be used, we present two interesting choices: the Krichevsky-Trofimov potential and the AdaptiveNormal potential.

Krichevsky-Trofimov Potential
The Krichevsky-Trofimov (KT) potential [15] is defined as follows: where δ ≥ 0 is a time shift parameter set to 0 in this work. Orabona & Pál [15] show that the KT potential satisfies (5). We modify the KT potential as follows to handle sleeping experts by replacing t in several places by S t,i : which satisfies (7). 6 The betting fraction β t defined in (4a) with KT potential exhibits the simple form . Note that this definition is identical with the one used in Theorem 2 if we set c 1 = 1 2 and c 2, By plugging c 1 and c 2,i into Theorem 2, we obtain the result.
AdaptiveNormal Potential Let G t := t s=1 |g s |. The AdaptiveNormal (AN) potential proposed by Orabona & Tommasi [16] is defined by where ξ > 0 is a parameter of minor importance in our context that we set to 1. Orabona & Tommasi [16,Lemma 2] show that the AN potential satisfies the condition (5). Let Z t,i := t s=1 |z s,i |. For sleeping experts, we use the following potential that satisfies (7) due to a trivial consequence of (11) satisfying (5): .
The betting fraction (6a) using the AN potential can be simplified to .
For sleeping experts, we have We present the regret bound of Sleeping CB with the AN potential in Corollary 4. Proof. Define .
To match this definition with the setup of Theorem 2, we redefine S T,i := 1 + Z T,i , for which the theorem still holds, since S T,i is used only through the function H T,i . We also set .
Then, by Theorem 2, we have proving the first statement of the theorem.
For the second statement, we use Lemma 13 in Section 7.5, which says if Regret When we set u = e i with this AN potential, we obtain a regret bound that scales with L T,i , which is always smaller than S T,i . The difference becomes significant when the expert i suffers a loss t,i that is close to 0 for all t ∈ Active(i). Note that Sleeping CB with the AN potential is quite similar to AdaNormalHedge [13], which has the same regret order. The key difference is that the truncation operates in the potential function for AdaNormalHedge whereas for ours it operates in the reduction to LEA (see the definition of g t,B J ).
According to our results, the regret bound of the KT potential can be much larger than that of the AN potential. Thus, one might wonder if we should always use the AN potential. Our empirical study in Section 5 shows a case where KT has a benefit over AN.

Coping with a Changing Environment by Sleeping CB
In this section, we synthesize the results in Sections 2 and 3 to specify and analyze our meta algorithm. Recall that a meta algorithm must efficiently aggregate decisions from multiple black-box runs that are active at time t. By treating each black-box run as an expert, we use Sleeping CB (Algorithm 1) as the meta algorithm, with geometric covering intervals. An important motivation for the use of Sleeping CB is that it is parameter-free. Other sleeping bandits techniques require the number of black-box runs (experts) to be specified in advance, which results in a theoretical guarantee only up to some finite time horizon T . By contrast, our approach provides an "anytime" guarantee. The complete algorithm, which we call Coin Betting for Changing Environments (CBCE), is shown in Algorithm 2.
We first present the results with the KT potential and then discuss applying the AN potential in the same manner. We make use of the following assumption.
The black-box run BJ picks a decision x B J t ∈ D, ∀J ∈ Active(t). The learner picks a decision xt = J∈J pt,B J x B J t . Each black-box run BJ that is awake (J ∈ Active(t)) suffers loss t,B J := ft(x B J t ). The learner suffers loss ft(xt). For each J ∈ Active(t), set holds, but now in expectation. When loss functions are unbounded, they can be scaled and truncated to [0, 1]. Any nonconvexity that results can be handled in the manner just described.
We define our choice of priorπ ∈ ∆ |J | as follows: where Z is a normalization factor. Since there exist at most 1 + log 2 J 1 distinct intervals starting at time J 1 , we have that Z < ∞ t=1 t −2 = π 2 /6. We bound the meta regret with respect to a black-box run B J as follows.
Proof. Note that our regret definition for meta algorithms is slightly different from that of Theorem 2 for u = e i : t∈J:It,i=1 t , p t − t,i . This translates to, in the language of meta algorithms, t∈J: . We claim that Theorem 2 and Corollary 3 for hold true for the regret (14). Note that, using Then, one can see that the proof of Theorem 2 goes through, so does Corollary 3.
Since KL(e B J ||π) = ln 1 We now present the bound on the SA-Regret R CBCE B I (w) with respect to w ∈ W on intervals I ⊆ [T ] that are not necessarily in J .  (2), Then, ) .
For the standard LEA problem, one can run the algorithm CB with KT potential (equivalent to Sleeping CB with I t,i = 1, ∀t, i), which achieves static regret O( T log(N T )) [15]. Using CB as the black-box algorithm, the regret of CBCE B on I is R (N I 2 )), and so SA-Regret (N T )). It follows that the m-shift regret of CBCE CB is O( mT log(N T )) using the technique presented in Section 7.2.
As said above, our bound improves over the best known result with the same time complexity in [5]. The key ingredient that allows us to get a better bound is the Sleeping CB Algorithm 1, that achieves a better SA-Regret than the one of [5]. In the next section, we will show that the empirical results also confirm the theoretical gap of these two algorithms.

The AdaptiveNormal Potential
We present the meta regret bound of CBCE with the AN potential on intervals J ∈ J in Lemma 7 and on any interval I in Lemma 8.

Lemma 7.
(Meta regret of CBCE with the AN potential) Assume A1. Suppose we run CBCE (Algorithm 2) with a black-box algorithm B, priorπ, and the AN potential (12) Proof. The proof deviates from the proof of Lemma 5. Since and we ignore additive terms scaling at most poly-logarithmically in I 2 .
Proof. The first equation in the minimum operator of the stated regret bound is trivial as t∈J f t (x B J t ) ≤ |J|. Thus, we focus on U (w). By Lemma 1, we know that J can be decomposed into two sequences of intervals {J (−a) , . . . , J (0) } and {J (1) , J (2) , . . . , J (b) }. Continuing from (2), ). For simplicity, we denote by C 1,i L B J (i) + C 2,i the meta regret bound of CBCE for the interval J (i) (see Lemma 7). DefineC 1 = max i C 1,i . Then, due to Lemma 7 and the fact . For brevity, we ignore the term √ E 2 that is smaller than E 2 unless E 2 < 1. Ignoring terms that cannot grow faster than poly-logarithmically in I 2 , the regret of CBCE B for interval I can be simplified to We emphasize that the order of regret stated in Theorem 8 is always no larger than CBCE with the KT potential. Furthermore, the regret bound of the AN potential scales roughly with min w∈W t∈I f t (w) + ( t∈I q t ) α . In some cases, this form of regret can be much smaller when there exists a decision w that has very small loss in the interval I. We instantiate the result above for LEA in Corollary 9 and for online convex optimization (OCO) in Corollary 10. To the best of our knowledge, Corollary 10 is the first first-order SA-Regret bound for OCO.
Discussion. Note that one can obtain the same result using the data streaming intervals (DS) [9,8] in place of the geometric covering intervals (GC). Section 7.3 elaborates on this with a lemma stating that DS induces a partition of an interval I in a very similar way to GC (a sequence of intervals of doubling lengths).
Our improved bound has another interesting implication. In designing strongly-adaptive algorithms for LEA, there is a well known technique called "restarts" or "sleeping experts" that has time complexity O(N T 2 ) [9,13], and several studies used DS or GC to reduce the time complexity to O(N T log T ) [9,8,5]. However, it was unclear whether it is possible to achieve both an m-shift regret of O( mT (log N + log T )) and a time complexity of O(N T log T ) without knowing m. Indeed, every study on m-shift regret with time O(N T log T ) results in suboptimal m-shift regret bounds [5,8,9], to our knowledge. Furthermore, some studies (e.g., [13,Section 5]) speculated that perhaps applying the data streaming technique would increase its SA-Regret by a logarithmic factor. Our analysis implies that one can reduce the overall time complexity to O(N T log T ) without sacrificing the order of SA-Regret and m-shift regret. 7

Experiments
We now turn to an empirical evaluation of algorithms for changing environments. We compare the performance of the meta algorithms under two online learning problems: (i) learning with expert advice (LEA) and (ii) metric learning (ML). We compare CBCE with SAOL [5] and AdaNormalHedge.TV (ATV) [13]. Although ATV was originally designed for LEA only, it is not hard to extend it to a meta algorithm and show that it has the same order of SA-Regret as CBCE using the same techniques. We run CBCE with both KT and AN potential, which are denoted by CBCE(KT) and CBCE(AN) respectively.
For our empirical study, we replace the geometric covering intervals (GC) with the data streaming intervals (DS) [9,8]. Let u(t) be a number such that 2 u(t) is the largest power of 2 that divides t; e.g., u(12) = 2. The data streaming intervals are J = {[t..(t + g · 2 u(t) − 1)] : t = 1, 2, . . .} for some g ≥ 1. DS is an attractive alternative, unlike GC, (i) DS initiates one and only one black-box run at each time, and (ii) it is more flexible in that the parameter g can be increased to enjoy smaller regret in practice while increasing the time complexity by a constant factor.
For both ATV and CBCE, we set the prior π over the black-box runs as the uniform distribution. Note that this does not break the theoretical guarantees since the number of black-box runs are never actually infinite; we usedπ (13) for ease of exposition.
For each meta algorithm, we use Sleeping CB with the AN potential as the black-box, 8 where I t,i = 1 for all t ≥ 1 and i ∈ [N ] as there are no sleeping experts in this experiment. We warm-start each black-box run at time t ≥ 2 by setting its prior to the decision p t−1 chosen by the meta algorithm at time step t − 1. We repeat the experiment 200 times and plot their average loss by computing moving mean with window size 10 in Figure 3(a). The overall winner is CBCE(AN). While CBCE(KT) catches up with the environmental change faster than CBCE(AN), CBCE(AN) shows smaller loss than CBCE(AN) once the change settles down. ATV is outperformed by both CBCEs but outperforms SAOL. Note that SAOL with GC intervals (SAOL-GC) tends to incur larger loss than the SAOL with DS. We observe that this is true for every meta algorithm, so we omit the result here to avoid clutter. We also run Fixed Share using the parameters recommended by Corollary 5.1 of [3], which requires to know the target time horizon T = 900 and the true number of switches m = 2. Such a strong assumption is often unrealistic in practice. Furthermore, we observe that Fixed Share is the slowest in adapting to the environmental changes. Nevertheless, Fixed Share can be attractive since (i) after the switch has settled down its loss is competitive to CBCE(AN), and (ii) its time complexity (O(N T )) is lower than other algorithms (O(N T log T )).

Metric Learning
We consider the problem of learning squared Mahalanobis distance from pairwise comparisons using the mirror descent algorithm [12]. The data point at time t is (z (2) t ∈ R d belongs to the same class. The goal is to learn a squared Mahalanobis distance parameterized by a positive semi-definite matrix M and a bias µ that have small where µ is the bias parameter and || · || * is the trace norm. Such a formulation encourages predicting y t with large margin and low rank in M. A learned matrix M that has low rank can be useful in a number of machine learning tasks; e.g., distance-based classifications, clusterings, and low-dimensional embeddings. We refer to [12] for details.  We create a scenario that exhibits shifts in the metric, which is inspired by [7]. Specifically, we create a mixture of three Gaussians in R 3 whose means are well-separated, and mixture weights are .5, .3, and .2. We draw 2000 points from it while keeping a record of their memberships. We repeat this three times independently and concatenate these three vectors to have 2000 9-dimensional vectors. Finally, we append to each point a 16-dimensional vector filled with Gaussian noise to have 25-dimensional vectors. Such a construction implies that for each point there are three independent cluster memberships. We run each algorithm for 1500 time steps. For time 1 to 500, we randomly pick a pair of points from the data pool and assign y t = 1 (y t = −1) if the pair belongs to the same (different) cluster under the first clustering. For time 501 to 1000 (1001 to 1500), we perform the same but under the second (third) clustering. In this way, a learner must track the change in metric, especially the important low-dimensional subspaces for each time segment.
Since the loss of the metric learning is unbounded, we scale the loss by multiplying 1/5 and then capping it above at 1 as in [7]. Although the randomized decision discussed in Section 4 can be used to maintain the theoretical guarantee, we stick to the weighted average since the event that the loss being capped at 1 is rare in our experiments. As in our LEA experiment, we use the data streaming intervals with g = 2 and initialize each black-box algorithm with the decision of the meta algorithm at the previous time step. We repeat the experiment 200 times and plot their average loss in Figure 3(b) by moving mean with window size 10. While we observe that CBCE(KT), CBCE(AN), and ATV are indistinguishable (see Figure 3(c)), all these methods outperform SAOL. We have verified that visible gaps in Figure 3 are statistically significant. This confirms the improved regret bound of CBCE and ATV.

Future Work
Among a number of interesting directions, we are interested in reducing the time complexity in online learning within a changing environment. For LEA, Fixed Share has the best time complexity. However, Fixed Share is inherently not parameter-free; especially, it requires the knowledge of the number of shifts m. Achieving the best m-shift regret bound without knowing m or the best SA-Regret bound in time O(N T ) would be an interesting future work. The same direction is interesting for the online convex optimization (OCO) problem. It would be interesting if an OCO algorithm such as online gradient descent can have the same SA-Regret as CBCE OGD without paying extra order of computation.

The Coin Betting Potential
We precisely define the coin betting potential. In this paper, we set = 1 throughout. For technical reasons, we define the potential function that takes a form ofF t (x; y 1:t ). We then define F t (y 1:t ) := F t ( t s=1 y s ; y 1:t ). Definition 11. (Coin Betting Potential [15] t=0 is called a sequence of coin betting potentials for initial endowment , if it satisfies the following three conditions: (a)F 0 (0; ·) = .
We now describe how the conditions for the coin betting potential lead to a lowerbound on the wealth: Wealth t ≥ F t (g 1:t ) for any g 1 , g 2 , . . . , g t ∈ [−1, 1]. We use induction. First, verify that Wealth 0 ≥ F 0 (·) = , trivially.

The Data Streaming Intervals Can Replace the Geometric Covering Intervals
We show that the data streaming intervals achieves the same goal as the geometric covering intervals (GC). Let u(t) be a number such that 2 u(t) is the largest power of 2 that divides t; e.g., u(12) = 2.
For any interval J, we denote by J 1 its starting time and by J 2 its ending time. We say an interval J is a prefix of J if J 1 = J 1 and J ⊆ J.
We show that DS also partitions an interval I in Lemma 12.
Proof. For simplicity, we assume g = 1; we later explain how the analysis can be extended to g > 1.
Let I 1 = 2 u · k where 2 u is the largest power of 2 that divides I 1 . It follows that k is an odd number. Let J ∈ J be the data streaming interval that starts from I 1 . The length |J| is 2 u by the definition, and J 2 is I 1 + 2 u − 1. DefineJ (1) := J.
Then, consider the next interval J ∈ J starting from time I 1 + 2 u . Note J 1 = I 1 + 2 u = 2 u · k + 2 u = 2 u+1 · k + 1 2 Note that k+1 2 is an integer since k is odd. Therefore, J 1 = 2 u · k where u > u. It follows that the length of J is Then, defineJ (2) := J .
We repeat this process until I is completely covered byJ (1) , . . .J (n) for some n. Finally, modify the last intervalJ (n) to end at I 2 which is still a prefix of some J ∈ J . This completes the proof for g = 1.
For the case of g > 1, note that by setting g > 1 we are only making the intervals longer. Observe that even if g > 1, the sequence of intervalsJ (1) , . . . ,J (n) above are still prefixes of some intervals in J .
Note that, unlike the partition induced by GC in which interval lengths successively double then successively halve, the partition induced by DS just successively doubles its interval lengths except the last interval. One can use DS to decompose SA-Regret of M B ; that is, in (2), replace b i=−a with n i=1 and J (i) withJ (i) . Since the decomposition by DS has the same effect of "doubling lengths', one can show that Theorem 6 holds true with DS, too, with slightly smaller constant factors.

A Subtle Difference between the Geometric Covering and Data Streaming Intervals
There is a subtle difference between the geometric covering intervals (GC) and the data streaming intervals (DS). As far as the black-box algorithm has an anytime regret bound, both GC and DS can be used to prove the overall regret bound as in Theorem 6. In our experiments, the blackbox algorithm has anytime regret bound, so using DS does not break the theoretical guarantee.
However, there exist algorithms with fixed-budget regret bounds only. That is, the algorithm needs to know the target time horizon T * , and the regret bound exists after exactly T * time steps only. When these algorithms are used as the black-box, there is no easy way to prove Theorem 6 with DS intervals. The good news, still, is that most online learning algorithms are equipped with anytime regret bounds, and one can often use a technique called 'doubling-trick' [3, Section 2.3] to turn an algorithm with a fixed budget regret into the one with an anytime regret bound.

Technical Results
Proof. We closely follow the proof of Luo & Schapire [13,Theorem 2]. We first claim that The proof is as follows: