Adaptive-Rate Reconstruction of Time-Varying Signals With Application in Compressive Foreground Extraction

We propose and analyze an online algorithm for reconstructing a sequence of signals from a limited number of linear measurements. The signals are assumed sparse, with unknown support, and evolve over time according to a generic nonlinear dynamical model. Our algorithm, based on recent theoretical results for l1 - l1 minimization, is recursive and computes the number of measurements to be taken at each time on-the-fly. As an example, we apply the algorithm to online compressive video foreground extraction, a problem stated as follows: given a set of measurements of a sequence of images with a static background, simultaneously reconstruct each image while separating its foreground from the background. The performance of our method is illustrated on sequences of real images. We observe that it allows a dramatic reduction in the number of measurements or reconstruction error with respect to state-of-the-art compressive background subtraction schemes.

sume the signals evolve according to the dynamical model where [k] ∈ R n is modeling noise and A k ∈ R m k ×n is a sensing matrix. In (1a), f k : (R n ) k −1 → R n is a known, but otherwise arbitrary, map that describes  (1), design an online algorithm that 1) uses a minimal number of measurements m k at time k, and 2) perfectly reconstructs each x [k] from y [k] acquired as in (1b), and possibly x [i] , i < k.
Note that our setting immediately generalizes from the case where each x [k] is sparse to the case where x [k] has a sparse representation in a linear, invertible transform. 1 An aspect that distinguishes our problem from other recursive signal reconstruction problems, such as the ones addressed in [2]- [9], is that the number of measurements m k varies at each iteration and has to be computed recursively.

A. Applications
Many problems require estimating a sequence of signals from a sequence of measurements satisfying the model in (1). These include classification and tracking in computer vision systems [10], [11], radar tracking [12], dynamic MRI [13], [14] and several tasks in wireless sensor networks [15].
Our application focus, however, is compressive background subtraction [16] (a.k.a. foreground subtraction). Background subtraction is a key task for detecting and tracking objects in a video sequence and it has been applied, for example, in video surveillance [17], [18], traffic monitoring [19], [20], and medical imaging [21], [22]. Although there are many background subtraction techniques, e.g., [11], [23], [24], most of them assume 1  This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ access to full frames and, thus, are inapplicable in compressive video sensing [25]- [30], a technology used in cameras where sensing is expensive. In particular, devices such as the singlepixel camera [25]- [30] acquire compressive measurements from images using few sensors, as in (1b); this allows expensive sensors that capture images in non-conventional wavelengths, such as infrared or UV, and also allows high-speed imaging [30]. Performing these tasks with conventional camera technology is expensive due to the large number of required sensors/pixels. Medical imaging [13], [14], [21] is another application where measurements are expensive and are acquired as in (1b). Although we do not explore this application, we believe our work can have significant impact in dynamic MRI [13], [14].
In compressive video sensing, one has access not to full frames as in conventional video, but only to a small set of linear measurements of each frame, as in (1b). Cevher et al. [16] noticed that background subtraction is possible in this context if the foreground pixels, i.e., those associated to a moving object, occupy a small area in each frame. Assuming the background image is known beforehand, compressed sensing techniques [31], [32] such as 1 -norm minimization allow reconstructing each foreground. This not only reconstructs the original frame (if we add the reconstructed foreground to the known background), but also performs background subtraction as a by-product [16].
With the exception of [33], [34], most approaches to compressive video sensing and compressive background subtraction assume a fixed number of measurements for all frames [16], [25], [27]- [30], [35]. If this number is too small, reconstruction fails. If it is too large, reconstruction succeeds, but at the cost of unnecessary measurements in some or all frames. The work in [33], [34] addresses this problem with an online scheme that uses cross-validation to compute the number of required measurements. Given a reconstructed foreground, [33], [34] estimates the area of the true foreground using extra cross-validation measurements. Then, assuming that consecutive frames have equal foreground areas, the phase diagram of the sensing matrix, computed beforehand, gives the number of measurements for the next frame. This approach, however, fails to use information from past frames in the reconstruction process, information that, as we will see, can be used to significantly reduce the number of measurements.

B. Overview of Our Approach and Contributions
Overview: Our approach to adaptive-rate signal reconstruction is based on the recent theoretical results of [36], [37]. These characterize the performance of sparse reconstruction schemes in the presence of side information. The scheme we are most interested in is 1 − 1 minimization: where x ∈ R n is the optimization variable and x 1 := n i=1 |x i | is the 1 -norm. In (2), y ∈ R m is a vector of measurements, β a positive parameter. The vector w ∈ R n is assumed known and is the so-called prior or side information: a vector similar to the vector that we want to reconstruct, say x . Note that if we set β = 0 in (2), we obtain basis pursuit (BP) [38], a wellknown sparse reconstruction problem at the core of compressed sensing [31], [32]. Problem (2) generalizes BP by integrating the side information w. The work in [36], [37] shows that, if w has reasonable quality (in a precise sense defined below) and the entries of A are drawn from an i.i.d. Gaussian distribution, the number of measurements required by (2) to reconstruct x is much smaller than the number of measurements required by BP. Furthermore, the theory in [36], [37] establishes that β = 1 is an optimal choice, irrespective of any problem parameter. This makes the reconstruction problem (2) parameter-free.
We address the problem of recursively reconstructing a sequence of sparse signals satisfying (1) as follows. Assuming the measurement matrix is Gaussian, 2 we propose an algorithm that uses (2) . And, building upon the results of [36], [37], we equip our algorithm with a mechanism to automatically compute an estimate on the number of required measurements. As application, we consider compressive background subtraction and show how to generate side information from past frames.
Contributions: We summarize our contributions as follows: 1) We propose an adaptive-rate algorithm for reconstructing sparse sequences satisfying the model in (1). 2) We establish conditions under which our algorithm reconstructs a finite sparse sequence 3) We describe how to apply the algorithm to compressive background subtraction problems, using motioncompensated extrapolation to predict the next image to be acquired. In other words, we show how to generate side information. 4) Given that images predicted by motion-compensated extrapolation are known to exhibit Laplacian noise, we then characterize the performance of (2) under this model. 5) Finally, we show the impressive performance of our algorithm for performing compressive background subtraction on several sequences of real images. Besides the incorporation of a scheme to compute a minimal number of measurements on-the-fly, there is another aspect that makes our algorithm fundamentally different from prior work. As overviewed in Section II, most prior algorithms for reconstructing dynamical sparse signals work well only when the sparsity pattern of x [k] varies slowly with time. Our algorithm, in contrast, operates well even when the sparsity pattern of x [k] varies arbitrarily between consecutive time instants, as shown by our theory and experiments. What is required to vary slowly is the "quality" of the prediction given by each f k (i.e., the quality of the side information) and, to a lesser extent, not the sparsity pattern of x [k] but only its sparsity, i.e., the number of nonzero entries.

C. Organization
Section II overviews related work. In Section III, we state the results from [36], [37] that are used by our algorithm. Section IV describes the algorithm and establishes reconstruction guarantees. Section V concerns the application to compressive background subtraction. Experimental results illustrating the performance of our algorithm are shown in Section VI; and Section VII concludes the paper. The Appendix contains the proofs of our results.

II. RELATED WORK
There is extensive literature on reconstructing time-varying signals from limited measurements. Here, we provide an overview by referring a few landmark papers.
The Kalman Filter: The classical solution to estimate a sequence of signals satisfying (1) or, in the control terminology, the state of a dynamical system, is the Kalman filter [41]. The Kalman filter is an online algorithm that is least-squares optimal when the model is linear, i.e., , and the sequence { [k]} is Gaussian and independent across time. Several extensions are available when these assumptions do not hold [42]- [44]. The Kalman filter and its extensions, however, are inapplicable to our scenario, as they do not easily integrate the additional knowledge that the state is sparse.
Dynamical Sparse Signal Reconstruction: Some prior work incorporates signal structure, such as sparsity, into online sparse reconstruction procedures. For example, [3]- [5] adapts a Kalman filter to estimate a sequence of sparse signals. Roughly, we have an estimate of the signal's support at each time instant and use the Kalman filter to compute the (nonzero) signal values. When a change in the support is detected, the estimate of the support is updated using compressed sensing techniques. The work in [3]- [5], however, assumes that the support varies very slowly and does not provide any strategy to update (or compute) the number of measurements; indeed, the number of measurements is assumed constant along time. Also assuming the support varies slowly and using a fixed number of measurements, the work in [45]- [47] addresses the same problem using a different approach: it integrates an estimate of the signal's support into the sparse reconstruction scheme using a technique known as Modified-CS [48]. Note that the methods in [3]- [5], [45]- [47] come with stability guarantees, in the sense that if the number of (fixed) measurements is large enough and consecutive signals have approximately the same support, then these methods have smaller restricted isometry constants (and thereby milder reconstruction guarantees) than simple sparse reconstruction methods.
Related work that also assumes a fixed number of measurements includes [49], which uses approximate belief propagation, and [50], which integrates sparsity knowledge into a Kalman filter via a pseudo-measurement technique. The works in [6], [7] and [8] propose online algorithms named GROUSE and PETRELS, respectively, for estimating signals that lie on a lowdimensional subspace. Their model can be seen as a particular case of (1), where each map f k is linear and depends only on the previous signal. Akin to most prior work, both GROUSE and PETRELS assume that the rank of the underlying subspace (i.e., the sparsity of x [k]) varies slowly with time, and fail to provide a scheme to compute the number of measurements.
We briefly overview the work in [9], which is close to ours, since it uses a related reconstruction problem. However, perhaps due to lack of theory for that problem, it assumes a fixed number of measurements for all signals (as most prior work) and, thus, does not solve the problem we consider in this paper. Three dynamical reconstruction schemes are studied in [9]. The one with the best performance is where β 2 > 0 and · 2 is the Euclidean 2 -norm. Problem (3) is the Lagrangian version of the noise-robust version of (2): where σ is a bound on the measurement noise. For β 2 in a given range, the solutions of (3) and (4) coincide. This makes the approach in [9] related to ours. Nevertheless, using (4) has two important advantages: first, in practice, it is easier to obtain bounds on the measurement noise σ than it is to tune β 2 ; second, and more importantly, the problem in (4) has well-characterized reconstruction guarantees [36], [37]. It is exactly those guarantees that enable our scheme for computing of the number of measurements online.
Robust PCA: A technique that has been successfully applied to perform background subtraction is Robust PCA (RPCA) [24], [51], [52]. RPCA decomposes a data matrix into the sum of a sparse and a low-rank matrix. In the context of background subtraction, a column of the data matrix corresponds to a video frame, which is decomposed into foreground (sparse matrix) and background (low-rank matrix). In RPCA, both the foreground and background are unknown, and the latter may vary slowly across time. Also, it either requires access to full frames or, in the case of compressive RPCA [53], each measurement may contain information from several different frames, e.g., the first and last frames, making it a batch algorithm: all frames are processed together, not online as in our algorithm.
There are, however, online extensions of RPCA, for example, the stochastic optimization approach of [54], and an algorithm called Prac-ReProCS [55]. The online algorithm in [54] is shown to converge to the same solution as the batch RPCA, but it requires access to full frames. Prac-ReProCS [55] is also backed up by theory [56] and can handle compressive measurements. It requires a training sequence of background images, which is akin to our assumption of knowing the background image, and works under the assumption that the foreground objects move slowly. The version of Prac-ReProCS that handles compressive measurements, however, reconstructs only the foreground sequence and not the (slow-changing) background and, in addition, assumes that the measurement matrix is the same for all frames. This implies, in particular, that the number of measurements is fixed, which is the reason why Prac-ReProCS fails to solve our problem.

III. PRELIMINARIES: STATIC SIGNAL RECONSTRUCTION USING
1 − 1 MINIMIZATION This section reviews some results from [36], namely reconstruction guarantees for (2) in a static scenario, i.e., when we estimate just one signal, not a sequence. As mentioned before, β = 1 is an optimal choice: it not only minimizes the bounds in [36], but also leads to the best results in practice. This is the reason why we use β = 1 henceforth. 1 − 1 Minimization: Let x ∈ R n be a sparse vector, and assume we have m linear measurements of x : y = Ax , where A ∈ R m ×n . Denote the sparsity of x with s := | {i : x i = 0} |, where | · | is the cardinality of a set. Assume we have access to a signal w ∈ R n similar to x (in the sense that x −w 1 is small) and suppose we attempt to reconstruct x by solving the 1 -1 minimization problem (2) with β = 1: The number of measurements that problem (5) requires to reconstruct x is a function of the "quality" of the side information w. Quality in [36] is measured in terms of the following parameters: Note that the components of w that contribute to h are defined on the support of x ; thus, 0 ≤ h ≤ s.
Theorem 1 (Th. 1 in [36]). Let x , w ∈ R n be the vector to reconstruct and the side information, respectively. Assume h > 0 and that there exists at least one index i for which then, with probability at least x is the unique solution of (5).
Theorem 1 establishes that if the number of measurements is larger than (7) then, with high probability, (5) reconstructs x perfectly. The bound in (7) is a function of the signal dimension n and sparsity s, and of the quantities ξ and h, which depend on the signs of the entries of x and w − x , but not on their magnitudes. When w is such that h is small, the bound in (7) is much smaller than the one for BP 3 in [57]: Namely, [57] establishes that if (8) holds and if A ∈ R m ×n has i.i.d. Gaussian entries with zero mean and variance 1/m then, with probability similar to the one in Theorem 1, x is the unique solution to BP. Indeed, if h s and ξ is larger than a small negative constant, then (7) is much smaller than (8). Note that, in practice, the quantities s, ξ, and h are unknown, since they depend on the unknown signal x . In the next section, we propose an online scheme to estimate them using past signals.
Noisy Case: Theorem 1 has a counterpart for noisy measurements, which we state informally; see [36] for details. Let where 0 < τ < 1. Letx noisy be any solution of (4). Then, with overwhelming probability, x noisy − x 2 ≤ 2σ/τ , i.e., (4) reconstructs x stably. Our algorithm, described in the next section, adapts easily to the noisy scenario, but we provide reconstruction guarantees only for the noiseless case.

IV. ONLINE SPARSE SIGNAL ESTIMATION
Algorithm 1 describes our online scheme for reconstructing a sparse sequence {x [k]} satisfying (1), i.e., Intuitively, given estimatesx [i] , i < k, of all the signals prior to time k, Algorithm 1 reconstructs x [k] by solving (5) with . Although described for a noiseless scenario, the algorithm easily adapts to the noisy scenario, as discussed later. Such adaptation is essential on a real system, e.g., a single-pixel camera [26].

A. Algorithm Description
The algorithm consists of two parts: the initialization, where the first two signals x [1] and x [2] are reconstructed using BP, and the online estimation, where the remaining signals are reconstructed using 1 -1 minimization.
Part I (Initialization): Steps 1-6 compute the number of measurements m 1 and m 2 according to the bound in (8), and then reconstruct x [1] and x [2] via BP. The expressions for m 1 and m 2 in step 2 require estimatesŝ 1 andŝ 2 of the sparsity of x [1] and x [2], which are given as input to the algorithm. Henceforth, variables with hats refer to estimates. Steps 7-9 initialize the estimator φ k : during Part II of the algorithm, φ k should approximate the right-hand side of (7) for x [k], i.e., with s = s k , h = h k , and ξ = ξ k , where the subscript k indicates that these are parameters associated with x [k].
Part II (Online Estimation): The loop in Part II starts by computing the number of measurements as m k = (1 + δ k )φ k , where δ k , an input to the algorithm, is a (positive) safeguard parameter. We take more measurements from x[k] than the ones prescribed by φ k , because φ k is only an approximation to the bound in (7), as explained next. After acquiring measurements from 14). Next, in step 15, we compute the sparsityŝ k and the quantities in (6) , then all these quantities match their true values. In that case, m k in step 16 will also match the true value of the bound in (7). Note, however, that the bound for x[k], m k , is computed only after x[k] is reconstructed. Consequently, the number of measurements used in the acquisition of x[k], k > 2, is a function of the bound (7) for x[k − 1]. Since the bounds for x [k] and x[k − 1] might differ, we take more measurements than the ones specified by φ k by a factor δ k , as in step 11. Also, we mitigate the effect of failed reconstructions by filtering m k with an exponential moving average filter, in step 17. Indeed, if reconstruction fails for some x [k], the resulting m k might differ significantly from Algorithm 1: Adaptive-Rate Sparse Signal Reconstruction.

17:
Update φ k +1 = (1 − α) φ k + α m k 18: end for the true bound in (7). The role of the filter is to smooth out such variations.
Extension to the Noisy Case: Algorithm 1 can be easily extended to the scenario where the acquisition process is noisy, i.e., y [k] = A k x [k] + η k . Assume η k is arbitrary noise, but has bounded magnitude, i.e., we know σ k such that η k 2 ≤ σ k . In that case, the constraint in the reconstruction problems in steps 5 and 14 should be replaced by A k x − y [k] 2 ≤ σ k . The other modification is in steps 8 and 16, whose expressions for m k are multiplied by 1/(1 − τ ) 2 as in (9). Our reconstruction guarantees, however, hold only for the noiseless case.
Remarks: We will see in the next section that Algorithm 1 works well when each δ k is chosen according to the prediction quality of f k : the worse the prediction quality, the larger δ k should be. In practice, it may be more convenient to make δ k constant, as we do in our experiments in Section VI. Note that the conditions under which our algorithm performs well differ from the majority of prior work. For example, the algorithms in [3], [4], [6]- [8], [16], [33], [34], [49], [50], [58] work well when the sparsity pattern of x [k] varies slowly between consecutive time instants. Our algorithm, in contrast, works well when the quality parameters ξ k and h k and also the sparsity s k vary slowly; in other words, when the quality of the prediction of f k varies slowly. On the other hand, if f k gives a prediction with bad quality at iteration k, the number of measurements taken at iteration k + 1 will approach (8) In that case, h k = s k and, when n is large enough, the dominant terms in (7) and (8) both equal 2s k log n. We finally remark that Algorithm 1 can possibly be adapted to reconstruction problems that integrate prior information in a different way (e.g., modified-CS, modified-BPDN [48], and

B. Reconstruction Guarantees
The following result bounds the probability with which Algorithm 1 with α = 1 perfectly reconstructs a finite-length se- The idea is to rewrite the condition that (7) applied to x [i − 1] is (1 + δ i ) times larger than (7) applied to x [i]. If that condition holds for the entire sequence then, using Theorem 1 and assuming that the matrices A k are drawn independently, we can bound the probability of successful reconstruction. The proof is in Appendix A.
Lemma 2: Let α = 1, m := min{m 1 , m 2 , min i=3,...,k m i }, and fix k ≥ 3. Let also, for all i = 3, . . . , k, where u i := s i + ξ i /2. Assumeŝ q ≥ s q := |{j : x j [q] = 0}|, for q = 1, 2, i.e., that the initial sparsity estimatesŝ 1 andŝ 2 are not smaller than the true sparsity of x [1] and x [2]. Assume also that the matrices {A i } k i=1 are drawn independently. Finally, assume that the estimates w [i] produced by f i are such that, for all i = 3, . . . , k, h i > 0 and x j [i] = w j [i] = 0 for at least one index j. Then, the probability (over the sequence of matrices When the conditions of Lemma 2 hold, the probability of perfect reconstruction decreases with the length k of the sequence, albeit at a very slow rate: for example, if m is as small as 8, then (11) equals 0.9998 for k = 10 2 , and 0.9845 for k = 10 4 . If m is larger, these numbers are even closer to 1.
Interpretation of (10): As shown in the proof, condition (10) To get more insight about this condition, rewrite it as where In that case, c 1 (n) , c 2 (n) 0, and condition (12) tells us that the oversampling factor δ i should be larger than the relative variation of h i from time i − 1 to time i. In general, the magnitude of c 1 (n) and c 2 (n) can be significant, since they approach zero at a relatively slow rate, o (1/ log n). Hence, those terms should not be ignored.
Remarks on the Noisy Case: There is an inherent difficulty in establishing a counterpart of Lemma 2 for the noisy measurement scenario: namely, the quality parameters ξ and h in (6) are not continuous functions of x. So, no matter how close a reconstructed signal is from the original one, their quality parameters can differ arbitrarily. And, for the noisy measurement case, we can never guarantee that the reconstructed and the original signals are equal; at most, if (9) holds, they are within a distance 2σ/τ , for 0 < τ < 1. So

V. COMPRESSIVE VIDEO BACKGROUND SUBTRACTION
We now consider the application of our algorithm to compressive video background subtraction. We start by modeling the problem of compressive background subtraction as the estimation of a sequence of sparse signals satisfying (1). Our background subtraction system, based on Algorithm 1, is then introduced. Finally, we establish reconstruction guarantees for our scheme when [k] in (1a) is Laplacian noise.

A. Model
Let {Z [k]} k ≥1 be a sequence of images with resolution N 1 × N 2 , and let z [k] ∈ R n with n := N 1 · N 2 be the columnmajor vectorization of the kth image. At time k, we collect m k linear measurements of Z [k]: Then, as suggested in [16], we subtract u b [k] from u [k]: This equation tells us that, although we cannot measure the foreground image x [k] directly, we can still construct a vector measurements, y [k], as if we would. Given that x [k] is usually sparse, the theory of compressed sensing tells us that it can be reconstructed by solving, for example, BP [31], [32]. Specifically, if x [k] has sparsity s k and the entries of A k are realizations of i.i.d. zero-mean Gaussian random variables with variance 1/m k , then 2s k log (n/s k ) + (7/5) s k + 1 measurements suffice to reconstruct x [k] perfectly [57] [cf. (8)]. Notice that (13) is exactly the equation of measurements in (1b). Regarding (1a), we will use it to model the estimation of the foreground of each frame, x [k], from previous foregrounds, We use a motion-compensated extrapolation technique, as explained in Subsection V.C. This technique is known to produce image estimates with an error well modeled as Laplacian and, thus, each [k] 1 is expected to be small. This perfectly aligns with the way we integrate side information in our reconstruction scheme: namely, the second term in the objective of the optimization problem in step 14 of Algorithm 1 is exactly [k] 1 . Fig. 1 shows the block diagram of our compressive background subtraction scheme and, essentially, translates Algorithm 1 into a diagram. The scheme does not apply to the reconstruction of the first two frames, which are reconstructed as in [16], i.e., by solving BP. This corresponds to Part I of Algorithm 1. The scheme in Fig. 1   f k , which takes a set of past reconstructed signals (in our case,

B. Our Background Subtraction Scheme
, respectively), and outputs the side information w [k]. This is one of the inputs of the 1 -1 block, which solves the optimization problem (5). To obtain the other input, i.e., the set of foreground measurements y [k], we proceed as specified in (13)

C. Motion-Compensated Extrapolation
To obtain an accurate predition e [k], we use a motioncompensated extrapolation technique similar to what is used in distributed video coding for generating decoder-based motioncompensated predictions [59]- [61]. Our technique is illustrated in Fig. 2. In the first stage, we perform forward block-based motion estimation between the reconstructed framesẑ [k − 2] andẑ [k − 1]. The block matching algorithm is performed with half-pel accuracy and considers a block size of γ × γ pixels and a search range of ρ pixels. The required interpolation for half-pel motion estimation is performed using the 6-tap filter of H.264/AVC [62]. In addition, we use the 1 -norm (or sum of absolute differences, SAD) as error metric. The resulting motion vectors are then spatially smoothed by applying a weighted vector-median filter [63]. The filtering improves the spatial coherence of the resulting motion field by removing outliers (i.e., motion vectors that are far from the true motion field). Assuming linear motion betweenẑ Remarks: In the above scheme, the motion extrapolation block creates an estimate e [k] of the current frame z [k] by assuming linear motion between the past two frames, z [k − 2] and z [k − 1]. Hence, it gives good predictions when foreground objects move linearly. This is the case of small displacements, since nonlinear motion is locally well approximated by linear motion. When displacements are large (i.e., objects move fast), the above scheme still gives good predictions if the motion is linear. This contrasts with prior methods for background subtraction and dynamical sparse signal estimation, such as [3], [4], [6]- [8], [16], [33], [34], [49], [50], [58], which work well only in the slow-motion case, i.e., when the foreground area between consecutive frames is approximately constant.
In practice, the background may change during the operation of the algorithm due to, for example, illumination change or camera movement. As in video coding [60]- [62], this indicates a scene cut, which can be detected by our algorithm via a dramatic increase in the measurement rate. In that case, the algorithm should take enough measurements to reconstruct an entire frame, the new background.

D. Reconstruction Guarantees for Laplacian Modeling Noise
It is well known that the noise produced by a motioncompensated prediction module, as the one just described, is Laplacian [59], [64]. In our model, that corresponds to each [k] in (1a) being Laplacian. We assume each [k] is independent from the matrix of measurements A k .
Model for [k]: As in [59], [64], [65] (and references therein), we assume that [k] is independent from [l], for k = l, and that the entries of each [k] are independent and have zeromean. The probability distribution of [k] is then where u ∈ R n and λ j ≥ 0.
is also Laplacian, yet not necessarily with zero-mean. That is, for u ∈ R n and k ≥ 2,

In words, the distribution of each component of x[k] conditioned on all past realizations x[i], 1 ≤ i < k, is Laplacian with mean
)] j and parameter λ j . Furthermore, it is independent from the other components.
Reconstruction Guarantees: Note that {x[k]} and { [k]} being stochastic processes implies that the quantities in (6), which we will denote by ξ k and h k for signal x[k], are random variables. Hence, at each time k, the conditions of Theorem 1, namely that h k > 0 and that there is at least one index i such that x i [k] = w i [k] = 0, become events, and may or may not hold. We now impose conditions on the variances σ 2 j = 2/λ j that guarantee the conditions of Theorem 1 are satisfied and, thus, that 1 -1 minimization reconstructs x[k] perfectly, with high probability. Given S ∈ {1, . . . , n}, S c denotes its complement in {1, . . . , n}.
Theorem 3: Let w ∈ R n be given. Let have distribution (14), where the variance of component j is σ 2 j = 2/λ 2 j . Define x := w + , and the sets Σ := {j : σ 2 j = 0} and W := {j : w j = 0}. Assume Σ c ∩ W c = ∅, that is, there exists j such that σ 2 j = 0 and w j = 0. Assume A ∈ R m ×n is generated as in Theorem 1 with a number of measurements for some t > 1, where μ := 1 . Letx denote the solution of 1 − 1 minimization (5). Then, The proof is in Appendix B. By assuming each component j is Laplacian with parameter λ j = √ 2/σ j (independent from the other components), Theorem 3 establishes a lower bound on the number of measurements that guarantee perfect reconstruction of x with probability as in (17). Note that all the quantities in (16) are deterministic. This contrasts with the direct application of Theorem 1 to the problem, since the right-hand side of (7) is a function of the random variables s, h, and ξ. The assumption Σ c ∩ W c = ∅ implies Σ c = ∅, which means that some components of have zero variance and, hence, are equal to zero with probability 1. Note that, provided the variances σ 2 j are known, all the quantities in (16), and consequently in (17), are known.
We state without proof a consequence of Theorem 3 that is obtained by reasoning as in Lemma 2:

Corollary 4: Let { [k]} be a stochastic process where [k]
has distribution (14) and each [k] is independent from [l] , k = l. Assume that {x [k]} is generated as in (1a) and consider Algorithm 1 with α = 1 at iteration k > 2. Assume that [k] and A k are independent. Assume also, for i = 3, . . . , k, that where u i := |Σ i | + |Σ c i ∩ W i |/2, and the quantities μ i , t i , Σ i , and W i are defined as in Theorem 3 for signal x [i]. Assume the initial sparsity estimates satisfyŝ 1 ≥ s 1 andŝ 2 ≥ s 2 with probability 1, where s 1 and s 2 are the sparsity of x [1] and x [2]. Then, the probability over the sequences Corollary 4 establishes reconstruction guarantees of Algorithm 1 when the modeling noise [k] in (1a) is Laplacian. In contrast with Lemma 2, the bound in (18) is a function of known parameters, but it requires the variances σ 2 j [i] of each j [i], which can be estimated from the past frame in a block-based way [60], [64]. For some insight on (18), assume u i u i−1 , t i = t i−1 , and that n is large enough so that terms not depending on it are negligible. Then, (18) becomes δ i (μ i − μ i−1 ) / (μ i−1 + t i−1 ), and we can select where κ := |Σ i |/|Σ i−1 |, and the inequality is due to The approximation in (19) holds if |Σ i−1 | 2, which is often the case in practice. The expression in (19) tells us that, for large n, δ i is mostly determined by the ratio κ: if κ > 1 (resp. < 1), then we should select δ i > 1 (resp. < 1). We observe that, in practice, (18) and (19) give conservative estimates for δ i . We will see in the next section that selecting a small, constant δ i (namely 0.1 and 0.01) leads to excellent results without compromising perfect reconstruction.

VI. EXPERIMENTAL RESULTS
We applied the scheme described in the previous section to several video sequences. 4 The sequences are described in   9 . The table shows the acronyms we refer each sequence by, their source, the frame numbers and resolution we used, and a sample image from each sequence. We performed two types of experiments, Type I and Type II, each with a different purpose. The Type I experiments validate our algorithm, which was designed for Gaussian measurement matrices. Note that Gaussian matrices require explicit storage and their multiplication with vectors is expensive; therefore, in Type I experiments, we use only the first two sequences of Table I, Hall and PETS2009. The goal of Type II experiments is, in turn, to compare our algorithm with the state-of-the-art in compressive background subtraction [34], using all sequences in Table I. Here, instead of Gaussian matrices, we use partial DFT matrices, which do not require explicit storage and use the efficient FFT algorithm for matrix-vector multiplication. Our algorithm still works extremely well for DFT matrices, reconstructing video frames with a very small error, but it may require more measurements with respect to Gaussian matrices.
Recall that we address the problem of online compressive background subtraction, not classical background subtraction as in RPCA. In particular, we have no access to full frames, only to a limited number of measurements (decided by the algorithm), which are processed in an online fashion; also, the background image is known. In all experiments, the background image is always the first frame of the sequence, that is, the frame corresponding to the leftmost number in the third column of Table I.

A. Type I Experiments
The goal of these experiments is to validate Algorithm 1. Hence, we generated measurements matrices as in Theorem 1 and Lemma 2: each entry of A k ∈ R m k ×n is i.i.d. Gaussian with zero mean and variance 1/m k , and A k and A j are independent for j = k.
Experimental Setup: As mentioned before, in Type I experiments, we use only the Hall and PETS2009 sequences. We set the oversampling parameters as δ := δ k = 0.1, for all k, and the filter parameter as α = 0.5. While for the Hall sequence we used the true sparsity of the first two foregrounds, i.e., s 1 = s 1 = 417 andŝ 2 = s 2 = 446, for the PETS2009 sequence we set these parameters to values much smaller than their true values: 10 =ŝ 1 s 1 = 194 and 10 =ŝ 2 s 2 = 211. In spite of this poor initialization, the algorithm was able to quickly adapt, as we will see. We removed isolated pixels from each frame by preprocessing the full sequences. For motion estimation, we used γ × γ = 8 × 8 blocks, and a search limit of ρ = 6. Finally, we mention that after computing the side information w [k] for frame k, we amplified the magnitude of its components by 30%. This, according to the theory in [36], improves the quality of the side information. To solve BP in the reconstruction of the first two frames we used SPGL1 [70], [71]. 10 To solve the 1 − 1 minimization problem (5) in the reconstruction of the remaining frames we used DECOPT [72], [73]. 11 Recall that Algorithm 1 assumes noiseless measurements. To illustrate its extension to noisy measurements, i.e., y [k] = A k x [k] + η k , with η k 2 ≤ σ k , we also applied it to the case where the Hall sequence is acquired with noise. Specifically, we generated η k as a vector of i.i.d. Gaussian entries with zero mean and variance 4/m k , and used σ k = 2 in all frames. The number of measurements was computed as in (9) with τ = 0.1. For reference, in the noiseless case of the Hall sequence, we compared our algorithm with time-adapted Modified-CS [47], [48], a method for reconstructing a sparse sequence that uses a fixed number of measurements m. In these experiments, we set m = 2000, as it was the smallest (round) number that allowed reconstructing most of the frames without being too wasteful. Time-adapted Modified-CS [47] reconstructs each frame using the support of the previously reconstructed frame as an aid. In our implementation, the support of a signal was computed as the set of indices containing at least 90% of its energy.
Results: Fig. 3 shows the results of Type I experiments. The plots on the left-hand side show the number of measurements m k Algorithm 1 took from each frame and the corresponding estimate φ k of (7); the same plots also show the bounds (7) and (8)  We observe that in all the left-hand side plots, m k and φ k are always below the CS bound (8), except at a few frames in Fig. 3(e) (PETS2009 sequence). In those frames, there is no foreground and thus the number of required measurements approaches zero. Since there are no such frames in the Hall sequence, all quantities in Figs. 3(a) and (c) do not exhibit such large fluctuations. In all the right-hand side plots, the estimation errors were approximately constant, around 0.01 for the Hall sequence (both in the noisy and noiseless cases) and around 0.93 for the PETS sequence. The reconstruction error varied between 3.8 × 10 −9 and 3.5 × 10 −6 for the noiseless Hall sequence [ Fig. 3(b)] and between 2.4 × 10 −5 and 8.3 × 10 −5 for the noisy Hall sequence [ Fig. 3(d)]. For the PETS sequence [ Fig. 3(f)], it was always below 10 −5 except at three instances, where the reconstruction error approached the estimation error. These correspond to the frames with no foreground [making the bounds in (7) and (8) approach zero] and to the initial frames, where the number of measurements was much smaller than (7). In spite of these "ill-conditioned" frames, our algorithm was able to quickly adapt in the next frames and follow the 1 − 1 bound curve closely. In Figs. 3(a) and (b), we can also see that Modified-CS [47], which took 2000 measurements from each frame, failed to reconstruct the first 34 frames for lack of measurements. Algorithm 1, in contrast, reconstructed all the frames perfectly and used 25% less measurements than Modified-CS did.
These experiments show three key features of Algorithm 1. The first and most important is that the estimate φ k of the bound (7) is always very close to the true value of the bound. The second one is that the algorithm uses significantly less measurements than if we were using BP [16], [34], even if we knew the true foreground sparsity, and than Modified-CS. The third feature is that there is not much difference between the noiseless and noisy cases in terms of the number of measurements; the most noticeable difference is in the reconstruction error: in Fig. 3(d), the reconstruction error is about three orders of magnitude larger than the one in Fig. 3(b).

B. Type II Experiments
The goal of Type II experiments is compare Algorithm 1 (Alg. 1) with the state-of-the-art in adaptive-rate background subtraction, the ARCS-CS algorithm in [34]. For memory and time constraints, we obtained measurements from each frame using partial DFT matrices instead of Gaussian matrices, a modification that, as we will see, causes Alg. 1 to take more measurements, however, without compromising reconstruction.
Experimental Setup: In these experiments, we set δ to an even smaller value, 0.01, and the search range ρ to 4. Furthermore, the sparsity of the first two frames,ŝ 1 andŝ 2 , was set to 150 in all sequences, independently of their true value. We used a simpler solver for (5), based on ADMM. All the other parameters were the same as in the Type I experiments. We compared Alg. 1 with the algorithm ARCS-CV in [34]. 12 Its parameters were set as in [34]: τ = 0.1 and σ b = 0.0157. The sparsity estimatorŝ 1 was initialized with 150 in all sequences (as in Alg. 1), and we modified the code so that its internal solver, SPGL1, had no constraints on the number of iterations. Recall that [34] presented experimental results for DFT matrices as well.
Results: Fig. 4 shows the results of these experiments for all the sequences of Table I. The results are presented as boxplots. In a boxplot, the data points are divided into four equal-sized groups, the quartiles. The median is represented with a horizontal line, and the second and third quartiles are enclosed in a box. The smallest (resp. largest) data point that is at a distance of the lower (resp. upper) quartile smaller than 3/2 times the box height is marked with the extremity of a "whisker." Data points at farther distances, the outliers, are represented with dots. This type of plot provides a compact representation of data with large variability, which is the case of the data we are presenting: the number of measurements taken by Alg. 1 and ARCS-CV [34], and the respective reconstruction errors. Each subfigure of Fig. 4 contains two plots; the left-hand side plot depicts the number of measurements, the right-hand plot the relative error Within each plot, the left boxplot corresponds to Alg. 1 (in red), the right boxplot to ARCS-CV [34] (in blue).
For example, in the Hall sequence [ Fig. 4(a)], we see that the number of measurements taken by Alg. 1 had a median of 1976, an increase of 45% with respect to the Gaussian case [ Fig. 3(a)]. In spite of this increase, Alg. 1 was slightly more efficient than ARCS-CV [34] in terms of the number of measurements, but much more efficient in terms of the reconstruction error (right-hand side plot of Fig. 4(a)). In fact, except for the PETS2009 sequence [ Fig. 4(b)], the median of the reconstruction error of ARCS-CV was always larger than 10 −3 . Given that the internal solver of ARCS-CV, SPGL1 [70], [71], is a high-precision solver, this indicates that the number of measurements taken by ARCS-CV was not enough for reconstruction, in most frames and for all sequences (except PETS2009). Taking this into account, it is then surprising that Alg. 1 required less measurements than ARCS-CV for some sequences. For example, the median of the number of measurements of Alg. 1 was smaller than the median of ARCS-CV in the Hall [ Fig. 4(a)], PETS2006 [ Fig. 4(d)], and Caviar [ Fig. 4(h)] sequences; note that for these sequences the reconstruction error of Alg 1 was also much smaller. In the remaining sequences, Alg. 1 took more measurements than ARCS-CV, but its reconstruction error was significantly smaller than the reconstruction error of ARCS-CV. We remark that the Traffic sequence exhibits jittering, causing some frames to have practically no background: all pixels comprise the foreground. As a consequence, Alg. 1 required n = 19200 measurements from most frames [ Fig. 4(i)], that is, as many measurements as the image dimension. This, in turn, caused the reconstruction error to be extremely small. Overall, these experiments show that Alg. 1 outperforms the prior stateof-the-art algorithm ARCS-CV in [34] in terms of the number 12 We used the implementation in http://garrettwarnell.com/ARCS-1.0.zip. of measurements for perfect reconstruction. Still, despite the fact that the background is not constant or that the motion is rapid, our algorithm performs extremely well, often surpassing the state of the art in terms of number of measurements, while always reconstructing with a smaller error.

VII. CONCLUSION
We proposed and analyzed an online algorithm for reconstructing sparse sequences of signals from a limited number of measurements. The signals vary across time according to a nonlinear dynamical model, and the measurements are linear. Our algorithm is based on 1 − 1 minimization and, assuming Gaussian measurement matrices, it estimates the required number of measurements to perfectly reconstruct each signal, automatically and on-the-fly. We also explored the application of our algorithm to compressive video background subtraction and tested its performance on sequences of real images. Our experiments show that it also works for DFT matrices and that it reduces the number of required measurements with respect to prior compressive video background subtraction schemes by a large margin. Interesting research directions include extending the algorithm to handle more complex priors, such as block sparsity and spatial foreground contiguity, adapting the algorithm to perform background subtraction in uncompressed videos, and exploring applications in medical imaging.

APPENDIX A PROOF OF LEMMA 2
First, note that condition (10) is a function only of the parameters of the sequences {x [k]} and { [k]} and, therefore, is a deterministic condition. Simple algebraic manipulation shows that it is equivalent to where m i is the right-hand side of (7) applied to x [i], that is, Notice that the source of randomness in Algorithm 1 is the set of matrices (random variables) A k , generated in steps 3 and 12.
Define the event S i as "perfect reconstruction at time i." Since we assume thatŝ 1 andŝ 2 are larger than the true sparsity of x [1] and x [2], there holds [57] for i = 1, 2, where the second inequality is due to m i ≥ m and being an increasing function.
Next, we compute the probability of the event "perfect reconstruction at time i" given that there was "perfect reconstruction at all previous time instants l < i," i.e., P S i | l< i S l , for all i = 3, . . . , k. Since we assume α = 1, we have φ i = m i−1 and step 11 of Algorithm 1 becomes (21). (The hat variables are random variables.) Consequently, due to our assumption (20), step 11 can be written as (7) is satisfied. By assumption, all the other conditions of Theorem 1 are satisfied and, hence, for i ≥ 3, where, again, we used the fact that m i ≥ m and that 1 − is an increasing function. Finally, we bound the probability that there is perfect reconstruction at all time instants 1 ≤ i ≤ k: From (24) to (25) we used the independence between S 1 and S 2 . From (25) to (26), we used (22) and (23).

APPENDIX B PROOF OF THEOREM 3
Recall the definitions of ξ and h in (6): where we rewrote h using x = w + . Define the events A := "∃ j : x j = w j = 0", B := "h > 0", and C := "m ≥ 2h log n s + ξ/2 + 7 5 s + ξ 2 + 1, " which are the assumptions of Theorem 1. In C, m and n are deterministic, whereas s, h, and ξ are random variables. Then, where we used Theorem 1. The rest of the proof consists of lower bounding P (A ∧ B ∧ C).

Lower Bound on P (A ∧ B ∧ C):
Recall that w is fixed and that each component x j is determined by x j = w j + j . Due to the continuity of the distribution of , with probability 1, no component j ∈ Σ (i.e., σ 2 j = 0) contributes to ξ. When j ∈ Σ c , we have two cases: 1) j ∈ Σ c ∩ W (i.e., σ 2 j = 0 and w j = 0): in this case, we have x j = w j with probability 1. Hence, these components contribute to the second term of ξ.
2) j ∈ Σ c ∩ W c (i.e., σ 2 j = 0 and w j = 0): in this case, we also have x j = w j with probability 1. However, these components do not contribute to ξ. We conclude P (D) = P (ξ = −|Σ c ∩ W|) = 1,where D is the event "ξ = −|Σ c ∩ W|." From the second case above we also conclude that our assumption Σ c ∩ W c = ∅ implies P (A) = 1. We can then write From (28) to (29), we used the fact that the events A = "Σ c ∩ W c = ∅" and D = "ξ = −|Σ c ∩ W|" are independent. This follows from the independence of the components of and the disjointness of Σ c ∩ W c and Σ c ∩ W. From (30) to (31), we used the fact that event C conditioned on D is equivalent to h ≤ μ + t. To see why, note that the sparsity of x is given by s = |Σ| + |Σ c ∩ W|; thus, given D, s + ξ/2 equals |Σ| + |Σ c ∩ W|/2; now, subtract the expression in assumption (16) from the expression that defines event C: Using the fact that n = |Σ| + |Σ c | ≥ |Σ| + |Σ c ∩ W| ≥ |Σ| + |Σ c ∩ W|/2, we conclude that C is equivalent to the event "h ≤ μ + t." We now bound (31) as follows: where the last step, explained below, is due to Hoeffding's inequality [74]. Note that once this step is proven, (33) together with (27) and (31) give (17), proving the theorem. Proof of Step (32)- (33): Hoeffding's inequality states that if {Z j } L j =1 is a sequence of independent random variables and P (0 ≤ Z j ≤ 1) = 1 for all j, then ( [74], Th. 4): for any τ > 0. We apply (35) to the second term in (32) and (34) to the third term. This is done by showing that h is the sum of |Σ| independent random variables, taking values in [0, 1] with probability 1, and whose expected values sum to μ. Note that μ > 0 by definition, and t > 1 by assumption.
We start by noticing that the components of that contribute to h are the ones for which σ 2 j = 0, i.e., j ∈ Σ (otherwise, j = 0 with probability 1). Using x j = w j + j [cf. (1a)], we have h = j ∈Σ Z j , where Z j is the indicator of the event that is, Z j = 1 if (36) holds, and Z j = 0 otherwise. By construction, 0 ≤ Z j ≤ 1 for all j. Furthermore, because the components of are independent, so are the random variables Z j . All we have left to do is to show that the sum of the expected values of Z j conditioned on A and D equals μ. This involves just simple integration. Let j ∈ Σ. Then, = P ( j > max {0, −w j }) + P ( j < min {0, −w j }) (38) = 1 + exp (−λ j |w j |) 2 (39) From (37) to (38), we used the fact that the events in (36) are disjoint for any w j . From (38) to (39), we used the fact that λ j is finite for j ∈ Σ, and And from (39) to (40) we simply replaced λ j = √ 2/σ j . The expected value of h conditioned on A and D is then where we used (40).