FDR-Control in Multiscale Change-point Segmentation

Fast multiple change-point segmentation methods, which additionally provide faithful statistical statements on the number, locations and sizes of the segments, have recently received great attention. In this paper, we propose a multiscale segmentation method, FDRSeg, which controls the false discovery rate (FDR) in the sense that the number of false jumps is bounded linearly by the number of true jumps. In this way, it adapts the detection power to the number of true jumps. We prove a non-asymptotic upper bound for its FDR in a Gaussian setting, which allows to calibrate the only parameter of FDRSeg properly. Change-point locations, as well as the signal, are shown to be estimated in a uniform sense at optimal minimax convergence rates up to a log-factor. The latter is w.r.t. $L^p$-risk, $p \ge 1$, over classes of step functions with bounded jump sizes and either bounded, or possibly increasing, number of change-points. FDRSeg can be efficiently computed by an accelerated dynamic program; its computational complexity is shown to be linear in the number of observations when there are many change-points. The performance of the proposed method is examined by comparisons with some state of the art methods on both simulated and real datasets. An R-package is available online.


Introduction
To keep the presentation simple, we assume that observations are given by the regression model where ε 0 , . . . , ε n−1 are independent standard normally distributed, and σ > 0. The mean-value function µ is assumed to be right-continuous and piecewise constant, i.e.
Here c k = c k+1 for k = 0, 1, . . . , K − 1, and 0 < τ 1 < . . . < τ K < 1 denote the change-points of µ, with the convention that τ 0 := 0 and τ K+1 := 1. For simplicity we will also use the notation I k = [τ k , τ k+1 ) for the k-th segment. The value of µ on the k-th segment I k is denoted by c k . We stress, however, that much of our subsequent methodology and analysis can be extended to other models, e.g. when the observations come from an exponential family or more generally, errors obey certain moment conditions. Estimation of µ and its change-points in this seemingly simple model (1) (and variations thereof) has a long history in statistical research (see e.g. Csörgö and Horváth, 1997;Siegmund, 2013;Frick et al., 2014, for a survey) and has recently gained renewed interest from two perspectives, in particular. Firstly, large scale applications such as from finance (see e.g. Inclán and Tiao, 1994;Bai and Perron, 1998;Lavielle and Teyssière, 2007;Spokoiny, 2009), signal processing (see e.g. Harchaoui and Lévy-Leduc, 2008;Blythe et al., 2012;Hotz et al., 2013) or genetic engineering (see e.g. Braun et al., 2000;Olshen et al., 2004;Zhang and Siegmund, 2007;Jeng et al., 2010) call for change-point segmentation methods which are computationally fast, say almost linear in the number of observations. Secondly, besides of a mere segmentation of the data into pieces of constancy certain evidence on the accuracy of the number and locations of change-points which comes with this segmentation is demanded.
Many of such methods are based on minimizing a penalized cost functional among different number of change-points K and locations of change-points τ k . For a cost function C, which serves as goodness-of-fit measure of a constant function on an interval, and a penalty against over-fitting f (K) these approaches search for a solution of the global optimization problem min K+1 k=1 C(Y nτ k−1 , . . . , Y nτ k −1 ) + γf (K). ( Fast and exact algorithms for this kind of methods are often based on dynamic programming such as the optimal partitioning method (Jackson et al., 2005) and the Potts estimate (Boysen et al., 2009;Storath et al., 2014), who advocate the sparsest subset selection penalty f (K) = l 0 (µ) = K.
For more general f , see e.g. the segment neighbor method (Auger and Lawrence, 1989) or (Friedrich et al., 2008). More recently, Killick et al. (2012) introduced a pruned dynamic program (PELT) with expected linear complexity mainly for f (K) = K. From a computational point of view, approaches of type (3) seem therefore beneficial. Nevertheless, the choice of the balancing parameter γ := γ n (Y ) in (3) is subtle. Birgé and Massart (2006) offer examples and discussion of this and other penalty choices, and Boysen et al. (2009) provide optimal choices of γ n as n → ∞. Zhang and Siegmund (2007) proposed a penalty depending on K and additionally on distances between consecutive change-points. However, given the data at hand, significance conclusions on the number, location and size of the change-point function are not an easy task for the above mentioned methods, although in many cases there is a good asymptotic understanding nowadays. A similar comment applies to other global segmentation methods which rely on an l 1 approximation of the l 0 penalty in (4) including lasso-type techniques possibly together with post filtering to further enhance sparseness, see e.g. (Tibshirani et al., 2005;Friedman et al., 2007;Harchaoui and Lévy-Leduc, 2010). To overcome the difficulty of choosing γ properly, and for conclusions on the obtained segmentation with statistical evidence, Bayesian methods offer an attractive alternative as well, see (Barry and Hartigan, 1993;Green, 1995;Rigaill et al., 2012) and the references therein. In contrast to solving the global optimization problem in (3) another prominent class of methods is based on the idea to iteratively apply a local segmentation method to detect a single change-point. If such a change-point is detected on a segment, it is split into two parts and the same routine is applied to both new segments. The method stops if no further change-points are found. This approach, referred to as binary segmentation (BS), is certainly among the most popular ones for change-point segmentation, in particular in the context of the analysis of copy number variation data and related biostatistical issues. It has already been suggested in Scott and Knott (1974) and more recently related methods have been proposed, such as circular binary segmentation (CBS) (Olshen et al., 2004;Venkatraman and Olshen, 2007) and wild binary segmentation (WBS) (Fryzlewicz, 2014). For these approaches, the to be specified parameter is the probability of including a false change-point in one iteration. Therefore, local error control can be provided, but the overall control on the error to include or exclude wrong segments appears to be elusive for these methods, as well. Frick et al. (2014) suggest a hybrid method, simultaneous multiscale change-point estimator (SMUCE) (see also Boysen et al., 2009;Davies et al., 2012, in the context of variance estimation), which tries to address both tasks by minimizing the number of change-points under a local multiscale side-constraint. The side-constraint is based on a simultaneous multiple testing procedure on all scales (length of subsequent observations) which employs a scale calibrating penalty, borrowed from (Dümbgen and Spokoiny, 2001). It can be shown that for the resulting segmentationμ the number of change-points is not overestimated at a pre-defined probability, 1 − α S (i.e. family-wise error rate, FWER). This provides a direct statistical interpretation. In fact, the error of including j false positives provided by SMUCE has exponential decay, (see Sieling, 2013;Frick et al., 2014), which in particular controls the overestimation of the number of jumps (j = 1 in (5)) Moreover, it can be shown that the method is able to detect the true number of change-points over a large range of scales with minimax detection power (see Theorem 5 in Frick et al., 2014). However, according to (6), in particular in situations with low signal to noise ratio (SNR) or with many change-points compared to the number of observations, this error control necessarily leads to a conservative estimateμ of µ in (2), i.e. with fewer change-points than the true number K. Therefore, in this paper we offer a strategy to overcome this drawback which might be beneficial also for other related methods. This is based on the control of the false discovery rate (FDR) (Benjamini and Hochberg, 1995) instead of the FWER. Despite of the huge literature about change-point detection, there is only a small number of papers addressing the FDR issue in this context, (see Tibshirani and Wang, 2008;Siegmund et al., 2011;Hao et al., 2013). These are multiple stage procedures and they only control FDR in certain steps, not for the whole approach. In this work, we will present FDR-SMUCE, similar in spirit to SMUCE, which however, controls the FDR of the whole segmentation, rather than the FWER. The significance statement given by the method is quite intuitive and also holds for a finite number of observations. This reveals that the contribution of this work is twofold: First, the new method overcomes the conservative nature of SMUCE while maintaining a solid statistical interpretation. In doing this, we provide a general framework how to combine FDRcontrol with global segmentation methods. We are not aware of any other changepoint segmentation method which shares this property. Second, all results are nonasymptotic and hold uniformly over all piecewise constant functions µ in model (2). Before going into details, we illustrate this by the example in Figure 1. We employed the blocks signal (Donoho and Johnstone, 1994) with Gaussian observations of standard deviation σ = 10 (with average SNR |µ(x)|dx/σ ≈ 0.65). Very naturally we declare such discoveries (estimated change-points) true if they are "close" (to be specified later) to true change-points. In this example FDR-SMUCE (FDR ≤ β = 0.1) detects all the change-points correctly, while SMUCE (α S = 0.1) finds only 6 out of 11, due to its requirement to control the FWER in (6). On this data, the smallest β for FDR-SMUCE which overestimates the number of changepoints is 0.5. With such choice of β, FDR-SMUCE finds one additional false changepoint (at 0.17, marked by a vertical line and an associated interval defined in (7), in the fourth panel) besides all the true ones. The proportion between false and all discoveries plus one (number of segments) is hence 1/(12 + 1) ≈ 0.08. Later we will show that the FDR-SMUCE is indeed able to control this proportion in expectation at the predefined level β. For the other direction, the largest β for FDR-SMUCE which underestimates the number of change-points is 0.07, which is shown in the bottom. That is, FDR-SMUCE estimates the correct number of change-points for every β ∈ (0.07, 0.5).
For our purpose it is helpful to interpret the "detection part" of the multiple changepoint regression problem as a multiple testing problem. In the literature methods with this flavor often consider multiscale local likelihood tests. Whereas local tests for the presence of a change-point on small systems of sets (e.g. the dyadics) of the sampling points {0, 1/n, . . . , (n − 1)/n} can be efficiently computed they may have low detection power and highly redundant systems such as the system of all intervals have been suggested instead (Siegmund and Yakir, 2000;Dümbgen and Spokoiny, 2001;Davies et al., 2012;Frick et al., 2014), see, however, (Walther, 2010;Rivera and Walther, 2013) for sparser but still asymptotically efficient systems. It was pointed out by Siegmund et al. (2011) that classical FDR for redundant systems might be misleading, because such local tests are highly correlated and consequently tests on nearby intervals likely reject/accept the null-hypothesis together, see also (Benjamini and Yekutieli, 2001;Guo and Sarkar, 2013) for a general discussion of this issue. Siegmund et al. (2011) therefore suggest to test for constancy on subintervals and to group the nearby false (or true) rejections, and count them as a single discovery, which allows to control the FDR group-wise. In our approach, we circumvent this difficulty, but still are able to work with redundant systems, because instead we perform a multiple test for the change-points directly, i.e. we treat the multiple testing problem It remains to define a true discovery. This is done by identifying a rejection as a true discovery if it is "close" to a true change-point. To be specific, let {τ 1 , . . . ,τK} be rejections (i.e. estimated change-points), andK the estimated number of changepoints. For each i ∈ {1, . . . ,K}, we classifyτ i as a true discovery if there is a true change-point lying in whereτ 0 := 0 andτK +1 := 1; otherwise, it is a false discovery, see again panel 4 in Figure 1. Similar to Benjamini and Hochberg (1995), we then define the false discovery rate (FDR) by where FD is the number of false discoveries in the above sense. The rest of the paper is organized as follows. In Section 2, we introduce the new segmentation method (FDR-SMUCE) and show its control of FDR. In Section 3 we will develop a pruned dynamic program for its computation. The accuracy and efficiency of FDR-SMUCE is examined in Section 4 on both simulated and real datasets. The paper ends with a conclusion in Section 5. An implementation of FDR-SMUCE is provided in R package "FDRS", available from http://www.stochastik.math.unigoettingen.de/fdrs.

Method and Main Result
Now we will give a formal definition of the FDR-SMUCE. To simplify, we assume that the noise level σ is known. For methods to estimate σ 2 , see e.g. (Rice, 1984), (Dette et al., 1998), or (Davies andKovac, 2001) among many others. Assume that Y = (Y 0 , . . . , Y n−1 ) is given by model (1). For an interval I ⊂ [0, 1) we consider the multiscale statistic with scale calibration where c is a real number, pen(x) = 2 log(e/x) the penalty term for the scale and |I| the number of observations in I (scale) with slight abuse of notation. For α ∈ (0, 1), let us introduce q α (m) by where ε = (ε 0 , . . . , ε n−1 ) is standard normally distributed,ε I = i/n∈I ε i /|I|, and I a fixed interval with |I| = m. It can be easily shown that q α (m) does not depend on the choice of I if |I| = m, which justifies the definition (10).
Remark 2.1. As a direct consequence of (Dümbgen and Spokoiny, 2001) (see also Dümbgen and Walther, 2008;Frick et al., 2014) the limit distribution of T I (ε,ε I ) is finite almost surely and is continuous (Dümbgen et al., 2006), as |I| → ∞. The values q α (m) are therefore uniformly bounded for all m.
For our purpose we have to introduce the set of step functions restricted to the multiscale side-constraint induced by (9) and (10) The estimated number of change-pointsK will be given bŷ Then the FDR-SMUCE estimateμ is given bŷ that is, the constrained maximum likelihood estimator within CK. The main theorem of this paper is the FDR control, defined in (8), of the estimatê µ in (13). More precisely, the FDR can be controlled explicitly by choosing the local level α in (10) properly.
Remark 2.3. To calibrate the method for given β, we simply rewrite (14) into which is roughly, α = β/2 for small β, see Figure 2.   Remark 2.4 (Comparison of SMUCE and FDR-SMUCE). Let us stress some notable differences to SMUCE (Frick et al., 2014), which is based on restricting possible estimators to

FDR−SMUCE
is as in (9), with penalty pen((j − i + 1)/n) instead. Firstly, this penalty term underlying SMUCE on the interval [i/n , j/n] only relates the ratio between the number of observations in [i/n, j/n] and all the observations, while that of FDR-SMUCE relies on the ratio between the number of observations in [i/n, j/n] and the corresponding segment length of I. This modification has a flavor similar to Zhang and Siegmund (2007)'s refined Bayes information criterion type of penalty. Secondly, the parameter α S of SMUCE ensures that the true signal lies in the sideconstraint C 0 K with probability at least 1−α S . In contrast, the FDR-SMUCE considers constant parts of the true signal individually, guaranteeing that the mean value of each segment I i lies in its associated side-constraint in C K with probability at least 1−α. This makes it much less conservative, and its error controllable in terms of FDR (see Theorem 2.2). This is a key idea underlying FDR-SMUCE. For an illustration of this effect see Figure 3. Thirdly, the quantile in SMUCE is universal and practically estimated by Monte-Carlo simulations using the worst case scenario, i.e. µ = 0 on all the intervals, since the exact system of intervals on which the true signal is constant is unknown. In general, this leads to a larger quantile than what is required, making the method more conservative than necessary. In contrast, for FDR-SMUCE, the quantiles q α in (10) are scale dependent, relying on all the intervals up to certain length, revealing the resulting method less conservative.
In situations with many change-points or low SNR, to overcome the conservative nature of SMUCE, the significance level α S in (6), the overestimation error, has been suggested to be chosen close to one to produce an estimate with good screening properties (Frick et al., 2014). It follows from the arguments above that the parameter α of FDR-SMUCE relates to α S roughly by because the probability of coverage of the true signal by C K is (1 − α) K+1 , where K is the true number of change-points. This is confirmed by simulations. For example, consider the recovery of a teeth signal (adopted from Fryzlewicz, 2014) with K = 50 from 900 observations contaminated by standard Gaussian noise, see Figure 4. In Figure 5, the histogram of estimated number of change-points by SMUCE (α S = 0.1) and FDR-SMUCE (α = 0.1) are shown in white bars from 1,000 repetitions. It can be seen that SMUCE (α S = 0.1) seriously underestimates the number of changepoints, while FDR-SMUCE estimates the right number of change-points with high probability. If we adjust α S according to (15), i.e. α S = 1 − (1 − 0.1) 51 ≈ 0.995, this leads to a significant improvement of detection power of SMUCE, as is shown by the corresponding histogram of estimated number of change-points in grey bars (left panel in Figure 5), however, at the expense of any provable statistical error control, i.e. the control of overestimating the true K for SMUCE becomes increasingly more difficult as K gets larger. On the other hand, FDR-SMUCE adapts to K automatically, and works well with a choice of small values of β in (14). Finally, it becomes apparent from a comparison of the two lower panels in Figure 4 that the local thresholding in (10) and (11) makes an important difference to SMUCE.
Remark 2.5 (Discussion of the bound in Theorem 2.2). Various simulation studies (not displayed) suggest even the bound FDR ≤ α, improving (14) by a factor of 2. Although we were not able to prove this, we stress that this might be useful for practical purpose to select and interpret α. For example in Figure 6 we display results for the teeth signal (see Figure 4), where the FDR is estimated by the empirical mean of 1,000 repetitions with n = 600. It shows that the bound (14) (dashed line) is good when α is small, and gets worse as α increases.

Implementation
It will be shown that the FDR-SMUCE can be efficiently computed by a pruned dynamic programming algorithm. For convenience let us introduce We first consider the computation ofK, which is defined in (12). LetK[i] be the number of change-points of the FDR-SMUCE estimate when applying to (Y 0 , . . . , Y i−1 ), that is, for i = 1, . . . , n, where denotes disjoint union. Then the estimated number of change-pointsK in (12) is given byK [n]. It can be shown that the following recursive relationK holds for i = 1, . . . , n. Eq. (16) is often referred to as Bellman equation (Bellman, 1957), also known as optimal substructure property in computer science community (Cormen et al., 2009). It justifies the use of dynamic programming (Bellman, 1957;Bellman and Dreyfus, 1962) for computing the FDR-SMUCE estimate. In this way, the computation ofK is decomposed into smaller subproblems of determininĝ K[i]'s. For each subproblem, it boils down to checking the existence of constant functions which satisfy the multiscale side-constraint on [j/n, i/n) i.e. I ([j/n, i/n)) =1. TheK[i] is computed, via the recursive relation (16), as i increases from 1 to n.
For each i, this involves the search space of {0, . . . , i − 1}, which increases as i approaches n. However, some of such searches are, actually, not necessary and can be pruned. This can be seen by rewriting the recursive relation in terms of the number of change-points. Let A 0 := {0} and B 0 := {1, 2, . . . , n}. For k = 1, 2, . . . , let The reason for introducing r k is that there is no need to consider larger intervals if the multiscale side-constraint on an interval does not allow a constant signal even with the maximal penalty and the maximal quantile. Now for each i we only need to search in a subset The value max 0≤k≤K (r k+1 − min A k − |A k |/2) 2 depends on the signal and the noise. If the signal has many change-points and segments have similar lengths, it is a constant independent of n. And the higher noise level, the larger it might be. In such situation, the computation complexity is linear, although in the worst case it can be cubic in n.
Indeed, the searches ofK and the maximum likelihood estimate can be done simultaneously, if we record the likelihood for each point i. The complexity is again bounded above by (17) but with a possibly larger constant. The memory complexity of the whole algorithm is linear, i.e. O(n). We omit technical details. The pruned algorithm is implemented in the statistical software R in the package "FDRS" (http://www.stochastik.math.uni-goettingen.de/fdrs).

Simulations and Applications
4.1. Simulation study. We now investigate the performance of FDR-SMUCE under situations with various SNRs or different number of change-points, and compare it with SMUCE (Frick et al., 2014), PELT (Killick et al., 2012), BS (Scott and Knott, 1974), CBS (Olshen et al., 2004;Venkatraman and Olshen, 2007) and WBS (Fryzlewicz, 2014). As mentioned in Section 1, these methods represent powerful state of the art procedures from two different view points: one is exact and fast optimization based on dynamic programming, including PELT, and SMUCE; the other is greedy methods based on single change-point detection, including BS, CBS and WBS. Concerning implementation, we use R packages "PSCBS" for CBS, "wbs" for BS and WBS, "changepoint" for PELT, and an efficient implementation in "FDRS" for SMUCE. All the packages, except for "FDRS", are available on CRAN. For both SMUCE and FDR-SMUCE, we estimate the α-quantile thresholds by 5,000 Monte-Carlo simulations. The penalty 2 log(K) is chosen for PELT, which is dubbed by "SIC1" in the codes provided by its authors, and works much better than the default choice. If we identify a change-point with two parameters (location and jump-size), this is the same as the Schwarz information criterion (SIC). We use the automatic rule, strengthened SIC, recommended by the author for WBS. The default parameter setting provided in the packages was used for BS and CBS. In all simulated scenarios, we assume that the noise level σ is known beforehand. For quantitative evaluation, we will use mean integrated square error (MISE), mean integrated absolute error (MIAE), the FDR defined in (8) and V-measure (Rosenberg and Hirschberg, 2007). The V-measure, a segmentation evaluation measure, takes values in [0, 1], with a larger value indicating higher accuracy. It is based upon two criteria for clustering usefulness, homogeneity and completeness, which capture a clustering solution's success in including all and only data points from a given class in a given cluster. In particular, a V-measure of 1 shows a perfect segmentation. All the experiments are repeated 1,000 times. 4.1.1. Varying noise level. Let us consider the impact of different noise levels. To this end, we use the mix signal (adopted from Fryzlewicz, 2014, see Figure 8) with additive Gaussian noise, which is a mix of prominent change-points between short intervals and less prominent change-points between longer intervals. The noise level σ varies from 1 to 8, and the number of observations n = 560. For SMUCE and FDR-SMUCE, we choose the same parameter α S = α = 0.15. As in Figure 7, FDR-SMUCE outperforms others in all noise levels, in terms of V-measure, MISE, MIAE, and detection power measured by the average number of detected change-points. As indicated by number of detected change-points, MISE and MIAE, the PELT ranks second followed by WBS, then CBS, SMUCE and lastly BS. The same order of performance is also seen from V-measure up to σ = 5, but SMUCE deteriorates slower as noise level σ increases and achieves a better V-measure than CBS when σ ≥ 6 and than WBS at σ = 8. It is worth noting that the empirical FDR of FDR-SMUCE is around 0.1, far away from the theoretical bound ≈ 0.35 (indicated by the dashed horizontal line in the lower-left panel). The CBS has the second largest empirical FDR, while that of PELT, SMUCE, BS and WBS are almost zero. Once the quantiles for SMUCE and FDR-SMUCE are simulated, they can be stored and used for later computations, which are therefore excluded from the recorded computation time. The computation time of FDR-SMUCE is similar to the fastest ones, namely PELT, BS and SMUCE, at σ = 1 and increases with the noise level σ. The FDR-SMUCE is faster than WBS and CBS in all scenarios. To have a closer examination, we also illustrate histograms of the locations of change-points, for σ = 8 in Figure 8. In this situation, the FDR-SMUCE has uniformly the largest detection power over all change-points. 4.1.2. Varying frequency of change-points. In order to evaluate the detection power as K increases, we employed the teeth signal (see Figure 4) with n = 3,000, and K = n θ , θ = 0.1, 0.2, . . . , 0.9, as its SNR remains the same for different number of change-points. The same parameter α S = α = 0.1 is chosen for SMUCE and FDR-SMUCE. The results are summarized in Figure 9. The FDR-SMUCE and PELT perform comparably good in all situations in terms of number of detected changepoints, V-measure, MISE and MIAE. As shown by V-measure, CBS and WBS fail when θ ≥ 0.7, BS fails when θ ≥ 0.8 and SMUCE deteriorates at θ = 0.9. Similar trend can also be seen from number of estimated change-points, MISE, and MIAE.  in particular, various cancers. Array comparative genomic hybridization (CGH) provides the means to quantitatively measure such changes in terms of DNA copy number (Pinkel et al., 1998). The statistical task is to determine accurately the regions of changed copy number. The model (1) has been well-justified and studied in this problem (Olshen et al., 2004;Zhang and Siegmund, 2007;Tibshirani and Wang, 2008;Jeng et al., 2010). We compared the FDR-SMUCE with SMUCE, and CBS, which is designed for the analysis of array CGH data, on the Coriel data set from (Snijders et al., 2001). An outlier smoothing procedure introduced in (Olshen et al., 2004) was applied before segmentation. The CBS estimate was computed using default parameters provided in the package "PSCBS". The estimated copy number variations by each method are plotted with the data (points) for cell line GM01524 in Figure 10. CBS finds the largest number of change-points, which is 17 in total. The FDR-SMUCE with β = 0.01 (i.e. α ≈ 0.005) finds 2 more change-point in chromosome 11 than the SMUCE with α S = 0.1, which are in accordance with CBS. With a larger β = 0.05, the FDR-SMUCE detects 2 additional bumps: one at chromosomes 1, 2 and the other at chromosome 18, which are also found by CBS. If we continue to increase β until 0.37, the FDR-SMUCE will detect the whole set of change-points by CBS, together with some additional change-points. Recall from Theorem 2.2 that the parameter β controls the FDR. Hence, by applying FDR-SMUCE with a range of β, we could provide a hierarchy of significant statements to the change-points by CBS (and any other method). This suggests that the FDR-SMUCE can also be used as a tool to interpret results by other methods which do not automatically come with statistical guarantees for its segmentation.
4.3. Ion channel idealization. Being prominent components of the nervous system, ion channels play major roles in cellular transporting (Hille, 2001), which are helpful in diagonalizing many human diseases such as epilepsy, cardiac arrhythmias, etc. (Kass et al., 2005). The data analysis is to obtain information about channel characteristics and the effect of external stimuli by monitoring their behavior with respect to conductance and/or kinetics (Chung et al., 2007). The measuring process involves an analog low-pass filter prior to digitization. As analyzed by Hotz et al.
, a realistic model for observations is where ∆ is the sampling rate, and the convolution kernel ρ of the low-pass filter has compact support in an interval of length L, such that ρ(t)dt = 1. Being the independent and identically distributed (i.i.d.) Gaussian noise after the low-pass filter ρ, theε i is still Gaussian with mean zero, but it is correlated. We observe that ε i is independent ofε j if |i − j| > L∆, and that ρ * µ will equal to µ on an interval of length T − L if µ is constant for some time T ≥ L. Thus, if we undersample the observations at rate L∆, the reduced data will satisfy our model (1). We compare SMUCE (α S = 0.05) and FDR-SMUCE (β = 0.05) estimates on the subsampled data (19). As a benchmark, the jump segmentation by multiresolution filter (J-SMURF) estimate, introduced by Hotz et al. (2013), was computed on the full data (18) with α = 0.05, which takes into account the dependence structure of the noise. The implementation of J-SMURF is provided in R package "stepR", available from http://www.stochastik.math.uni-goettingen.de/smuce. Figure 11 shows a characteristic conductance trace of gramicidin A with a typical SNR, L = 30 and ∆ = 0.1 ms. Compared with SMUCE, the additional change-points (indicated by vertical lines) detected by FDR-SMUCE are clearly to be correct, which is reassured by J-SMURF. By the theoretical bound of FDR (see Theorem 2.2), it is known that out of 10 change-points detected by FDR-SMUCE there are at most 0.55 false ones on average. This shows that the ideas underlying FDR-SMUCE are very helpful to detect change-points on various scales as it is to be expected for the investigated gramicidin channel (for an explanation see Hotz et al., 2013). Therefore, it is promising to modify FDR-SMUCE such that local dependencies are taken fully into account, which is postponed to future research.

Conclusion
In this work we proposed a multiple change-point estimate FDR-SMUCE, bearing similar flavor to SMUCE, however with notable differences. Key is the relaxation of the family-wise error to the false discovery rate. The new FDR multiscale sideconstraint succeeded in overcoming the conservativeness of SMUCE. By experiments on both simulation and real data, the FDR-SMUCE shows a significant increase in the the detection power and meanwhile with controlled accuracy. A theoretical bound is provided for its FDR, which provides a meaningful interpretation of the only user-specified parameter α.
Our method is not confined to i.i.d. Gaussian observations, although we restricted our presentation to this in order to highlight the main ideas more concisely. Obviously, it can be extended to more general additive errors, because the proof of Lemma A.1 only relies on Gaussianity for the independence of the residuals and the mean. In the case of different models, e.g. exponential family regression, we believe that one can argue in similar lines as in the proof of Theorem 2.2, but results will only hold asymptotically. This, however, is above the scope of the paper, and postponed to further research. Also as we have applied the CBS outlier smoothing procedure to the array CGH data, it might be of interest to have more robust versions of FDR-SMUCE. To this end, e.g. local median, instead of local mean, might provide useful results. Alternatively, one may transform this into a Bernoulli regression problem (see Dümbgen and Kovac, 2009;Frick et al., 2014), which might be interesting for future research. Now we consider the three intervals I 1 = [0, i * /n), I 2 = [i * /n, j * /n] and I 3 = (j * /n, 1) and bound the probability of further splitting them into smaller intervals. It will be shown that P T I k (Y,Ȳ I k ) > q α (|I k |) T [0,1) (Y,Ȳ ) > q α (n) ≤ α for k = 1, 2, 3.
Given I 2 = [i/n, j/n], the random variable T I k (Y,Ȳ I k ) depends only on {Y i −Ȳ I k , i/n ∈ I k }, which is independent ofȲ andȲ I 2 . It follows from (20) that t [0,1) (I 2 ) depends only onȲ andȲ I 2 . Thus T I k (Y,Ȳ I k ) is independent of t [0,1) (I 2 ) conditioned on I 2 .
Proof of Theorem 2.2. For random variables X, Y and Z = X +Y we find by Jensen's inequality that .
We set X = FD, Y = TD + 1. Together with Lemma A.2 this yields that