1 Introduction

Changepoint Detection (CPD) is the process of detecting significant changes in the data generating process of a time series data. These changes are of two types, abrupt and gradual, though many authors define changepoints as only the abrupt ones. Abrupt changepoints demonstrate very fast and rapid changes in the underlying states of the systems, thus its primary applications include quality control, system reliability engineering, and fault detection and monitoring in industrial plants (Basseville and Nikiforov 1993). Changepoint analysis is also used for time series modeling and signal processing, many studies rely on CPD techniques to segment the study period and then apply different time series methodologies for each homogeneous segment (Lemire 2007). Changepoint detection has successfully proved its practicability in various domains such as finance, sound processing, climatology, network traffic data, etc (see Chen and Gupta 2012 for further references).

In contrast, gradual changepoints do not correspond to rapid discontinuous changes but only smooth continuous ones, which allow the underlying parameters to vary smoothly between different regimes. In other words, the underlying distribution or the property of the time series remains approximately constant over some time, and then slowly starts to change into another regime followed by another period with constant properties. We refer to the midpoint of the period where the smooth change takes place as a gradual changepoint. In different domains such as linguistics (Sharma and Sankaran 2011), paleoclimatology (Trauth et al. 2018), paleontology (Rose and Bown 1986), and paleobiology (Hunt 2008), studies have shown that the changes in different phenomena are often gradual in nature and not abrupt. For example, the evolution of a language over a few hundred years contains only gradual changes, and it is thus difficult to point out any particular period where the archaic language has evolved into its current state. However, such a changepoint has its use, since without detecting such a changepoint it is extremely difficult to obtain a language model completely, while segmenting the study period into homogeneous regions may allow us to model the language in different periods more clearly. More motivating examples of gradual changepoint can be found in (Vogt and Dette 2015).

It has been established that the changepoint detection algorithms developed to tackle the abrupt changes are inefficient in estimation due to relatively larger variance, in presence of a gradual changepoint (Hušková 1998). One particular problem with these algorithms is that they assume a definitive change in the distribution at a fixed point of time which is not true for a gradual changepoint. In contrast, modeling these points of changes in distribution using fuzzy set theory will allow the presence of a gradual changepoint that is difficult to pinpoint. In other words, the gradual changepoint is characterized by a continuum value in [0,1], unlike in {0,1} as in the case of abrupt changepoint. Also, the incorporation of rough sets in the changepoint detection algorithm would allow us to model the ambiguity about the knowledge of the distribution (as opposed to the probabilistic approach). In this paper, we propose a technique that can incorporate these fuzzy-rough set theory-based modifications in most of the existing abrupt changepoint detection algorithms. As will be shown in this paper, this modification would allow even the simple abrupt CPD algorithms to detect the gradual changepoints with much higher accuracy.

2 Existing literature and our contribution

Analysis of abrupt changepoint detection for quality control and industrial applications dates back to the 1950’s Page (1955). Since then, several parametric and nonparametric methods for change detection have been proposed (Basseville and Nikiforov 1993; Brodsky and Darkhovsky 2013). Following Hinkley’s idea of likelihood-based single changepoint detection (Hinkley 1970; Scott and Knott 1974) extended it to the detection of multiple changepoints based on Binary Segmentation, which was popularized as a general method to detect abrupt changepoints. Some recent and popular algorithms for CPD include Pruned Exact Linear Time (PELT) Killick et al. (2012), Bayesian Change-point Detection analysis (Turner et al. 2009; Adams and MacKay 2007), Kernel Change-point Detection (Harchaoui et al. 2009), Change-point detection using nonparametric statistical measures (Haynes et al. 2017).

In comparison to the vast literature on the detection of abrupt changepoints (van den Burg and Williams 2020), the investigation of the gradual changepoint is scarce. Although some recent studies (Liang and Xu 2021) have focused on gradual changes in mean, they retain abrupt changes in variance as the defining property of a changepoint. Some early investigations completely in gradual changepoint setup used to measure the performance of control charts under gradual changes (Gan 1992). Few existing parametric methods (Hušková 1998) consider specific forms of smooth changes of mean and perform likelihood-based tests to detect gradual changepoints. On the other hand, nonparametric methods based on p-value (Mallik et al. 2011) and modifications on CUSUM principle (Vogt and Dette 2015) rely on some suitable modifications of existing abrupt changepoint detection algorithms for the gradual change setup. Another approach to gradual changepoint detection is via the application of abrupt CPD techniques in a multiscale setup (Wu and Zhou 2020), of which SMUCE (Frick et al. 2014) and HSMUCE (Pein et al. 2017) are two popular methods. However, there is no extensive empirical investigation of their performances in presence of a gradual changepoint.

Only a few investigations directed towards the detection of gradual changepoints are based on fuzzy techniques. A major portion of them (Chang et al. 2015) uses the fuzzy c-means clustering method or its variants to segment the period of study into homogeneous segments. Other approaches include fuzzy regression algorithms (Chang et al. 2015), and neural networks (Angelo et al. 2011). No approach considers the discernibility issues that are prevalent in the examples with gradual changepoints, for instance, observations of two close time points would naturally come from a similar kind of distribution, converting each of the segments into a rough set instead of a crisp one. The present paper aims to provide a general methodology based on fuzzy rough sets to increase the efficiency of any statistical measure that can be used to detect changepoints.

The novelty of this paper is three-fold. The first aspect is that we mathematically model the uncertainty associated with gradual changepoints as a fuzzy set and the indiscernibility between very close time points using rough set theory. This enables us to devise an algorithm to transform most of the abrupt CPD algorithms into a gradual CPD algorithm, which is relatively robust to both the low signal-to-noise ratio of the data and the high degree of fuzziness of the actual changepoint. Simulation studies show that our estimates remain fairly accurate over an extensive range of hyperparameter values furthering the robustness of the proposed method. The second aspect of this paper is that we connect the image segmentation problem from the field of Pattern Recognition and the changepoint detection problem from the statistical domain. This connection was discussed by (Chatterjee and Kar 2017), where they formally expressed image segmentation as a special case of the changepoint detection problem. However, our work can be interpreted as going in a reverse direction, by a generalization of methods proposed for image segmentation by (Sen and Pal 2009) for the problem of changepoint detection. The third component of novelty is reducing the computational complexity of the aforementioned image segmentation algorithm by a series of results and fitting the algorithm in the context of the statistical changepoint detection problem. To formally present the proposed algorithm in conjunction with a statistical hypothesis testing framework, we derive the asymptotic distribution of the proposed rough-fuzzy estimator under some suitable regularity conditions. This enables us to test for false positives of the detected changepoints and also allows for multiple changepoint detections. The performance of the proposed method has been illustrated using extensive simulation studies, as well as different real-data applications, including estimation of behavioral changes during the covid-19 period, and detection of changes in language using US presidential speech data. We also observe that our method outperforms several existing changepoint detection methods in different setups. The implementation of the method is made available through a python package roufcp, bearing the acronym for the name of the proposed algorithm Rough-Fuzzy CPD.

The rest of the paper is organized as follows: In Section 3, some mathematical concepts are briefly described in connection to the proposed method, which is ultimately described in Section 4.2 in detail. Different ways to speed up the computation of the algorithm are described in Section 4.1. In Section 4.3, we provide a theoretical justification of the proposed method in the light of a hypothesis testing framework with an example. Section 5 provides support for the proposed roufCP method based on extensive simulation studies and also establishes the superiority of the method in comparison to several existing changepoint detection techniques. Finally, Section 6 demonstrates its usefulness in real-life examples, which follows with concluding remarks in Section 7. For brevity of the presentation, all the proofs of the theorems have been provided in the appendix.

3 Mathematical preliminaries

3.1 Problem statement

The mathematical setup of the problem of changepoint detection starts with a multivariate time series signal \(\left\{ y_{t} \right\} _{t=1}^{T}\), where each \(y_{t} \in \mathbb {R}^{p}\). There also exist some fixed points \(1< t_{1}^{*}< t_{2}^{*}< \dots t_{k}^{*} < T\), such that within the interval \(\left[ t_{(i-1)}, t_{i} \right]\), the time series observations follow the same probability distributional model, but these models differ from one interval to another. In particular,

$$\begin{aligned} y_{t} \sim {\left\{ \begin{array}{ll} F_{0}(t) &{} \text { if } 1 \le t< t_{1}\\ F_{1}(t) &{} \text { if } t_{1} \le t< t_{2}\\ \dots &{} \dots \\ F_{(k-1)}(t) &{} \text { if } t_{(k-1)} \le t < t_{k}\\ F_{k}(t) &{} \text { if } t_{k} \le t \le T\\ \end{array}\right. }, \end{aligned}$$
(1)

and \(F_{i} \ne F_{(i+1)}\) for \(i = 0, 1, \dots (k-1)\). A special case follows when the signal is assumed to be piece-wise stationary. The problem of CPD deals with identifying the set of points \(\left\{ t_{1}, t_{2}, \dots t_{k}\right\}\) where the distribution or behavior of the signal changes, from the knowledge of the available time series observations \(y_{t}\)s. The knowledge of the number of changepoints k may or may not be known apriori, in which case it also has to be estimated from the data.

A general framework for existing CPD algorithms is provided in a comprehensive review by (Truong et al. 2020). Most of the existing algorithms involve minimization of a criterion function \(V(\mathcal {T}, y)\) to obtain the best set of changepoints \(\mathcal {T}\),

$$\begin{aligned} V(\mathcal {T}, y) = \sum \limits _{i = 1}^{k} c\left( \left\{ y_{t} \right\} _{t_{(i-1)}}^{t_{i}}\right) + \text {Penalty}(\mathcal {T}). \end{aligned}$$
(2)

3.2 Fuzzy set

Fuzzy sets, proposed by (Zadeh 1965), provide a mathematical means of modeling vagueness and imprecise information. They are a generalization of crisp sets or normal sets. A fuzzy set is represented by an ordered pair \(\langle X, \mu \rangle\). Denoting \(\mathbb {U}\) as the universe of objects, and \(X \subseteq \mathbb {U}\), the membership function for the set X can be described as \(\mu : \mathbb {U} \rightarrow [0, 1]\), indicating the degree of inclusion of an element \(x \in \mathbb {U}\) into the set X. Consequently, three cases are possible based on the value of \(\mu (x)\):

  1. 1.

    Not included in \(\langle \mathbf {X}, \mu \rangle\) if \(\mu (x) = 0\).

  2. 2.

    Partially included in \(\langle \mathbf {X}, \mu \rangle\) if \(0< \mu (x) < 1\).

  3. 3.

    Fully included in \(\langle \mathbf {X}, \mu \rangle\) if \(\mu (x) = 1\).

While fuzzy sets have been extensively used in fields like control systems for electronic devices, decision-making support systems in businesses, prediction systems in finance, quantitative pattern analysis for industrial quality assurance, and uncertainty modeling in pattern recognition, image processing and vision (see Pal and Dutta-Majumder 1986; Pathak and Pal 1986 for references), its use in changepoint analysis and statistics, in general, has been very limited. In this paper, we shall model the membership of a data point to a particular distribution as the membership function of a fuzzy set to characterize the ambiguity in the data generation process.

3.3 Rough set

The notion of rough sets, as introduced by (Pawlak 1982), is defined as follows: Let \(\mathcal {A}\) = \(\langle \mathbb {U}, A \rangle\) be an information system, where \(\mathbb {U}\) is the universe of all objects as mentioned before, and A be the set of attributes which are used to identify two elements of \(\mathbb {U}\). Let \(B\subseteq A\) be a subset of attributes and \(X \subseteq \mathbb {U}\). We can approximate the set X using only the information contained in B by constructing lower and upper approximations of X. Let \([x]_{B}\) denote the equivalence class of object x relative to \(I_{B}\) (equivalence relation induced by the variables in B). In rough set theory, the upper and lower approximations of the set X under \(I_{B}\) are defined below,

  • \({\underline{B}X} =\) B-Lower approximation of X in \(\mathbb {U}\) =\(\{x \in \mathbb {U}: [x]_{B} \subseteq X\}\)

  • \({\overline{B}X}\) = B-upper approximation of X in \(\mathbb {U}\) = \(\{x \in \mathbb {U}: [x]_{B} \cap X \ne \phi \}\)

So, the lower approximation of a set X relative to B in \(\mathbb {U}\) are the elements in \(\mathbb {U}\) which can be certainly classified as elements of X based on B. Intuitively, it is the set of all elements x in \(\mathbb {U}\) such that the equivalence class containing x is a subset of the target set X. In other words, the lower approximation is the set of objects that are certainly members of the target set X. On the other hand, the upper approximation is the complete set of objects x such that the equivalence class containing x has a non-empty intersection with X. Intuitively, it is the set of elements in \(\mathbb {U}/B\) that cannot be positively (i.e., unambiguously) classified as belonging to the complement of the target set X. In contrast to the lower approximation, the upper approximation is the complete set of objects that are certainly as well as possible members of the target set X.

The pair \(\langle \underline{B}X, \overline{B}X\rangle\) denotes the rough representation of the crisp set X with respect to B. This rough representation actually captures the uncertainty in defining X because of the incomplete information provided by the subset of attributes B. Numerical characterization of the roughness of X can be obtained as follows (Pal and Skowron 1999)

$$\begin{aligned} \rho _{B}(X) = 1 - \frac{\vert \underline{B}X\vert }{\vert \overline{B}X \vert } \end{aligned}$$
(3)

Therefore, \(\rho _{B} = 0\) means, the set X is crisp or exact (with respect to B) and conversely, \(\rho _{B} > 0\) means, X is rough, i.e. ambiguous, with respect to B. This vague definition of X in \(\mathbb {U}\) (in terms of lower and upper approximations) signifies incompleteness of knowledge about \(\mathbb {U}\).

An amalgamation of fuzzy theory and rough theory can be used to characterize systems in which the sets and their elements themselves are ambiguous due to partial knowledge about their attributes, as well as the elements themselves can belong to the set partially. Similar to the lower and upper approximations of a crisp set, the definition of such approximations for a fuzzy set can also be established. While the definition of lower and upper approximations of a set may vary based on the nature of set X (crisp or fuzzy) as well as the nature of the relationship between elements of the universe \(\mathbb {U}\), here, we are only concerned with approximations of a fuzzy set X with respect to an indiscernibility relation R which is either an equivalence relation (crisp or fuzzy) or a tolerance relation (crisp or fuzzy). The expressions of upper and lower approximations for both these cases have been derived by Sen and Pal (2009). Since the set approximations obtained using equivalence relations are not always smooth, we will restrict our consideration only to the tolerance relations. The general implicit expressions of these approximations are given below, while a more explicit form of such approximations will be derived later in Eq. 4.1.

A tolerance relation (crisp or fuzzy) is a (crisp or fuzzy) relation that satisfies (crisp or fuzzy) reflexivity and symmetry. Unlike equivalence relations, tolerance relations are not necessarily transitive. When R is a tolerance relation, the space \(\langle \mathbb {U}, R \rangle\) is called tolerance approximation space. Associated with a tolerance relation R, there is a membership function \(S_{R}\) of the relation itself, where \(S_{R}(u, v)\) denotes the membership value of the pair (uv) belonging to the relation R. With the help of these, the lower and upper approximations of any crisp or fuzzy set X are obtained as follows;

$$\begin{aligned} \underline{R}X= & {} \{ (u, \underline{M}(u)) \left| \right. u \in \mathbb {U}\} \nonumber \\ \overline{R}X= & {} \{ (u, \overline{M}(u)) \left| \right. u \in \mathbb {U}\} \end{aligned}$$
(4)

where the approximations of the membership functions are based on the tolerance function \(S_{R}\) of the relation R  Sen and Pal (2009);

$$\begin{aligned} \underline{M}(u)= & {} \underset{\psi \in \mathbb {U}}{\inf } \max (\overline{S}_{R}(u, \psi ), \mu _{X}(\psi )) \nonumber \\ \overline{M}(u)= & {} \underset{\psi \in \mathbb {U}}{\sup } \min (S_{R}(u, \psi ), \mu _{X}(\psi )) \end{aligned}$$
(5)

where \(\mu _{X}\), which takes values in the interval [0, 1], is the membership function associated with set X and \(\overline{S}_{R}(u, \psi ) = 1 - S_{R}(u,\psi )\). When X is a crisp set, \(\mu _{X}\) would take values only from the set \(\{0, 1\}\). Similarly, when R is a crisp tolerance function, \(S_{R}(u, \psi )\) would take values only from the set \(\{0, 1\}\).

3.4 Entropy measures

Using the definitions of upper and lower approximations of a set, entropy can be defined to quantify the ambiguity in the description of X. While the roughness measure \(\rho _{B}(X)\) as in Eq. 3 gives a measure of ambiguity in the description of X, they are needed to be transformed to properly describe the information gain. In this regard, two types of gain functions as defined by Pal and Pal (1991) are considered.

One choice to use a logarithmic function to measure the gain in incompleteness results in a formula of entropy similar to “gain in information” in Shannon’s entropy. The logarithmic entropy measure for quantifying the incompleteness of knowledge about \(\mathbb {U}\) with respect to the definability of a set \(X\in \mathbb {U}\) is given as

$$\begin{aligned} {H_{R}^{L}}(X) = -\frac{1}{2}\left( \chi (X) + \chi (X^{C}) \right) \end{aligned}$$
(6)

where for any set \(D \in \mathbb {U}\), \(\chi (D) = \rho _{R}(D)\log _{\beta }\left( \frac{\rho _{R}(D)}{\beta }\right)\). Note that the “gain in incompleteness” term is taken as \(\log _{\beta }\left( \frac{\rho _{R}(D)}{\beta }\right)\) and for \(\beta > 1\) it takes values in \([1,\infty ]\).

In contrast, the other kind of entropy measure is defined using the exponential function for the “gain in incompleteness”. This class of entropy is derived using \(\chi (D) = \rho _{R}(D)\beta ^{\overline{\rho }_{R}(D)}\) where \(\overline{\rho }_{R}(D) = 1-\rho _{R}(D)\). So the expression of entropy becomes

$$\begin{aligned} {H_{R}^{E}}(X) = -\frac{1}{2}\left( \rho _{R}(X)\beta ^{\overline{\rho }_{R}(X)} +\rho _{R}(X^{C})\beta ^{\overline{\rho }_{R}(X^{C})} \right) \end{aligned}$$
(7)

Here the gain in incompleteness term is taken as \(\beta ^{(1-\rho _{R})}\) which takes values in \([1, \beta ]\) when \(\beta >1\). This class of exponential entropy functions possesses various desirable properties which are not present in the usual Shannon’s entropy, as illustrated by Pal and Pal (1991). Some of them are mentioned below.

  1. 1.

    In Shannon’s entropy, the gain in information \(\log (1/p) \rightarrow \infty\) as \(p \rightarrow 0\) and is undefined for \(p = 0\). However, in real life, gain in information from an event, whether highly unlikely or highly probable is expected to be finite. However, the exponential gain function ensures that such a gain in information is bounded between 1 and \(\beta\).

  2. 2.

    Logarithmic entropy is very sensitive to outliers due to the nature of the \(\log\) function. In contrast, exponential entropy is more robust to outliers. Particularly, a small probability relative to the other probabilities will bias the entropy towards the small probability event since the weight function \(\log (1/p)\) in logarithmic entropy has an unbounded derivative. In comparison, the weight function \(\beta ^{(1-p)}\) used in exponential entropy has a bounded derivative.

We have considered \(\beta = e\) in the subsequent sections, but any value greater than 1 is suitable.

4 Proposed RoufCP algorithm

4.1 Precomputation

In terms of mathematical setup, we consider \(\left\{ y_{1}, y_{2}, \dots y_{T} \right\}\) as the time series data with \(y_{t} \in \mathbb {R}^{p}\) for any \(t = 1, 2, \dots T\). With the number of changepoints equal to 1, we assume that the data comes from a distribution \(\mathcal {F}\) at first and then gradually comes from a different distribution \(\mathcal {G}\) after some time. Since the location of the changepoint is ambiguous in nature, this creates the possibility of splitting the set of time points \(\mathbb {U} = \{1,2, ... ,T\}\) into two fuzzy partitions, \(\gamma _{\mathcal {F}} = \langle \mathbb {U}, \mu _{\mathcal {F}}\rangle\) and \(\gamma _{\mathcal {G}} = \langle \mathbb {U}, \mu _{\mathcal {G}}\rangle\). Here, \(\mu _{\mathcal {F}}\) and \(\mu _{\mathcal {G}}\) denote the membership function of the respective partitions with observations coming from \(\mathcal {F}\) and \(\mathcal {G}\) respectively. Clearly, an obvious restriction is that \(\mu _{\mathcal {F}}(t) + \mu _{\mathcal {G}}(t) = 1\). This is related to the incompleteness in knowledge about \(\mathbb {U}\) and can be quantified by the entropy measures described by Sen and Pal (2009).

To formalize this notion of fuzziness, for \(t \in \mathbb {U}\), for an estimated changepoint s and bandwidth \(\Delta\), we define a fuzzy measure similar to that in Sen and Pal (2009).

$$\begin{aligned} \mu _{s,\Delta }(t) = {\left\{ \begin{array}{ll} 1 &{} t \le s -\Delta \\ 1-2\left[ \frac{t-(s -\Delta )}{\Delta }\right] ^{2} &{} s -\Delta< t \le s \\ 2\left[ \frac{(s+\Delta ) - t}{\Delta }\right] ^{2} &{} s < t \le s+\Delta \\ 0 &{} t > s+\Delta \end{array}\right. } \end{aligned}$$
(8)

On the basis of this estimated changepoint s, the partitions \(\gamma _{\mathcal {F}}\) and \(\gamma _{\mathcal {G}}\) can be reformalized as \(\gamma _{s} = \gamma _{\mathcal {F}} = \{(t, \mu _{s, \Delta }(t)) \left| \right. t\in \mathbb {U}\}\) and \({\gamma _{s}^{C}} = \gamma _{\mathcal {G}} = \{(t, 1 - \mu _{s, \Delta }(t)) \left| \right. t\in \mathbb {U}\}\), which captures the fuzzy nature of the two partitions due to the estimation of changepoint by a fixed quantity s.

Following the footsteps of grayness ambiguity, as formulated in Sen and Pal (2009), we consider a tolerance function \(S_{w}(u, v)\) such that,

  1. 1.

    \(S_{w}(u, u) = 1\) for any \(u \in \mathbb {U} = \{ 1, 2, \dots T\}\).

  2. 2.

    \(S_{w}(u, v)\) is a decreasing function in \(\vert u - v\vert\).

  3. 3.

    \(S_{w}(u, v) = 0\) if \(\vert u - v \vert \ge 2w\), where w is a chosen window length. This means that sufficiently spaced timepoints can be distinguished quite nicely.

Then, the lower and upper approximations of the fuzzy set \(\gamma _{s}\) can be constructed as \(\underline{\gamma _{s}} = \left\{ (u, {M}_{\underline{\gamma _{s}}}) : u \in \mathbb {U} \right\}\) and \(\overline{\gamma _{s}} = \left\{ (u, {M}_{\overline{\gamma _{s}}}) : u \in \mathbb {U} \right\}\), where \({M}_{\underline{\gamma _{s}}}\) and \({M}_{\overline{\gamma _{s}}}\) are obtained using Eq. 5 with the tolerance function \(S_{w}\) and membership function \(\mu _{s, \Delta }(t)\). In a similar way, lower and upper approximations of the fuzzy complement set \({\gamma _{s}^{C}}\) can also be obtained. However, there is a much easier formula that allows one to obtain expressions of upper and lower approximations under set complementation.

Lemma 1

$$\begin{aligned} M_{\underline{\gamma }_{s}}(t)= & {} 1 - M_{\overline{\gamma }_{s}^{C}}(t) \\ M_{\overline{\gamma }_{s}}(t)= & {} 1 - M_{\underline{\gamma }_{s}^{C}}(t)\\ \end{aligned}$$

The proof of Lemma 1 is quite simple and can be found in Appendix A.1. Note that, since the computation of the lower and upper approximations \({M}_{\underline{\gamma _{s}}}(t)\) and \({M}_{\overline{\gamma _{s}}}(t)\) is independent of the data, it can be pre-computed for the changepoint analysis problems, given the knowledge of the number of timepoints T. However, if T is large, computation of Eq. 5 poses a high memory and computational complexity. However, it is possible to obtain exact expressions of these lower and upper approximations under a very general setup, which greatly reduces both the computational and storage cost complexities of the whole process. For instance, according to Eq. 5\(M_{\underline{\gamma }_{s}}(t)\) is a minimizer of the function \(\max \left( \overline{S}_{w}(t, \psi ), \mu _{s, \Delta }(\psi ) \right)\) with respect to \(\psi\). This is shown by the upper envelope curve of \(\overline{S}_{w}(t, \psi )\) and \(\mu _{s, \Delta }(\psi )\) in Fig. 1. Thus, it is easy to see that the minimizer would appear at a point \(t^{*}\) where the fuzzy membership function and the tolerance function cross each other, i.e., \(\overline{S}_{w}(t, t^{*}) = \mu _{s, \Delta }(t^{*})\). These observations lead us to the following theorem.

Fig. 1
figure 1

Lower Approximation Curves

Theorem 1

If the membership function \(\mu _{s, \Delta }(t)\) is given as in Eq. 8 and the tolerance function \(S_{w}(u,v)\) is continuous in its both arguments, then lower and upper approximations of the left partition of a chosen changepoint s are expressed by \(\underline{\gamma _{s}} = \left\{ (u, {M}_{\underline{\gamma _{s}}}) : u \in \mathbb {U} \right\}\) and \(\overline{\gamma _{s}} = \left\{ (u, {M}_{\overline{\gamma _{s}}}) : u \in \mathbb {U} \right\}\) respectively, where

$$\begin{aligned} M_{\underline{\gamma _{s}}}(t) = {\left\{ \begin{array}{ll} 1 &{} \text { if } t< (s - 2w - \Delta )\\ 1 - \underset{ \{ t^{*} : S_{w}(t, t^{*}) + \mu _{s, \Delta }(t^{*}) = 1 \} }{\max } S_{w}(t, t^{*}) &{} \text { if } (s - 2w - \Delta ) \le t < (s + \Delta )\\ 0 &{} \text { if } t \ge (s + \Delta )\\ \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} M_{\overline{\gamma _{s}}}(t) = {\left\{ \begin{array}{ll} 1 &{} \text { if } t< (s - \Delta )\\ \underset{ \{ t^{*} : S_{w}(t, t^{*}) = \mu _{s, \Delta }(t^{*}) \} }{\max } S_{w}(t, t^{*}) &{} \text { if } (s - \Delta ) \le t < (s + 2w + \Delta )\\ 0 &{} \text { if } t \ge (s + 2w + \Delta )\\ \end{array}\right. } \end{aligned}$$

The derivation of the above expressions has been outlined in Appendix A.2. It is possible to obtain a closed form solution of \(S_{w}(t, t^{*}) = \mu _{s, \Delta }(t^{*})\) and similar equations for some specific tolerance functions. One such specific choice is provided in the following corollary.

Corollary 1 Let the membership function \(\mu _{s, \Delta }(t)\) be given as in Eq. 8, and a tolerance function be given as

$$\begin{aligned} S_{w}(t, t^{\prime }) = {\left\{ \begin{array}{ll} 0 &{} \text { if } \vert t - t^{\prime } \vert \ge 2w\\ 2\left[ \frac{t^{\prime } - (t - 2w)}{2w} \right] ^{2} &{} \text { if } (t-2w)< x< (t-w)\\ 1 - 2\left[ \frac{x-t}{2w} \right] ^{2} &{} \text { if } \vert t - t^{\prime } \vert \le w\\ 2\left[ \frac{(t + 2w) - t^{\prime }}{2w} \right] ^{2} &{} \text { if } (t+w)< x < (t+2w).\\ \end{array}\right. } \end{aligned}$$
(9)

Then the lower and upper approximations of the left and right partitions for a chosen changepoint s can be obtained as \(\underline{\gamma _{s}} = \left\{ (u, {M}_{\underline{\gamma _{s}}}) : u \in \mathbb {U} \right\}\), \(\overline{\gamma _{s}} = \left\{ (u, {M}_{\overline{\gamma _{s}}}) : u \in \mathbb {U} \right\}\), \(\underline{{\gamma _{s}^{C}}} = \left\{ (u, {M}_{\underline{{\gamma _{s}^{C}}}}) : u \in \mathbb {U} \right\}\) and \(\overline{{\gamma _{s}^{C}}} = \left\{ (u, {M}_{\overline{{\gamma _{s}^{C}}}}) : u \in \mathbb {U} \right\}\), where

$$\begin{aligned} M_{\underline{\gamma }_{s}}(t) = {\left\{ \begin{array}{ll} 0 &{} \text {if } t \ge (s + \Delta )\\ 2\left[ \frac{(s+\Delta ) - t}{2(w + \Delta )} \right] ^{2} &{} \text {if } (s - w) \le t< (s + \Delta ) \\ 1 - 2\left[ \frac{(t + 2w) - (s - \Delta )}{2(w + \Delta )} \right] ^{2} &{} \text {if } (s - 2w - \Delta ) \le t< (s - w)\\ 1 &{} \text {if } t < (s - 2w - \Delta ) \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} M_{\overline{\gamma }_{s}}(t) = {\left\{ \begin{array}{ll} 0 &{} \text {if } t \ge (s + 2w + \Delta )\\ 2\left[ \frac{(s+\Delta ) - (t - 2w)}{2(w + \Delta )} \right] ^{2} &{} \text {if } (s + w) \le t< (s + 2w + \Delta )\\ 1 - 2\left[ \frac{t - (s - \Delta )}{2(w + \Delta )} \right] ^{2} &{} \text {if } (s - \Delta ) \le t< (s + w)\\ 1 &{} \text {if } t < (s - \Delta ), \end{array}\right. } \end{aligned}$$

while the approximations for the complementary set can be obtained using Lemma 1.

Another very general result for symmetric tolerance functions is expressed in Theorem 2. It uses the symmetry of the tolerance function to reduce the computation of obtaining the upper (or lower) approximations \(M_{\overline{\gamma }_{s}}\) (or \(M_{\underline{\gamma }_{s}}\)) by half. The proof of the theorem has been deferred till Appendix A.3.

Theorem 2

If the tolerance function \(S_{w}(u, v)\) can be expressed as a function of the absolute difference of its arguments, i.e. \(S_{w}(u, v) = g(\vert u - v\vert )\) such that \(g(\cdot )\) is symmetric about 0, then;

$$\begin{aligned} M_{\underline{\gamma }_{s}}(t)&= 1 - M_{\overline{\gamma }_{s}}(2s-t) \ \ \forall \ t\ :\ \max \{1, 2s-T\}<t<\max \{2s, T\} \end{aligned}$$

4.2 The final algorithm

The proposed algorithm Rough-Fuzzy CPD (RoufCP) first calculates a regularity measure based on the input, whose working is similar to that of the usual criterion \(V(\mathcal {T},y)\) in Eq. 2. In other words, for each timepoint t, a two-sample test statistic R(t) is computed to detect the changes in the samples \(\{ y_{(t-\delta +1)}, y_{(t-\delta +2)}, \dots y_{t} \}\) and \(\{ y_{(t+1)}, y_{(t+2)}, \dots y_{(t+\delta )} \}\). We expect the regularity measure R(t) to take a higher value when there is no change in distribution and take a lower value when there is a detected change in distribution, hence it could be taken as some suitable transformation of the usual two-sample test statistics used in statistics. For example, one such regularity measure to detect the changes in mean could be based on Hotelling’s T\(^{2}\) test statistic.

$$\begin{aligned} R(t) = \frac{1}{1 + (\bar{y}_{1} - \bar{y}_{2})^{\intercal } \Sigma ^{-1} (\bar{y}_{1} - \bar{y}_{2}) }, \end{aligned}$$

where

$$\begin{aligned} \bar{y}_{1}= & {} \delta ^{-1} \sum \limits _{t^{\prime } = (t-\delta +1)}^{t} y_{t^{\prime }} \bar{y}_{2} = \delta ^{-1} \sum \limits _{t^{\prime } = (t+1)}^{(t+\delta )} y_{t^{\prime }}\\ \Sigma= & {} \delta ^{-1} \sum \limits _{t^{\prime } = (t - \delta +1)}^{(t + \delta )} \left( y_{t^{\prime }} - \frac{1}{2}(\bar{y_{1}} + \bar{y}_{2}) \right) \left( y_{t^{\prime }} - \frac{1}{2}(\bar{y_{1}} + \bar{y}_{2}) \right) ^{\intercal } \end{aligned}$$

Combining the pre-computed upper and lower approximations of the two fuzzy partitions \(\gamma _{s}\) and \({\gamma _{s}^{C}}\) by the proposed changepoint s, with the regularity measure R(t) obtained from the data, one can create roughness measures for these two partitions as follows:

$$\begin{aligned} \rho _{\Delta , \delta , w}(\gamma _{s})= & {} 1 - \frac{{\sum }_{t = 1}^{T} {M}_{\underline{\gamma _{s}}}(t) R(t) }{{\sum }_{t = 1}^{T} {M}_{\overline{\gamma _{s}}}(t) R(t)}\nonumber \\ \rho _{\Delta , \delta , w}({\gamma _{s}^{c}})= & {} 1 - \frac{{\sum }_{t = 1}^{T} {M}_{\underline{{\gamma _{s}^{c}}}}(t) R(t) }{{\sum }_{t = 1}^{T} {M}_{\overline{{\gamma _{s}^{c}}}}(t) R(t)} \end{aligned}$$
(10)

These roughness measures can be efficiently computed due to the results mentioned in Section 4.1. In particular, to obtain the roughness measures \(\rho _{\Delta , \delta , w}(\gamma _{s})\) and \(\rho _{\Delta , \delta , w}({\gamma _{s}^{C}})\) as given in Eq. 10, it is enough to focus the computation on only one of these terms, since the other can be obtained as a byproduct of Lemma 1. Also, Theorem 2 tells us that in the case of symmetric and location invariant tolerance function, only one, either lower or upper, approximation curve is required to be computed, thus reducing the computational complexity to one-fourth of the naive implementation. Finally, Theorem 1 and Corollary 1 can be used to obtain closed-form expressions for the upper (or lower) approximation for the particular choice of tolerance function given in Eq. 9.

Based on these roughness measures, entropy quantifying the ambiguity for the fuzzy partitions of the period for the specifically chosen changepoint s can be expressed using Eqs. 6 or 7, with \(\rho _{R}(X)\) replaced by the roughness measure given in Eq. 10. Thus, we obtain

$$\begin{aligned} H^{E}_{\Delta , \delta , w}(s) = \rho _{\Delta , \delta , w}(\gamma _{s}) e^{\left( 1 - \rho _{\Delta , \delta , w}(\gamma _{s})\right) } + \rho _{\Delta , \delta , w}({\gamma _{s}^{C}})e^{\left( 1 - \rho _{\Delta , \delta , w}({\gamma _{s}^{C}}) \right) } \end{aligned}$$
(11)

Our proposed method minimizes this entropy (or uncertainty) \(H^{E}_{\Delta , \delta , w}(s)\) to obtain the changepoints.

$$\begin{aligned} t^{*} = \underset{t \in \{ 1, 2, \dots T\} }{\min } H^{E}_{\Delta , \delta , w}(t). \end{aligned}$$
(12)

This estimated changepoint \(t^{*}\) shown in Eq. 12 will be denoted as the rough-fuzzy CP, and the method will be called Rough-Fuzzy CPD. Note that while the global minimum serves as the estimate of a single changepoint, the local minima of the entropy function, after suitable testing, can be used for multiple changepoint detections. A detailed analysis of how these detected changepoints can be tested using a statistical hypothesis to reduce any false positive detection is described in Section 4.3. The complete algorithm is shown in Algorithm 1.

figure a

Algorithm 1 Rough-Fuzzy CPD (RoufCP)

In contrast to minimizing the entropy, the usual changepoint detection method using the regularity measure R(t) Truong et al. (2020) can be broadly expressed as

$$\begin{aligned} t^{*} = \underset{t \in \{ 1, 2, \dots T\} }{\min } R(t) = \underset{t \in \{ 1, 2, \dots T\} }{\min } V\left( \left( \{ 1, 2, \dots t \}, \{ (t+1), (t+2), \dots T \} \right) , y\right) . \end{aligned}$$
(13)

where \(t^{*}\) is the estimated changepoint and \(V(\cdot , \cdot )\) is a cost function as shown in Eq. 2, which can be interpreted as a regularity measure. While in this way, the regularity measure R(t) itself becomes an indicator of the changepoint, it can be greatly enhanced with the help of fuzzy and rough set theory, by constructing the entropy as given in Eq. 11 and then focusing on its minimizer. Hence, the inclusion of fuzzy-rough set theory simply serves the purpose of variance reduction instead of a completely new algorithm. We shall show later that this simple tweak is extremely beneficial if the changes in the time series are smooth. Intuitively, since the underlying distribution of the time series data is unknown, all the information about the changepoint must be gathered in terms of the available observations \(y_{t}\), which is now summarized only through a single attribute R(t), the regularity measure. Along with this loss of information, as the regularity measure R(t) is computed based on overlapping windows, any information about the locality of a timepoint will permeate to its neighboring timepoints as well, resulting in incomplete information about the timepoints themselves. This rough resemblance between different time points is modeled by the tolerance relation which eventually leads to the rough set formulation.

4.3 Asymptotic analysis

While the Rough-Fuzzy CPD can be employed and a single changepoint can be detected using Eq. 12, multiple changepoints can be detected by local minima of the curve \(H^{E}_{\Delta , \delta , w}(s)\) as a function of s. However, a reference curve must be computed to select the true changepoints from many local minima. In statistical language, these reference curve is usually computed based on the distribution of the statistic under a suitably chosen null hypothesis, by modifying the problem into a hypothesis testing framework.

Considering the mathematical framework given in Eq. 1, we can formulate the problem of detecting changepoint as a hypothesis testing problem.

$$\begin{aligned} H_{0}:&\quad F_{0} = F_{1} = \dots = F_{k} = F\\ H_{1}:&\quad \text {There is at least one inequality} \end{aligned}$$

Since the regularity measure R(t) is an indicator of a possible changepoint, hence under the null hypothesis \(H_{0}\); \(\mathbb {E}(R(t))\) is a constant independent of the time t. Based on this, we obtain the asymptotic null distribution of the proposed statistic under some reasonable assumptions on the asymptotic null distribution of the regularity measure. Clearly, the asymptotic follows when the number of samples for constructing R(t) i.e. \(\delta\) is tended to infinity, which forces the total number of timepoints T to tend to infinity as well. Thus, in order to talk about asymptotics, we restrict ourselves to an infinite dimensional normed space, which without any loss of generality, can be taken as \(l^{\infty }(\mathbb {N})\), the set of all uniformly bounded functions from \(\mathbb {N}\) to \(\mathbb {R}\). Also, while the number of timepoints T increases to infinity, the intervals between successive observations tend to zero, in order to make the total period of observation constant pertaining to most practical situations.

Let us denote the regularity measure \(R_{\delta _{n}}\) as a \(l^{\infty }(\mathbb {N})\) valued random element defined on some probability measure space \((\Omega , \mathcal {A}, \mathbb {P})\) such that \(R_{\delta _{n}}(t)\) is a \(\mathbb {R}\)-valued random variable denoting the regularity measure based on the \(2\delta _{n}\) subsamples centered at t for each \(t = 1, 2, \dots \infty\).

Theorem 3

Assume that under the null hypothesis there is no changepoint, the regularity measure \(R_{\delta _{n}}\) has an asymptotic distribution such that for some sequence \(a_{\delta _{n}} \rightarrow \infty\),

$$\begin{aligned} a_{\delta _{n}}(R_{\delta _{n}} - \mu \varvec{1}(\cdot )) \rightsquigarrow Z \end{aligned}$$

where \(\varvec{1} \in l^{\infty }(\mathbb {N})\) is the identity function \(\varvec{1}(x) = x\), Z is an infinite dimensional Gaussian process with mean function identically equal to 0 and covariance function \(\sigma : \mathbb {N} \times \mathbb {N} \rightarrow [0, \infty )\). Also, assume the following regularity conditions:

  1. 1.

    \(\mu \ne 0\).

  2. 2.

    For the proposed changepoint s, the series \(\sum _{t = 1}^{\infty } M_{\overline{\gamma _s}}(t)\) and \(\sum _{t = 1}^{\infty } M_{\overline{\gamma _s^C}}(t)\) are convergent.

Define,

$$\begin{aligned} b(s)= & {} \frac{\sum _{t = 1}^{\infty } M_{\underline{\gamma _{s}}}(t) }{\sum _{t = 1}^{\infty } M_{\overline{\gamma _{s}}}(t) }\\ \overline{b}(s)= & {} \frac{\sum _{t = 1}^{\infty } M_{\underline{{\gamma _{s}^{C}}}}(t) }{\sum _{t = 1}^{\infty } M_{\overline{{\gamma _{s}^{C}}}}(t) } \end{aligned}$$

Then, the exponential entropy based statistic \(H^{E}_{\Delta , \delta _{n}, w}(s)\) defined in Eq. 11 has an asymptotic distribution with the same normalizing constant, \(a_{\delta _{n}}\) such that as \(a_{\delta _{n}} \rightarrow \infty\),

$$\begin{aligned} a_{\delta _{n}}(H^{E}_{\Delta , \delta _{n}, w}(s) - H^{*}(s)) \rightsquigarrow Z^{*} \end{aligned}$$

where

$$\begin{aligned} H^{*}(s) = \left( 1 - b(s) \right) e^{b(s)} + (1 - \overline{b}(s))e^{\overline{b}(s)} \end{aligned}$$

and \(Z^{*}\) is a univariate normally distributed random variable with mean 0 and variance \(\sigma ^{*}\) where;

$$\begin{aligned} \sigma ^{*} = \sum \limits _{m = 1}^{\infty } \sum \limits _{n = 1}^{\infty } A_{s}(m)\sigma (m, n)A_{s}(n) \end{aligned}$$

and;

$$\begin{aligned} A_{s}(n)= & {} \left[ b(s)e^{b(s)} \left\{ \frac{\sum _{t = 1}^{\infty } M_{\overline{\gamma _{s}}}(n) M_{\underline{\gamma _{s}}}(t) - \sum _{t = 1}^{\infty } M_{\underline{\gamma _{s}}}(n) M_{\overline{\gamma _{s}}}(t) }{\mu \left( \sum _{t = 1}^{\infty } M_{\overline{\gamma _{s}}}(t)\right) ^{2}} \right\} + \right. \nonumber \\&\qquad \qquad \qquad \qquad \left. \overline{b}(s)e^{\overline{b}(s)} \left\{ \frac{\sum _{t = 1}^{\infty } M_{\overline{{\gamma _{s}^{C}}}}(n) M_{\underline{{\gamma _{s}^{C}}}}(t) - \sum _{t = 1}^{\infty } M_{\underline{{\gamma _{s}^{C}}}}(n) M_{\overline{{\gamma _{s}^{C}}}}(t) }{\mu \left( \sum _{t = 1}^{\infty } M_{\overline{{\gamma _{s}^{C}}}}(t)\right) ^{2}} \right\} \right] \end{aligned}$$
(14)

provided that the series expression of \(\sigma ^{*}\) is convergent.

For the detailed proof of the above theorem please refer to Appendix A.4. In the above Theorem 3, even if Z is not an infinite-dimensional Gaussian process but follows some other process, \(Z^{*}\) can be modified accordingly to accommodate such changes. In view to the functional delta method described in Römisch (2014), \(Z^{*}\) has the same distribution as \(\Psi ^{\prime }_{s}\vert _{\mu \varvec{1}(\cdot )}(Z)\). An immediate extension of Theorem 3 is the analogous result for multiple proposed changepoints, which can also be accordingly modified for the case when Z does not follow a Gaussian process.

Corollary 2 Under the same assumptions and conditions of Theorem 3, the vector of exponential entropy based statistic \(H_{\Delta , \delta _{n}, w}^{E}(s)\) for multiple proposed changepoints \(s_{1}, s_{2}, \dots s_{k}\) has the following asymptotic distribution as \(a_{\delta _{n}} \rightarrow \infty\);

$$\begin{aligned} a_{\delta _{n}} \left[ \begin{array}{cccc} H_{\Delta , \delta _{n}, w}^{E}(s_{1}) - H^{*}(s_{1})\\ H_{\Delta , \delta _{n}, w}^{E}(s_{2}) - H^{*}(s_{2})\\ \dots \\ H_{\Delta , \delta _{n}, w}^{E}(s_{k}) - H^{*}(s_{k})\\ \end{array}\right] \rightsquigarrow \mathcal {N}_{k}(\varvec{0}_{k}, \Sigma ^{*}_{k\times k}) \end{aligned}$$

where \(\varvec{0}_{k}\) is the k dimensional null vector, and the entries of the \(k\times k\) dispersion matrix are given as;

$$\begin{aligned} \left( \Sigma ^{*}_{k \times k} \right) _{(i, j)} = \sum \limits _{m=1}^{\infty } \sum \limits _{n=1}^{\infty } A_{s_{i}}(m) \sigma (m, n) A_{s_{j}}(n) \qquad i, j = 1, 2, \dots k \end{aligned}$$

In particular, if \(\vert s_{i} - s_{j} \vert > (4w + 2\Delta )\), then \(\left( \Sigma ^{*}_{k \times k} \right) _{(i, j)} = 0\).

The conditions mentioned in Theorem 3 hold for a multitude of regularity measures. In particular, for almost any usual two-sample test statistic, a suitable transformation can be performed to create a regularity measure satisfying the conditions. To illustrate the applicability of Theorem 3, we consider a simple example here. Take the two sample-based U-statistic

$$\begin{aligned} u_{\delta _{n}}(t) = \frac{1}{{\delta _{n}^{2}}} \sum \limits _{i=1}^{\delta _{n}}\sum \limits _{j=1}^{\delta _{n}} h\left( y_{t-i}, y_{t+j} \right) \end{aligned}$$
(15)

where \(h(y_{t-i}, y_{t+j}) = (y_{t-i} - y_{t+j})^{2}\). We assume that \(y_{1}, y_{2}, \dots y_{n}\) follow a normal distribution \(N(\mu , \sigma ^{2})\). Under the null hypothesis that there is no changepoint, \(y_{t-i} - y_{t+j}\) follows a normal distribution \(N(0, 2\sigma ^{2})\). An application of two sample U-statistics theorem Serfling (2009) can be employed to show that

$$\begin{aligned} \sqrt{2\delta _{n}} (u_{\delta _{n}}(t) - 2\sigma ^{2}) \rightsquigarrow N(0, 8\sigma ^{4}) \end{aligned}$$

Generalizing this for all timepoints t yields

$$\begin{aligned} \sqrt{2\delta _{n}} (u_{\delta _{n}} - 2\sigma ^{2} \varvec{1}(\cdot )) \rightsquigarrow Z \end{aligned}$$

where Z is an infinite dimensional Gaussian process with mean function identically equal to 0 and covariance function \(\sigma (t, t+s) = 4\max \{(2-\lambda ),0\}\sigma ^{4}\) for any s and t satisfying \(\lim _{\delta _{n} \rightarrow \infty }s/\delta _{n} = \lambda\). Also, one can easily verify that

$$\begin{aligned} \frac{1}{{\delta _{n}^{2}}} \sum _{i=1}^{\delta _{n}}\sum _{j=1}^{\delta _{n}} (y_{t-i} - y_{t+j})^{2} = s_{t,-}^{2} + s_{t,+}^{2} + (\overline{y}_{t, -} - \overline{y}_{t, +})^{2} \end{aligned}$$

where \(\overline{y}_{t,-}, s_{t,-}^{2}\) are respectively the sample mean and sample variance of \(\{ y_{t-\delta _{n}}, y_{t-\delta _{n}+1}, \dots y_{t-1} \}\) and \(\overline{y}_{t,+}, s_{t,+}^{2}\) are respectively the sample mean and sample variance of \(\{ y_{t+1}, y_{t+2}, \dots y_{t+\delta _{n}} \}\). This shows that the U-statistic considered in Eq. 15 is a suitable transformation of the mean difference statistic used for changepoint detection.

5 Simulation studies

In this section, we analyze the numerical performance of our proposed changepoint detection method. The section is organized as follows. First, we quantify how much Rough-Fuzzy CPD improves upon the usual changepoint detection methods (hereon, refers to as the base method) based on Eq. 13, using the same regularity measures. Then, we check the performance of our method under different parameters of the underlying true changepoint. Specifically, we address the effect of the fuzziness of the true change point and signal-to-noise ratio of the data on the performance of Rough-Fuzzy CPD as compared to the base methods. Following this, we provide an extensive comparison between the performances of the proposed method with various existing CPD methods. A variety of methods including standard methods for abrupt changepoint detection, multiscale methods, and different fuzzy clustering-based techniques have been chosen for this purpose. Finally, we perform a sensitivity analysis to study the dependence of the performance of our proposed method with varying choices of hyperparameters. All the following simulations and real data applications have been performed by the python package roufcp developed by us.

5.1 Improvement upon base method

To demonstrate how incorporating fuzzy rough set theory into the usual CPD methods improves its performance, we consider two variants of the same CPD methods. In the first variant, Eq. 13 is used to obtain the estimate of the changepoint based on the regularity measure R(t), whereas, in the second variant, the regularity measure is combined with roughness measures to estimate the changepoint using Eq. 12. To compare the performances, we consider three simulation setups of the form

$$\begin{aligned} y_{t} = \mu (t) + \epsilon _{t} \end{aligned}$$

where \(\mu (t)\) is the mean function dependent on time, while \(\epsilon _{t}\) denotes a standard Gaussian white noise process. Three different choices of the mean curve \(\mu (t)\) are assumed. As shown in Fig. 2a, scenario (S1) considers an abrupt changepoint in the mean function which corresponds to the usual definition of a changepoint. On the other hand, scenarios (S2) and (S3) consider gradual changes in the mean function, denoting a piecewise linear continuous jump and smooth jump respectively. Figures 2b and c show the mean functions and one simulated data for these scenarios.

Fig. 2
figure 2

Different Scenarios of changepoints with Changes in mean of the data

To establish the comparison, we consider three test statistics that are well used in a two-sample setup to detect the changes in the mean. The parametric statistics as the square of the two sample t-test statistic, nonparametric Kolmogorov-Smirnov test statistic, and Augmented Dickey-Fuller test statistic among unit root tests were chosen. In each case, the regularity measure R(t) is obtained by taking the reciprocal of the test statistics, to make sure R(t) has a higher value in absence of changepoints and a lower value in presence of changepoint. Both Eqs. 13 and 12 were used to obtain the estimates of the changepoints as mentioned before, and the process was repeated for 200 Monte Carlo resamples and used to obtain error measures. In applying Rough-Fuzzy CPD, \(\delta = 50\) was chosen to compute the regularity measure, while different values of w and \(\Delta\) were used in the experiment to see their effects.

Table 1 Comparison of Rough-Fuzzy CPD (proposed) vs usual (base) methods based on cost minimization for 3 types of mean shift changepoints

Table 1 summarizes the result for all three simulation setups described above. We begin analyzing the results with the changepoint of type (S1), where a discrete jump in the mean function has occurred. Note that, the hyperparameters w and \(\Delta\) control the amount of fuzziness and roughness to incorporate when detecting changepoints. Clearly, \(w = \Delta = 0\) would entail just a transformation of R(t) as the exponential entropy, and both the estimating Eqs. 13 and 12 would yield the same changepoint. Thus, in general, increasing w and \(\Delta\) would increase RMSE (root mean square error) if the true model has a discrete jump change in the mean function. Indeed, for \(w = \Delta = 5\), for the KS test, we have a \(74\%\) decrease in MSE while with higher values of w and \(\Delta\), we see higher RMSE for Rough-Fuzzy CPD compared to the base methods using t-test and KS-test. For ADF test statistic, however, the proposed method outperforms the base method across all hyperparameter values, achieving more than \(70\%\) reduction in MSE. Hence, we see it is important to use appropriate values of hyperparameters. Overall, for discrete jump type changepoints, Rough-Fuzzy CPD performs poorly, giving higher MSE than the base model for t-test and Kolmogorov Smirnov tests, and higher values of w and \(\Delta\) will decrease the estimation accuracy. This is because the fundamental assumption of the proposed method viz. “change is fuzzy in nature and not abrupt” is violated in this case. Now, we look at the other 2 types of changepoints where the change is not abrupt and occurs gradually over some time.

From Table 1 we observe the performance of Rough-Fuzzy CPD for situation (S2) with continuous change in mean function. For both the t-test statistic and the Kolmogorov-Smirnov statistic, the RMSE is much lower in Rough-Fuzzy CPD relative to the base method based on the regularity measure R(t). However, with an increase in w and \(\Delta\), the RMSE reduces first and then increases, possibly suggesting the existence of optimal hyperparameters in between. Because of the specific parametric setup, the Augmented Dickey-Fuller (ADF) test and nonparametric Kolmogorov Smirnov test generally perform worse than the parametric t-test. However, in practical applications when the data generating processes are not known, ADF and KS test statistics might perform better in conjunction with our proposed improvement. An interesting phenomenon occurs when we use the ADF test statistic-based regularity measure. As shown in Figure 3, the ADF test usually outputs two changepoints around the true changepoint at 666; but as the fuzziness is incorporated into the model by increasing w and \(\Delta\), the bimodal distribution gradually becomes unimodal. However, there possibly remains a negative bias in the estimation, as indicated by the mode of the density curve of the Rough-Fuzzy CPD for \(w = \Delta = 50\).

Fig. 3
figure 3

Base method along with proposed Rough-Fuzzy CPD based on the regularity measure with Augmented Dickey Fuller statistic for continuous jump changepoint (S2)

Turning to the situation (S3) with a smooth change in mean function, the results are found to be similar to those in scenario S2. Here also Rough-Fuzzy CPD has reduced the MSE by more than \(50\%\) in most cases. However, we see that for \(w = \Delta = 100\) the efficacy of the model is greatly reduced and it even performs slightly worse than the t-test-based method. Again, as mentioned earlier, it is of utmost importance to choose the values of the hyperparameters w and \(\Delta\) carefully to obtain accurate estimates of the changepoint even in situations where the mean function is gradual and the underlying assumption of the method is not violated. However, as shown by Table 1, Rough-Fuzzy CPD generally obtains a higher reduction in MSE for S2 (continuous jump) and S3 (smooth jump) compared to S1 (discrete jump). So, with a greater degree of ambiguity in the changepoint, the relative performance of our model increases as expected. We shall illustrate this phenomenon in greater detail and more rigorously in Section 5.2.

5.2 Effect of fuzziness of the true changepoint and SNR of the data

In our earlier simulations, scenarios (S2) and (S3) depict such situations where a gradual change has occurred. To check how Rough-Fuzzy CPD performs under different levels of graduality or “fuzziness” in change, we consider different cases of scenario S2 where the mean function is defined as follows;

$$\begin{aligned} \mu _{s,\Delta }(t) = {\left\{ \begin{array}{ll} 0 &{} t \le s_{0}- \mathcal {F}\\ \frac{t-(s_{0} -\mathcal {F})}{\mathcal {F}} &{} s_{0} -\mathcal {F} < t \le s_{0}+\mathcal {F} \\ 2 &{} t > s_{0}+\mathcal {F} \end{array}\right. } \end{aligned}$$

where \(s_{0}\) is the true changepoint and the parameter \(\mathcal {F}\) defines the degree of graduality or “fuzziness” of the changepoint. To see the effect of \(\mathcal {F}\) on the estimate of Rough-Fuzzy CPD, we consider 1000 resamples of scenario S2 with true changepoint at \(s_{0} = 666\), for 15 different values of \(\mathcal {F}\) ranging from 10 to 150. The MSE for each level of fuzziness \(\mathcal {F}\) is calculated using a Monte Carlo method, for the estimated changepoint obtained by Rough-Fuzzy CPD as well as the underlying base method with the Kolmogorov-Smirnov statistic.

Fig. 4
figure 4

Performance of proposed Rough-Fuzzy CPD and base method (KS-test)for different values of fuzziness of true changepoint

Figure 4 shows the RMSE of estimated changepoints by Rough-Fuzzy CPD and its base counterpart for different values of fuzziness \(\mathcal {F}\), along with the relative decrease obtained by Rough-Fuzzy CPD. As shown in Fig. 4a, a higher degree of fuzziness leads to higher error in estimation for both methods, though Rough-Fuzzy CPD is not as severely affected as the base method. Figure 4b depicts that the relative performance gain by the proposed rough-fuzzy improvement is only possible when the fuzziness in the true changepoint crosses a certain threshold, about \(\mathcal {F} = 31\) for our specific setup. However, as the fuzziness increases, due to the extremely sensitive performance of the base method, Rough-Fuzzy CPD could achieve nearly \(95-99\%\) performance gain in terms of MSE.

While detecting fuzzy changepoints is a hard problem, doing so in noisy data with a low Signal-to-Noise Ratio (SNR) is even harder. Signal-to-Noise Ration (SNR) is defined as the ratio of \(\mathbb {E}(S^{2})\) and \(\mathbb {E}(N^{2})\) where N is the noise component of the data and S is the true signal component of the data, which mainly comprises the mean function in the time series observations. To check the performance of Rough-Fuzzy CPD under different SNRs, we keep noise N the same as the standard Gaussian white noise and vary the value of signal S by changing the size of the jump in the mean function. To illustrate its effect, we consider scenario (S2) with continuous linear change in mean function, with different jump sizes ranging \(1/5, 1/4, \dots 1/2, 1, 2, \dots 10\). For each such setup, we use the Kolmogorov-Smirnov test statistic to estimate the changepoints using the proposed method and the base method for 1000 resamples, and Monte Carlo estimate of MSE is obtained. In each resample, the true changepoint value is kept fixed at 666. Figure 5 shows the variation of RMSE of the estimated changepoint for the proposed Rough-Fuzzy CPD as well as the base method. We observe that Rough-Fuzzy CPD has obtained lower RMSE than the base method for all values of SNR. While SNR increases, the predictive capabilities of both methods increase. However, it is important to note that while the base method performs very badly for lower values of SNR (\(< 1\)), the performance of Rough-Fuzzy CPD using the same regularity measure is not so severely affected. In fact, throughout the low and high values of SNR, the relative improvement of MSE achieved by Rough-Fuzzy CPD remains fairly uniform ranging from \(88-97\%\), with the highest reduction achieved when SNR is 2 or 3.

Fig. 5
figure 5

Performance of proposed Rough-Fuzzy CPD and base method (KS-test)for different values of signal-to-noise ratio

5.3 Comparison with existing methods

To compare the performance of the Rough-Fuzzy CPD with existing changepoint detection algorithms, we have chosen different changepoint detection methods. The chosen methods can be broadly classified into 3 main categories - state-of-the-art abrupt changepoint detection algorithms, fuzzy changepoint detection algorithms, and finally some other gradual changepoint detection algorithms. For the abrupt changepoint detection algorithms, we have chosen Binary Segmentation Scott and Knott (1974), Pruned Exact Linear Time (PELT) Killick et al. (2012) and Bayesian Online changepoint Detection (Adams and MacKay 2007). To compare the performance of the Rough-Fuzzy CPD with existing fuzzy changepoint detection algorithms, we have chosen three such methods, namely the FCP algorithm Chang et al. (2015) for regression models, Fuzzy shift changepoint (FSCP) algorithm Lu et al. (2016) and Fuzzy classification maximum likelihood changepoint (FCMLCP) algorithm Lu and Chang (2016). Finally, we have considered 3 gradual changepoint detection methods which do not use fuzzy methods. They are Multiscale Jump Point Detection (MJPD) Wu and Zhou (2020), Simultaneous Multiscale Changepoint Estimator (SMUCE) Frick et al. (2014) and the Heterogeneous Simultaneous Multiscale Changepoint Estimator (H-SMUCE) Pein et al. (2017). To compare the performances between Rough-Fuzzy CPD and these existing methods, we consider Rough-Fuzzy CPD with the tuning parameters \(w = \Delta = 50\) in combination with the standard t-test statistic. While the choice of parameters is subjective, sensitivity analysis (see Section ) for w and \(\Delta\) show that the performance of the proposed method is quite robust to the choice of these hyperparameters.

Since we are only dealing with gradual changepoint detection, we consider the mean functions from scenarios (S2) and (S3) for comparison. To perform a more comprehensive comparison, we consider 6 different models for the errors \(\epsilon _{t}\).

  • (Gaussian) \(\epsilon _{t} \sim\) iid \(\mathcal {N}(0,1)\)

  • (Gamma) \(\epsilon _{t} \sim\) iid Gamma\((\theta ,k)\) with \(\theta =k=1\)

  • (\(t-\)distribution) \(\epsilon _{t} \sim\) iid \(t_{5}\).

  • (AR) \(\epsilon _{t} = \sqrt{0.91} X_{t}\) where \(X_{t} - 0.3X_{t-1} = \eta _{t}\) with \(\eta _{t}\) being standard Gaussian white noise process.

  • (MA) \(\epsilon _{t} = X_{t} / \sqrt{1.25}\) where \(X_{t} = \eta _{t} + 0.5\eta _{t-1}\) with \(\eta _{t}\) being standard Gaussian white noise process.

  • (ARMA) Combining the AR and MA models above, we consider \(\epsilon _{t}\) generated from an ARMA(1,1) process with long run variance 1, i.e. \(\epsilon _{t} = X_{t}/2.142857\) where \(X_{t}-0.3X_{t-1} = \eta _{t}+0.5\eta _{t-1}\) and \(\eta _{t}^{\prime }\)s are iid \(\mathcal {N}(0, 1)\).

Table 2 Performance comparison of the proposed Rough-Fuzzy CPD (or RoufCP) with existing fuzzy changepoint detection methods under the simulation setups

Table 2 shows the RMSE in estimating the true changepoint. We see that for the continuous jump in mean (setup S2), the proposed method, RoufCP, performs best in 3 out of the 6 cases while for the smooth jump in mean (setup S3), roufCP performs best in 3 out of the 6 cases. We observe that for both (S2) and (S3), Roufcp has the lowest RMSE for Gamma, MA, and ARMA models. Furthermore, the proposed RoufCP method achieves the second lowest RMSE value for Gaussian and t-distributed errors under both setups (S2) and (S3), and for autoregressive errors for setup (S3).

5.4 Sensitivity analysis

The proposed method Rough-Fuzzy CPD depends on 3 main factors viz. the regularity measure R(t), the tunable hyperparameters w and \(\Delta\). w denotes the degree of roughness of the tolerance function with higher values indicating greater roughness. Similarly, the parameter \(\Delta\) determines the fuzziness of the membership function with higher values corresponding to greater fuzziness. While we do not have control over the regularity measure R(t) as it is exogenous to Rough-Fuzzy CPD, we need to choose the values of w and \(\Delta\) judiciously to obtain better estimates. To understand the effect of parameters w and \(\Delta\) we perform some sensitivity analysis.

Fig. 6
figure 6

Heatmap showing RMSE in estimating changepoint in scenarios S2 and S3 for different values of w and \(\Delta\)

We consider two special scenarios to study the effect of hyperparameters. Figure 6a shows the variation of RMSE for scenario (S2) with continuous jump in mean function and Fig. 6b shows the variation of RMSE for scenario (S3) with smooth jump in mean function with standard Gaussian white noise errors. In both the cases, we vary the ordered pair \(\langle w, \Delta \rangle\) in \(\{10,20, \dots , 140\}\times \{10,20, \dots , 140\}\) and estimate the changepoints using RoufCP. It is evident from Fig. 6 that there exists a wide range of optimal values of w and \(\Delta\), and too high or too low values of both these parameters together are detrimental to the efficacy of the model. The difference in the range of values of RMSE between Fig. a and b can be attributed to the difference in the mean curve of the data. As shown by using a shared color scale, Rough-Fuzzy CPD achieves a lower RMSE for scenario (S3) than in scenario (S2) with their corresponding optimal hyperparameter values. One possible reason could be the resemblance of the chosen membership function for Rough-Fuzzy CPD with the change in the mean function for the scenario (S3).

6 Application to real data

We consider three real data sets to show the performance of our method Rough-Fuzzy CPD over the usual regularity measure-based methods. Two popular benchmark datasets in changepoint analysis, namely “Flow of the River Nile” data and “Seatbelts” data regarding monthly Road Casualties in Great Britain \(1969-84\), collected from datasets package Durbin et al. (2001) in R, provide the first example. For the second illustration, changepoints are estimated from the time-varying reproduction rate from the coronavirus disease of 2019 (COVID-19) incidence dataset. The final example tries to analyze the gradual changes in the language in the speeches of the presidents of the United States of America. The last example illustrates the applicability of the proposed method Rough-Fuzzy CPD for multivariate datasets.

6.1 Benchmark datasets

The very well-known “Flow of the river Nile” dataset comprises measurements of the annual flow of the Nile river at Aswan from 1871 to 1970. The data has a possible changepoint near 1898 which is associated with the Fashoda incident (Bickley and Bates 1984) in the same year.

Fig. 7
figure 7

Performance of proposed Rough-Fuzzy CPD and base method (KS-test) on Nile dataset

The result for the Nile dataset is shown in Figure 7, where we use a transformation of the t-test statistic as a regularity measure. As shown in Fig. 7a, there are multiple minima detected by the base method, while the proposed rough-fuzzy improvement smooths out these minima, and the false positives are automatically removed. The estimated changepoint by our Rough-Fuzzy CPD turns out to be at 1902, which coincides with the completion of Zifta Barrage and Assiut Barrage Wegmann (1922) and is close to the commonly believed changepoint at 1898.

Another popular benchmark dataset “Seatbelts” consists of the number of drivers in Great Britain wearing seatbelts during the period January, 1969 to December, 1984. There are possibly two evident changepoints present in the data, the first one corresponds to the start of seat belt legislation movements from 1972 when the seatbelts were enforced compulsory in newly manufactured cars, and the second one corresponds to the compulsory enforcement of seat belt wearing while driving in 1983 van den Burg and Williams (2020).

Fig. 8
figure 8

Performance of proposed Rough-Fuzzy CPD and base method (KS-test) on Seatbelts dataset

Figure 8 explains the performance of Rough-Fuzzy CPD in detecting changepoints for Seatbelts data. Similar to Nile data, the regularity measure based on the nonparametric Kolmogorov Smirnov test tends to have too many local minima, each of which can be possibly thought of as an estimate of changepoint. However, in conjunction with the rough-fuzzy improvement, the prominent local minima appear in 1975 and 1983, both of which are close to the true changepoints.

6.2 COVID-19 data

The coronavirus disease of 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. First detected in Wuhan, China in December 2019, it has spread across the globe and has infected nearly 200 Million cases by \(4^{\text {th}}\) August 2021. There has been a multitude of epidemiological models trying to estimate the spread of COVID-19 in various countries and states (Purkayastha et al. 2020). However, many such models fail to predict the case counts accurately due to the frequently changing public health measures aimed at simultaneously controlling the spread of the disease and preventing economic crises. This makes the identification of changepoints in the epidemic data of particular importance. Another use of identifying changepoints in the spread of COVID-19 is inference regarding the efficacy of non-pharmaceutical interventions.

Such work has been done for some countries like USA Zhang et al. (2020) and Germany Dehning et al. (2020). To identify changepoints, Dehning et al. (2020) calculated the time-varying rate of transmission \(\beta (t)\) in the usual Susceptible Infected Recovered (SIR) model Dehning et al. (2020) and detected changepoints in \(\beta (t)\) through the course of the pandemic. For the SIR model, the transmission rate is defined as the rate of transmission of infection from an infected individual to a susceptible individual Dehning et al. (2020).

While the SIR model and its parameters are relatively straightforward to understand and easy to estimate, the dynamics of transmission of COVID-19 are much more complex due to the presence of factors like undetected asymptomatic transmissions, latency period, incubation period, delay in reporting, under-reporting, different transmission rates for asymptomatic and symptomatic individuals, quarantined and hospitalized individuals, and erroneous testing resulting in false negatives, to name a few Zimmermann et al. (2021). So, \(\beta (t)\) alone might not be a suitable representative of the spread of the disease.

So, we look at another related parameter which is the basic reproduction number \(R_{0}\). While the SIR model assumes constant \(R_{0}\) we estimate the time-varying reproduction number \(R_{t}\). \(R_{t}\) can be defined as the expected number of cases directly generated by one case in the population at time t Purkayastha et al. (2020). We estimate \(R_{t}\) using the EpiEstim package in R based on the incidence curve from \(15^{th}\) March, 2020 to \(1^{st}\) June, 2021 in India. The required data on the number of confirmed cases of COVID-19 has been collected from an API made by the volunteer-driven covid19india group. We then apply Rough-Fuzzy CPD on the estimated \(R_{t}\) to identify the changepoints. Note that modeling the changepoints in a fuzzy manner, instead of crisp, is logical and appropriate here as interventions and measures rolled out by the government are impossible to be implemented throughout a vast country like India instantaneously.

Fig. 9
figure 9

Changepoints in estimated \(R_{t}\) for India

Figure 9 shows the estimated changepoints along with the estimates of \(R_{t}\). We see that our method detects 7 changepoints on \(12^{\text {th}}\) April, \(28^{\text {th}}\) May, \(28^{\text {th}}\) July, \(14^{\text {th}}\) September and \(27^{\text {th}}\) November in 2020 and \(12^{\text {th}}\) February and \(23^{\text {rd}}\) April in 2021. The first changepoint (on \(12^{\text {th}}\) April, 2020) corresponds to the initial elbow in the estimate of \(R_{t}\) and it is very close to the end of the \(1^{\text {st}}\) Lockdown in India (on \(14^{\text {th}}\) April, 2020). This shows that the effect of the Lockdown was not immediate and it took the entire \(1^{\text {st}}\) Lockdown to bring down the value of \(R_{t}\) substantially and check the unrestricted spread of the disease. Further, the changepoint detected on \(12^{\text {th}}\) February, 2021, is very close to \(10^{\text {th}}\) February, 2021 which is regarded as the start of the \(2^{\text {nd}}\) wave in India Covid-19 in India (2021). Also, the changepoint detected on \(12^{\text {th}}\) April 2021, marks the date after which there is a substantial decrease in the value of \(R_{t}\). In fact, immediately after this date, the value of \(R_{t}\) comes below 1 which is followed by a sharp decline in new cases. Among the other detected changepoints, those on \(12^{\text {th}}\) April, \(28^{\text {th}}\) May, \(28^{\text {th}}\) July and \(27^{\text {th}}\) November are very close to the date of commencement of Lockdown 2, Unlock 1, Unlock 3 and Unlock 7 respectively. There are no detected changepoints near the other lockdowns and unlock phases. This implies that the effects of Unlock 1, Unlock 3, and Unlock 7 were much more pronounced on \(R_{t}\) than in the other nationwide intervention phases in India.

6.3 US President Speech Data

From the presidency of George Washington (April 30, 1789) to Joe Biden (current), 46 presidents of the United States have delivered countless speeches during their presidency to address contemporary political, economical, and social issues. Among them, more than a thousand speeches have been recorded in the UVA Miller Center database (Presidential Speeches 2021). Starting from George Washington’s first inaugural speech “Fellow citizens of the senate and the house of representatives, among the vicissitudes incident to life no event could have filled me...” to Franklin D. Roosevelt’s “iffy” or Donald Trump’s “covfefe”, the English language spoken by the US presidents have undergone countless changes. While some of the sophisticated archaic words have transformed into more natural colloquial synonyms, the topics addressed by the presidents also varied over time with relevance to contemporary political, economical, and social issues. This linguistic shift is obviously gradual in nature, hence the proposed RoufCP method can be used to segment the time from 1789 to 2021 into several homogeneous regions, where the languages are similar and are expected to have simpler homogeneous linguistic models. One important thing to note here is that the presidential speech dataset poses a particular challenge in identifying changepoints owing to its high dimensionality because of the huge number of words in the corpus. Such a challenge was not present in the other datasets we considered earlier.

Fig. 10
figure 10

Analysis of changepoint detection of US President speeches data

Figure 10b shows the gradual shift in usage of different words. A darker color indicates a higher incidence of a word in a particular period. The words have been chosen to represent different topics that presidents across different periods addressed in their speeches. The topics range from slavery, and war to domestic and international politics, economy, etc. The proposed RoufCP detects 3 changepoints in the years 1848, 1921, and 1957 respectively. Now, in Fig. 10b, we can see that before 1848 higher usage of words like “public”, “power”, “bank” etc. while immediately after 1848 speeches of presidents included words like “slave”, “Missouri” etc. in higher frequencies which corresponds to the movement against slavery and the eventual abolition of slavery in 1865. After that, the second changepoint in 1921 marks a shift in politics and a decrease in usage of words like “Panama canal”, “Mexico”, and “territory”. Finally, we can see that after the last changepoint in 1957, there is a stark increase in the frequency of words like “war”, “tax”, “America”, and “economy”, which is related to various events like the cold war and space race. In this way, using RoufCP, we can detect the major changepoints in US politics and linguistics of the English language from the Presidential speech dataset.

7 Conclusion

Though gradual changepoints are present in various time series data, they are usually overlooked. Our study aims to contribute in this direction by enriching the existing methodologies using the principles of the Fuzzy Rough set theory. The biggest strength of our approach is that it is independent of the base method, and thus any changepoint detection algorithm expressed as shown in Fig. 13 can be subjected to improvement by our proposal. Here, we have presented only 3 cases of base methods for computing the regularity measure - parametric two-sample test, nonparametric two-sample tests, and stationarity tests (unit root test). However, our choices are not limited to the aforementioned cases. Any method for detecting changepoint which provides any type of anomaly score or regularity score can be used in our framework with suitable transformation. Thus rough-fuzzy CPD allows one to utilize the rich collection of methods existing for crisp changepoint detection for the problem of fuzzy changepoint detection.

As various simulations show, even for very simple regularity measures, like t-test statistic and Kolmogorov-Smirnov test statistic, combining them with the rough-fuzzy CPD increases their efficiency by far. In all cases, except for the discrete jump models (where the assumption of fuzziness in changepoint is violated), rough-fuzzy CPD reduces the MSE in detecting the changepoints. In comparison to existing fuzzy and crisp changepoint detection algorithms, rough-fuzzy CPD turns out to be more efficient in the estimation of continuous and smooth changes in the mean. Furthermore, under a gradual change in the mean function, our method outperforms several existing changepoint detection methods by achieving smaller MSE, for different error distributions.

The asymptotic distribution mentioned in Theorems 3 and 2 connects the rough-fuzzy CPD to a hypothesis testing framework, allowing one to output the statistical significance of the estimated changepoints as well. Since Theorem 2 allows us to derive the joint asymptotic distribution of the proposed entropy at multiple time points, we can rule out the false positives and apply the rough-fuzzy CPD for multiple changepoint detection problems.

From a computational point of view, this paper addresses the issue of obtaining closed-form expressions for upper and lower approximations of the rough fuzzy partitions, described by an estimated changepoint, thus allowing the overall algorithm to be much faster. In general, the time and space complexity of the overall algorithm will be dominated by the cost of computing the regularity measures.

Our approach of using fuzzy and rough set theory has definite advantages which are reflected in the robustness of the rough-fuzzy CPD. Under changes in the signal-to-noise ratio, this method gives a steady improvement over the regularity measure-based changepoint detection. We observe that for a diverse range of SNR, the improvement in accuracy of estimates obtained by the rough-fuzzy CPD over that of the base model remains fairly uniform ranging from 88% to 96%. On the other hand, increasing the fuzziness \(\mathcal {F}\) in the continuous change in mean function increases the efficiency of the rough-fuzzy CPD in comparison to the base statistic used. As fuzziness increases, a rapid increment in the efficiency in estimation can be observed. Also, rough-fuzzy CPD rarely outputs an outlying or spurious false positive estimate of changepoint, precisely because the entropy curve \(H^{E}_{\Delta , \delta , w}(s)\) possesses more smoothness properties than its regularity measure R(t) counterpart. Even in the cases when the base model predicts a bimodal distribution of estimated changepoints, our model shrinks the two modes towards the true changepoint, as illustrated in Fig. 3. Finally, through sensitivity analysis, we have shown that the rough-fuzzy CPD performs reasonably well on a wide range of hyperparameter values.

On the flip side, there are some obvious limitations. The rough-fuzzy CPD may not perform well if the assumption of fuzziness in the true changepoint is violated (i.e., there is a discrete jump discontinuity in the mean function), or if the regularity measure R(t) is a bad indicator for the type of changepoint that we are trying to detect. Another limitation could be the choice of the hyperparameters \(w, \Delta\), and \(\delta\).

A future possibility for extension of this investigation may consider building a probabilistic view of the detected changepoint, which should enable one to provide a confidence interval for the estimate of the changepoint, which is more meaningful in terms of gradual change. Further, rough-fuzzy CPD can be viewed as an extension of an image segmentation algorithm Sen and Pal (2009) into the domain of changepoint detection. A future endeavor in this direction could be to generalize a family of image segmentation algorithms to fit perfectly in the context of a changepoint detection problem.

8 Software

For broader dissemination of our work, we have developed the python package roufcp which is available at pypi.org/project/roufcp/. All codes for the package has been open sourced and are made available at a github respository github.com/subroy13/roufcp.