A comparison of online methods for change point detection in ion-mobility spectrometry data ✩

When on-site classification of volatile organic compounds (VOCs) is required, a portable ion mobility spectrom- eter (IMS) is a suitable choice. However, the IMS readings often show transient phases before they stabilize. Even so the importance of transient phase and features extracted from it has been highlighted in the literature, it has not, to our knowledge, been used for IMS-based classification so far. This paper analyzes whether change point detection algorithms with low computational complexity can separate transient and stable phases in IMS readings. The algorithms were tested on IMS data from different types of mushrooms. All algorithms successfully detected switches from transient to stable phase. The most accurate results were provided by the previously proposed multivariate max-CUSUM algorithm and the matrix form CUSUM algorithm, which is developed in this paper.


Introduction
Change points are abrupt variations in time series data. Detection of these points is useful in modeling and predicting time series [1]. Change point detection algorithms are designed to find a time point where a process evolving in time has experienced a change. This time point indicates a change in a process generating the data points. Change point detection is widely used in quality control [2], navigation system monitoring [3], seismic data processing [4], medicine, etc. [5].
Different change point detection algorithms have been proposed in the literature [5][6][7][8]. Online algorithms are run in real-time while time series data are being measured. Offline algorithms are supposed to run after the whole data set has been collected. Online algorithms can be applied also offline after data sets have been collected.
Aminikhanghahi and Cook published a survey on algorithms for change point detection in time series data. The survey described application areas of change point detection algorithms, different supervised and unsupervised approaches, and accuracy metrics. For more details the reader is referred to [1].
Recent studies on online change point detection indicate that the likelihood and probabilistic approaches are the most attractive methods [9][10][11]. For example, in [10] the Bayesian online change point E-mail address: anton.kondratev@tuni.fi (A. Kondratev). 1 A. Vehkaoja and P. Müller contributed equally to this article. algorithm was adapted for detecting a behavioral change in daily water consumption time series. The daily consumption profiles were clustered for extracting main behavior patterns and feeding them into the general likelihood framework for sequential analysis. The proposed algorithm also accounts for variables that can influence transitions between states in time series. Another example is applying the Bayesian Change Point Detection (BOCPD) algorithm for assessing the impact of cracks on the structural safety of concrete dams in time [12]. The results of this work showed that BOCPD can successfully detect real-time crack behavior changes.
Another approach for change point detection is subspace identification. This type of algorithm is base on the idea that ''subspaces spanned by subsequences of time series data and the columns of the extended observability matrix are approximately equivalent'' [13]. Change point are detected by estimating a state-space model behind time series. The authors demonstrated that their method is highly accurate.
In the context of the present work, online change detection algorithms are needed for supporting classification algorithms. The IMS readings by an electronic nose (eNose) often display transient and stable phases [14]. A common approach with IMS data is to wait for the switch from transient phase to stable phase and to use only https://doi.org/10.1016/j.array.2022.100151 Received 5 November 2021; Received in revised form 8 March 2022; Accepted 1 April 2022 data from the stable phase for scent classification. However, this slows down the classification as one has to wait for the transient phase to end, which typically lasts from few seconds up to 1-2 min. However, information contained in the transient phase can be, according to some researchers, more valuable than information contained in the stable phase [15]. Features from the transient phase (e.g. derivatives, length of the transient phase, etc.) may help with classification. In order to use the full potential of information in the transient phase, it is essential to accurately and objectively determine the switch from transient to stable phase. Therefore, this paper studies online change detection algorithms to distinguish the transient phase from the stable phase in IMS readings.
In total five algorithms are discussed: Shewhart Control Charts, Cumulative Sum (CUSUM) and two Cumulative Sum (CUSUM) variants including one variant that, to our knowledge, has not been proposed before, and Bayesian Online Change Point algorithm. In this work we considered simple algorithms, with the Bayesian Online Change Point Detection algorithm being the most complicated. There are other modifications of the CUSUM algorithm, such as tabular CUSUM and CUSUM V-mask. The latter one uses a set of hyperparameters for which no definite rule exist on how to choose them. For that and other reasons using this algorithm is not recommended [8, p. 416]. The tabular CUSUM would require, for the problem at hand, to run a large number of algorithms in parallel and is therefore omitted from the consideration. There exists more sophisticated change point detection algorithms based on, for example, neural networks [16], but they are out of scope for this paper because only small number of data sets were available. Online algorithms described in this work have a short delay of 1 s because the data is preprocessed before feeding them into the algorithms. In this work, we are interested only in methods with low computational demand that enable online change point detection on hand-held devices with limited computational capacity.
For each algorithm the paper provides a brief explanation and discussion on its suitability for detecting switches from transient to stable phases. For suitable algorithms pseudo code is presented and they are then used for detecting switches from transient to stable phases in IMS readings collected from mushrooms, and their performances are compared. The IMS data as well as ready-to-use Python code for the tested algorithms is freely available at [17].

Electronic noses based on ion mobility
An eNose is a set of gas sensors that measure the ambient gas atmosphere, for example, scents, flavors, or non-odorous chemicals. eNoses are based on the general principle where changes in the gaseous atmosphere alter the sensor properties in a characteristic way, depending on the eNose technology and chemical sensor or sensor array used [18]. The sensors often consist of metal oxides, conducting polymers composites and intrinsically conducted polymers [18].
Ion Mobility Spectrometry (IMS) is a common eNose technique where ionized molecules are separated using an electric field and buffer gas. The molecules are headed into a drift-tube where they are ionized. Ions moving through the drift tube under impact of the electrical field are colliding with the molecules of the buffer gas, which causes them to slow down (i.e., change the mobility of the ions). The ChemPro100i eNose consists of several sensing areas (e.g., several metal-oxide sensors) and the velocity of the ionized particles determines at which sensing area they will be measured. The velocity correlates with the mobility of the specific ion. Several types of IMS devices exist and there are differences in the technical details of the devices including the number of the sensing areas and sensors used as well as, for example, electric field strength and drift gas temperature that do affect the data. A detailed discussion of IMS types can be found, for example, in [19].
In this work we used the ChemPro100i, which is an IMS-based eNose developed and patented by Environics Ltd. ChemPro100i was developed for detecting chemical substances such as warfare agents and hazardous gases in ambient air. It uses the so-called ''open-loop aspiration'' principle and uses Americium-241 source for ionization. IMS sensors have several favorable properties: they are light and small, which allows their use in portable eNoses. Their high sensitivity, low power consumption and low operating costs [20] make them a suitable choice for on-site measurements. One significant advantage of ChemPro 100i is that air can be employed as a carrier gas. Other mobility spectrometers traditionally use specific carrier gases [21]. Fig. 1 illustrates the workflow of the ChemPro100i.
Among other information like temperature and humidity the ChemPro100i provides electric current readings for 16 channels. However, the eighth and the sixteenth channels are control channels, which should be zero at any time. Of the remaining 14 channels, seven show positive and seven negative currents. The readings usually have transient and stable phases when a channel is reacting to a chemical, e.g., scent. The transient phase is defined as either uptrending or downtrending part of a particular sensor channel. The red dashed line in Fig. 2(a) separates these two phases. The transient phase is on the left of the red line and the stable phase is on the right of the red line. The measurements always contain noise. The magnitude of noise depends on from where the scent source was sampled. If a scent is measured from a controlled headspace, e.g., sealed flask the readings are in general rather stable (see Fig. 2(a) for a typical example) as the composition of the air sucked into the eNose is stable. Scents measured in a less controlled setting, e.g., from a plate in general yield more instable channel responses or cause no valuable responses at all (see Fig. 2(b) for an example). The reason for significant noise when measuring from plate is the presence of other scents in the ambient air and the more significant fluctuations in the composition of the air sucked into the ChemPro100i for analysis. Another problem is that the ChemPro100i does not react to certain scents. For example, the reading in Fig. 2(b) shows that the ChemPro100i possibly did not react (significantly) to the black chanterelle scent for the displayed channel. The change in readings is more than 10 pA if measured from flask ( Fig. 2(a)), whereas readings from plate fluctuated between 43.5 and 46.5 pA, indicating that the scent concentration was too weak for being (clearly) visible in the IMS reading. Note that Figs. 2(a) and 2(b) are illustrative examples and represent different channels.

Shewhart control charts
The Shewhart Control Charts algorithm is the simplest log-likelihood ratio based algorithm. The main idea of this algorithm is to capture an instance where a distribution changes its parameters. The log-likelihood ratio is defined as where 0 and 1 represent a set of parameters of a distribution before and after a change point. The natural logarithm function causes the ( ) function to be negative when the likelihood of the distribution with the parameters 0 is greater and positive when the likelihood of the parameters 1 is greater. This algorithm requires 0 and 1 to be known.
A. Kondratev et al. Based on our observation of multiple data sets from various scent sources we assumed the data in the stable phase to be approximately normally distributed with zero mean after differencing (i.e. discrete difference operation). Differencing means that instead of measured IMS values difference between two consecutive observations was used. This operation is often used in time-series analysis for making a time-series stationary [23]. The distribution in the transition phase is assumed to be normal with the parameters equal to the sample mean and sample standard deviation. The log-likelihood ratio for following the change in mean is defined as and is the size of the sliding window. A sliding window is a structure that rolls over a time series. It always contains the current and − 1 previous data points. This technique enables sampling and calculating statistical information for time series online, and can be used for smoothing time series. In [5] using the cumulative log-likelihood ratio where ℎ is conveniently chosen threshold (we used ℎ = 0). The typical decision function is shown in Fig. 3. In this work we did not use the Cumulative log-likelihood ratio (LLR). Instead, we used the sequence of log-likelihood ratio ( Fig. 3 left plot), as it is more convenient for detecting where the log-likelihood ratio crosses zero. The workflow of the Shewhart Control Charts algorithm is shown in Algorithm 1. When the loop (line 7) stops iterating the counter will contain the

Algorithm 1: Shewhart Charts
Input: s -length of sliding window, -initial sample of size s Result: time step at which distribution has changed detected time stamp of a change. The disadvantage of this algorithm is that it is difficult to choose threshold ℎ. Setting ℎ to zero causes the algorithm to indicate a change point too early. Instead of thresholding the value of the LLR-function, we add the size of the sliding window to the change point detected by the algorithm. This is based on the logic that when the algorithm has detected a change then it probably happened at the end or in the middle of the sliding window. Adding half of the sliding window was tested and resulted in too early detection. For scent classification, we trust measurements from stable phase more than measurements from transient phase. Therefore, it is less critical for scent classification if an algorithm places a change point slightly after rather than before the true change point. The sliding window is defined starting from the first point in a sequence. This algorithm needs to be run for each channel separately, which increases memory consumption and calculation times. The final decision is when majority of the channels yield detected change point by using the average of all detected change points. The Shewhart Charts can be implemented using matrices that enable calculation of all the channels simultaneously.

CUSUM
Cumulative Sum (CUSUM) was proposed by Page in [24]. The CUSUM method is a popular algorithm in statistical quality control and in change point detection. Many extensions based on the CUSUM have been proposed [6,8,25].
The core of the algorithm is the same as in the Shewhart Chartslog-likelihood ratio test, but decision whether a change point is found differs. The LLR-value is calculated by (2). The decision about a change point is made by comparing cumulative LLR-values with the minimum LLR-value found in the previous iterations. The decision function is defined as where is the cumulative LLR, is the minimum LLR-value found in the previous iterations and ℎ is a conveniently chosen threshold. (5) can be rewritten as In the algorithm implemented in this work zero threshold was used. Therefore, (6) simplifies to = ≥ .
As in the description of the Shewhart Charts, we set the threshold to zero and add the size of the sliding window to the found time stamp. Figs. 4(a) and 4(b) show the typical behavior of CUSUM's decision functions. The first plot ( Fig. 4(a)) shows the decision function for artificially generated data, where distribution has changed from (0, 1) to (5, 1) at 150 s. The second plot ( Fig. 4(b)) shows the decision function for one channel, picked as an illustrative example, in the data explained in Section 4. The red dashed line depicts the time stamp where the algorithm has found a change point. The last plot ( Fig. 4(c)) shows IMS readings on the channel and change point found by the algorithm. Algorithm 2 shows pseudo-code for the CUSUM approach. As can be seen on line 13 LLR-value is stored into the array and the difference of the current LLR-value and the minimum value in the array is compared to zero (line 14). If the difference is greater than zero then a change point is found.

Multivariate Max-CUSUM Chart
This algorithm was proposed in [6]. The paper demonstrates how testing multivariate normal data with CUSUM will be reduced to univariate classic CUSUM testing.
The classic scheme for CUSUM works here as well: where The Multivariate Max-CUSUM algorithm calculates all channels simultaneously. As mentioned previously the data points are run through the differencing operation before the algorithm. Pseudo-code for the Multivariate Max-CUSUM algorithm is shown in Algorithm 3.

Matrix form CUSUM
We propose the matrix form CUSUM as a simple multivariate extension of the classic CUSUM algorithm. All calculations are the same as in the classic CUSUM, but performed in matrix form. This approach shortens computation time significantly and has approximately the same accuracy as the one-dimensional CUSUM. The algorithm can be implemented in two ways, providing either a uniform change point that is the average of detected change points over all channels or separate change points for each channel. Averaging detected points over all channels can result in less precise change points but increases robustness to channels that did not react to a scent and do not contain any change points. This approach only fails if the majority of channels fails or when the algorithm detects change points on multiple channels that are far from the real change points. Analysis of the data set collected for this paper and the data set used in [14] showed that both situations are improbable. Therefore, in this paper the Matrix form CUSUM algorithm using averages of detected change points is used.
be the matrix containing readings from all channels, where each row represents a channel and is the size of a moving window. That is, , is the th reading of the th channel. The vector of sample means for each channel is calculated as follow The standard deviation for one dimension is calculated as The one-dimensional standard deviation can be converted into the matrix form as follows: Then the fraction under the square root in (13) is calculated by which can be rewritten in more compact form as Eventually, the -dimensional standard deviation is Notice that in (17) square and square root are calculated for each entry of the matrix. The log-likelihood ratio (LLR) for CUSUM mean shift for one dimension [5] is calculated as The same LLR is used for the Matrix Form CUSUM where: Notice is scalar and 1 is the LLR calculated for the first sample of size . Symbol ⊙ in (20) denotes element-wise multiplication.
The decision function is calculated as and ℎ is a conveniently chosen threshold. The right side of (24) represents the minimum value of all previously calculated LLR. Each entry of vector is the LLR for one channel. That is, each entry of the vector is the channel-specific minimum LLR-value. If the algorithm has detected that calculated LLR-value is less than the value in this vector, the corresponding entry of the vector will be replaced. The averaging operation of the vector in (23) means averaging of all entries in the vector. Sequential calculation of this log-likelihood ratio yields the same result as in the one-dimensional CUSUM, but for all channels simultaneously. Vector in Algorithm 4 contains the minimum values over all previous LLR for each channel. The entries of vector need to be updated at each iteration.

Bayesian online change point detection
Bayesian online change point detection (BOCPD) was proposed in [26]. The idea of BOCPD is to detect a change point in terms of so-called run lengths. The concept of this algorithm is shown in Fig. 5. Whenever a new measurement is available the algorithm calculates the probability that the corresponding run length grows by one. If the probability of change is greater than the probability of growth then the run length drops to zero and a change point is detected. Fig. 5(a) has three partitions. The partitions are separated by change points. The fourth point in the partition 1 is the last before the change point occurs. Before this point, as can be seen in Fig. 5(b), the run length grows. The fifth point, which belongs to the partition 2 originates from another distribution. This fact causes the run length to drop to zero. After arriving at a point the algorithm performs four steps: probabilities of change at times 1 , … , . The remaining columns represent probabilities of different run lengths at different times. It is not necessary to keep the full matrix in the memory. Instead, only the results of the last iteration must be stored. In this work the t-distribution was used as underlying probability distribution (UPM) because the Normal distribution showed poor detection results. The t-distribution was used because and 2 of the data were unknown. As a result the conjugate prior for estimating both parameters is the Normal-Gamma distribution [27]. In this case, the posterior predictive distribution is the generalized t-distribution with = 2 degrees of freedom, =̄sample mean and 2 = ( + 1) variance. The Normal-Gamma distribution has parameters , , , , which must be initialized before the first iteration. The updating parameters are updated as follow: The first step at time is calculating the posterior predictive probability for any possible run length by where represents the run length. This probability is calculated using the t-distribution with parameters , and 2 . In the second step the probability of growth for each run length is calculated as where is a hazard rate. The hazard rate may represent prior knowledge on how often change points occur. More explanation about the hazard rate can be found in [28]. In the third step the probability of a change point is calculated, i.e. run length being zero, as The distribution of run lengths needs to be normalized by Lastly, the parameters of the distribution are updated with (25)- (28).
In the implemented algorithm a matrix R was used for keeping probabilities of growth and change point. Rows of the matrix R represent time steps and columns represent joints of the trellis described in the original paper. The matrix R was used only for illustrative purposes. The implementation of this algorithm can be found in the supplementary material.
One obstacle encountered in the implementation process was the initialization of the parameters for the Normal-Gamma distribution. Several sources simply set the all initial parameters to 1 and the hazard rate to the prior belief on the frequency of change points, but no justification has been given in the literature. A quick study [25, p. 37] using the data sets [14] revealed that setting and to one, to sample mean and being equal to the hazard rate ensures that the algorithm always finds meaningful change points. Initializing all parameters to 1 results in failing to find any change point.

Data
Measurements from black chanterelle (Craterellus cornucopioides), yellowfoot (Craterellus tubaeformis), and a mix of both species were collected. The mushrooms were air dried. No additives (e.g., salt or water) were used in the process.
All scent sources were measured with a ChemPro100i both from an open plate and a sealed flask at 1 Hz. For each scent source 5 sets of 5 min were measured, meaning that the database contained a total of 30 data sets. Between measurements of two sets a break of 3 min was taken. The breaks were needed in order to flush the drift tube with ambient air until the IMS readings returned to the baseline. The ground truth points were selected manually by visual inspection of the time series plots.
The 30 data sets were then classified as either good, bad or ambiguous. A data set was characterized as good if it contained IMS readings with clearly visible transient and stable phases. A data set was bad if it did not contain any clearly visible phase changes. A ambiguous data set contained both channels with clearly visible phase changes and channels without clearly visible phases. The data is available at [17].

Results
All considered algorithms were implemented using Python language. The source files, the data sets and the supplementary material are available for downloading at [17].
Performance of each algorithm was measured using the Mean Absolute Error (MAE) metrics, which is calculated as where is the ground truth value and is the value returned by an algorithm. Table 1  The relative running times are calculated as ratios with respect to the fastest algorithm, the Matrix Form CUSUM algorithm. As can be seen all algorithms, except for Bayesian Change Point, have relatively small running times. The Bayesian is over two thousand times slower than the Matrix Form CUSUM. This can be partly explained with sliding window. The Bayesian algorithm does not use sliding window. That is, it calculates every subsequent data point. All implemented algorithms run faster than the sampling rate of the ChemPro100i. This means that all five algorithms would enable online detection and that there would be still time for preprocessing for future studies. Readings in Fig. 7(a) are from a data set classified as good. The algorithms detected the change point quite accurately. As can be seen only CUSUM placed the change point slightly later. The readings shown in Fig. 7(b) are from a bad data set, which was measured from plate. Almost all data sets classified as bad were sampled from plate. The   reading in Fig. 7(b) shows that there are no visible change points. The algorithms react to local trend changes. A possible explanation of such a reading result is that the scent concentration in the air sucked into the eNose was too low to cause a significant change in the measured currents. Fig. 7(c) shows mixture of yellowfoot and black chanterelle measured from a sealed flask. The readings show that the mixture has transient and stable phases. The algorithms performed well for the channels with multiple change points. There are multiple, possible change points visible in Fig. 7(b), which means that it has been difficult for the algorithms to find the correct change point. The ambiguous class of the data sets contains channels with both clear change point and obscured one.
Multiple data sets measured from table have channels containing only binary noise as in Fig. 8. Also for these channels ground truth points were marked and the detected change points contributed to the MAE-scores. Even so there are no change points, the algorithms still yield random points, which affects MAE-scores of a particular data set. To overcome this problem the five algorithms were modified. The modified algorithms calculated differences between the maximum and the minimum value for each channel based on the initial sample. If this difference was less than 0.05 pA then the channel was excluded from the calculation. The threshold value 0.05 was chosen based on an analysis of all the data sets, which can be found from the supplementary material. Table 2 summarizes results over all data sets. The table shows the best algorithm with its MAE-scores (Eq. (33)) for three sliding window lengths. As can be seen from the table Multivariate Max CUSUM and Matrix Form CUSUM have often lower MAE-scores, which means that they yield change points that are closer to the ground truth point than those yielded by the remaining three methods. It is crucial, however, to keep in mind that ground truth points were selected based on intuition. Thus, their accuracy cannot be calculated.
The supplementary material shows that in case of visually detectable change the algorithms generally perform very well. The size of the sliding window plays a big role in the accuracy except for the BOCPD, which does not use a sliding window. For example, the Shewhart Charts, the CUSUM and the Matrix Form CUSUM often perform better with the largest size of the sliding window. The BOCPD algorithm often works with only clear change points with acceptable results.
In case of unclear change points the MAE scores might not be accurate because ground truth points are subjectively chosen and might A. Kondratev et al.

Table 2
The best algorithms for each data set. The column ''quality'' indicates whether a data set was classified as either good, bad, or ambiguous. The column ''best algorithm'' contain names of the algorithms that performed best for respective data set and window size. The letter ''e'' in parenthesis means that modified algorithm, which uses exclusion of problematic channels, was the best option. The letter ''a'' means that this algorithm used all channels. not represent the true ground truth. The second problem is that some channels do not react to certain scents as seen in Fig. 2(b). However, the ground truth points were selected for such channels and compared to the results of the algorithms. The ground truth points for such readings were selected according to the first local peak or by setting them close to ground truth points of other channels. For addressing this problem we modified the algorithms to reject channels, which possibly contain only noise. The drawback of this technique is that it might affect the accuracy in case of channels that do not react immediately to a presented scent. The third problem is that for samples measured from table the channels 14 and 15 almost always contain binary noise (Fig. 8). We could not extract any valuable information from such channels. Nonetheless, ground truth for such channels was determined and the found change points contributed to the MAE metrics. For addressing the last two problems we modified algorithms to reject channels that did not contain much variability within the initial sample. However, by rejecting channels with small variability in the initial sample we could potentially loose channels that have slow reaction to a scent. As for the computational demand, all algorithms ran faster than the sample rate of the ChemPro. The Bayesian Online Change Point Detector was the slowest of all considered algorithms and the Matrix Form CUSUM was the fastest one. However, all algorithms were run on a laptop. Based on the average run times presented in Table 1 it is save to assume that all algorithms, except for the Bayesian approach, will enable real-time change detection even on embedded devices with less computational capacity if the sampling rate is kept at 1 Hz. For the Bayesian approach it depends on the exact computational capacity of the embedded device whether real-time change detection is possible.

Discussion and outlook
In this paper we implemented five algorithms for detecting change points in the ion mobility spectrometry readings. These algorithms are very simple and easy to implement. Focusing on more simple algorithms allowed us to evaluate them on more thorough manner and decide how such approaches are suitable to ensure classification methods.
The Shewhart Control charts, CUSUM and Bayesian Online Change Point detection algorithms performed reliably but must be run on each channel separately. The BOCPD generally worked well in the case of clearly visible change points. Thus, these algorithms suited our needs and may be used for change point detection of IMS readings. The Multivariate Max-CUSUM and Matrix Form CUSUM that calculate all channels simultaneously turned out to be the fastest and the most reliable of all considered algorithms.
In case of visually detectable change points all algorithms performed well. In other cases the algorithms detected the first peak in the readings, which can be possibly a change point. Using change detection algorithms for automatic detection of stable phase removes subjectivity from labeling change points, but might introduce also some false detections as they always pick one point as change point. This drawback was circumvented by adding a pre-processing step that checked whether a channel reacted to a measured scent.
Besides the subjectivity of choosing ground truth there was a problem with the data sets measured from a table. Very often in such data sets the channels 7, 14 and 15 showed a binary noise. We do not have explanations on that behavior. Again, using a pre-processing step to eliminate such channels can solve this issue.
Generally all algorithms performed well and detected change points. Based on the results we recommend using Matrix Form CUSUM and Multivariate Max-CUSUM with IMS data. By running algorithms online we can automatically separate the transient phase from the stable phase and use features of the transient phase to strengthen and speed up the classification algorithms. For example, such features can be: length of transient phase, variance, derivatives, etc. Future research of change point detection can be done using more complex algorithms, such as subspace identification [13], neural networks [16] and time series analysis [29].