TONTA: Trend-based Online Network Traffic Analysis in ad-hoc IoT networks

Internet of Things (IoT) refers to a system of interconnected heterogeneous smart devices communicating without human intervention. A significant portion of existing IoT networks is under the umbrella of ad-hoc and quasi ad-hoc networks. Ad-hoc based IoT networks suffer from the lack of resource-rich network infrastructures that are able to perform heavyweight network management tasks using, e.g. machine learning-based Network Traffic Monitoring and Analysis (NTMA) techniques. Designing light-weight NTMA techniques that do not need to be (re-) trained has received much attention due to the time complexity of the training phase. In this study, a novel pattern recognition method, called Trend-based Online Network Traffic Analysis (TONTA), is proposed for ad-hoc IoT networks to monitor network performance. The proposed method uses a statistical light-weight Trend Change Detection (TCD) method in an online manner. TONTA discovers predominant trends and recognizes abrupt or gradual time-series dataset changes to analyze the IoT network traffic. TONTA is then compared with RuLSIF as an offline benchmark TCD technique. The results show that TONTA detects approximately 60% less false positive alarms than RuLSIF.


Introduction
IoT is increasingly recognized as a networking paradigm connecting heterogeneous smart devices without human intervention [1,2].The heterogeneity can be in different aspects such as energy, processing power, and communication protocols [3].IoT can be established based on two types of network infrastructures including: () Infrastructuredependent networks: Some fixed devices are in charge of connecting end devices, e.g.routers, switches, and Base Stations (BSs).and () Infrastructure-independent or ad-hoc networks: No fixed node is available to establish the network infrastructure.Instead, some resource-rich endnodes temporarily work together to establish the network infrastructure and connect other nodes.
A significant portion of real-world IoT networks is under the umbrella of ad-hoc networking.In several IoT applications, the nodes use conventional ad-hoc communication protocols, e.g.Zigbee, or use Device to Device (D2D)-enabled communication protocols, e.g.WiFi-Direct and Cellular-D2D [4,5].Fig. 1 shows an example of ad-hoc IoT networks.Considering data forwarding tasks in ad-hoc IoT, existing techniques have been almost successful in establishing an efficient network infrastructure, e.g.routing and clustering techniques [6][7][8].On the contrary, in case of NTMA, ad-hoc IoT is not mature yet [9].
Monitoring and analyzing the network traffic is a key requirement in such networks like many other conventional networks.NTMA is a broad topic, including sub-fields of traffic classification, traffic prediction, * Corresponding author at: Department of Informatics (IFI), University of Oslo, Oslo, Norway.E-mail address: am.shahraki@ieee.org(A.Shahraki).fault management, and network security, to name a few.Network Traffic Analysis (NTA) is a sub-field of network traffic prediction [10,11] which plays an important role in monitoring the network traffic behavior and efficiency.In addition, it can detect unusual events or gradual departures from a normal operation by analyzing network traffic flow characteristics.Generally, given the size and complexity of data and the dynamicity of the network, NTMA tasks are considered heavyweight.
In Infrastructure-dependent IoT networks, resource-rich devices, e.g.routers and switches are to perform heavy tasks of monitoring and analyzing the network traffic.On the contrary, in infrastructureindependent networks, most of the infrastructure nodes have not enough resources to perform such heavy NTMA tasks.Although existing NTMA solutions have good performance, most of them are based on Machine Learning (ML) techniques [9].ML-based NTMA techniques suffer from some disadvantages in the case of ad-hoc IoT as indicated below [12]: • Unlike other ML applications, in network management solutions, ML techniques should be retrained frequently due to the high dynamicity of networks.Some events can cause ML models retraining [13] due to daily training [14], changes in analysis requirements (by the NTMA platform itself or human) [15], changes in the network traffic behavior [16], and starting a new network traffic (the ML model should be trained from scratch  techniques need a considerable amount of resources to be trained.
Training the ML models is also a time-consuming task that cannot support time-sensitive IoT applications.• Conventional supervised learning techniques are efficient only in the case of labeled data, but in real-world applications of networking, most of the data is unlabeled or semi-labeled [17].
Deep Learning (DL) is the most well-known ML technique to work with unlabeled data [18], but it has some drawbacks, mainly it demands high resources and time to train the DL model.
Based on the above disadvantages, ML-based NTMA techniques are considered resource demanding for ad-hoc resource-constrained IoT.To monitor networking efficiency, we need to invest in more lightweight methods.Compared to ML techniques, statistical techniques are more lightweight NTMA methods that generally do not need to be trained.The network traffic characteristics, e.g.delay, throughput and jitter can be presented as time-series datasets as they are extracted gradually from the network in a period of time.In statistical techniques, pattern recognition methods are the most common solution to discover trends of time-series datasets [19,20].Network traffic pattern recognition is a broad topic and includes some sub-techniques [21], e.g.anomaly detection and trend change detection.TCD is a solution generally designed for discovering trends of time-series datasets.TCD techniques also include various sub-techniques, e.g.time series segmentation and Change Point Detection (CPD) [22].Among various TCD sub-techniques, CPD techniques are the most common online solutions to identify changes in gradually-generated time-series datasets [23].CPD techniques process the time-series dataset online to detect Change points (CPs) (as data points) that can divide the datasets into subsets with predominant trends.Compared to ML techniques, CPD techniques are very light-weight [24].In addition, they do not need to be (re-) trained.
We remedy the lack of lightweight NTA techniques by proposing a network traffic monitoring technique based on the online TCD technique.The proposed technique determines the behavior of IoT network traffic by analyzing network management characteristics gathered as time-series datasets.The proposed method, called TONTA, can be used by all nodes in charge of forwarding data, e.g.middle nodes and gateways.To analyze the network traffic based on TONTA, we consider two steps: () determining boundaries of network traffic behavior changes by recognizing CPs of the time-series dataset, and () identifying the trends between every two continuous boundaries as the network traffic behavior.The methods introduced in [25,26] are the cornerstone of TONTA, but it includes fundamental improvements compared to the above works.The key contributions of this paper include: • Proposing a new TCD method that does not need (re)training.
• Designing a NTA method to recognize abrupt and gradual changes in the network traffic in an online manner.• Formulating a new model for a dynamic-sliding window to process datasets gathered from ad-hoc based IoT networks to improve the time complexity and accuracy of the proposed algorithm.
The rest of the paper is structured as follows.In Section 2, related work is discussed.In Section 3, we present TONTA, while its efficiency is evaluated and compared with Relative unconstrained Least-Squares Importance Fitting (RuLSIF) in Section 4. Finally, Section 5 concludes the paper and highlights the future directions.Table 1 shows the acronyms used in this paper.

Related work
There are some survey papers reviewing the NTMA techniques, e.g.[9,27].To support QoS, the network infrastructure should be monitored periodically to detect abnormal or abrupt network traffic changes [28,29].In terms of traffic prediction, both classic forecasting methods and machine learning techniques are introduced.As the statisticalbased methods have lower overhead compared to ML techniques, they have attracted much attention in network monitoring.In [30], the authors evaluate the three types of predictors including classic time series, artificial neural networks and wavelet transform-based predictors through real network traces.They show that Double Exponential Smoothing (DES) as a classic time-series predictor has a good trade-off between performance and cost overhead.In [31], the authors have discussed about statistical methods for network surveillance from different aspects, e.g.network security, data network and dynamic networks.
In case of classic forecasting, Auto Regressive Integrated Moving Average (ARIMA) is one of the most well-known techniques [32].In [33], Mehdi et al. introduce FARIMA based on combining the ARIMA and fuzzy regression to forecast traffic in cloud computing.They reduce the need for historical data by ARIMA using the fuzzy regression and also propose SOFA as an sliding window to improve the accuracy of the prediction.The authors in [34] apply fractional ARIMA to predict the network throughput in adaptive video based on HTTP.In [35], the authors introduce a ARIMA-based modeling for non-Gaussian traffic data to manage traffic congestion by predicting abnormal status.They use R for data preprocessing and use the ARIMA to analyze and predict the traffic.In [36], Schemidt et al. introduce an unsupervised detection approach for cloud monitoring based on an online ARIMA prediction technique for online data monitoring to detect degraded state anomalies.Their results show that online ARIMA is efficient in case of anomaly detection with high accuracy and low number of false alarms.In [37], the authors introduce a network traffic prediction technique based on combining Discrete Wavelet Transform (DWT), ARIMA and Recurrent Neural Network (RNN).They use the proposed model to predict the network traffic based on time-series dataset to improve Quality of Service (QoS) and reduce cost.Statistical based techniques are also used for supporting QoS.In [38], the authors use the first and second order statistics to propose a metric to predict amount of data that can be delivered through a shared band of a spectrum to guarantee QoS statistically.They introduce a probabilistic link capacity (PLC) metric based on a distributed robust data-driven approach.They use the estimation errors to evaluate the uncertainty of the statistics.In [39], the authors introduce a network monitoring system based on extensible SDN and NFV.Their proposed monitoring mechanism is based on a statistical data analysis technique by NetFlow to collect information through virtual routers.
Anomaly detection as an sub-field of NTMA is very popular for detecting abnormal changes in computer networks [40,41], but they are generally used to improve security of the network.Anomaly detection methods monitor the nodes and also traffic flows of the network to evaluate the efficiency [42,43].Although anomaly detection methods are generally used to improve security, monitoring the network to improve QoS is another task which can be performed by them [44].Flow-based anomaly detection methods determine the behavior of packet streams to detect abnormal or abrupt changes [45].Several techniques are introduced to detect anomalies based on analyzing traffic flows in a network [46,47].
Although different statistical methods are used to improve the performance of networking, CPD techniques have attracted much attention.In [48], the authors proposed a model to transform traditional IP flow analyses to stream-based IP flow analysis to improve security and real-time situational awareness with high throughput, low latency, and good scalability.The authors in [49] introduce a model for continuous detection of performance anomalies based on the property of streams which has two levels.First, they use an adaptive learning model to estimate an underlying temporal property of the stream.Then, they use robust control charts based on statistics to recognize deviations.In [50], the authors use a non-parametric model to describe the congestion times during transferring a file using a CPD method.They use the proposed method to create time periods, dividing data into subsets and put each subset on the time periods called segments to analyze the subsets.
TCD is also used in anomaly detection techniques.In [51], the authors introduce a method to detect DDoS attacks based on extracting HIAD form IP based network flows and analyze it using trend change detection methods.In [52], M. Alkasassbeh proposed a new prediction model to estimate network traffic and use a CPD method to detect DDoS attacks in computer networks.
In [53], the authors introduce an anomaly detector method in networks based on time variations and correlation among statistics gathered from the network as time-series datasets.Table 2 compare the literature as reviewed in this section.

Proposed method
Different QoS metrics can be gathered from the network to collect network traffic characteristics, e.g.throughput, delay and jitter.We focus on packet delay (briefly delay) as an important metric in time-sensitive IoT applications.One-way Delay (OWD), also called end-to-end delay, is an important QoS metric to reveal network traffic behavior changes as a time-series dataset [54,55].A time-series dataset follows various types of trends, but mainly, each of them can be uptrend, downtrend, or side-way trend.In OWD time-series datasets, uptrend shows the delay of the packet stream is increasing and QoS cannot be provided after a while.A downtrend shows the delay is decreasing and some forwarding resources will be free.A side-way trend shows the packet stream behavior is stable and no important change is happening.TONTA follows some steps to discover the network traffic behavior as listed below: 1.As the first-step of the data pre-processing, before each trigger, TONTA waits for enough new data points (i.e.OWD raw data) to perform a sampling method.After gathering enough raw OWD data, the sampling method is performed to calculate a data point, called sampled data point.TONTA appends the sampled data to the end of the time-series dataset.We consider that the OWD raw data is extracted by Shallow Packet Inspection (SPI), but performing SPI is out of the scope of this study.When enough sampled data is appended to the time-series dataset (_ ℎ), TONTA proceeds to the next step.2. TONTA adapts the size of a dynamic-sliding window to select an adequate number of data points from the end of the time-series dataset.3.As the second-step of data pre-processing, a smoothing method is performed on the subset to improve the quality of data and eliminate the impact of variety of packet generator distribution functions.4. TONTA identifies a temporary CP in every selected subset based on a novel CPD method and divides the subset into two new temporary subsets called ( Sec_1 and Sec_2).

The temporary CP is evaluated by a novel Curve Detection
Method (CDM) as the first-step of CP verification.Curves are intermediate data points transforming two continuous predominant trends, but they are not a new trend as shown in Fig. 3.
The CDM method helps to avoid selecting repetitive CPs or any data point in curves as CPs.Combining traffic prediction and changing detection.
6.As the third-step of data pre-processing, an outlier detection method is performed to remove outliers of the ( Sec_1 and Sec_2) to prepare the subsets to compare.7.As the second-step of CP verification, two outlier-free subsets are compared to determine the importance degree of the temporary CP based on intensities of their trends.If the importance degree of the temporary CP passes a threshold (_ ℎ), the CP turns into a permanent CP, otherwise TONTA stops working and returns to the first step, waiting for new OWD data gathered from the network.8.If the temporary CP turns into a permanent CP, intensity of Sec_2 is determined as the current network traffic behavior.9.By selecting the permanent CP, Sec_1 and all data points before that are removed from the dataset to avoid re-processing.
Fig. 2 shows the activity diagram of TONTA and how its steps identify the network traffic behavior in an online manner.To adapt TONTA with the properties of the network streams in IoT, we design it based on facing some specifications listed below: • Although TONTA identifies the predominant trend changes, some events can cause jitters (temporary trend changes), e.g.performance of the communication protocols, and network congestion, to name a few.In addition, some events can cause short-term network traffic changes, e.g.packet bursting.These short-term trend changes can conceal predominant trends if they happen frequently as fluctuations in the time-series datasets.TONTA has been adapted to face these network properties based on data pre-processing and CP verification steps.In the rest of this section, we explain all aforementioned steps of TONTA.

Collecting the network traffic characteristics
As the first step of pre-processing, TONTA tries to optimize the volume of the data that should be processed.First, a sampling method reduces the volume of OWD raw data to improve the time complexity of the algorithm.Eq. ( 1) is used for sampling when r is an OWD raw data, and s is the fixed sampling rate.In addition, to improve the time complexity and avoid unnecessary data processing, the Interval_Thr parameter is introduced.As shown in Fig. 2, TONTA gathers enough new sampled data before processing.Every Interval_Thr of  new sampled data points trigger TONTA.As an example, when Interval_Thr = 20, every 20 new sampled data points trigger TONTA; thereby if s = 10 and Interval_Thr = 20, every 200 new packets trigger TONTA.The reasons of using the sampling method are listed below: • Generally, TONTA as an online method, might be triggered after each new raw OWD data, but it causes high unnecessary data processing and resource consumption.The sampling method reduces the number of times that TONTA is triggered to process a time-series dataset.• If TONTA is triggered for every few new raws of OWD data, there would be a high data redundancy to process in every continuous trigger.To avoid data redundancy, a high number of new data points should be appended to a trigger TONTA.
Big Interval_Thr can increase average delay of CP discovery, but it can also reduce the number of TONTA triggers.The delay is calculated based on the time of identifying a CP after occurring.On the contrary, big Interval_Thr may decrease the accuracy of TONTA, because in some cases, CPs will be concealed in a huge number of new data points.

Online data analysis by dynamic-sliding window
To determine CPs in an online manner, data points needs to be analyzed gradually on-the-fly.As explained in Step 2, TONTA, in each trigger, processes a subset selected from the time-series dataset by a dynamic-running window.The subset should not be very big to consume a lot of processing resources.In addition, it should not be very small resulting in reducing the accuracy of TONTA.To achieve low time complexity and high accuracy, the dynamic-sliding window is proposed which runs through the time-series dataset from the end and select subsets in every trigger to process.The size of the window is highly dynamic, adapted before each trigger, based on different factors like last permanent CP, data density, and   ℎ.
There are two thresholds for the dynamic-sliding window called Max_Thr and Min_Thr which show the maximum and minimum number of data points, selected by the window, respectively.Max_Thr limits the size of the window to improve the time complexity and resource consumption.TONTA needs a minimum number of data points to discover the network traffic pattern.Min_Thr is proposed to provide enough number of data points to process.Algorithm 1 shows the pseudo-code of the dynamic-sliding window to select a subset of the time-series dataset. is a variable to improve the time complexity by removing unnecessary data points.It is used when the number of data points in the window is enough, but exceeding Max_Thr. is used to keep the size of the dynamic-sliding window between Min_Thr and Max_Thr efficiently.The default value of  is 1.5.The subset selected by the window contains  data points.

Determining a temporary CP in a time-series dataset
In this step, a temporary CP in the selected subset should be selected.After appending enough new sampled data points, a smoothing method is performed on the subset to remove jitters and short-term trend changes.It also helps to transform datasets generated by different types of distribution functions to a type of dataset that can be processed by TONTA as explained in step 3. Eq. ( 2) is used for data smoothing based on a floating frame when the frame size is 2*m+1. is a parameter to change the size of the frame and   shows   ℎ data point of the subset.As an example, if  = 1, then the smoothing method is applied on every three sampled data points.Bigger  makes the subset smoother, but high smoothing can possibly conceal the CPs.
In this step,  1 =  1 and   =   .To recognize a temporary CP in each trigger, as the first step, the difference of the smoothing data in the first order is used by Eq. (3).
To identify the temporary CP in the subset, we use the standard deviation method to predict existing data points and calculate the error rate of the prediction.In other words, the existing smoothed data points are predicted using the standard deviation method.Then predicted data points are compared with the smoothed data points to calculate the standard deviation errors.A novel exponential weighting method is used to give priority to the latest sampled data points.The weighting method helps to discover more important trend changes occurred recently as the current network traffic behavior.The weighting method changes the importance of each data point in the subset.Eq. ( 4) is used to calculate the weight for each data point.
0 <   < 1 is calculated by Eq. ( 5) to give weights to the data points of the subset.when  is approaching 1, the latest sampled data points are more important than older ones.On the contrary, if  is approaching 0, the importance of all data points tends to be equal.To identify the temporary CP, we need to give different weights from low to high to the smoothed data points to calculate the error rates.Value of  is determined by Eq. ( 5) shows the number of times that the data points are compared with different weights.Eq. ( 6) calculates the second order finite difference along with applying the weighting model.
Eq. ( 7) removes the effect of finite order difference, calculates the standard deviation error of each weighted data point, and finally sums up the error rates for each .
Each    that shows the minimum prediction error is considered as the position of the temporary CP in the subset.As an example, if the minimum of    appears in   = 0.81 and  = 500, the temporary CP is 0.81 * 500 = 405th data point of the subset.
If  = 100 then TONTA compares the data points of the subset by giving 100 different weights to each data point.To reduce the time complexity of TONTA,  can be increased, but the accuracy will be decreased simultaneously.Changing  is based on different factors, e.g.processing power of Node-x as the node that is performing TONTA, sensitivity of the network traffic to discover accurate CPs, and the volume of the network traffic.It is noteworthy that if there are more than one predominant trend in the subset, the last CP is selected as the subset is weighting from the end.

Curve detection method
In computer networks, NTA methods encounter an important challenge to process time-series datasets called curves.Unlike other applications of pattern recognition techniques (e.g.signal processing), transforming two continuous network traffic behaviors happens gradually.Curves are data points between two continuous predominant trends that transform the first one to the second one, but they are not new network traffic behaviors individually.Examples of curves and trends are shown in Fig. 3. Thick (black) lines on top of the time-series dataset show different predominant trends of the dataset and narrow (blue) lines show subsets of the dataset as curves.To avoid selecting data points of curves as a predominant trend, the Curve_Thr variable is introduced.Avoiding the selection of CPs in curves is considered as the first-step of CP verification.As shown below, there are two conditions (C1 and C2) that should be satisfied to approve that the temporary CP is not in a curve.Len(x) returns the number of data points of x and Pos(y) shows the position of data point y in the time-series dataset.  shows the last discovered permanent CP.The first new subset called  1 is started from the first data point of the subset to the temporary CP.The second new subset called  2 is started from the temporary CP to the end of the subset.
This method is called CDM. 1 helps to avoid identifying curves as predominant trends.2 helps to avoid selecting a CP before the last CP.

Verifying the temporary CP
After identifying the temporary CP, TONTA needs to determine the importance degree of the CP to turn it into a permanent CP or refuse it as a false alarm.It is noteworthy that the proposed method always determines a temporary CP in each trigger, but it can be false or insignificant due to the intense of the network traffic change.Therefore,  1 and  2 need to be compared to identify that how important the change is.As TONTA is very sensitive in terms of outliers, before comparing,  1 and  2 should be prepared.To address the challenge, an outlier detection method is proposed based on giving scores to data points.The assigned score is compared with a threshold to recognize and remove outliers.Eq. ( 8) calculates the slope of  1 and  2 when  = +, where  is the slope of the desired line and  represents its intercept.Assuming that the line slope is known, one can claim that the new line equation with  is replaced with the mean of  ′  that should result in the mean of  ′ .The score is calculated using Eq. ( 9) for every data point based on detrending the subset.Finally, Eq. ( 10) is used to determine whether each data point is an outlier or not.If the result of the Eq. ( 10) for a data point is 1, it is considered as an outlier and removed from the subset before comparing  1 and  2 .
1 ∀|() > () + √  () (10) After removing the possible outliers, Eqs. ( 13), ( 12) and ( 14) are used to compare the  1 and  2 . ) shows the intensity of the difference between trends of  1 and  2 .In addition,   2 shows the behavior of the trend as the current network traffic behavior.Eq. ( 14) shows the final result of the TONTA method in each trigger.If its result equals 1 for a CP, the method turns the temporary CP to a permanent CP. _ ℎ is a parameter which adapts the sensitivity of the TONTA model to turning a temporary CP into a permanent CP.As an example, if _ ℎ − 0.5, the difference between the trends of  1 and 2 should be at least 50% to select the temporary CP as a permanent CP.In addition,   2 can have a negative or a positive value.A negative value shows the delay of the packet stream is decreasing and a positive value shows that the delay of the packet stream is increasing.When it is approaching 0, it shows that the pattern is side-way and means that the delay is stable.
After determining the temporary CP and passing the first and second verification steps,  1 and all data points before the  1 will be removed from the time-series dataset.In other words, if the proposed method selects a new permanent CP, the data points before the CP will not be processed again.Furthermore, after selecting a new permanent CP, if the selected subset is bigger than _ ℎ (it happens if  2 + _ ℎ > _ ℎ), then _ ℎ%_ ℎ extra data points will be removed from the start of the subset.Removing the extra data means there are enough data points in the window, thereby there is no need for more data points to discover the CP.The extra data points are just removed from the window.The original data is kept to be used to determine permanent CPs shown in Fig. 2. If  2 + _ ℎ > _ ℎ, data points will be removed as extra data.

Evaluating the performance of TONTA
To evaluate the efficiency of TONTA, we use the Riverbed modeler (OPNET modeler) to gradually generate the time-series dataset in an ad-hoc IoT network.Two nodes are the source and destination of a packet stream, and three nodes are middle nodes to apply the effects of data forwarding queues.The forwarding rate of every middle node is 100 Pkt/Sec.The destination node is responsible for gathering OWD of every packet by SPI and sending it directly to the MATLAB engine where TONTA is implemented.
Table 3 shows the distribution functions of the packet generator at the source node for 1000 s.The number of raw data points (i.e., number of generated packets) without using the sampling method is 99 800 in 1000 s.After using sampling methods, 1914 data points are extracted gradually.The sampling rate is a constant equal to  = 50.The table also shows the expected network traffic behavior based on changing the distribution function, including Uptrend (UP), Downtrend (DO), Side-way (S), Very (V), Low (L), Medium (M), and High (H).As an example, we expect that after 200th second, the behavior of the network traffic changes from side-way to downtrend (DO) with a high (H) intensity, so it is (DO-H).We evaluate all public datasets to mimic all possible situations in an ad-hoc IoT network.To the best of our knowledge, no existing benchmark dataset provides the characteristics of our generated dataset listed below: • The dataset provides different network traffic changes like abrupt changes, gradual changes, short-term changes, jitters, downtrend to uptrend transformation and vice versa, side-way to downtrend and uptrend transformations, and vice versa.The generated dataset can represent all possible transformations to evaluate the efficiency of TONTA.• The packet generator's distribution function changes frequently to provide a dataset based on different distribution functions, e.g.normal, Poisson, constant, and exponential.
As there is no public dataset to mimic all types of transformations and distribution functions, we generate the dataset using the OPNET simulator.The -axis of all figures of the performance evaluation section is '' ()'', whereas the -axis denotes ''OWD (10 ms)''.In all figures, the red labels show permanent CPs based on Eq. ( 14) and the red values show the exact time of every CP.

The effects of TONTA parameters
In this section, we evaluate the effects of all parameters mentioned in Table 9.As it is impossible to evaluate all possible values of the parameters, we have assigned some default values to the parameters.For each performance figure, the effects of changing the value of the parameter, which is under evaluation, is investigated.The default values are Min_Thr = 100, Max_Thr = 300, Interval_Thr = 50, Imp_Thr = 0.5 and  = 100.The reasons for assigning these values are discussed in Section 4.1.1.The following issues should be considered to evaluate the performance of TONTA: • TONTA needs to determine CPs every 100 s, but according to the effects of curves and forwarding queues, it is practically impossible to recognize CPs in the destination node right after changing the distribution function in the data generator of the source node.After each change in the distribution function of the packet generator, it takes some time to have a considerable network traffic change in OWD of packets at the destination node.• As an online method, it is not expected that TONTA recognizes the current network traffic behavior exactly after arising the CP as it needs some new data points after the new CP to extract the new network traffic behavior.• As the distribution functions of the packet generator changes every 100 s, we expect the methods to identity CPs close to the time of the changes.By considering the effects of queuing in middle nodes and curves, to evaluate the efficiency of techniques, we accept alarms 30 data points before or after the time of changing the distribution function.As TONTA is an online method, it may select CPs before curves, therefore we accept CPs that are up to 30 data points before changing.
The effects of Interval_Thr.Table 4 shows the effects of changing the value of _ ℎ from 10 to 100 each step 10.The table shows that by assigning a very small value to _ ℎ, TONTA gets stuck in short-term trend changes because it has no time to wait for new data points.In addition, having a small _ ℎ can make a high negative impact on the performance of TONTA because TONTA needs to process the same data points several times in each triggering.A small _ ℎ causes concealing the CPs in new data points.
As shown in the table, the number of false-positive alarms is very high when _ ℎ is small, whereas TONTA misses the CPs when _ ℎ is big.
The effects of . is a parameter to improve accuracy and time complexity of TONTA by reducing the number of times that data points should be compared to detect a CP in a subset.Fig. 4a shows that some CPs are missed at  = 25.As an example, in 350th as well as 850th seconds, there are abrupt changes that TONTA cannot recognize.
Valuing  is related to the density of data so that  cannot be very small because it causes concealing CPs and cannot be very large because it negatively affects the time complexity.Figs 4b and 4c show that increasing the value of  improves the accuracy of TONTA.As shown in Table 5, increasing the value of  can reduce the false negative alarms and increase the true alarms.The results show that higher  improves the accuracy of TONTA, but based on the time complexity of TONTA, it also increases the time complexity simultaneously.

The effects of dynamicity of the window size.
To analyze the effects of the window size, specifically _ ℎ and _ ℎ, TONTA has been performed for more than 50 times based on different parameter values.The value of _ ℎ is between 50 and 300, each step 50, and the value of _ ℎ is between 100 and 600, each step 50.We do   Fig.5a shows the results for small _ ℎ and _ ℎ where their values are close to each other.As shown in the figure, TONTA is unable to recognize abrupt changes and also the number of false alarms is very high.The reason is that, by assigning two close values to these parameters, the dynamic-sliding window removes older data points very fast to provide enough space in the window for new data points.Thus, TONTA has not much chance to compare new data points with previous ones.Fig. 5b shows that when _ ℎ is greater than _ ℎ, the results are highly accurate.By having a high _ ℎ, TONTA does not remove data points for a long time and compare new   data points with old ones several times, increasing the time complexity of the algorithm.Although a high _ ℎ can improve the results, the time complexity of TONTA would be a challenge.Fig. 6 shows that, when both _ ℎ and _ ℎ are high, TONTA is unable to identify predominant trends based on Table 3.The reason of missing the CPs is that a high number of data points in the window can conceal the CPs.Therefore, it needs to have a smaller _ ℎ to trigger TONTA.
The effects of Imp_Thr._ ℎ is also used to change the sensitivity of TONTA in terms of identifying CPs.Smaller values of _ ℎ helps TONTA recognize changes with a low intensity as CPs.Higher values cause considering a change point as a permanent CP if it happens because of an abrupt change.Fig. 7a shows the results of TONTA for Imp_Thr = 0.2.As shown, most of selected CPs connect two trends with no big difference between their intensities.The main reason that smaller Imp_Thr cannot identify the right CPs is that TONTA selects short-term changes and curves as predominant trends.Selecting these trends can cause concealing the real predominant trends based on CDM.Fig. 7b shows the results when _ ℎ = 0.8, in which only abrupt changes are considered as permanent CPs and gradual changes are concealed by other data points.Fig. 7c shows the results when TONTA selects changes for which the difference between intensity of the two continuous trends is 100%.As shown in the figure, only abrupt changes are selected as TONTA is unable to detect short-term and predominant trends in this case.

Optimal values of parameters of TONTA
Based on the results of evaluating different parameters, we have extracted some rules as listed below: is high, then _ ℎ should be small.On the contrary, the small number of outliers shows that the packet stream cannot have many jitters.Generally, 50% is a good value to cover all possible situations.• _ ℎ is related to _ ℎ, where the best results are shown for _ ℎ * 2 ≈ _ ℎ.
By considering that the time-series dataset is gathered gradually in an online manner, TONTA can recognize important changes without the need to wait long after each CP.The time between arising a CP and identifying it is related to the _ ℎ and _ ℎ parameters.In TONTA, a CP should be determined after arising in a maximum _ ℎ number of new data points; otherwise, the CP will be removed.Table 6 shows details of selecting the permanent CPs based on Fig. 8.The table shows how many data points are appended to the dataset before identifying a CP.
As the final results of TONTA, the table shows the network traffic behavior between every two continuous CPs of Fig. 8 based on Eq. ( 13).The number of appended data points before identifying a CP is an important metric showing the delay of TONTA.As an example, between 143th and 190th seconds, there are 111 data points, but TONTA  recognizes the CP at 190th second, after appending 19 new sampled data points.Network traffic behavior is also approximately 0, which means that the delay is stable.Table 6 shows that TONTA as an online method is fast enough to recognize important changes.In most cases, TONTA identifies CPs in 1 2 × _ ℎ number of new data points.The average number of data points after CP is 23.5, where _ ℎ = 50.The results show that the delay of TONTA to identify CPS is related to _ ℎ.If the value of _ ℎ is very small, TONTA cannot discover the CPs right after appending new data points and needs to wait for more data, therefore the current trigger is useless and consumes resources without any significant results.Moreover, the value of _ ℎ should not be very high, resulting in concealing the CP among a high volume of new data points.

Benchmarking
We compare TONTA with RuLSIF [56] as a benchmark TCD algorithm to process time-series datasets.RuLSIF has been widely used for TCD in different applications, e.g. computer networks [57], healthcare [58], and daily activity segmentation [59], to name a few.In addition, several state-of-the-art TCD techniques are compared with RuLSIF to evaluate their performance [23,57,[60][61][62][63].It is noteworthy that RuL-SIF, as an offline method, needs the whole dataset at once to process.Although comparing an offline and an online method is not the best option, it can express the robustness of TONTA.In addition, RuLSIF has been selected to be compared with TONTA because of their similarities, e.g. both methods use the running window and try to reduce overlapped data points.RuLSIF is used to assess the ratio of relative density among subsequences of a time-series dataset.It uses divergence of data points to measure the relative density.Three parameters are important to adapt RuLSIF, including : the number of overlapped windows, : the size of each overlapped window, and : the relationship among data points.If the data points in a time-series dataset are highly related to each other, a higher  is needed.Using a supervised learning method, we have extracted values as  = 100,  = 10 and  = 0.1 based on our generated time-series dataset.
Fig. 9 shows the results of the RuLSIF method.Red peaks show the scores of RuLSIF for data points as RuLSIF assigns a score to each data point.0.1 is considered as the threshold to recognize CPs, therefore most of the peaks are considered as change points.Table 7 also shows the results of RuLSIF in details.We use the same method for RuLSIF as we used in TONTA to calculate network traffic behavior between two continuous CPs.
As shown in Table 7, there are some absolutely false alarms showing that RuLSIF detects a CP very close to another CP.Comparing Tables 3, 6 and 4 shows that TONTA has a high accuracy compared to RuLSIF.Although it is difficult to calculate the accuracy of both methods because of the delays of queues in middle nodes, comparing the tables shows that TONTA detects better CPs compared to RuLSIF based on expected network traffic behaviors mentioned in Table 3.As an example, it is expected that both TONTA and RuLSIF detect network traffic behavior between 600th and 700th as an uptrend with high  intensity.TONTA detects two CPs in 601th and 737th seconds, and the network traffic behavior between them is 0.09696, showing that the delay is increasing.As the maximum network traffic behavior is 0.5, the result shows 20% increase in the delay of every packet which is high.RuLSIF detects four CPs in 602th, 603th, 652th, and 697th seconds and behaviors of the network traffic between each two continuous CPs are 0.26, 0.04, and 0.07, respectively.Even if we consider the average of values, it is 0.123, but the number of false alarms is very high as we change the rate of the packet generator one time, but we receive four alarms from the CPD model.As the number of false alarms in RuLSIF is very high, they conceal the important network traffic changes and make it hard to trust the results.In addition, divergence of each two continuous changes that TONTA selects is higher than RuLSIF based on comparing the Difference field in Tables 6 and 7.The comparison shows that TONTA selects more important network traffic behaviors as there are some small values in the results of RuLSIF, e.g. 2% and 5%.Comparing TONTA and RuLSIF shows that the accuracy of TONTA is higher than RuLSIF based on the positions of CPs, which are recognized in a common time-series dataset.We consider three types of alarms to compare the efficiency of TONTA and RuLSIF, as shown in Table 8.As the application of TONTA is NTA, false alarms can highly affect the performance of the network.

Time complexity of TONTA
Based on Section 4.1, we evaluate the effect of the parameters on the accuracy of TONTA.In addition, we calculate the time complexity of TONTA to show how the proposed method performs on weak machines.The time complexity of the sampling method is O(1) and the smoothing method is O(Max_Thr).In terms of detecting the temporary CP, the time complexity is equal to O(Max_Thr*()).Time complexity of outlier detection is also O(Max_Thr), therefore the whole time complexity of TONTA is equal to O((2*Max_Thr)+Max_Thr*()). 2 is removed from the time complexity as a constant.Thus, the final time complexity would be O(Max_Thr+(Max_Thr*)).
Table 9 shows different parameters of TONTA and their effects on the time complexity of the proposed method and its accuracy in terms of TCD based on Section 4.1.Øystein Haugen is a professor at Østfold University College.His research focuses on cyber-physical systems, variability modeling and software product lines, object oriented languages and methods, software engineering, and practical formal methods.

Fig. 3 .
Fig. 3. Example of the Trends and Curves in a time-series dataset.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

A
.Shahraki et al.

Fig. 8 .
Fig. 8.The results of the TONTA for default parameter values.

Fig. 9 .
Fig. 9.The results of RuLSIF ( = 100,  = 10 and  = 0.1).(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Amin
Shahraki was born in Mashhad, Iran, 1988.He received his bachelor's degree in computer software engineering and master's degree in computer networks in 2009 and 2012 respectively.He received his Ph.D. degree in Computer networks from University of Oslo, Norway in 2020.During his Ph.D., he was also working as a research assistant at Østfold University College and working on Smart Buildings and welfare project.He has been a visiting researcher at University of Melbourne, CLOUDS Lab from Aug. 2019 to Feb. 2020 under supervision of Professor Rajkumar Buyya.He is a member of National Elite Foundation in Iran and IEEE since 2011.His current research interests are Internet of Things, Cellular network, Cognitive Communications and Networking, Network Behavior Analysis, Time Series Analysis, Clustering and QoS and self-healing networks.Amir Taherkordi is an Associate Professor in the Networks and Distributed Systems (ND) group at the Department of Informatics, University of Oslo.Amir received his Ph.D. degree from the Informatics Department, UiO and his Ph.D. thesis was titled ''Programming Wireless Sensor Networks: From Static to Adaptive Models''.He has experience from several RCN and EU projects and serves as a reviewer for many high-ranking conferences in the area of distributed systems.Amir's research interests lie broadly in the area of distributed computing, software engineering, adaptation middleware, networked embedded systems (WSNs, IoT, and CPS), and Internet/Web engineering.He has recently received a prestigious grant from the Norwegian Research Council under the scheme of Young Research Talents to work on a novel IoT service computing model for future large-scale and dynamic Fog-Cloud systems.

Table 1
List of abbreviations.

Table 2
Comparing the literature.

Table 3
Distribution functions of packet generator at source node to generate the evaluated packet stream.

Table 4
The effects of changing the _ ℎ on the results of TONTA.

•
Based on parameters  and window size, TONTA needs to analyze a minimum number of data points in each trigger equal to _ ℎ for  number of times.In this case, the value of  needs to be adapted based on the value of _ ℎ to optimize the time complexity.If the value of _ ℎ is very high,  should be small.On the contrary, if _ ℎ is not very high,  can be a bit bigger to improve accuracy, therefore there is a trade-off between  and _ ℎ.•As shown for window size, _ ℎ has a high correlation with _ ℎ, where their values should not be very close.In addition, big values of _ ℎ can cause inefficiency in terms of the time complexity.•Thebestresults for _ ℎ emerges when _ ℎ = 40 which is approximately equal to _ ℎ 2 .In the case of very big _ ℎ, important trend changes are concealed by curves.The results shown in Table4indicate that the best value for _ ℎ is half of _ ℎ, improving the time complexity as well as accuracy.•The value of _ ℎ can be adapted based on the number of outliers removed in the second step of CP verification.If

Table 6
Network traffic behavior analyzed by TONTA and delay of TONTA to recognize CPs based on default scenario.

Table 7
The results of RuLSIF based on Fig.9.