Characterisation: Third-Order Statistics & Higher-Order Spectra for Precise Trafﬁc Modelling

Undoubtedly, the characterisation of network traﬃc ﬂows is vitally important in understanding the dynamics of Internet traﬃc and in appropriately dimensioning network resources for network and systems management. The vast majority of modelling techniques developed for volume-based traﬃc proﬁling (based on packet and/byte counts) imply the statistical assumptions of stationarity, Gaussianity and linearity, which are often taken for granted without being explicitly validated. In this paper, we demonstrate that such properties are often not applicable due to the high ﬂuctuations in Internet traﬃc, and should therefore be validated ﬁrst before they are assumed. We employ Time-Frequency (TF) representations and the Hinich algorithms for validating these three modelling assumptions on real backbone and edge network traces. We show by conducting a passive, oﬄine statistical analysis on real operational network traﬃc traces from both backbone and edge links that link traﬃc is extremely dynamic irrespective of the level of aggregation and that model characteristics vary. Subsequently, we propose the use of a representative of higher order spectra, the bispectrum, to act as a particularly suitable method for volume-based traﬃc proﬁling due to its ability to adapt to diﬀerent underlying statistical assumptions, as opposed to ARIMA timeseries models that have been typically used in the literature. We demonstrate that the bispectrum, a signal processing tool that has so far been used in the area of image processing and acoustic signals, can be exploited to accurately characterise traﬃc volumes per transport protocol, and can therefore contribute to ﬁne-grained network operations tasks such as application classiﬁcation and anomaly detection.

Understanding the underlying transport-layer traffic behaviour of backbone and edge networks is vital for traffic engineering tasks such as link capacity planning, traffic classification, and anomaly detection [1,2,3,4,5]. Traffic characterisation is typically addressed through statistical analysis of individual link(s) and network-wide traffic volume properties such as counts of bytes and packets [1,2,6,7], as well as by analyzing the distributional behaviour of particular packet header fields [1,4,5,8,9,10].
In the literature, numerous statistical and signal processing techniques have been proposed to construct traffic models. A number of studies attempt to determine the longterm network-wide behaviour based on past measurements using wavelet filters, Holt-Winters forecasting, Analysis of Variance (ANOVA) and Autoregressive Integrated Moving Average (ARIMA) methods (e.g. [6,11,14]). At the same time there are volume-based schemes that characterize network behaviour by optimizing the traffic matrix estimation problem (e.g. [7,12,13]). Having as a basis pre-captured IP packet data and Simple Network Manage-ment Protocol (SNMP) counters from Points of Presence (PoPs) aggregated and mapped as Origin-Destination (OD) flows, such methods can derive general models to represent overall network traffic demands and routing. Throughout past literature, the most influential theoretical propositions underpinning the traffic matrix problem have been the network tomography Gaussian models provided by Cao et al. [15], GoldschmidtâĂŹs deterministic linear models derived by Linear Programming (LP) methods [16], and Zhang et al.'s optimized gravity model [12]. Other notable work includes the linear state space modelling [17], and the joint model of Hidden-Markov-Model (HMM) and Gaussian distributions [18].
Apart from the traffic matrix approach, there have been studies mapping application-related traffic demands to the transport-layer traffic intensity observed on backbone links. For instance, one of the first significant studies by Fieldmann et al., employed a wavelet-based approach to show that scaling properties of Wide Area Network (WAN) traffic are linked to the high activity of Web flows [19]. Perapplication traffic characterisation studies have also used signal processing, graph-theoretic and machine-learning approaches to classify application layer activity [5,8,20]. Within anomaly detection, the work presented in [21] proposed a non-Gaussian model for characterizing the volume A C C E P T E D M A N U S C R I P T of unidirectional flows traversing a backbone link. In contrast to the studies mentioned above, the authors in [21] indicate that the strongly asymmetric traffic profile at a transit or backbone link forbids the employment of the well-known OD approach as well as the analysis of bidirectional flows. Moreover, the authors in [22] demonstrate that in order to map the aggregate Internet traffic volume under a Gaussian distribution several penalties need to be employed on a given model that surely do not capture the actual dynamics persisting in the examined measurements.

Problem Definition
Most of the above parametric solutions assume a complete knowledge of the probability distribution of the traffic volume, either on a network-wide, single PoP, or link volume traffic characterisation. Hence, they have incorporated mathematical models based on the de facto statistical assumptions of stationarity, Gaussianity and linearity. In general, these statistical properties determine whether the traffic volume may be represented as a stochastic process where 1 st -and 2 nd -order moments such as mean and variance do not change throughout time (i.e., are stationary). In parallel, the traffic volume is assumed to be modeled under a Gaussian fit that follows a normal distribution. Moreover, if the observations that compose the traffic volume (in our case byte/packet counts of a flow) have a linear relationship with the preceding or following observations then it can also be represented as a linear state model since it complies with the linearity assumption.
Nevertheless, the majority of studies within the realms of Internet traffic characterisation assume these three properties without rigorous validation, whereas others as in [6,13,23] use 2 nd -order statistics (i.e. mean, variance, autocorrelation sequence, power spectrum) to validate these assumptions. However, 2 nd -order statistics are problematic in validating timeseries properties such as stationarity, linearity and Gaussianity. As explicitly demonstrated in signal processing studies presented in [24] and [25], 2 nd -order statistics suppress phase characteristics such as the phase magnitude thus they are unable to capture phase transition peaks. In networking terms, phase transition peaks relate with the adequate capturing of the exact timing and duration of traffic fluctuations occurring in a network [26]. Consequently and as we show in this paper (section 4.4), any further modelling based on the three assumptions, without a pre-validation process, leads to questionable accuracy and hence this naturally leads to inaccurate interpretation of network traffic dynamics.

Contributions
In this paper, we focus on the rigorous validation of the statistical properties of stationarity, Gaussianity and linearity using 3 rd order statistics, which have been used in the past to model queuing performance [16] and packet interarrival time processes [5]. Here, we use 3 rd order statistics for volume-based traffic modelling based on byte and packet counts in unidirectional traffic flows on an aggregate as well as per-protocol basis. Our work sheds new light in the domain of Internet traffic characterisation through demonstrating the applicability of techniques that are typically used in other domains (e.g., image processing and acoustic signals processing) such as Time-Frequency (TF) representations and 3 rd order statistics estimated by the Hinich algorithms and the bispectrum [24,25,27].
We highlight the main contributions from our study: (1) We first empirically show that the three often-used, de-facto assumptions of stationarity, Gaussianity, and linearity should be rigorously validated before being applied, as they do not hold universally. The empirical analysis exhibited in this work relies upon real backbone and edge traffic traces and the employment of higher order statistics by the Hinich algorithms. In particular we have found that: • The non-linear behaviour of the instantaneous frequency and group delay, in all of our datasets on a protocol-specific and aggregate level in both packet and bytes, indicates a highly non-stationary persona in the examined traffic traces.
• All of our datasets on a transport-layer protocol, aggregate volume basis either on a byte or packet-based analysis, indicated to follow non-Gaussian properties.
• In the majority of cases and even in a small temporal interval in the same day (e.g., 30 min.), different transport-layer protocols do not comply with the same modelling assumptions (e.g., Gaussianity, linearity). Therefore their independent, per-protocol analysis is crucial for several traffic characterisation domains such as anomaly detection.
• Even in the process of independent protocol analysis, packets and bytes should be separately profiled since their distributional behaviours exhibit opposing and varying statistical properties with respect to linearity.
• We observe that traffic on the transport layer has many sudden changes within a small time period (e.g., 15 min.) and we also evidently show that the linearity assumptions often fail. Thus, it is essential to reconsider their validity at the pre-modelling stage, since they constitute the underlying basis for selecting a correct traffic characterisation scheme.
• With the use of measurements that have a 10 year gap between them (e.g., 2006-2016) we show that the Internet traffic volume on backbone links from an aggregate or protocol-specific viewpoint still holds the same highly non-stationary and non-Gaussian properties where linearity is evident in some instances.
(2) We show that methods derived by 2 nd -order statistics, commonly used for traffic profiling such as ARIMA models and the power spectrum (e.g. as in [6,7,10,14,15,19,21]), often fail to detect traffic fluctuations throughout A C C E P T E D M A N U S C R I P T the observational time frame due to their explicit dependency on assuming the stationary and linearity property for the traffic volume. Apart from not being able to capture phase information, such techniques are also unable to accurately localize frequency fluctuations on the time domain. As we show in this work, such sudden frequency changes undetected by 2 nd order statistics are mapped to sudden traffic changes triggered by adverse or malicious events of which network operators need to be aware.
(3) In order to overcome the limitations of the 2 ndorder techniques, we propose and validate the use of higher order spectra with the use of the bispectrum [24,25]. The bispectrum is a tool for examining a timeseries' 3 rd -order statistical properties which captures phase information and therefore can adequately map traffic fluctuations on the time-frequency plane. We compare the bispectrum with its commonly used 2 nd -order counterpart, the power spectrum [24,25], and show that our proposed bispectrum offers a much better resolution on the TF plane leading to higher accuracy in characterizing the behaviour of aggregate traffic volume.
(4) We demonstrate the bispectrum's applicability for practical network capacity planning purposes, because of its ability to adapt to different underlying statistical assumptions, which makes it ideal for link traffic volume peak analysis. On the other hand, we show that traditional methodologies such as the ARIMA timeseries (e.g. as in [6,11,13,14,23]) may not be in a position to precisely detect such peaks affecting the overall capacity planning process, due to their explicit dependency on assuming the stationary and linear property for the traffic volume.
The remainder of this paper is structured as follows: Section 1 describes the traffic traces used for our analysis while Section 2 describes the theoretical background and the results obtained from the stationarity test. Section 3 presents the theory behind the Hinich algorithms that underpin the validation of linearity and Gaussianity, and presents the results. Section 4 compares 2 nd -and 3 rd -order statistics when applied on aggregate traffic volume, and discusses the practical benefits offered by the proposed bispectrum method, especially in the cases of fluctuating traffic demands. It also elaborates on the benefits of perprotocol independent modelling and shows the advantages offered by the bispectrum. Finally section 5 summarizes the contributions of this work and concludes the paper.

Data Description
This section is dedicated to presenting and describing the data used within our experimentation. Our analysis is based on unidirectional traffic flows extracted from four anonymized packet payload traces collected in the US and Japan, as shown in Table 1. The EQUCH dataset is a subset of a larger CAIDA dataset 1 collected in 2016 at the 1 CAIDA anonymized Internet Traces 2016:http://www.caida. org/data/passive/passive_2016_dataset.xml Equinix 2 datacenter in Chicago, IL, that is connected to a 10 Gb/s backbone link of a Tier-1 ISP between Chicago, IL and Seattle, WA. The WIDE trace was collected on a 100 Mb/s US-Japan Trans-Pacific backbone link carrying commodity traffic for WIDE member organizations 3 . The Keio traces were captured on an 1 Gb/s Ethernet link from the Japanese academic network of the Keio University's Shonan-Fujisawa campus. With the use of CAIDA's Coral-Reef tool 4 we grouped the captured packets into their corresponding network flows and then computed per unidirectional transport flow (i.e. TCP, UDP, ICMP) volume statistics (e.g. counts of bytes, counts of packets).
From an application point of view, the EQUCH trace was dominated mainly by WWW(i.e. HTTP/HTTPS) traffic flows but also some considerable amount of unclassified TCP and UDP flows that we speculate were related to game traffic. In addition, the EQUCH trace is also comprised of DNS, RTMP, IPSEC, SSH and also game-related traffic with the QUAKE protocol. On the other hand and as presented in [5], the -10 years older-WIDE trace is mostly comprised of DNS flows, being followed by WWW (i.e. HTTP/HTTPS), FTP as well as unclassified traffic (by the payload-based crl_pay classifier [5]) and general network operation traffic such as NetBios, NTP, SNMP and Spamassasin. The Keio-I trace includes FTP, WWW, DNS, and Email traffic. In addition, some scanning attack traffic and a percentage of unknown traffic flows were observed. The Keio-II trace contains DNS, FTP transfers, WWW and streaming media traffic related to applications such as Realplayer, Windows Media Player and Quicktime, as well as P2P flows (e.g. BitTorrent). Though our analysis in this paper is primarily based on decomposing transport traffic into byte and packet counts of TCP, UDP, and ICMP traffic, still the more fine-grained application-based view helped us perform drill-down analyses on observed transport protocol behaviour and identify causal applications or attacks (section 4.5).

Stationarity Test
This section presents the experimentation conducted for validating the stationarity assumption in our datasets. It firstly presents via section 2.1 the signal-oriented principles of instantaneous frequency and group delay, which are used as the metrics for validating stationarity on the byte/packet count timeseries. Subsequently, section 2.2 demonstrates and discusses the results obtained by our stationarity analysis.

Methodology
Traditionally in statistics, a stochastic process is said to be stationary if there exists time invariance between A C C E P T E D M A N U S C R I P T observations. From a signal processing point of view, a signal is considered to be wide-sense stationary if its decomposition can be expressed as a discrete sum of sinusoids (i.e. as a sum of elements that have constant instantaneous amplitude and instantaneous frequency) [25]. Apart from keeping a constant mean and variance, a wide-sense stationary signal g(t) should also be described by an autocorrelation function (i.e. ACF) E[g(t 1 )g * (t 2 )] that only depends on the time difference t 2 − t 1 . In simple words the ACF relies on a single time lag and does not change with the time at which the function was calculated.
If we wish to non-parametrically characterise a widesense stationary signal on the Time-Frequency (TF) plane, it is firstly required to be in its analytical complex form. Under this form, it is expected that the sum of elements composing that signal should keep a constant amplitude and instantaneous frequency respectively as depicted in Fig. 2. In case the signal does not follow any of these constraints, it is considered as non-stationary. Let g(t) denote a signal representing our byte/packet count timeseries and its complex form G a (t) derived after a Hilbert transformation where G a (t)=g(t)+iH [g(t)] [25]. Its absolute value |G a (t)| gives the magnitude of change in the signal (amplitude) in bytes/packets for a given time t. Since the instantaneous peak in the signal is known, through the instantaneous amplitude we can use equation 1 to get a measure of the instantaneous frequency f (t): In our case, f (t) denotes the amplitude of frequency we observe in one count of a packet/byte arrival at a particular time t. We also use the group delay which shows the local time behaviour with respect to the frequency function (i.e. time distortion caused by the signal's instantaneous frequency). The group delay t G (v) is computed by solving equation 1 with respect to the Fourier transform of G a (t), F a (v): The definitions provided by the above equations are vital for our validation regarding the stationary behaviour in the byte/packet count timeseries. Hence, we are particularly interested in identifying whether they keep a constant and linear behaviour. As shown next, we observe the characteristics of these metrics and conclude on the inexistence of stationarity.

Stationarity Test: Results
This section presents the results obtained from the mathematical equations illustrated in section 2.1 for validating stationarity. Under the assumption of our traffic signal being stationary and given the definitions of instantaneous frequency and group delay in section 2.1, we anticipate that each independent byte/packet count timeseries would exhibit unique instantaneous frequency and group delay values. This intuition relates to the assumption that each protocol would behave as a mono-component signal either from the packets or bytes analysis. A mono-component signal behaviour relies on the fact that the counts of both packets and bytes numerically differ and their counts enforce modulated amplitude with complex sinusoidal signals under different time periods. Therefore resulting with a distinct time instant with a single dominant amplitude.
In simple networking terms, such a single dominant amplitude is translated as the case where the volume aggregate of protocol flows (e.g., TCP flows) on a given time instant indicates only a single peak initiated by a particular byte/packet count of a single flow, not by multiple flows. Thus, we cannot have multiple flows indicating a peak on the aggregate but only one which holds the highest byte/packet count. In order to assume a stationary behaviour of our mono-component signal, we expected that each unique frequency which relates with a single amplitude (i.e., peak initiated by a single, most dominant flow) alongside the rest of the subsequent estimated frequencies would have a constant and close-to-linear evolution in time. Similarly, group delay should present a normalized frequency close to linear with respect to time.
However, as depicted by Fig. 1, none of the anticipated outcomes of constant or linear characteristics for group delay and instantaneous frequency is observed, thus implying a non-stationary behaviour in our datasets. Due to the fact that all the results in all the three tested protocols lead to the same conclusions in all our traffic traces, we restrict this section on presenting results obtained from the Keio-I trace and particularly the stationarity analysis on TCP. Fig. 1 presents the evaluation conducted on Keio-I for TCP from the perspectives of byte and packet counts. The four upper plots in 1 show the behaviour of the instantaneous frequency and group delay on TCP packets/bytes when the initial packet/byte timeseries are not differenced whereas the bottom plots illustrate the same analysis when the packet byte timeseries are differenced up to the third order. At first glance, it is fairly obvious that in all cases, multiple instantaneous frequencies exist within the same time instant. In addition, the group delay outcomes indicate that   every signal suffers from time distortion and phase delay as shown by the unstable, non-linear graphs produced. Naturally, the time distortion indicates that the exact capturing of the instantaneous frequency which is related with the instant amplitude of the signal is not correctly adjusted on the TF plane, thus time shifts of individual components (i.e., byte/packets counts of a single flow) are not properly calibrated. These outcomes contradict the assumption that the datasets are of a mono-component nature, since there are different instantaneous frequencies (i.e., multiple amplitudes composed by various flows) and time distortions in the same time instant; therefore the signals have a multi-component nature.
In order to remove some of the distortion and possibly get closer to stationary characteristics we have used a common data analysis technique by differencing our timeseries (e.g., as in [6,14,16]). We have examined our signals up to the 3 rd order of differentiation but, as the four bottom plots of Fig. 1 show, the general conclusions with respect to the non-stationary persona of the byte and packet timeseries remain the same. A comparison between the results gained for the non-differenced series (four top plots of Fig. 1) with those attained for the differenced series shows an insignificant reduction of distortion, but still there are random simultaneous frequencies on both byte and packet counts. The outcomes of the four bottom plots in Fig. 1 clearly show that in certain cases the time and frequency distortion is eliminated to a certain extent, but still all signals are structured by several formants. For instance, the group delay graph for the differenced series of TCP bytes illustrates that at 500 seconds there are two different instantaneous frequencies for the TCP byte count signal (indicated with arrows -bottom right plot). Hence, the initial conclusion with respect to the non-stationary, multi-component nature of our protocols is still valid even after differentiating the datasets.

Linearity & Gaussianity Tests
This section consists of two main parts. Section 3.1 introduces the methodology employed for validating Gaussianity and linearity. In particular, section 3.1 and 3.2 elaborate upon the concepts of cumulants, bispectrum and bicoherence since they constitute the basic elements of the Hinich algorithms [27], which are used for assessing Gaussianity and linearity in our datasets. Section 3.3 illustrates and discusses the results obtained. Through our analysis, we validate (i) whether traffic flows on the aggregate and on a protocol-specific basis can be adequately fitted under a Gaussian fit and (ii) if the frequency of flow occurrences on the time domain exhibits linear dependencies. We show that each transport layer protocol independently exhibits different statistical characteristics with respect to Gaussianity and linearity within reasonably small time bins.
The identification of linear and Gaussian behaviour has been formulated by employing computational algorithms provided by Hinich [27]. The algorithms are suitable for accurately validating whether a signal's timeseries has linear and Gaussian characteristics. This capability exists due to their direct assessment of a timeseries' 3 rd order statistics, serving as the means for profiling phase properties which are not detected by 2 nd order statistics (e.g., mean, variance, autocorrelation sequence) [24,25,27]. The Hinich algorithms mainly expose the 3 rd order cumulant characteristics of a signal as indicated by the bispectrum and bicoherence values [24,25,27]. In this subsection we introduce the concepts of cumulants, bispectrum and bicoherence, as well as some of their composite statistical features and how these are mapped to our experimental analysis.

From moments to cumulants and polyspectra
Moments give a quantitative measure of the linear combination of points in the distribution of a random process. On the other hand, cumulants denote the non-linear combinations of points within the distribution for a random process g(t). Traditionally, the estimation of the wellknown power spectrum which defines the spectral density of a process is derived from the Fourier Transform (FT) of the 2 nd order moment sequence (i.e. autocorrelation sequence). Due to the elaboration on Kolmogorov's moment definitions provided by Rosenblatt [24], the power spectrum (also known as the 2 nd order polyspectrum) can be defined in terms of cumulants instead of moments. Subsequently, we can express the power spectrum S(ω 1 ) with respect to the 2 nd order cumulant sequence c 2 (τ 1 ) as: On the other hand, the topic of interest in the Hinich algorithms is the estimation of the bispectrum which is the 3 rd order polyspectrum. According to [24], higher spectra of order N 3 are strictly expressed by cumulants. The bispectrum provides a dual-frequency representation on the time-frequency (TF) plane in contrast to the onedimensional interpretation of the power spectrum, and as shown by equation 4, it is considering the 3 rd order cumulant sequence c 3 (τ 1 , τ 2 ) as its basic tuning function: Work in [24] and [25] suggest that the bispectrum provides a much finer level of granularity for detecting and further characterizing a non-linear signal than the power spectrum. The reason is that the power spectrum directly depends on the autocorrelation sequence, whereas the bispectrum deals with the 3 rd order cumulant sequence. In addition, the bispectrum can extract information related to the minimum phase shifts that take place on a non-linear multi-component signal. As we show in section 3.3 this is the case for Internet traffic and particularly on per-protocol counts of bytes and packets where most of them exhibit different phase properties and in many cases, a non-linear property.
Based on the aforementioned characteristics, we propose to use the bispectrum as the basic means of characterizing our traffic streams in order to verify the assumptions of linearity and Gaussianity. Due to the fact that we could not obtain a crisp understanding regarding the probability distribution function of our observed timeseries we apply a non-parametric estimation of the bispectrum [29]. Used alongside bicoherence that we explain next, the bispectrum also reveals other distributional properties such as the skewness of our dataset. Specifically, the Gaussianity test is commonly mentioned as the "zero-skewness test" since theoretically a dataset exhibiting zero skewness is considered to be Gaussian [29]. Although skewness is derived from the 3 rd order cumulant sequence (as the bispectrum and bicoherence are), zero valued skewness is only valid if it satisfies the requirement that our dataset shows zero mean, which in our case is not. Therefore, we consider it more appropriate to use the bispectrum and its squared normalized version (i.e. bicoherence) which do not require the zero mean as a prerequisite.
Bicoherence is a useful statistical tool derived from the definition of coherence. Coherence describes frequencydomain correlations for a given signal, and it is also established as a means for measuring linear dependencies of moments. On the contrary, bicoherence is used to detect quadratic non-linearities on the time-frequency (TF) plane as well as quadratic phase coupling, denoted as the squared normalized version of the bispectrum as follows: S(ω 1 ) and S(ω 2 ) are the power spectrums of two independent Fourier components, and S(ω 1 + ω 2 ) is the power spectrum of a composite 3 rd signal defined by the two independent Fourier signals ω 1 and ω 2 . equation 5 is considered a crucial element of the Hinich algorithms. It is actually the basic means for estimating the distance of cumulants from each other which in our process represents the counts of bytes and packets for each flow.

Hinich Algorithms
Firstly introduced in [11], the Hinich 5 algorithms are a set of statistical hypothesis tests to detect non-linear 5 Naturally the concepts invoked by the Hinich algorithm and the bispectrum involve non-parametric signal processing schemes that in practice consider the random traffic behaviour where its probability and further Gaussian or non-Gaussian characteristics on a given random stochastic process g(t). As already mentioned above, the process g(t) in our case is the count of packets or bytes in discrete time bins of length n, denoted as T = τ 1 , τ 2 , . . . , τ n . The following subsections introduce the basic equations employed by the Gaussianity and linearity tests.

Gaussianity Test
This test focuses on the bicoherence value as the result of the normalized bispectrum estimate. The Gaussianity (also known as zero-skewness) assumption according to Hinich is considered the case where the estimated bicoherence value E[b k B (ω1,ω2) ] as well as the skewness equal to zero. Since in reality we can never have a flat, zerobicoherence (due to noise), we take the mean bicoherence value which in practice represents a quantitative Gaussianity metric [24,25,27]. By using the definition of equation 5 we can calculate the mean bicoherence power estimate k with: Work demonstrated in [10] and [3] employ some comparisons of the k value in order to establish a concrete conclusion on whether the dataset under test actually follows a Gaussian distribution or not. As they suggest, the computed value of k is x 2 distributed (chi-squared distributed) with a Fast Fourier Transform (FFT) length density function either on a packet or byte perspective cannot be concretely defined as commonly assumed in the literature. Albeit that the Hinich algorithms have been challenged due to their dependency on Gaussian asymptotics as well as their suboptimal smoothing in the bispectral domain ( [30]) we still argue in fair of their validity since the opposing suggestions do not provide constructive comparisons on real datasets but they rather equationte synthetic timeseries of oscillatory random processes in extremely small time intervals( [30]). Thus, given the outcomes of several Internet traffic characterisation schemes we can characterize the traffic volume timeseries as the observable state of a random process but on the other hand we do not hold all the properties (e.g. frequency stability) that are explicit to oscillatory random processes. function for approximating the number of freedom degrees to the closest Gaussian fit. Therefore, in case the estimated k indicates that our timeseries has less than or equal to 10 freedom degrees, then we conclude that the dataset follows a Gaussian distribution. In parallel with k, the Zero Skewness Probability (ZSP) of false alarm ϑ is computed. ϑ illustrates the probability of the newly approximated value being much larger than the initial k estimate. When ϑ is small, we can reject the zero-skewness (i.e. Gaussian) hypothesis. In our tests we use 0.2 ϑ 1 as a valid range for approving Gaussianity, based on findings from work in [24,25,27,31].

Linearity Test
The linearity assumption holds in case where the bispectrum of a real process g(t): where C is a constant value for all ω 1 and ω 2 . As suggested by Swami [29], it is essential to approximate values of a sample interquartile range under a new bispectral estimate Ψ(ω 1 , ω 2 ) derived by B(ω 1 , ω 2 ): where M is the resulting boxcar window length, after rounding the FFT length with a resolution parameter ρ. The intuition for calculating the interquartile range is to compare it with a theoretical interquartile range Y (ω 1 , ω 2 ) which similarly to Ψ(ω 1 , ω 2 ) is chi-squared with a non-centrality parameter η defined as: In our experimentation we keep the values of a FFT length of 128 and a resolution parameter ρ of 0.51 since higher resolution parameter values would involve drawbacks with  [29]. The main step is to compare Ψ(ω 1 , ω 2 ) with Y (ω 1 , ω 2 ) and if the difference is higher than the limits provided by [27,29] (e.g., >10), then we reject the linearity hypothesis.

Linearity & Gaussianity Analysis: Results
This subsection examines the validity of the linearity and Gaussianity assumptions on our datasets, using the Hinich algorithms. Tables 2 and 3 demonstrate the results obtained for all our datasets on an aggregate and transport protocol-specific perspective. In general, all examined our datasets either on a protocol or aggregate-level analysis exhibited zero values on the Zero Skewness Probability (ZSP ϑ) from both a bytes and packets perspective. Therefore, all the timeseries examined promote a non-Gaussian property. This is also justified by the generated Degree of Freedom (DF) values produced for each packet or byte-wise distribution that hold values greater than the acceptable threshold of 10 DFs that we discussed earlier (section 3.2.1).
As illustrated in table 2, we observe that the linearity assumption varies in each dataset and is not necessarily related to the particular volume feature (i.e. packet or byte). By looking at the EQUCH dataset, we verify that the traffic volume from both a packet or byte perspective holds non-linear and non-Gaussian properties, thus they both need to be carefully modelled under a representation that severely considers these properties. On the other hand, the aggregate volume properties of the WIDE trace exhibit non-linear and non-Gaussian properties if we strictly consider the bytes distribution whereas the packet distribution demonstrates linear properties and non-Gaussian. Hence, a potential modelling approach for a given traffic characterisation application (e.g. anomaly detection) that requires granular view of these flow features should consider independently the modelling of packets and bytes. Similarly, such a modelling approach should act in the same fashion if it wishes to assess the dynamics of the Keio-II dataset since its aggregate packet volume distribution holds linear and non-Gaussian properties whereas its bytes distribution demonstrates a non-linear and non-Gaussian profile based on the computed statistics. Finally, the Keio-I dataset has linear and non-Gaussian properties on both its packet and byte-wise distribution, thus a potential modelling approach may assume linearity for the overall aggregate traffic volume.
Overall, table 2 has demonstrated that empirically we do see the linearity assumption to be seen as frequently as the non-linearity assumption. However, the transport protocol-based volume analysis illustrated by table 3 indicates that the linearity assumption is not true in most of the examined distributions. It is therefore evident that even when modelling a single protocol, we should consider the counts of bytes and packets independently and analyze them separately. On the other hand and similarly with the aggregate volume analysis, the non-Gaussianity assumption still holds throughout all the datasets from a transport protocol viewpoint.
Given the statistics depicted in table 3 we can clearly visualise that the transport protocol dynamics in the EQUCH dataset exhibit varying outcomes with respect to linearity on each individual transport protocol (i.e. TCP, UDP, ICMP). For instance, in EQUCH as well as in the WIDE dataset, TCP cannot be modelled in the same way on both packets and bytes since the former behaves linearly and the latter non-linearly. Nonetheless, the UDP protocol in the EQUCH trace can be modelled under the same statistical assumptions since both its packet and byte distributions appear to conform with non-linear and non-Gaussian properties. However, UDP in the WIDE trace has a linear and non-Gaussian behaviour from a packets perspective, thus it should be modelled independently from the packets distribution that demonstrates a non-linear and non-Gaussian property. Furthermore, ICMP in both the EQUCH and the WIDE trace appears to have non-linear and non-Gaussian properties from both a packet or byte-wise perspective. Therefore, both ICMP packets and bytes in this case could be modelled jointly by considering the aforementioned assumptions.
The Keio I/II characterisation presented in table 3 has produced some significant outcomes related to the statistical behaviour of each protocol on byte and packet counts. It is observed that even in a small temporal interval in the same day (i.e., 30 mins), TCP and ICMP may not be modeled under the same assumptions. According to the generated statistics assessing linearity and Gaussianity, TCP in Keio-I for both byte and packet counts complies with a linear (i.e., L) and non-Gaussian (i.e., N G) statisti- cal assumption, whereas ICMP was found in both packet and bytes to follow a non-linear (i.e., N L) and N G persona. Even though they can both be modeled within a non-Gaussian model, still a uniform scheme cannot be employed since a Gaussian (or mixed Gaussian) fit to both protocols would not be able to fully capture the varying and unstable changes of the mean and the variance, as implied by the NL profile of ICMP. Nevertheless, the findings indicated in Table 3 illustrate the existence of N G characteristics on the aggregate and all the protocols independently, thus justifying the efforts placed in [6] where traffic profiling was done under a non-Gaussian modelling method. In general, it is also obvious that not all the protocols may be modeled in the same way therefore perprotocol, independent analysis is warranted, as we will present in section 4.5. Throughout Keio-I, only ICMP and UDP can be modeled in the same way (i.e., they both fall under the N L & N G assumptions), whereas TCP should be treated under linear but non-Gaussian assumptions. All the three protocols for both packet and byte counts have a zero probability for exhibiting zero skewness. This zero probability is the main strong evidence for concluding that the data do not exhibit Gaussian behaviour. The linearity exhibited by TCP in Keio-I is demon strated by the small numerical difference (<10) between the theoretical and the estimated interquartile range. On the contrary, we conclude that neither UDP nor ICMP exhibit linear characteristics since the difference between the interquartile ranges gets greater than 10. Identical analysis on Keio-II reinforces the results from Keio-I indicating again that all protocols cannot be modeled together under the same statistical assumptions.
Our experimentation goes a step further in order to empower the argument of the varying outcomes on the fundamental assumptions of linearity and Gaussianity over small timescales. In particular, we assessed the validity of these assumptions in smaller timescales on the EQUCH dataset. In order to achieve this, we initially segmented the EQUCH dataset in smaller samples of 15 minutes (i.e. EQUCH I -IV) and examined the statistical behaviour of their aggregate volume from both a packet and bytes perspective. As evidenced by figure 3, for the first 30 minutes in the EQUCH dataset (i.e. EQUCH I, EQUCH II) the aggregate volume on both packets and bytes is considered as linear since the absolute difference of the estimated interquartile range Ψ est with theoretical interquartile range Ψ th is less than 10. However, the statistical assumptions change for the subsequent 15 minutes in EQUCH since the linearity assumption is not valid since the aforementioned interquartile range absolute difference goes beyond the acceptable linearity threshold. Finally, the last 15 minute bin on EQUCH promotes a linear property since the difference gets lower than the threshold. Thus, a potential traffic model that aims to profile traffic dynamics for fine-grained operations (e.g. anomaly detection) would essentially need to reconsider this varying property in order to achieve higher accuracy. In parallel, the visualization of the Degrees of Freedom (DF) depicted in figure 4 indicates that the Gaussianity assumption is not valid throughout the whole 60 minute duration on EQUCH. In fact, all the computed DF values as resulted by chi-squaring the k value using a FFT function, are beyond the acceptable Gaussianity threshold of 10 DFs. Overall, via this small granular traffic analysis we observe the linearity and Gaussianity assumptions failing in small timescales and therefore the need for reconsidering their validity is essential, since they constitute the underlying basis for selecting a correct traffic characterisation scheme.

Traffic characterisation under Higher Order Spectra
In this section we illustrate the importance of validating statistical assumptions within the context of capacity planning and particularly detection of sudden volume peaks. We compare the performance of our proposed bispectrum as derived out of higher order spectra against the typical ARIMA models employed in several past studies on capacity planning (e.g. [6]). Our choice of the bispectrum is motivated by the fact that (i) packet and byte traffic signals in all of our datasets showed highly non-stationary and non-Gaussian characteristics while linearity was evident in some cases, and (ii) the bispectrum is the most suitable candidate for reasonably profiling such signal protocols [24,25], due to its own ability of assessing higher, 3 rd -order statistical properties. 6

ARIMA models
We have generated several ARIMA models describing the residuals and the autocorrelations that exist in our datasets as well as periodograms that provide a power spectrum representation of the same datasets. The most appropriate fits for our datasets were selected based on the Akaike Information Criterion (AIC) and Akaike's Final Prediction Error (FPE) which are particularly suitable statistical metrics for comparing several linear models 7 . Based on the AIC and FPE values, the most appropriate model to fit the overall counts of bytes in WIDE was a first order Auto-Regressive (AR = 1) second order differentiation (I = 2) with three Moving Average steps (M A = 3). On the other hand, the AIC and FPE values obtained for the packet count profiling indicated that the best-fit ARIMA model had a first order Auto-Regressive step (AR = 1) and two Moving Average steps (M A = 2) while the whole series should have been initially differentiated twice (I = 2).
As shown in both Fig. 5 and Fig. 6, the residual plots (top) generated for both traffic signals cannot determine any trends and the generated autocorrelation (ACF) plots (bottom) empower the speculation of non-stationary and random characteristics. At the same time, both ACF plots illustrate the absence of strong autocorrelation between the observations on the x-axis implying that neither the packet nor byte count signals in WIDE possess self-similar characteristics.
The range of values obtained for the autocorrelation coefficient (i.e. close to 0) justifies the absence of strong 6 Albeit that the primitive estimation of bispectral estimates relies on the weak (but not strong) stationarity assumption, suggestions depicted in [24,25,29] ensured that a fine-grained tuning on the smoothing function within the estimation of the non-parametric indirect bispectrum, can reasonably characterize non-stationary timeseries as well. 7 Due to space limitations, we will not explain in detail how the ARIMA modelling, the AIC and the FPE are mathematically defined, rather we refer the interested readers to [32] A C C E P T E D M A N U S C R I P T   we would expect a non-degenerate ACF asymptotically decaying to zero.
Due to the not well-defined patterns and the inexistence of strong autocorrelations depicted by the plots provided by Fig. 5 and Fig. 6, it is clearly evident that the ARIMA model cannot provide a meaningful and interpretable view for the traffic behaviour with respect to its exact evolution on the time domain. We argue that this is due to the nature of the underlying statistical assumptions of linearity and non-stationarity which we validated earlier (sections 2.2, 3.3). By principle, the AutoRegressive (AR) and Moving Average (MA) processes embodied within an ARIMA model require the linear dependency of observations throughout time and in parallel assumes their stationary behaviour [33].

Power Spectrum vs. Bispectrum for Traffic Surge
Detection The resulting power spectrum representations for the WIDE trace as derived by the ARIMA models are shown in Fig. 7. Since the timeseries were initially differentiated up to a 2 nd -order by the ARIMA model, we followed the same process for the bispectrum estimation. Fig. 7 shows the generated periodograms and estimated models for ARIMA (1, 2, 3) and ARIMA (1, 2, 2) illustrating that overall byte and packet counts exhibit a close-to-exponential growth with no significant or dominant peaks. This representation implies that there have not been sudden traffic changes in our traces, and that it is likely to have minimal or nonexistent traffic bursts. In addition, even in the case of some small dominant frequency cycles (e.g. on λ = 0.5, 2.5, 3 for byte and λ = 0.3, 2, 2.3, 3 for packet counts) that one could be tempted to investigate, the power estimates were not able to capture the exact timing of traffic surge events as seen in the raw datasets. However, by manually inspecting the traces, we did observe certain traffic bursts due to high utilisation of the three transport protocols TCP, UDP and ICMP. These high peaks were reasonably detected by the bispectrum which, as shown in Fig. 8, had isolated the phase transitions present in the timeseries.
As illustrated at the top right corner 8 in both bispectra of Fig 8, there are several distinct points or regions (coloured in yellow and dark brown those with high amplitudes based on the amplitude legend next to each bispectrum figure) indicating phase transition peaks that explicitly denote sharp traffic surges, which were not detected by the ARIMA model.
Based on manual inspection conducted with the raw dataset, we confirmed that there were some significant traffic demands arising from all the three protocols. As depicted in Fig 8, the regions surrounding the 2D frequency points in both packet and byte count of (0.26, 0.34) and (0.27, 0.39) exhibit a close-to-synchronised behaviour of increased traffic utilisation. However, as it also happens with the rest of the regions around the points (0.33, 0.28), (0.33, 0.39), (0.39, 0.28) and (0.39, 0.35), we could clearly identify traffic volume surges on both packet and byte counts.
Generally, in contrast to the periodogram output shown in Fig. 7, the generated bispectral estimates were able to localise and further isolate the distinct traffic demands enabling much easier inspection in the raw traces. These results show that the bispectrum-based approach provides a much more accurate TF representation of traffic than the commonly-used spectral representation of a produced ARIMA model via the power spectrum approach.

Traffic Peak Analysis
The ability to accurately localise high traffic demands on the time dimension is a pre-requisite within the context of capacity planning. Therefore, this subsection demonstrates the applicability of the bispectrum on a traffic peak analysis scenario and we compare its accuracy with that of the ARIMA methodology. We have mapped the signal intensities (i.e. amplitudes) into their corresponding bytebased values (Mbps) and picked the most dominant average traffic volume peaks (i.e., highest averaged mean values) obtained in the WIDE-II bin from the original timeseries signal. Fig. 9 shows that the bispectral estimates (green plots) outperform the power spectrum approximations(red plots) resulted after the ARIMA modelling on the WIDE trace. The two-dimensional frequency indices provided by the bispectrum's definition (section 3.1.1) allow the generation of a range of distributions that reasonably match the traffic peaks present in the original signal. Due to the complex nature of the resulting bispectrum we have employed the least squares method in order to extract the best fit distributions compared with the original signal values. As illustrated in Fig. 9, in the case of WIDE-II we found the four best-fit distributions within the overall WIDE-II bispectrum which are by far more accurate than the periodogram. All of the four distributions tend to have a close approximation to the initial highest peak of the original signal (≈ 95M bps). Despite the fact that most of their subsequent estimations were close to the original signal, it is also evident that particularly the distributions of Fig. 9(a), Fig. 9(b) and Fig. 9(d) are unable to capture the second and third peaks at the 2 nd and 3 rd minute respectively. In addition, the distributions of Fig. 9(b) and Fig. 9(d) tend not to fully match with the original signal's values on the last minutes of observation (i.e., from the 11 th minute and onwards) whereas those of Fig. 9(a) and Fig. 9(c) are reasonably close to these values though not fully matching them. These outcomes are due to the initial addition of the first five cumulants within the bispectrum's cumulant sequence c 3 (τ 1 , τ 2 ) which is the core element of the bispectrum as explained in section 3.1.1. In some cases this process of computing the 3 rd order cumulant sequence involves complex and negative values that cause the dislocation of the resulted bispectral estimates when included  within the dual Fourier transformation required by the bispectrum's computation.
Nevertheless, from a capacity planning point of view it is fairly obvious by Fig. 9 that the power spectrum (ARIMA/periodogram) estimations produce more erroneous indications which subsequently lead to wrong conclusions. By considering the fact that the WIDE link holds a maximum capacity of 100M bps (during the trace collection period), the ARIMA predictions alarm two cases in t ≈ 0 and t ≈ 1 where the link is overflowed with instant link utilisation of 102 and 101M bps respectively. Under a real scenario, the network operator in this case would seriously consider the forecasting provided by the ARIMA model and would regard over-provisioning the infrastructure in the short term (e.g., through redundant links), which would essentially result in unnecessary extra cost for an ISP. On the contrary, the bispectral distributions in all the cases have reasonably matched the majority of peaks and are by far closer to the real valued peaks exposed by the original traffic signal.

Importance of validating statistical assumptions -ARIMA
modelling on volume aggregates This section presents our aggregate volume analysis on both the Keio traces in order to illustrate the importance evident that a suitable model for the byte series would be a linear exponential smoothing with 2 nd order differentiation (I=2) and two MA steps (MA=2) with no autoregressive components (AR=0), in contrast to the packet series where an ARIMA (1, 2, 3) is more suitable. In addition, Keio-I's byte count modelling differs with the profiling we perform on Keio-II since the Keio-II's byte dataset fits with ARIMA (2, 1, 3).
As opposed to the byte count profiling, packets in both datasets go with the same characterisation scheme under ARIMA (1, 2, 3). The generated outcomes imply that a monolithic blanket scheme cannot be applied since each raw dataset surely satisfies different assumptions with respect to stationarity, linearity and Gaussianity (section 3.3). As presented in section 4, none of the studied datasets (including the WIDE trace) can be sufficiently modelled using an ARIMA process, since all of them exhibit non-stationary characteristics even when differentiated in several orders, thus a non-stationary modelling approach is more appropriate. We also note that model fits estimated by a linearlybased model such as ARIMA depend on whether the initial raw timeseries are linear or not. For instance, the consistency of fitting the exact same ARIMA model on Keio-I/II packets was mainly due to the fact that both datasets have been classified as linear (even though non-stationary) by the Hinich algorithms.

Protocol-specific characterisation
Despite the fact that volume peaks are successfully detected by the bispectrum, we still lack an interpretation of the main cause of the traffic burst (e.g., which application flows are involved). The main reason for this inability is due to directly dealing with averaged values of the volume aggregate. Consequently, such a scheme poses a level of opaqueness on specifically identifying the flows associated with any particular traffic surge. A more granular methodology which employs protocol-specific modelling allows the identification and extraction of any flows that contribute significantly to the overall burst of the traffic process. Due to the diverse behaviour exhibited by the different protocols as demonstrated by the statistical assumption validation performed earlier (sections 2.2 and 3.3), we employ the bispectrum on each transport layer protocol and individually assess its dynamics.

Keio-I Analysis
In this section we present the analysis performed by the bispectrum only on the Keio-I trace. Fig. 10 graphically represents the bispectral estimates generated for counts of packets and bytes for each protocol, after our series were differentiated in order to remove drifts and noise components. It can be distinctively observed that each protocol exposes certain phase transition peaks on different 2D frequency points, indicating high demands or anomalous behaviour. The bispectral outcomes of the TCP analysis produced consistent characteristics on both packet and byte representations since high peaks due to sudden packet transmissions are normally mapped to high peaks on byte flows. Yet, there are certain phase shifts particularly on the bytes analysis indicating unknown behaviour, which is one extra reason to emphasise the need for separate analysis of byte and packet counts.
Generally in all the three protocols, for both packet and byte-based estimates, there were several distinct peaks within the range of points (0.2-0.4, 0.15-0.4) related to sudden changes caused by a range of applications. The most dominant applications with high frequency mappings were SSH, DNS, HTTP, HTTPS as well as the NNTP protocol. In addition, reasonably high frequency shifts are mainly caused by mail from POP3 and SMTP as well as by RTP streaming. There are particular byte-based estimates having a clear localisation on the time-frequency (TF) plane, but at the same time they cannot be directly linked to the packet frequency characteristics. Especially around point (0.4, 0.2) on TCP, the packet analysis shows a modest peak governing a large area (compared to the rest of the peaks on the same image) whereas the bytebased analysis indicates that the particular region is ruled by phase shifts caused by numerous bytes but not packet transmissions. After mining the raw dataset using the bispectral estimates, we identified an attack targeting TCP port 135 which is used as a service port by the Remote Procedure Call (RPC) protocol. The attacker used various source ports from the same machine to send a large number of single-packet flows with each packet being maximumsized. Due to the nature of the single-packet flows, it was not possible to detect this phase transition in the timeseries of packet counts, yet the bispectrum was sensitive enough to reveal it from the byte count timeseries. UDP applications exhibit greater phase shifts in packets than they do in bytes. Phase transitions observed are consistent with the analysis presented previously (sec-  ICMP exhibits characteristics on both packet and byte volume features, which were not extracted from the initial aggregate volume profiling. As the representative bispectra show, there are peaks on the expected point regions (i.e. 0.2-0.4 in both axes) having sporadic, scattered shifts. There are three distinct peaks on the packet-level observations at (0.36, 0.28), (0.36, 0.35) and (0.29, 0.35), and five peaks from a bytes perspective. It is quite interesting that one of the packet phase shifts at (0.29, 0.35) is mapped to ICMP inverse mapping. Specifically, an attacker sent ICMP reply packets from a single source to a range of IP addresses behind the Keio university firewall (which did not block them), resulting in a router to respond with an ICMP Destination Unreachable message. This way, the attacker was able to identify the next hop's IP address and expose (a part of) the network's internal topology.
In addition to the inverse mapping, we also identified certain misbehaviours from the bytes analysis. One out of the five peaks, specifically referenced at point (0.4, 0.29) is the incremental behaviour of ICMP flows consisting of a single packet of a length greater than or equal to the typical Ethernet Maximum Transmission Unit (MTU). These flows are known as pings of death, a very well-known attack triggered by ICMP. Apart from this byte-wise anomaly, the rest of the peaks refer to legitimate traffic which was mainly caused by normal ICMP interactions of internal updates taking place within the Keio network initiated by syslog operations. Figure 11, depicts the produced bispectra for each volume feature (i.e. bytes and packets) on each transport layer protocol. As evidenced, there was a number of phase transitions on the TF bispectral representation that are correspondingly mapped as sudden traffic peaks on each transport layer protocol in the WIDE trace. By considering all the peaks from all the examined protocols we identify the majority to appear in the range of points (0.2 − 0.4, 0.25 − 0.35) on the bispectral 2D estimate. The protocol-specific analysis has demonstrated a range of intriguing traffic peaks that aided towards a better interpretation of the traffic-wise behaviour in the WIDE trace.

WIDE Analysis
Similarly with the Keio analysis shown earlier and due to the fact that both traces were captured in the same year (i.e. 2006), the traffic trends and peaks were quite similar. In more detail, the most of the identified peaks were due to the most dominant applications of HTTP/HTTPS, SSH, DNS as well as SMTP and NNTP. By contrast with the Keio trace, the WIDE trace also exhibited some distinct peaks caused by the Napster protocol, RTSP and as well as game traffic particularly from games such as Starcraft and Halflife.
TCP applications on a per-packet analysis demonstrated higher amplitudes in their signal phase shifts rather than from a bytes viewpoint. There were three distinct peaks at (0.25, 0.25), (0.3, 0.33) and (0.35, 0.3) that had similar characteristics with respect to the time period of these flows. In fact, all three unidirectional flows had a relatively small duration in the range of 0.3 − 0.5ms but with relatively high number of packets. The first two peaks were associated with HTTPS whereas the third peak was related to a large file transfer over the FTP protocol. Interestingly enough, these peaks were not associated with the TCP byte-based phase shifts identified. As demonstrated in Fig. 11, there were two major phase shifts. From a manual inspection we derived that these shifts were associated with unidirectional flows from the same source IP address and they had a relatively small number of packets ( i.e. < 20) but with high number of bytes. Most of the packets were reaching the MTU threshold of 1500 bytes/packet and the applications associated were the Squid -an HTTP web proxy protocol-and SSH.
As demonstrated via Fig. 11 there were three significant phase shifts on UDP flows. . Two of these shifts on both packets and bytes were related with the same unidirectional UDP flows since they had a quite small duration but rather a large number of packets with relatively high number of bytes for each. A detailed inspection revealed that these flows were triggered by a RealAudio server that was streaming data over the UDP port 6970. The third packetbased shift at (0.43, 0.25) was identified as a single ,packetwise, large DNS flow in which despite the relatively same size for its packet had an extremely large number of packets with small inter-arrival time. Similarly, the byte-based shift at (0.42, 0.26) was mapped with a large Starcraft flow in which the last 15 packets were byte-wise large and they had a much smaller inter-arrival time compared to the previous packets within the same flow. Thus, a spike with respect to the UDP's byte-wise signal amplitude was localised by the bispectral estimates resulting to the detected phase shift.
From a packets perspective, the bispectra produced for the ICMP protocol revealed three distinct peaks at points (0.3, 0.35), (0.36, 0.33) and (0.44, 0.28). All three peaks were extracted from the raw datasets and they were identified as ICMP scanning activities. The first two peaks were originating from the same source IP address that performed horizontal scans from the port 2048 on multiple DNS domains. Both scans exhibited a large number of ICMP Echo Request packets sent within a small amount of time, thus the amplitude with respect to the TF behaviour of the packet-based timeseries got reasonably high and it was flagged as a phase shift from the resulted bispectrum estimate. The third shift at (0.44, 0.28) was a vertical scan in which a particular source IP address was sending multiple ICMP Echo Request packets on a given destination IP address over multiple ports on the same machine. We argue, that this type of scan had a malicious intent, since the same source IP address appeared to be related with the single peak identified on the byte-based bispectral at point (0. 35, 0.34) in which a ping of death was initiated A C C E P T E D M A N U S C R I P T on a different destination host. The two packets contained within that ICMP flow had a size of over 2500 bytes each; hence much bigger than the MTU threshold.

EQUCHAnalysis
The protocol-specific bispectrum analysis on the EQUCHtrace verified that the behaviour of the Internet has changed in terms of which application layer protocols are mostly utilised. In comparison with the analysis conducted on the Keio and WIDE traces that were captured more than a decade ago, the changes related to higher traffic volume and speeds as well as the trends mapping the diversity of application layer protocols dependent on TCP and UDP were quite clear. Nonetheless, as we following describe there were still some similarities, particularly for the analysis performed on the ICMP protocol.
The TCP protocol's volume was the most dominant throughout the trace and, as anticipated, distinct peaks were identified from both a packet and bytes perspective. As illustrated via Fig. 12, there were two distinct peaks pinpointed by the bispectral estimates at (0.32, 0.38) and (0.36, 0.30) that were also related with two out of the three peaks in the byte-based analysis at points (0.30, 0.36) and (0.35, 0.34) respectively. A further post-processing of the raw captured data indicated that these phase shifts were TCP flows in which the extracted IP addresses participated on social network platforms over the usual TCP port 80 via HTTPS and then they were redirected to YouTube via TCP port 443. Subsequently, the users were assigned to a new connection with YouTube for actually streaming the video content over the TCP port 1935 that is dedicated for the RealTime Media Player protocol (RMTP). This series of events was related to transmitted flows with low duration and in parallel high numbers of packets per flow. In particular, the phase shifts were mostly instantaneous with a large number of users redirected to port 1935 via YouTube. Our analysis could not reveal on whether the redirected users were watching the same video content, however the phenomenon was quite similar with that of a Flash Crowd. Moreover, all the flows associated to these signal phase shifts were peaks that also had a quite considerable count of bytes per packet, hence the redirection of the users to the TCP port 1935 caused sudden increments in the frequency of byte counts. Therefore, a higher amplitude of the byte-based signal on the bispectrum's TF plane was associated with two distinct peaks. The byte-based analysis also shown a third peak at (0.35, 0.36) that also had a quite interesting interpretation in terms of the network's behaviour. In more detail, the third peak was identified to have the behaviour of a typical TCP SYN stealth scan where a single source IP address seemed to perform a mixed vertical and horizontal scan on multiple hosts over multiple ports on a particular destination host. Apparently, the extracted scan didn't hold the typical properties of a normal TCP SYN scan as initiated by tools such as NMAP since the single packet flows had a size of more than 800 bytes each. Therefore, we argue that this scan was carefully crafted and it was probably the probing procedure of a greater botnet.
Of malicious intent were also UDP flows associated with three out of the four phase transition peaks on the packet-based analysis as depicted in Fig. 12. The three packet-based peaks were also linked with one of the two shifts on the byte-based analysis whereas the second bytebased shifts at (0.39, 0.31) was caused by high-byte VoIP flows. In more detail, the packet-based peaks at (0.28, 0.27), (0.31, 0.31) and (0.35, 0.30) were caused by the incremental packet-based behaviour of UDP flows between multiple hosts and a particular source IP address on UDP port 10050. A further investigation pointed that these flows were in fact initiated through the Zabbix agent protocol that is used for pulling real-time system information and performance metrics from Windows servers and workstations. Due to the real-time nature of the Zabbix agent there were in some cases relatively small duration flows (e.g. < 15s duration) with high packet but as well byte rates (e.g. > 30K packets per flow and >1300 bytes per packet) that eventually caused the sudden phase shifts detected by the produced bispectra. The fourth packet-based peak at (0.35, 0.35) appeared to be the result of multiple hosts communicating with a single destination IP address through their UDP port 8080. This particular UDP port has been flagged to be used by the Backdoor.Tjserv trojan that acts as a HTTP and SOCKS4/5 proxy and it serves the purpose of listening remote commands from a botserver/botmaster. Thus, we infer that the identified destination IP address could potentially be a command and control server of a greater botnet.
The ICMP analysis demonstrated quite similar characteristics with the other two traces (i.e. Keio and WIDE) in regards of the actual nature of flows that caused the detected peaks. All the three peaks detected for the packetbased bispectrum analysis at Fig. 12 were due to ICMP ping sweeps that are common scanning techniques initiated by tools such as NMAP. In fact, these scans were triggered by a single source IP address over multiple IP addresses over different DNS domains (i.e. horizontal scans) and they were characterized by relatively large packets (i.e. > 12 packets per flow) for extremely small flows with respect to their duration (i.e. <5s). The three peaks localized by the bispectrum over the byte-based analysis were all single packet flows with a byte size over the MTU threshold (i.e. > 1500 bytes per packet); the so-called pings of death.

Conclusions
Current trends in network traffic profiling include the examination of aggregate volume characteristics, and the use of models that inherently assume stationarity, linearity and Gaussianity of the timeseries. In this paper, we empirically showed that such statistical assumptions do not hold universally, thus they should be rigorously validated beforehand. We proposed the use of Time-Frequency representations for determining stationarity of a timeseries, and A C C E P T E D M A N U S C R I P T the Hinich algorithms for validating linearity and Gaussianity. In addition, we proposed the bispectrum for traffic characterisation, and demonstrate its capability of accurately exposing traffic fluctuations by capturing a signal's phase information. The combination of our theoretical and experimental evaluation with our traffic traces collected from backbone and edge links shows that: • The applicability of any traffic model depends on the underlying statistical assumptions of stationarity, linearity and Gaussianity, which can not be assumed for the entirety of a traffic trace. Hence, such assumptions should be rigorously validated before applying any scheme. Failure to do so results in modelling inaccuracies.
• We propose to use the bispectrum to assess transportlayer protocol or aggregate traffic characteristics, which provides a novel and accurate methodology for revealing application-layer activities and sudden traffic changes. We showed that this capability can be beneficially used within crucial traffic engineering tasks such as capacity planning as well as anomaly and traffic surge detection.
• In contrast to the schemes where only aggregate volume is considered (as e.g. in [2,6,11,13,16,17]), our proposed scheme can also employ protocol-based traffic decomposition to enable a detailed characterisation of the dynamics exhibited by different types of traffic and allow us to extract otherwise hidden patterns such as protocol-specific attacks (e.g. ICMP inverse mapping).
• Link traffic exhibits significant fluctuations in short timescales (30 mins.) which may be described by diverse models (rather than holistic ones, both time and protocol-wise), not adhering to identical statistical assumptions such as stationarity, linearity and Gaussianity.
• Byte counts mostly expose different modelling characteristics with respect to their Gaussian and linear properties than packet counts. Thus, a volume-based approach should consider the analysis of bytes and packets separately.
This work contributes towards the interpretation of traffic dynamics by taking a bottom-up approach and validating the de facto assumptions of stationarity, Gaussianity and linearity. It illustrates that these modelling assumptions do change with respect to time and either on an aggregate or protocol-specific viewpoint; thus their validation in the pre-modelling stage is extremely important for the composition of an accurate traffic model. By virtue of the fluctuating traffic behaviour exposed either on aggregate volume or at a protocol-specific level, this paper proposes the use of the bispectrum which can adapt to such dynamics and consequently provide useful insights for crucial traffic engineering tasks. Journal and IEEE TNSM, as well being editor of the Wiley book series in Computer Networks and Distributed Systems. He has helped build a strong research group in computer networks, which is well known internationally for contributions in a range of areas including Quality of Service architecture and mechanisms, multimedia caching and filtering, multicast engineering, active and programmable networking, content distribution networks, mobile IPv6 systems and applications, communications infrastructures for Grid based systems, testbed activities, and Internet Science. He now focuses largely on resilient and secure networking, with interests in Future Internet and also the protection of critical infrastructures including industrial control systems.

A C C E P T E D M A N U S C R I P T Appendix A. Bispectrum Properties
We explain the symmetrical properties which enable us to correctly relate bispectral estimates with the real events as captured in our raw dataset. Hasselman et al. used the Cramer spectral representations and provided a visualization of the symmetries that exist in the bispectrum, and elaborated upon their importance on real processes [34]. Based on symmetries that exist on the 3 rd order cumulant sequence c 3 (τ 1 , τ 2 ) [35], there is a reflective symmetry for the bispectrum as expressed next: In addition, for a real process g(t) the bispectrum is equal to its complex conjugate: where, −π ≤ ω 1 ≤ π, −π ≤ ω 2 ≤ π, −π ≤ ω 1 + ω 2 ≤ π (A.3) In simple terms, equation A.2 given the limits for the three independent Fourier components , ω 1 , ω 2 and (ω 1 + ω 2 ) in A.1 denote that under a real process scenario we may only consider a single region since the rest bispectral estimate domains are symmetric and may be approximated by knowing just one of them. For our case of representing different harmonic Fourier components(i.e. network traffic on different frequencies) in our dataset, we focus on a single region on the bispectrum and identify phase transitions from estimated peaks which are mapped as the exact initiation and time duration of significant volume-wise traffic fluctuations in our datasets. Even though the symmetry relationships provided by equations A.1 and A.2 are based on the angular frequency on the TF plane, the symmetry may also be viewed separately in the time and frequency dimension. Fig A.13 illustrates the symmetries expressed