Machine learning in eigensubspace for network path identiﬁcation and ﬂow forecast

This paper emphasizes the joint time-frequency interpretation of eigensubspace representation for network statistics as features for identiﬁcation and tracking trafﬁc ﬂows based on the link level activity. Eigencoefﬁcients (frequency domain feature) and eigenvector components (time domain feature) are jointly utilized to quantify their combined signiﬁcance on the representation of each link data (each component of the link trafﬁc vector) in the eigensubspace . The joint time-frequency method is employed to analyze the trafﬁc data obtained from the Internet2 network. It is shown that the analysis with link-level resolution brings advantages for network trafﬁc engineering applications. A machine learning method is investigated to identify network paths using eigenanalysis of link statistics as the feature set. The merit of the method is validated by experimental studies of the network scenarios considered in the paper. Eigenvectors and eigenﬂows in the subspace are jointly used as factors (features) for linear regression to forecast the network link trafﬁc. It is demonstrated that the eigensubspace based auto-regressive order two, AR (2), predictor is superior to the time-domain based predictor to forecast the link level trafﬁc of a network.


INTRODUCTION
Recent developments in the analysis of network traffic data have mostly focused on classification and characterization of flows by using the flow level statistics [1][2][3][4][5][6][7][8][9]. Principal Component Analysis (PCA) was used to analyze the IP flow statistics to identify anomalous behaviour in the network [10]. They used PCA as a dimension reduction tool by reducing a very large number of flows into a few most significant eigenflows for analysis. Similarly, PCA was used for traffic classification with the IP flow statistics [3]. One of the major challenges with these techniques is the high number of flows in the network, which makes them impractical. The merits of analyzing the traffic data with PCA to assess the state of the network for certain cases were presented in [10]. They argued that shifts in traffic behaviour can be analyzed better by looking at multiple links. They collected the time series data of the Origin-Destination (O-D) pair IP flows traversing multiple links in the network and transformed them onto an eigensubspace using PCA. It was demonstrated that the overall network behaviour could be studied by analyzing the first few This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Communications published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology eigenflows (eigencoefficients). This scheme suffers from two systemic problems. First, one needs to collect packet level statistics on a per flow basis for a large number of flows; there are n(n − 1)∕2 flows for a network of n nodes. This is computationally very expensive in terms of data collection and processing. Second, these eigenflows do not reveal the significance and the status of a specific link within the O-D flow in the network. Therefore, if an O-D flow is identified as anomalous, it is not possible to find which link in the path is the root cause of the problem. We address these issues by demonstrating the advantages of the proposed method in the following sections.
The PCA for flow decomposition and prediction by adding extensions and improvement to the basic PCA was used in [11]. They have primarily employed robust PCA (RPCA) that allows the normal data to dwell in a low-dimensional space [12]. This is due to the strong correlation among normal observations, and special events as well as allowing noise to dwell in a sparse subspace. This technique was applied to highway traffic data. This paper reinforces our perspective of using the PCA components for prediction, but the use of RPCA is not applicable for our case sine the focus here is for telecom network traffic data. The PCA based network traffic prediction problem was also studied in [13]. In that study, the Kmeans clustering algorithm to separate the flows into relevant groups was utilized. It also suffers from the same problems discussed above.
An overview of machine learning (ML) methods to analyze networks and their flows is given in [14]. They presented the state-of-the-art deep learning architectures and algorithms relevant to the network traffic control systems, and demonstrated a use case for intelligent traffic routing. They also discussed the applicability of deep learning for network flow prediction. Another survey on ML applications in all aspects of networking, including traffic classification, prediction, routing, congestion control, resource management, fault management and network security and their merits are given in [15].
Here, we utilize the link level statistics to identify large flows in the network by using PCA (eigendecomposition) in a novel way [1]. This technique reduces the data collection overhead and provides a finer tracking of the flows at the link level resolution. It allows improved solutions to network problems. We also use linear regression to predict eigenvectors and eigencoefficients that are successfully used as features to forecast the network traffic. Advantages of the proposed methods are explained and verified through performance simulations for the real data of Internet2 network.

EIGENANALYSIS AND EIGENFLOWS
The eigendecomposition of N × N matrix R is expressed as [1] where Λ is the diagonal matrix with its elements as the eigenvalues, and equivalently, R is the covariance matrix of eigencoefficients. Φ is the eigenmatrix of R and populated by the eigenvectors as its columns, and T indicates matrix transpose operation. It defines the resulting N-dimensional orthonormal eigensubspace for the given R [16].
Here, eigensubspace representations of random vectors, that describe traffic variables like link bandwidth at measurement instances in megabits per second (Mbps), are used to analyze and understand the status of communication networks. Let us assume a network of N links, and the snapshot of the link traffic vector at a periodic measurement time point is expressed by r T = [r 1 , r 2 , … , r N ]. The empirical correlation matrix R of link traffic is calculated as for the predefined historical measurement data window of W samples as [1] where r i j = cov(r i r j )∕( r i r j ) for W , and r i and r j are the standard deviations of r i and r j , respectively.
It is used in (1) to create the eigensubspace expressed in Φ matrix. The measurements are repeated periodically to update the eigensubspace due to the statistical variations (non-stationarity) of the network dynamics. The snapshot of N link bandwidths populate the column vector c n where n is the measurement time index in the regular clock with the assumption of the stationarity during the update period until n+1. Then, one can project this link traffic vector onto the currently defined eigensubspace as (forward transform) [1] where the most significant eigenvector, also known as the principal eigenvector, is commonly the first column of the eigenmatrixΦ.
The link variables are expressed through the inverse transformation operator as Now, we can rewrite this expression more explicitly as follows: Note that the components of the first eigenvector in (5) are [Φ 11 Φ 21 … Φ N 1 ] as the first column vector of Φ. We compute eigenflows { n k , k = 1, 2, … N } that characterize the entire network at time n by using (3). Figure 1(a) displays the variations of the most significant eigenflow, n 1 at time n, for the network. The components of the principal eigenvector are displayed in Figure 1(b) for the case where 1 week of the Internet2 link data is used [16]. These components represent the contribution from the corresponding links into the most significant eigenflow.
As a corollary to (4), we express the k th link traffic for the measurement at time n in the eigensubspace as In the next section, we define the joint time-frequency matrix to utilize the link or path specific (time/signal domain) features of the network along with the eigenflows.

REPRESENTATION OF A LINK IN EIGENSUBSPACE
Let {c n l } represent link measurements at time instance n for the entire network, l =1,2, … N. Let us define a more detailed eigensubspace representation of the network status where the product of eigenflows and eigenvectors are interpreted for analysis as shown in the matrix where diag{ } = I n = diag{Θ} (8) and, I is N × N identity matrix. It is explicitly shown at time n as The sum of the elements of the l th row of is equivalent to the traffic measurement on the l th link c n l as given in (6). is called the joint time-frequency matrix. Figure 2(b) displays the joint time-frequency matrix for the link measurements of the network that we analyze in the paper [1,2]. We will use this matrix to identify large network data flows through the network topology as described in Section 5. In the next section, we give a new interpretation of eigenflows with the link-level focus for the network using the eigensubspace representation framework described above.

LINK-LEVEL INTERPRETATION OF EIGENFLOWS
The vector in (3) represents the eigenflows (eigencoefficients) in the network at time n. For example, the first eigenflow is the inner   (9) below where the vertical axis is the component indices of eigenvectors and the horizontal axis is the eigenvector indices [1,2]. In contrast to the traditional PCA based studies reported in the literature [17][18][19], our experiments show that a few most significant eigenvectors and eigenflows (dimension reduction) are not always able to capture all of the network characteristics. They are rather spread out to a larger subset as shown in Figure 2(a). We highlight that some components of eigenvectors with low eigenvalues that may not survive the dimension reduction step can be significant for the representation of certain link traffic in the eigensubspace. Therefore, we need to jointly look at the variations of all eigenvectors and all eigenflows in time in order to better track network dynamics and anomalies with the highest level of representation granularity.
Eigenflow (frequency domain) interpretation of network dynamics does not emphasize link or path specific (time/signal domain) features in a network. These features are equally critical to assess the state of the network. We propose the joint timefrequency interpretation [2] of eigenanalysis for network engineering as explained in the previous section [1]. The next section defines the process of identification of large flows using the techniques described in earlier sections.

IDENTIFICATION OF LARGE NETWORK FLOWS
We demonstrate the use of the joint time-frequency matrix to identify large network flows. The identification process is comprised of the following steps.
• Step 1: Identify the column vector containing the most significant components. This is done by taking the L2-norm of the column vectors and sorting them. It is seen from Figure 2(b) that the 7th column vector of has the the most significant components for this case. • Step 2: Find the components of the vector identified in Step 1 with significant contributions by setting a threshold = 0.35. The components of that vector are plotted as a bar graph in Figure 3(a).
• Step 3: Map the components on the network topology [16].
• Step 4: Identify the paths in the network created by the identified eigenvector components. This is done by verifying that more than one link corresponding to the components are connected together to form a path. It is seen in Figure 3(b) that the links identified in Step 3 traverse a path taken by a flow from DC to LA. We also observe that there is a second path taken by another flow from Seattle to Houston. • Step 5: Go back to Step1 and repeat the process for the next significant column vector of .
Note that Link6 is mapped in the figure even though it is below the threshold because its two adjacent links are significant contributors to a large flow. By inference, it has to pass through this link. While performing Step 4, there may be some links that are above the threshold but are not connected to any other link to form a path. Such links can be safely ignored from further analysis, mapping or plotting. For example, in Figure 3(b), Link5 and Link20 are ignored.
The link-level interpretation of eigenanalysis and its mapping on the topology demonstrates that large flows in the network can be detected. It helps us to better understand the network activity in a real-world scenario with a more efficient implementation than the currently used methods [3][4][5]. Note that the selection of the threshold to identify significant links requires some experimental study on the network of interest and may also be automated.
There are various applications of this research. Network traffic has been growing drastically in the last decade with the advent of smart phones and proliferation of video applications.
Due to this, it has become a huge challenge to characterize, forecast and engineer network traffic. The traffic largely comprises many small flows and a few large flows over a period of time with their specific network requirements. The identification of these large flows as addressed in the paper has a major impact in the overall network performance and security. If the flow is from a trusted source on the expected path, then the information is used for traffic engineering and optimization applications. The link level granularity of data is instrumental in long-term network planning applications. It is also used in congestion control, resource management, fault management and QOS management to name a few. Besides the network applications mentioned above, there are several other uses in the areas of application monitoring, security awareness and intrusion detection, policy validation and service assurance. For example, if a large flow is identified and the source or the path is anomalous, then it could be security violation or a Denial of Service (DOS) attack. It is extremely difficult to build these applications that infer the network behaviour by merely looking at the traffic on a single link or to the entire network. Thus, the joint time-frequency interpretation of network traffic data in the eigensubspace gives better insight and features in analyzing link level and network wide behaviour.

NETWORK FORECAST IN EIGENSUBSPACE
Using variations of eigenvectors and eigencoefficients, we train a linear regression model to predict their future values which are tied to link parameters of the network. A subset of the link traffic data provided by a large service provider is used for this study. The network characteristics are summarized as follows: This 3D array might also be represented as L d l ,t with the same subscripts and superscript as defined. For convenience, we drop subscript and/or superscript and use the following notation.
L d -matrix of link traffic measurements for day d, and for all 24 links, where each row vector of the matrix is 12-dimensional and populated by the corresponding measurements of the 5min intervals for the given hour of the day. In this experiment, we only used the hour between 9:00 PM and 10:00 PM.
L d l -corresponds to a single row vector of L d matrix as mentioned above for link l.
The empirical correlation matrix of size 24 × 24 for the link traffic on a given day d is computed as where i and j are the link indices, and the pairwise correlations r ij = E{L d i L d j } are calculated based on the 12 measurements of the 5-min intervals. The eigenmatrix d of the network for day d is expressed as Then, the daily eigencoefficient matrix d of size N × T comprised of eigencoefficient vectors calculated for each 5-min interval of 9:00 to 10:00 PM as its columns is defined in the current subspace d as Now, we drop the superscript d for convenience and expand the equation where the first row of [ d ] T is the principal eigenvector as follows: The first column vector of the eigencoefficient matrix in (13) corresponds to the first 5-min time interval and written as Similarly, its second column corresponds to the second 5-min time interval as and so on. As a result, the elements of the first row vector of the N × T eigencoefficient matrix where N = 24 and T = 12 in this example. Similarly, the second row corresponds to the second most significant eigencoefficients for those 5-min time intervals, and so on. A subset of the rows of Θ matrix given in (13) are used for linear regression and prediction of link values as discussed later in this section.
From (4), the traffic of the link l, for day d, and the specific 5-min time interval t is calculated as follows: where V is the pre-defined number of most significant vectors (dimension reduction based on explained variance, V ≤ N) running on the dimension index k as used to approximate the link traffic values in the subspace. When we drop the day superscript d, for t = 1 and V = N (no dimension reduction) (16) is expanded as In (17), only the first column of the Θ matrix has been used to compute the link traffic values for the first 5-min time slot. Similarly, the link values for the second 5-min time interval is computed by using the second column of the Θ matrix as follows: Hence, the link traffic measurement matrix is defined for each 5-min time slot, and for all the links when T = 12 and N = 24 as Now, we use the eigenvectors and eigencoefficients of the previous 2 days to predict the link traffic for the next day as [20,21] where 1 and 2 are the auto-regressive order two, AR(2), regression model parameters used to predict the eigenmatrixΦ, Φ is the prediction (white) noise corresponding to the AR(2) process of the eigenmatrix Φ. 1 and 2 are the AR(2) regression model parameters to predict the eigencoefficients matrix Θ Θ is the prediction (white) noise corresponding to the AR(2) process of the eigencoefficients matrix Θ.
By using the predictedΦ d +1 andΘ d +1 in (4) we computê L d +1 l as expressed in (22) beloŵ where V is the reduced dimension in the eigensubspace, V≤N, to approximate the time series with the permissible prediction error (or explained variance).

PERFORMANCE COMPARISON OF TIME SERIES AND PCA BASED NETWORK FORECASTS
In this section, we develop an auto-regressive, order two, AR(2), model to predict the link traffic of a network based on its historical data [14]. We built AR(2) predictors in the time domain and also in the eigensubspace, and compared their performances as explained in the following two subsections.

A. Linear Regression in Time Domain
We employed a vector auto-regressive model for linear regression in the time domain [22]. The 3D data set L d l ,t introduced in Section 6 is labelled as L t . We regressed the link traffic in the time domain for day d by using the data of the previous days d − i, i = 1, 2, … , M by using the vector auto-regressive (VAR) model expressed as [22,23] where C t is a column vector of constant offsets, t ,i is a vector of AR parameters of the model, M is the order of the auto-regressive model for t, M = 2 for AR(2) model, t is the time index of 5-min intervals for the same one hour period in each day t =1, 2, … , T where T = 12, t is a column vector of white noise. Equation (23) is rewritten for AR(2) as follows: The column vectorsL d 1 toL d 12 are combined to form the forecasted link traffic matrix for 5-min intervals for day d. Then, we estimate the parameters of AR(2) by using Equation (25). Figure 4 displays the link measurements for the link l = 24 of day d along with their forecasted values by the AR(2) model in the time domain. It is seen from Figure 4 that the downward trend of the link traffic is forecasted with about 10% prediction error. We will repeat the same experiment in the eigensubspace in the next section and show its merit.

B. Linear Regression with Joint Time-Frequency Features in Eigensubspace
In this experiment, the regression is performed by using the and eigenmatrix , and the link traffic is predicted as per (22). We used the same data set as the one used in the time-domain based experiment. The traffic measurements are mapped onto the currently defined eigensubspace. A small subset of eigenflows (dimension reduction) are used in the parameter calculations of AR(2) model. The eigensubspace based AR (2) predictor results are also displayed in Figure 4. It is observed from the figure that the eigensubspace based linear regression model forecasts the network traffic more accurately than the time domain based prediction [24].

MACHINE LEARNING TO ID ENTIFY PATHS WITH NETWORK FEATURES IN EIGENSUBSPACE
The use of machine learning methods for network traffic classification and flow prediction was reported in [14]. We extend that work to investigate the use of network eigenfeatures in such learning methods. In Section 5, we used manual thresholding of the joint time-frequency matrix elements defined in the eigensubspace to identify large network flows. The manual thresholding as described in Section 5 suffers from several short comings: a. The threshold needs to be set heuristically based on historical data. Therefore, the threshold detection will fail as soon as the trend diverges from the historical trend. Hence, it may produce inaccurate results in the long-run when the nonstationarity in data becomes significant. b. The value of threshold is crucial for this method, and it becomes a major challenge due to the exponential growth in broadband traffic and increased data types. It may be an interesting research topic to employ ML method for threshold selection that is beyond the scope of this study. c. Finally, the usage patterns are driven by many events and large geographies where the thresholding method may not work as the centre of gravity of source and destinations of the traffic change.
Hence, we came up with the ML method that learns from its past experience and quickly adapts to new trend in the traffic characteristics automatically without manual intervention.
In this section, we present the method to automate the identification process. Instead of thresholding, we employ a machine learning technique utilizing the same column vectors of the joint time-frequency matrix, see (9), to identify paths carrying large traffic flows. It is based on an artificial neural network (ANN) using a multilayer perceptron (MLP) model.
The MLP is a class of feed-forward (FF) artificial neural network [25]. A single neuron MLP model is shown in Figure 5.
Its inputs x T = [x 1 x 2 ⋯ x n ] are weighted by the coefficients w T =[w 1 w 2 ⋯ w n ] and summed together in . The output y is obtained through the activation function g( ) with Each node in a neuron uses a nonlinear activation function except for the input nodes [25].
The MLP model is expressed as follows: where There are several activation functions commonly used in machine learning algorithms. We mostly used the sigmoid function in this study given as We also experimented hyperbolic tangent and other activation functions as discussed later.
In this study, we use the neural network that is comprised of the input layer, one or more hidden layers and the output layer as displayed in Figure 6.
We use the column vectors of the joint time-frequency matrix of the network statistics defined in (9) as the input to the neural network. The network path numbers are labelled as the outputs of the neural network model. We have a training set of p pairs of i , y i where i is a column vector of the joint timefrequency matrix and y i is a network path index corresponding to that column vector where i =1 … p. The network paths are identified by using the method described in 5. Herein, we intend to calculate the weights of the neurons that map a given joint time-frequency vector to the corresponding path index at the output.
The training process is an optimization problem expressed as It can be rewritten as In (29), we assumed that the activation function is the identity map to simplify the optimization model [26]. We use this supervised ANN algorithm in the eigensubspace where built-in dimension reduction is inherently achieved.
The optimization problem defined in (29) is solved to find the optimal sets of weights used by the neural network model to identify the paths. We used the feed forward with back propagation (FFBP) technique for learning as described in [27][28][29]. This technique is implemented as an unconstrained optimization that uses a gradient descent algorithm. The gradient descent algorithm used to find the optimal weights of the model is written as where k is the index of iteration step, (k) is the learning rate, e (k) is the error between actual and predicted value. It is noted that, we use the batch gradient descent (BGD) [30] in (30) due to its superior performance over scaled conjugate gradient (SCG) [31], Levenberg-Marquardt (LM) [32,33] and Bayesian regularization (BR) algorithms [34,35]. It yields an unbiased estimate of gradients and theoretically guaranteed to converge to the global minimum along with a straight trajectory if the loss function is convex.
Similarly, we used the tan sigmoid (TS) and Elliott Sigmoid (ES) activation functions [32]. We also used the minimization loss function of mean squared error (MSE) and cross-entropy error (CEE) in our experimental studies and identified the best ones for the task at hand [36,37].
We experimented with various optimization algorithms and activation functions to train of the network. We varied the number of neurons in the hidden layers and the size of training data features in the eigensubspace, (9), to evaluate the convergence and performance of the algorithm.
Note that each i column vector of (9) corresponds to a network path identified with the label y i . Using the network traffic data, we identified and labelled the large network flows traversing a certain path in the network during the training step. We had limited data for only 200 input vectors with 100 network paths from the actual network. This dataset was too small for these experiments. Therefore, we generated a dataset of 100 times FIGURE 7 Model validation performance larger size by adding 20% Gaussian noise to the actual network measurements i and keeping the same output label for a given pair. This allows us to train the algorithm with sufficient amount of training data.
We found that more than 20,000 raw data samples are required to train our model, which uses a feature set of 30, with the SCG training algorithm. The model is built with 30 neurons in each layer, the TS activation and the CEE loss function. Figure 7 displays CEE error performance of this ANN and its convergence for the network path identification problem, using column vectors of the joint time-frequency matrix in (9), as a function of training data size (epochs).
As a result of this investigation, we conclude that the use of eigenfeatures is an effective tool for network traffic classification. For the traffic characteristics present in the data used, the SGC algorithm with CEE loss function provides the fastest convergence.

COMPUTATIONAL COMPLEXITY
We analyzed the computational complexity of the proposed techniques and compared them with the prior work in [24]. The are two fundamental differences in the collection of time series data. First, the method in [10] collects O-D flows data which can be very large in a given network, versus in our proposed methods it is link level which scales only with the number of links in the network. For example, a network of n nodes, there are n(n − 1)∕2 possible O-D flows in a maximally connected network as compared to (n − 1) links in a minimally connected network. Second, the O-D flow data is collected by sampling the arriving packets; therefore, it depends on the link bandwidth whereas link level data is collected at regular intervals of 5 min regardless of the link bandwidth. Assuming these flows are going over 100 Gbps links with a packet size of 1500 bytes, there are 8.3 M packets/s passing through each interface. If a sampling rate N = 1000 is used, then there are 8.3 K packet headers/second being collected per flow. With a standard 20 bytes packet header size, the total data collected is approximately 600 Mbytes/h. For n = 10 nodes, there are 45 O-D flows, which results in 27 Gb/h data collection volume. Clearly, this is an insurmountable challenge for using this technique. It calls for further increase in sampling rate, aggregation and pre-processing at local nodes. This also adds to the infrastructure overhead and bandwidth load on the network. For the proposed method, the same network with n = 10, we have 9 links to monitor every 5 min, that is, 108 samples/h. Furthermore, this data is already being collected, and therefore, there is no overhead to the network. Therefore, in terms of computational complexity, our method is several orders of magnitude simpler than O-D flow analysis.
It is noted that the O-D flows use Deep Packet Inspection methods to infer the path of the flow using the origin and destination of the packets. They utilize an additional step of routing table lookup to identify the path in the network. The computation complexity analysis of this method is studied well. It is known to be computationally expensive and it cannot be deployed in very large networks [38,39]. A performance comparison of the TomoGravity method used by O-D flows, the traditional PCA and the Deep architecture Long Short-Term Memory traffic matrix prediction method was reported in [40]. They demonstrated the superiority of the PCA and LSTM based methods over the O-D flow (TomoGravity) with respect to the temporal relative prediction error.

CONCLUSIONS
O-D flow based eigensubspace methods are computationally costly. More importantly, they lack the link-level resolution required to efficiently track the dynamics of a heterogeneous network [16]. We developed a granular method to monitor the variations of network traffic to identify anomalies. The method utilizes the eigendecomposition of the empirical correlation matrix of link traffics in a given network. This joint time-frequency interpretation of eigensubspace representation of traffic data provides additional insights to better understand the overall network behaviour as well as the individual links. We demonstrate in the paper that the link-level focus on the statistical analysis leads to identify local anomalies as the building blocks of elephant flows in the network. We also show in the paper that the eigensubspacebased network forecast outperforms the methods that use the time domain based measurements and predictions. We employed ANN with FFBP to identify network paths by using joint time-frequency (eigencoefficient weighted) eigenvectors as features. Specifically, for our network model with 30 inputs (links/vectors) and 100 possible outputs (network paths), we have made the following observations: 1. A training set of approximately 25,000 inputs is required to accurately train an ANN model for network path identification. 2. The same number of neurons are required in the hidden layers as the number of neurons in the input layer.
3. The performance of the learning algorithm does not degrade with up to 20% of random noise in the input vector components. 4. The Scaled Conjugate Gradient (SCG) algorithm with Crossentropy loss function (CEE) shows better performance and speed of convergence in our experiments for the given data set. The rate of convergence differs from one sample data set to another but always converges. 5. Elliott Sigmoid (ES) activation function works faster than tan sigmoid (TS) due to elimination of the exponential (e) factor in the function.
We conclude that an artificial neural network with feed forward and backward propagation model using scaled conjugate gradient optimization can be used as an effective machine learning technique to identify large network flow paths using the proposed joint time-frequency matrix as its input data features in the eigensubspace. This novel technique provides new insights to the operators for automating the network engineering operations.