Computer Network Flow Recognition Method Based on Improved Support Vector Machine

Computer network traffic recognition based on improved support vector machine is a defect of current mainstream network traffic algorithm, designed a network traffic prediction algorithm based on improved support vector machine. This paper mainly introduces the computer network traffic identification method based on improved support vector machine. This article mainly analyzes the related content of network traffic prediction, including the linear and nonlinear characteristics of network traffic, the theoretical basis of network traffic prediction and the method of obtaining traffic data. This paper studies support vector machine theory and least square support vector machine, and proposes an improved algorithm for least square support vector machine. The purpose of this article is to design a network traffic identification and analysis system. On the one hand, by monitoring the network traffic, we will be able to grasp the operation of the entire network in real time;on the other hand, the system statistically analyzes the results of different stages, We have a more comprehensive understanding of the operational efficiency of network resources, network performance and the rationality of network configuration. The experimental results in this paper show that the recognition efficiency of traffic based on the improved support vector machine method has been significantly improved. Under this method, the security problem of traffic has been increased by 14%, and the efficiency of traffic has been increased by 24%. The improved support vector machine will be the future computer network The development trend of traffic identification direction.


Research Status at Home and Abroad
With the increasing prosperity of the Internet, operators and Internet companies are paying more and more attention to the network traffic of their servers [4] . On the one hand, by observing the changes in network traffic, you can understand the operation of the device, and then you can make more reasonable company network Planning;on the other hand, operators also need to have a detailed understanding of the company's traffic sources and user behavior characteristics in order to reformulate marketing strategies [5] .
Foreign research institutions are also actively conducting similar research. Haffner. P and others use machine learning to identify application layer protocols. The process is to distinguish TCP data streams according to the upstream and downstream directions [6] , and reassemble these data as training. Vector, use the classic AdaBoost, Naive Bayes and other algorithms in machine learning to train samples to generate a detection model. Subhabrata S and Paul Barford and others have proposed a method based on the identification of payload features. This method identifies the application layer protocol by extracting the feature code of the protocol [7] .

Related Work Content
The kernel parameters and penalty parameters of the support vector machine have a great influence on the complexity and accuracy of the prediction model. Tharwat A proposed a support vector machine parameter optimization method based on the Bat algorithm to reduce the classification error. In order to evaluate the proposed model, the experiment used 9 standard data sets obtained from the UCI machine learning data warehouse. However, due to the complexity of the evaluation, the results obtained are not very accurate [8] . This paper mainly studies the theory of network traffic prediction algorithms. Analyze the various existing prediction algorithm models, study their respective advantages and disadvantages, and derive the theoretical feasibility of a new network traffic prediction algorithm based on the support vector machine model [9] . It also introduces the specific implementation of the system, which mainly includes the realization of unit time flow statistics and flow distribution statistics, the realization of protocol feature extraction, the realization of business flow distribution statistics, and the realization of network delay distribution. Finally, the content of this chapter is simply Summary [10] .

Linear Separable Support Vector Machine Algorithm
If the input of the training sample is , the linear classification decision function can be set as ) , ( i i y x D(x)=w*x+b in the dimensional space. Set the optimal classification line as w*x+b=0, normalize the classification decision function so that all samples of the two categories satisfy D(x)>=0|, then the classification decision function satisfies the following constraints: Solving the optimal classification function is to maximize the conditions while satisfying the conditions. The Lagrange function is introduced to solve the constraint maximization problem, which is transformed into the objective function formula: It can be seen from the above that when is the training sample, it has no effect on the 0  i a classification problem. When , the training sample that meets the constraints is the support 0  i a vector, the optimal solution and the weight vector are obtained, and the value of the optimal solution can be obtained using the original constraints. , Where S is the set of support vectors. At this time, the best classification decision function is: , it means that x is on the boundary and is inseparable. If the training result is acceptable, the area is{1>x|D(x)>-1}is the generalized area.

Linear inseparable support vector machine method
In the problem of nonlinear separability, according to the theory of universal functions, as long as a function satisfies the condition, it corresponds to the inner product in a certain transformation space, so that there is a nonlinear mapping. )The sample mapping space of the input space has a high Dimensional feature), that is, the kernel function, so that using the appropriate kernel function in the optimal classification surface can linearly classify the problem after nonlinear transformation. When high-dimensional space is involved, as long as the inner product operation is performed, the original function can be used. The function in the space is implemented, and the details of the transformation can be omitted. Although the dimension of the feature space has increased a lot, the computational complexity has not increased much. So the best classification decision function can be obtained as: The linear separable support vector machine method is the same. If D(x)=0, it means that x is on the boundary and is inseparable. If the training result is acceptable, the area is{1>x|D(x)>-1}is the generalized area.

Support Vector Machine Flow Detection Experiment
The data collected in this experiment is divided into 5 groups, from set A to set E, as shown in Table  1. set A and set B are pure P2P traffic and non-P2P traffic respectively, set C is the mixed traffic of P2P and non-P2P, and set D and set E are the mixed data collected in one day and one week respectively. Each data set is aggregated every 15 minutes and aggregated four times per hour. The size of all data sets and the proportion of P2P in mixed traffic are listed in the table.

Analysis of Network Traffic Distribution Results
Counting the amount of data transmission in the network at each moment is very important for the company's equipment maintenance personnel, because the network data transmission amount in each second can be used to discover problems in the network equipment in time and make maintenance adjustments.

Improved Support Vector Machine Analysis of Traffic Recognition Results
The improved support vector machine and it statistics on the predicted and measured values of the test samples are shown in Table 2. The comparison results of the improved support vector machine's predicted and measured values are shown in Figure 2 respectively.  Figure 2. Improved support vector machine traffic prediction results It can be seen from the chart: The prediction result of BP neural network algorithm is achieved, and the prediction result of improved support vector machine algorithm is that the prediction effect based on improved support vector machine algorithm is obviously better than that of neural network algorithm. This is because the neural network algorithm is easy to cause over-fitting of training and lead to large prediction errors. For individual points, the predicted value deviates greatly from the actual value, and the overall prediction effect is not ideal. The support vector machine is based on the principle of structural risk minimization, and the global optimal solution is obtained, and its generalization performance is higher than that of neural network. The total time required to improve the support of the machine method differs greatly from the total cost of the network algorithm. In terms of training time, the training and prediction time of the support vector machine algorithm is much faster, but the parameter optimization takes more time. However, the improved algorithm is to optimize the model parameters through genetic algorithm, without prior knowledge and insensitivity to the initial parameters, so it will not fall into the local minimum, as long as the appropriate population size is set, the search time can be greatly shortened.

Conclusion
The research content of this paper is based on the improved network traffic algorithm of support vector machine. The article first summarizes the research progress of network traffic, analyzes the existing linear and nonlinear prediction algorithm models, points out the characteristics and deficiencies of each algorithm, and also discusses the use of support vector machine algorithm for network traffic The advantages and feasibility of prediction.
This article mainly introduces that network traffic contains a large amount of information data between users and merchants, so it is very necessary to build a comprehensive traffic statistics analysis system. At present, most of the functions of the traffic statistics system are not perfect. For example, the business statistics cannot be collected, and the transmission delay in the network is not accurately counted. Most traffic statistics systems are desktop application software, and the system cannot fully utilize the computing power of the server. This paper designs a new improved support vector machine algorithm for the problems of the support vector machine algorithm, that is, the introduction of genetic algorithm to achieve the optimization of the parameters of the support vector machine, so that the support vector machine can be determined more quickly and accurately Various parameters. Finally, by using the existing network traffic instance data for testing and comparing with the prediction results of the neural network algorithm, the results prove that the improved algorithm studied in this paper has much higher prediction accuracy and has certain advantages.