Research on the Detection Method of SQL Injection Attack Based on Sequence Alignment

In order to improve the detection efficiency of SQL injection attack, this paper proposes a detection method of SQL injection attack based on sequence alignment algorithm. Needleman-Wunsch and its improved algorithm are used to carry out sequence alignment and then find out the best alignment results in the global-scope. Experiments show that 100% of the SQL injection attacks tested are detected in the detection rate; the algorithm has the minimum response delay compared with other types of SQL injection attacks. The detection method proposed in this paper can improve the detection effect quickly and effectively, and reduce the complexity of time and space.


Introduction
With the development of computer technology and Internet, applications based on Internet have also been developed rapidly; security problems have attracted much attention [1]. SQL injection attack [2] is one of the most popular hacker script attacks on the Internet. In the web security threat assessment reports published by OWASP in 2013 and 2017 [3][4], SQL injection attack ranked first. SQL injection attack has become the mainstream attack mode, and its harm is far greater than other attack types. SQL statements are flexible and changeable. They can compile, execute and store data in the three-tier application architecture of web through interaction. Because the interaction time of SQL is very short, it is difficult for common detection methods to effectively detect such attacks in a short time. Therefore, this paper proposes a SQL injection detection method based on sequence alignment algorithm [5]. Sequence contrast detection does not need to build syntax parse tree, which reduces the complexity of time and space to a certain extent, and improves the efficiency of SQL injection detection.

Related Works
In recent years, the proportion of information security problems caused by SQL injection attacks is increasing. Program developers are aware of the harm brought by the attack, and the research for this is also in-depth. The current research results mainly include: input filtering method, program analysis method, method based on machine learning, sequence alignment method, etc. In terms of user input filtering methods, Zhang et al. [6] proposed a SQL injection attack interception method based on HttpModule; Roy s et al. [7] developed an injection attack detection method based on URL filtering. In terms of program analysis, Qin et al [8] put forward a static analysis method to judge SQL injection attack by analysing the differences in syntax structure and semantic content of SQL statements; Li et al. [9] designed a detection method combining dynamic and static detection, which verified suspected loopholes through fuzzy testing technology and reduced false alarm rate. In the detection method based on machine learning, Li et al. [10] applied the SVM algorithm training into the detection model to improve the applicability and detection accuracy of the detection method; Kar D et al. [11] directly applied the bidirectional hidden Markov model to detect SQL injection attacks; Zhang et al. [12] presented an analysis model of SQL injection vulnerability based on artificial neural network.
Although the above research has achieved good results, most of them still have the problems of high rate of failed reporting and error reporting, which cannot achieve real-time dynamic detection. The SQL injection attack detection technology based on sequence alignment proposed in this paper uses Needleman-Wunsch algorithm [13] to analyses the SQL statement structure and judges whether there is an attack behavior, so as to reduce the detection time and improve the detection efficiency.

Needleman-Wunsch Algorithm
Sequence alignment was first used in the field of biological gene sequence analysis. It is of great significance in bioinformatics to analyse the maximum number of base matching between two or more sequences by a specific algorithm. According to the number of sequences, sequence alignment can be divided into pair-wise sequence alignment and multiple sequence alignment. According to the sequence range of alignment, sequence alignment can be divided into global alignment and local alignment. The most classical global comparison algorithm is Needleman-Wunsch algorithm, Smith-Waterman algorithm [14] is the representative of local comparison algorithm.
This paper applied the Needleman-Wunsch pair-wise sequence alignment algorithm. It can carry out sequence comparison analysis in the global-scope and find out the best comparison results, but its time and space complexity are higher than other sequence alignment algorithms, both of which are O(n*m) (where n and m represent the length of two sequences respectively). The main body of the algorithm can be described as follows: Firstly, the data matrix is constructed for two sequences with length of n and m respectively, and the last element of the matrix is the best score of the base comparison of the corresponding positions of the two sequences; then the best ratio pair result is obtained by backtracking.
This paper takes the following code as an example to detect SQL injection attack based on sequence alignment: • SELECT user,password FROM user WHERE id='' • Sequence p:S usr,psd F u W id=' ' • SELECT user,password FROM user WHERE id='1' • Sequence q1:S usr,psd F u W id='1' • SELECT user,password FROM user WHERE id='-1' or 1='1' • Sequence q2:S usr,psd F u W id='-1' or 1='1' The specific implementation steps of the algorithm are as follows: The two sequences are SqA＝s1s2…sn and SqB＝t1t2…tm. The data matrix S is constructed for two sequences, n and m represent the respective length, i represents the i-th character of the sequence SqA, j represents the j-th character of the sequence SqB (1<=i<=n,1<=j<=m), represents the best score of corresponding positions of two sequences. The implementation of the algorithm is divided into three steps: initialization, filling and backtracking.
is constructed for two sequences: , ; Among them, 1 K is the score of sequence matching, 2 K is the score of mismatch, gap is the situation of inserting vacancy (the penalty score of vacancy is 0, that is gap = 0). At the same time, the match score should be greater than the mismatch score, and the mismatch score should be greater than the vacancy score, so that the result is meaningful.

3.1.2.
Filling. The filling process is a recursive calculation of . The values of the remaining vacancy in the matrix are optimally selected based on the values of the cells adjacent to them. Fill the cell with the largest value as its final value. The filling rules are as follows: The value of   It can be seen from Figure 1 that if the last character of the p and q sequences is the same, they backtrack from the bottom right to the top left according to the diagonal principle; otherwise, move one character to the left and compare again until the characters are the same. Suppose that the comparison result obtained by backtracking is sequence T. As follows: • q1：S usr,psd F u W id='1' q2：S usr,psd F u W id='-1' or 1='1' • T1：S usr,psd F u W id='_' T2：S usr,psd F u W id=______='_' According to the comparison results, we can see that the user inputs string with length of 1 in sequence q1; the user inputs a number with length of 8 and a string with length of 1 in the sequence q2. In conclusion, sequence q2 is attacked by SQL injection.

Needleman-Wunsch algorithm optimization
The matrix filling process of Needleman-Wunsch algorithm has two cyclic structures: 1) Judge the value of  . In this way, the calculation time can be reduced without calculating the horizontal and vertical values. Compared with the calculation process before the improvement, the time complexity is greatly reduced. The specific steps of the improved filling matrix are as follows: (1) If the i-th bit of sequence p matches the j-th character of sequence q, so If the i-th bit of sequence p is mismatched with the j-th character of sequence q, so

Experimental Environment and Data
In order to obtain the user's login account and password, and then log in the background database to obtain the data information. We used phpstudy, DVWA, and MD5 decryption website to build the experimental environment. The experimental data mainly consists of two parts: finding injection point and confirming injection.

Search for injection point.
Enter "1", "1'", "1 and 1 = 1" and "1 and 1 = 2" in the website respectively and submit, the results are shown in Figure 2. It can be seen from the four inputs that there is an injection point, so it is inferred that the SQL query statement is: Select the column name of First name and the column name of Surname from the table name where id =' the id we entered'.

Detected injection.
First, get the number of fields. Enter "1 'order by 1", "1' order by 2" and "1 'order by 3" one by one. The results of the first two inputs are correct; when the third was inputted, an error is reported, as shown in Figure 3. Therefore, the number of this field is 2. Then, enter the code "1'union select database(), user () #" to get the current database name. The obtained database name is DVWA, and the user name is root @ localhost, as shown in Figure 4. Next, enter the code "-1' union select column_name,2 from information_schema.columns where table_schema= 'dvwa' and table_name= 'users'#", to obtain the data table information from the database, as shown in Figure 5(a). Finally, select the user and password table to obtain all the user name and password information in the table, as shown in Figure 5(b). After decrypting the obtained password through MD5, you can log in to the background database.

Correctness Analysis
The correctness analysis is based on the following two points: (1)The matching path starts at the bottom right corner. If the user's input is correct, then through the comparison of p and q, it is only possible to add a number of vacant in the p sequence to match q exactly, and the number of consecutive vacant is exactly the number of user's input. Therefore, the path can only move from the lower right corner to the left or along the diagonal to the upper left corner during the comparison.
(2) The data type of database can be divided into character type and number type. In the database condition statement, if the field is character type, the separator must be added in the condition judgment (It's up to the application to use single or double quotation marks for separators); if the field is numeric, only a number or a "." can be added in the condition judgment.
When there is a malicious SQL injection attack, the input structure of the SQL statement or the type of user input will change, which makes the final result of the comparison against the above two principles.
This paper defines the ParseTree (p, q, num) function. Where p is the pattern string, q is the target string, and num is the number inputted by the user. This function is called before performing the database operation. If there is no SQL injection attack, function returns true and executes statements, otherwise, it returns false. The experimental statistical results are shown in table 1.

Performance Testing
In order to further verify the performance advantages of the proposed algorithm, we test and compare it with other types of SQL injection detection methods. During the experiment, the server does not do other work. The response time includes: statement analysis time, returning results after database operation, closing connection, etc. The results are shown in Figure 6. The more the number of SQL injection attacks, the shorter the response time of the algorithm, and the more obvious the advantages. When the number of SQL injections reaches 350, ParseTree has only a response time of 0.04ms, it is more than twice that the response time of the other two methods. Compared with the detection technology based on learning and dynamic detection technology, the advantage of this method is that it does not rely on the accuracy of any learning algorithm, and does not need the support of any external library, reducing the complexity of time and space.

Conclusion
Sequence alignment algorithm was born in the research of biological sequence, and then applied to the field of computer. Sequence alignment algorithm contains multiple detection directions, which has great plasticity and can be continuously mined and improved. In the future, it can become the main detection method in the field of network security defence. The Needleman-Wunsch pair-wise sequence alignment algorithm proposed in this paper can perform sequence comparison and analysis in the global-scope, and can quickly find out the best alignment results, which has a small response delay for SQL injection attacks. The algorithm is simple and does not need to rely on the support of external database, which reduces the complexity of time and space.