A New Reversible Database Watermarking Approach with Ant Colony Optimization Algorithm

In many fields such as medicine, science, transportation, it is often necessary to maintain the real-time information, and the data in the centralized database that saves data in these fields is distributed to various clients. On the insecure public Internet, there are distributed databases stored in data centers. Sharing may create risks, such as unauthorized copies, changes to data, and distribution of data to unauthorized persons for reuse. Many database watermarking methods have been proposed to protect digital rights. In order to reduce watermark distortion and increase watermark capacity, people pay more and more attention to the optimization of watermark technology. This paper proposes a method of embedding differential expansion watermarking (DEW) based on ant colony algorithm (ACO) in relational database. This method uses a secure hash algorithm for grouping, increasing the privacy of watermark embedding. The ant colony algorithm performs the optimization operation, finds the best attribute columns for embedding the watermark in the subgroup, and finally uses the differential expansion technology to embed the watermark. Experimental results show that compared with similar works reported in the literature, the reversible watermarking method based on ant colony algorithm database proposed in this paper not only increases the watermarking capacity, but also improves the robustness against various types of attacks.


Introduction
In order to ensure the copyright of shared data and prevent tampering, it is necessary to use appropriate protection methods in shared data. Because the encrypted information is not related to the original content and cannot be used for copyright protection, the digital watermarking method is proposed [1]. In 2008, Gupta and Pieprzyk used differential expansion watermarking (DEW) technology to realize the reversibility of relational databases, and use reversible watermarks to carry out watermark attacks [2]. In 2012, Arif et al. proposed to use heavy watermark attack and use date to add watermark to overcome the linear conversion attack that cannot be detected by watermark when deleting tuples [3]. Jawad and Khan used the reversible method of the genetic algorithm based on the differential expansion watermark to reduce the distortion caused by the embedding process and increase the watermark capacity [4]. The DEW technology was used to embed the watermark bit into the database in a reversible manner. In 2017, Bilehan Imamoglu and others proposed a new watermarking method, using database watermarking technology based on Firefly algorithm to reduce distortion and algorithm complexity, and improve the robustness of DEW [5].
This paper proposes a new robust and reversible watermarking technique for protecting relational databases [6]. Based on the idea of differential expansion, this method uses ant colony algorithm to increase the watermark capacity and reduce distortion. Because of the reversibility of this method, the distortion caused by inserting the watermark can be completely recovered [7]. In order to prevent the distortion caused by the difference expansion and help the attacker to predict the watermarking attribute, we combined the tuple and the distortion classified by the attribute in the ant colony algorithm's fitness function, making it difficult for the attacker to predict the watermarked attribute. Experimental results show that the technology proposed in this paper significantly reduces the problems of false positives and attribute order changes at the detection end, and can resist various attacks, such as deletion, sorting, and bit flipping.

Construction and Geometrical Dimensions of Specimens
The watermarking algorithm is composed of four modules, namely preprocessing module, ant colony measuring module, watermark embedding module and watermark extracting module. The preprocessing module sorts the tuples to obtain the prepared data sequence; the ant colony determination module outputs the best ants for the data set; the watermark embedding module adds the data watermark to the data set according to the best ants; the watermark extraction module extracts the data from Database watermark information and reconstruct the original database using DEW.

Preprocessing
The maximum and minimum values of each attribute in the database are selected as the upper and lower limits of the non-distortion range. That is, assuming that the database has M attribute columns , the undistorted range of the attribute column x is .Then use the secure hash algorithm to group the original database elements into several subgroups.

Ant colony determination
The algorithm first generates a random attribute for each attribute pair and fixes it, and then uses the "dewbest" function to find another attribute that minimizes the difference. It is used to generate the function used by the initial ant colony. It aims to minimize the difference after embedding the watermark. The "dewbest" function produces attribute pairs that minimize differences for each row.
When determining the best ant, first calculate the objective function value of each ant generated in the previous stage, and record the smallest objective function value, namely, the information about the best ant. In the subsequent algorithm steps, the best ants are used to strengthen the pheromone matrix.
The objective function consists of two parts whose weights are 1 w and 2 w , respectively. a is the sigmoid function adjustment coefficient, c refers to the number of failed cells, the first part is the sum of all data value changes before and after embedding the watermark, and the second part shows the effect of the number of failed cells on the objective function. In this paper, the sigmoid function is used as the influence function to approximate the change law of the influence of the number of failed cells. x1 k FD refers to the data value in the k row 1 x column in the original database, and x1 k BD refers to the data value in the k row 1 x column in the original database after the watermarking operation.
... P (2)  The design of the pheromone matrix that affects ant selection is as above Eq.(2). ij p represents the pheromone concentration of the j-th attribute column selected in the i-th row of the subset. The initial pheromone matrix is an all-one matrix. The pheromone update operation is divided into the normal volatilization of pheromones and the use of the best ants to strengthen pheromones. The updating algorithm of matrix and pheromone is designed as follows.
Function. PheromoneUpdate Input. Best ant Y Output. The updated pheromone matrix P Step Step 3. Assign Step 4. Repeat Step 4 for Step 5. Repeat Step 5 for Step 6. Assign Step 7. Retuen P .
The above algorithm uses a heuristic enhancement strategy, where Q is the heuristic index and  is the attenuation factor. The enhanced pheromone matrix can generate a roulette probability selection matrix to help the ants make the next choice.
This article uses double roulette to update the selection of ants, namely, each ant's two choices in each row are through roulette. For each row, the greater the pheromone concentration corresponding to the attribute is, the greater the probability that the attribute is selected.
Within the maximum number of iterations, repeat steps to obtain the optimal decision vector, that is, the optimal position of the current subgroup to be embedded in the watermark. Through the ant colony algorithm, the optimal embedded watermark attribute row position can be found under the condition of the maximum watermark capacity under the constraint of ensuring that the data is in the non-distortion range and the minimum change value.

Watermark embedding
After the subgroup obtained by the grouping method and the optimal embedding watermark position obtained by the ant colony algorithm, the differential watermarking technique is used to embed the generated watermark sequence. First, the original database is grouped into some subgroups by using a key safe hash algorithm, in the selection process of the attribute feature row, the ant colony algorithm is used to select the attribute row to be embedded in the watermark for all subgroups, and then the DEW technique is used for the subgroup to be embedded in the attribute feature row according to the watermark position to realize the embedding of the watermark sequence. The value of is generated by the random function.

Watermark extraction
First, group the databases embedded in the watermark, and use the DEW operation on the tuples in the subgroup according to the saved attribute feature sequence to extract the watermark bits embedded in the subgroup. Then vote for the extracted watermark, count the number of 0 and 1 in the subgroup, take the largest number of watermark bits as the watermark bits extracted by the subgroup, and perform watermark bit extraction operations on all subgroups in turn, and finally embed watermark sequence and original database. The simulation results of this method are given in this article. This method is coded in Python using PyCharm IDE. The database comes from the Forest Cover Type Dataset (FCT) provided by the University of California. The database contains 581,012 tuples and 54 attribute columns. In this experiment, the first 25,000 rows of tuples and the first 8 columns of attribute data of the database are selected as experimental data, and the watermark method based on the firefly algorithm is marked as FFADEW [5]. Marked as ACODEW, the algorithm performance of the two methods is analyzed.

Watermark capacity experiment
The first experiment is to compare the watermark capacity of the two methods. The watermark capacity is the ratio of the number of tuples available in the DS to the total number of tuples. A higher watermark capacity means that more watermark information can be embedded in the same DS. This article uses 4052, 8997, 15466, and 20031 to create six different DS to test the watermark capacity of the proposed method and other methods. Fig.1. shows the watermark capacity performance of the two methods. The watermark technology proposed in this paper can enable the database to embed more watermark information. At the same time, as the number of database tuples increases, the method proposed in this paper has a potential upward trend.

Robust experiment
The second experiment is a robust experiment, using watermark extraction accuracy as the criterion 100% ) len(w) Where w represents the watermark sequence before extraction, ' w represents the watermark sequence obtained by extraction, and ⊕ represents the bitwise XOR operation, and len obtains the number of bits of the watermark sequence, and sum sums the bits of the watermark sequence. If the ACC value is higher, it indicates that the watermarking method is more robust. In this paper, two attack experiments of subset deletion and modification are used. Subset deletion and modification experiments are conducted at 0, 10%, 20%, 30%, 40%, 50%, 60%, and 70%, respectively.  Fig.3. show that whether it is a subset deletion or modification experiment, as the attack intensity continues to increase, the proposed method ACODEW watermark extraction rate declines slower than FFADEW. Therefore, this article ACODEW is better than FFADEW.

Conclusion
Based on the results and discussions presented above, the conclusions are obtained as below: (1) The method in this paper innovatively combines the ant colony algorithm and the differential expansion algorithm, and applies it to the relational database watermark to achieve the protection of the database ownership.
(2) This method first uses a secure hash algorithm for grouping. The technology of group embedding makes the watermark embedding more secretive.
(3) The ant colony algorithm is used to ensure that the data distortion of the database after watermark embedding is minimized.
(4) Experimental results show that the database reversible watermarking method based on ant colony algorithm proposed in this paper not only increases the watermark capacity, but also improves the robustness against various types of attacks.