Median Filtering Detection Based on Quaternion Convolutional Neural Network

: Median filtering is a nonlinear signal processing technique and has an advantage in the field of image anti-forensics. Therefore, more attention has been paid to the forensics research of median filtering. In this paper, a median filtering forensics method based on quaternion convolutional neural network (QCNN) is proposed. The median filtering residuals (MFR) are used to preprocess the images. Then the output of MFR is expanded to four channels and used as the input of QCNN. In QCNN, quaternion convolution is designed that can better mix the information of different channels than traditional methods. The quaternion pooling layer is designed to evaluate the result of quaternion convolution. QCNN is proposed to features well combine the three-channel information of color image and fully extract forensics features. Experiments show that the proposed method has higher accuracy and shorter training time than the traditional convolutional neural network with the same convolution depth.


Introduction
With the rapid development of multimedia technology, the video [Pan, Lei, Zhang et al. (2018); Pan, Yi and Chen (2018)] and digital image has become one of the indispensable information carriers in people's daily life. There are many ways to obtain images, such as mobile phones, cameras and so on. Nowadays, with the popularity of the Internet, images are able to spread rapidly around the world. At the same time, with the appearing of many image editing software, it becomes easier to tamper the content of digital images. However, it is difficult for the human to distinguish whether the image has been modified by the image editing software. The altered images often lead to misunderstandings and give the lawbreaker an opportunity to take advantage of it. Therefore, digital image forensics technology is particularly important today. In recent years, image manipulation and forensics mainly focus on JPEG compression [Luo, Huang and Qiu (2010)], geometric transformation [Popescu and Farid (2005)], contrast enhancement [Stamm and Liu (2010)], sharpening [Cao, Zhao, Ni et al. (2011)] and median filtering [Cao, Zhao, Ni et al. (2010); Yuan (2011)]. Median filtering is a statistical filter commonly used in digital image processing. It has good properties of filtering random noise and preserving edge details. Tamperers can easily use median filtering for anti-forensics operations. For example, the nonlinearity of median filtering is used to destroy the periodicity between the adjacent pixels of resampled images so that the technique based on periodic detection geometric transformation is ineffective [Kirchner and Bohme (2008)]. Due to the role played by median filtering in digital image anti-forensics technology and information hiding, the detection and forensics of median filtering have attracted more and more attention. In recent years, the convolutional neural network has achieved great success in the computer vision field. It can automatically learn digital image features and classify them. The development of neural networks has gone through many booms and valleys over the years. After Yann LeCun successfully used LeNet-5 in handwritten digit recognition in 1998, deep learning has once again experienced rapid development. Among them, there are many landmark structures, such as AlexNet [Krizhevsky, Sutskever and Hinton (2012)], GoogleNet [Szegedy, Liu, Jia et al. (2015)], ResNet [He, Zhang, Ren et al. (2016)], DenseNet [Huang, Liu, Van Der Maaten et al. (2017)], and DCGAN [Radford, Metz and Chintala (2015)], etc. These models have achieved remarkable success in computer vision and other related fields. Fang et al. [Fang, Zhang, Sheng et al. (2018)] used DCGAN to improve image recognition based on CNN and get a big ascension. The use of deep learning for remote sensing image object detection is also very effective [Cheng, Zhou and Han (2016), Han, Zhang, Cheng et al. (2015)]. Quaternion and quaternion transform are widely used in image compression, texture classification, face recognition, digital watermarking and other color image processing fields. Soulard et al. [Soulard and Carré (2010), Soulard and Carré (2011);Carré and Denis (2006)] used color quaternion wavelet transform (CQWT) to encode color images the first time. Compared with discrete wavelet transform (DWT), CQWT has the characteristics of a large compression ratio and a small distortion. Subsequently, the amplitude and angle coefficients of CQWT are used to classify color images. For example, Wang et al. [Wang, Li, Luo et al. (2018)] used CQWT to identify computergenerated images. Due to the continuous multi-scale analysis characteristics, CQWT has achieved an improvement in classification accuracy than DWT and continuous wavelet transform (CWT) under the same characteristics. It is verified that CQWT is superior in performance. Schauertet et al. [Schauerte and Stiefelhagen (2012)] detected faces using quaternion discrete cosine transform (QDCT) and joint focus model. Rizo et al. [Rizo and Ziou (2015)] used quaternion-based local binary patterns (LBP) and illumination invariance to study face recognition. This method simply defines LBP in the form of quaternion, and solves the problem of information redundancy between adjacent points in traditional LBP. Many researchers combine quaternion with neural networks and get good results [Parcollet, Morchid, Bousquet et al. (2016), Parcollet, Zhang, Morchide et al. (2018)]. It can be seen from the above that quaternion has obvious advantages in processing color digital images. Therefore, it is a feasible method to combine quaternion with neural network and apply it to color image [Shi and Funt (2007)], but most applications of quaternion only use a quaternion to preprocess color images and then takes their output as the input of neural networks [Rishiyur (2006); Kendall, Grimes and Cipolla (2015)]. In this paper, a median filtering detection based on the quaternion convolutional neural network is proposed. The contribution of the scheme is summarized as follows: (1) A quaternion is used to combine the information of three channels of color images.
(2) The quaternion convolution layer and quaternion pooling layer are designed.
(3) Quaternion convolutional neural network has better detection performance for median filtering than CNN with the same depth. The remaining part of this paper is organized as follows. Section 2 briefly introduces some related work. Then the proposed scheme is introduced in Section 3. The performance of the experiment and the results are provided in Section 4. Finally, Section 5 provides the conclusion.

Median filtering residual
In Chen et al. [Chen, Kang, Liu et al. (2015)], the convolutional neural network is used to extract the features of the median filtered image. Unlike conventional convolutional neural networks, the filter layer is designed before the first convolution layer. For median filtering forensics, directly input the color images to convolutional neural networks is not very good. The filter layer is used to get the median filtering residual (MFR) of an image. The output of the filter layer is then used as the input of the convolutional neural network.
A w w × sliding window is used to convolve the input image ( , ) x i j to get the output ( , ) y i j . MFR is designed as Eq. (2): The filter layer is followed by five convolution layers, two fully connected layers. The above method does not take into account the relationship between the RGB channels of the color image, and only directly convolution color image to extract features. But it often loses important structural features.

Figure 1:
An overall framework of the proposed method. RGB is the channels of color image, and it will be filtered by the median filter with 3 3 × size. Next, the original going to be subtracted from the result and get MFR. After that, the MFR is expanded to four dimensions and used as input to the quaternion neural network

Quaternion convolution
In document [Zhu, Xu, Xu et al. (2018)], the traditional convolutional neural network adds a different channel of input through convolution operation has some inherent deficiencies [Zhu, Xu, Xu et al. (2018)], like: (1) Simple addition ignores the complex connections between channels.
(2) Some important structural information may be lost.
(3) The added way gives the convolution kernel a very large degree of freedom, which easily leads to overfitting. Therefore, quaternion network is proposed and the quaternion convolution layer is designed. The elements of the quaternion convolution kernel can be designed according to Eq. (3): θ is the angle of rotation, ll s is the scaling factor, the rotation axis is µ .
Traditional CNN only supports scaling transformation, although document [Zhu, Xu, Xu et al. (2018)] combines quaternion with neural network to support the rotation of color space (which provides a more reasonable representation of color). But quaternion convolution in document [Zhu, Xu, Xu et al. (2018)] has a high degree of complexity, which is not conducive to practical use.

Proposed model
The overall structure is shown in Fig. 1. In this section, quaternion is first introduced and then the details of the proposed model are explained.

Definition of quaternion
The concept of quaternion was introduced the concept of quaternion in 1843. The representation of quaternion hyper-complex q is as follows: where , , , a b c d R ∈ , , , i j k are imaginary numbers, and be defined as follows: Quaternions are extensions of real and complex numbers, which are generally seen as a combination of real and imaginary parts. a is the real part of quaternion, which can also be expressed as r q . b , c and d are the imaginary parts of quaternion, which can also be expressed as i j k q i q j q k ⋅ + ⋅ + ⋅ . If r q is equal to 0, quaternion q is known as the pure quaternion. Quaternion q can also be expressed as the complex form q A B j = + ⋅ , where A and B are also the complex form, and The operation rules for quaternion, for quaternion 1 q and 2 q , satisfy: Then the addition, subtraction and multiplication operations of quaternion 1 q and 2 q are respectively defined as follows: 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 q q q q q q q q q q q q q q q q q q i q q q q q q q q j q q q q q q q q k

Proposed scheme 3.2.1 Preprocess
Median filtering is very effective for the removal of speckle noise and salt and pepper noise on the image. However, usually the noise contained in the actual image is relatively insignificant. The features of the images extracted from convolutional neural network are mainly texture and color, etc. But, there are relatively few high-frequency features that can reflect the difference between the original image and the image after median filtering. At the same time, a large number of features such as textures and colors extracted may even have some interference in the discrimination. Therefore, in some previous studies based on the detection of median filtering using deep learning, it was found that if the original image was taken as the input of the network directly, the final result was not very ideal. So the image was preprocessed. First, the filter layer in Chen et al. [Chen, Kang, Liu et al. (2015)] is used, which uses the median filtering window with 3 3 × size as shown in Fig. 1 to filter the original image ( , ) x i j and then use the output ( , ) y i j to subtract the original image and extract fingerprints of median filtering operations. Finally, the median filtering residual (MFR) is obtained. Such processing can highlight the characteristics of the high-frequency part, thus improving the accuracy of classification.
Quaternions are generalization of complex numbers. In the present research results, it is considered to represent an image with quaternion by encoding the RGB channels of color image as the imaginary part of quaternion. The above method is also used in this paper.
In this way, quaternion can process their pixels as a whole vector in multidimensional space and consider color association. At the same time, many classical tools previously used for grayscale image processing have been successfully extended to color image processing by means of quaternion representation, such as Fourier transform, wavelet transform, and neural network and so on. However, compared with the above methods, the quaternion convolution layer designed in this study needs to correspond to four channels, and color images have only three channels, in order to match the two dimensions, the color image is added a channel to form a four-channel image. The fourth dimension of the expansion is filled with zero.

Quaternion convolution layer
The local pixels of the image are closely related, while the distant ones are weakly correlated. Therefore, it is not necessary for each neuron to perceive the global image, but only the local image, and then the global information can be obtained by integrating the local information at a higher level. This is called the local perceptual field. It shows the importance of convolution layer in the neural network. Therefore, the key to complete the quaternion neural network is the design of the convolution layer of a quaternion. If two quaternions are multiplied, each element in the matrix is combined with the other elements. For example, there is a quaternion convolution kernel matrix F a b i c j d k = + ⋅ + ⋅ + ⋅ , a quaternion matrix M w x i y j z k = + ⋅ + ⋅ + ⋅ . If the quaternion convolution kernel F is used to convolve the quaternion matrix M , it can be expressed by the following formula:

F M a w b x c y d z i a x b w c z d y j a y b z c w d x k a z b y c x d w
Through the quaternion convolution, three channels of a color image can be effectively merged. The quaternion convolution process of the image is shown in Fig. 2. Fig. 2(a) describes the overall process of quaternion convolution. First, the input feature maps (RIJK) are convolved four times: same input, different convolution kernel (all four convolutions kernel are initialized by random method). The convolution method designed in this study is different from the traditional convolutional neural network. The conventional convolution method generates the number of channels of the convolution kernel according to the number of channels of the input feature map. For example, for the size of a convolution kernel is F F × and the number of channels of the input feature map is d , then the convolution kernel of size F F d × × can be obtained. The convolution kernel can obtain a channel of the output feature map by convolving the input feature map. The convolution method in this study is to convolve the channels of the input feature map using the same convolution kernel. Thus, output channels are generated with the same number of the channels that are entered in the feature map. These output channels are connected to form a new output feature map. As shown in Fig. 2(b), M in MK Feature Maps represents a feature map, and K represents convolution kernel and the process is called corresponding convolution. The feature map obtained by corresponding convolution has the characteristics of the corresponding channel of the input feature map. Four feature maps (MK, MJ, MI, MR) can be obtained after four corresponding convolutions. The channels at different positions of each output feature map are then combined according to the formula shown in Fig. 2(c). in this way the final output of the quaternion convolution layer is obtained. Each channel of the output feature map obtained by the above method contains the feature information of the input feature map. Finally, color image information is combined and makes feature more sufficient extraction.

Quaternion pooling layer
The use of pooling layer brings great improvement to a convolutional neural network. The pooling layer introduces the displacement invariance and pays more attention to the existence of the silent feature rather than the specific location of the feature. The use of the pooling layer reduces the input size of the next layer, reduces the amount of calculation and the number of parameters, which is the most intuitive function. At the same time, the use of pooling layer can also prevent overfitting to some extent. Therefore, pooling layer is a very important part of a convolutional neural network. A quaternion pooling layer is specially designed for quaternion convolutional neural networks. The quaternion pooling is based on the output of the quaternion convolution layer. The amplitude of the convolution result of a quaternion is evaluated, and the maximum value is obtained. The formula to obtain amplitude is as follows: where ij is the position in the channel in which the element is located, a , b ,c and d represent the element values of the four channels corresponding to the same position.

Other layers
The other layers select the ones used in the traditional convolutional neural network. For the activation layer, sigmoid, tanh and RELU are the most commonly used. Since the sigmoid and tanh gradients are very gentle in the saturated region and close to zero, it is easy to cause the vanishing gradient problem and slow down the convergence speed, while ReLU is easier to learn and optimize. Because of its piecewise linearity nature, the derivation of its forward propagation and backward propagation are both piecewise linear, which is more convenient. Therefore, RELU is selected as the activation function of this structure. Because the activation input value of deep neural network before nonlinear transformation gradually deviates or changes with the deepening of network depth or in the training process. So this leads to the disappearance of the gradient of the low-level neural network in the case of backpropagation. Through a certain standardization method, forcibly pulls the distribution of the input value of any neuron in each layer of neural network back to the standard normal distribution with a mean of zero variance, so that the gradient becomes larger and the gradient disappearance problem is avoided. Moreover, the gradient becomes larger, which means that the learning convergence speed is fast, and the training speed can be greatly accelerated. Therefore, the BatchNorm layer will be added to this structure, and the BatchNorm function is selected from TensorFlow.

Experiment
In this section, the experiments performed are presented to demonstrate the effects of median filtering forensics using a quaternion convolutional neural network. We also used the conventional convolutional neural network for comparison, which showed the improvement of our scheme.

Introduction of datasets
To run the experiments, we first built a dataset using the CASIA TIDE v2.0 and UCID datasets. CASIA TIDE v2.0 contains 12,614 color images, which have a different size from 240 160 × to 900 600 × pixels and three kind of formats: JPEG, BMP and TIFF. The UCID dataset has 886 color images of the two kinds of size of 384 512 × and 512 384 × respectively. Next, we performed median filtering on the original image by using the 3 3 × sliding window to generate a set of tampered images. Finally, we got 13500 tampered images. Now we had 27000 images in total, and we then used a ratio of 2:1 to separate training and testing datasets, i.e., 18,000 images for training dataset, and the remaining 9,000 images as testing dataset. In addition, the images are all cut to the size of 256 256 × .

Experimental environment and parameter settings
Two experiments are conducted on the dataset, respectively using the conventional convolution neural network and the proposed quaternion convolution neural network. Both experiments consist of three convolution layers, three pooling layers, three fully connected layers and followed by one softmax layer. In addition to the fact that convolution and pooling layers are different from the conventional ones, quaternion convolution neural network has the same structure as conventional convolution neural network in both fully connected and softmax layer, which also includes the operation of Flatten before the fully connected layer. ReLU is selected as the activation function in both experiments, which make the network nonlinear. Both networks use cross entropy loss as loss function. For the detection task of median filtering, it is actually the task of classifying the tampered images from the original ones. The method used to judge the model is to check the accuracy of classifying the detected image to the correct label. Both experiments are performed on a NVIDIA GeForce GTX 1080Ti GPU.

Figure 3: Testing accuracy versus number of convolution layers
When training the proposed model, an optimization algorithm called stochastic gradient descent is employed, with the parameters set as follows: momentum=0.9, decay=5e-4, and a learning rate e=1e-6 that decreases every 5 epochs by a factor f=0.5. The model in experiment are both trained for 50 epochs.

Performance of the proposed scheme
In order to obtain a better model for median filtering forensic detection, a selfcomparison experiment was designed based on the scheme. Our original architecture involves two quaternion convolution layers, two quaternion pooling layers to learn the association across quaternion features and three fully connected layers used for the classification. However, the number of quaternion convolution layers cannot be determined, and different depths must be tried to find the appropriate number of convolution layers. Thus, a set of experiments are conducted to find the best depth of quaternion convolution layers. The performance of each depth is shown in Fig. 3. It can be noticed that the overall detection rate can be improved while we increase the number of quaternion convolution layers until reaching the best depth. It can be observed that the best performance is achieved when we use the architecture mentioned, i.e., three quaternion convolution layers each followed by a quaternion pooling layer and ended with three fully connected layers. Additionally, it can be seen that when four quaternion convolution layers are used, the final detection rate has declined by 0.94% compared to the best one. Because when the number of convolution layers is 4, compared with 3 layers, it is easier to overfit, resulting in a decrease in accuracy. It can show us that it's not the deeper the quaternion convolution network is, the better detection result it has in the field of color image manipulation forensics.

Comparison with other methods
The experiments were conducted on the earlier mentioned datasets. We collected the test accuracy for each of the two comparative experiments and put them together in Fig.  4 for comparison. Fig. 4 depicts the curves of the image manipulation detection accuracy rate versus the number of training epochs for each network, i.e., the proposed QCNN and the conventional CNN. It can be observed that the QCNN method is superior to the conventional method in the training period, and the detection rate of the best implementation is 95.10% and 93.46% that are shown in Tab. 1, respectively. The convergence speed of the conventional CNN is very fast, but the proposed scheme is better after about 15 epochs. The conventional approach achieves a lower detection rate since it is a suboptimal solution for a trained network with not fully utilizing the correlation information among three color channels. Thus, the proposed QCNN can capture color channel features that may not be captured using real-value convolution. And due to its better applicability to color information, the proposed QCNN can get richer features than the conventional CNN, leading to higher detection accuracy.

Conclusion
In this paper, a quaternion -based neural network is proposed and used for median filtering detection. In the scheme, the color image has only three channels, while quaternion contains four channels. The method to match quaternion to dimensions of color images becomes very important. After consulting relevant literature and research, the method of extending one dimension to color image is finally adopted. In the end, the result is better than the direct input method. The quaternion convolution layer perfectly combines the convolutional neural network with quaternion. Considering the importance of the pooling layer to the neural network, the quaternion pooling layer is specially designed. Of course, there are still many problems that have not been solved so far, such as how to fill the fourth dimension of color images. We will continue to explore these issues in the following work.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.