A Survey on Digital Image Copy-Move Forgery Localization Using Passive Techniques

: Digital images can be tampered easily with simple image editing software tools. Therefore, image forensic investigation on the authenticity of digital images’ content is increasingly important. Copy-move is one of the most common types of image forgeries. Thus, an overview of the traditional and the recent copy-move forgery localization methods using passive techniques is presented in this paper. These methods are classified into three types: block-based methods, keypoint-based methods, and deep learning-based methods. In addition, the strengths and weaknesses of these methods are compared and analyzed in robustness and computational cost. Finally, further research directions are discussed.


Introduction
With the rapid development of low cost and sophisticated image processing software tools, digital images can be tampered easily with no obvious visual traces. If digital images are used as supporting evidence in court, it is of vital importance to guarantee their source, integrity, and authenticity. What's more, if forged images are used illegally, it will jeopardize social security, fairness, and justice. So, verifying digital images using forensic techniques is meaningful. Image authentication solution is broadly categorized as active techniques and passive techniques. Active forgery localization techniques, such as digital watermarking and digital signatures, require embedding additional information in advance which damages the quality of images. Moreover, the active techniques are inefficient without the information pre-embedded (e.g., digital images on the Internet). On the contrary, passive techniques utilize images' inherent properties and analyze their features to seek visible traces. Therefore, the quality of images is not destroyed. Copymove is the most convenient and common types of image tampering, in which one or more areas of an image is copied and pasted to another desired region within the same image [Warif, Wahab, Idris et al. (2016)]. Two examples are shown in Fig. 1. In this paper, a survey on copy-move forgery localization using passive techniques is illustrated. Most copy-move forgery localization methods follow a common process pipeline presented in Christlein et al. [Christlein, Riess, Jordan et al. (2012)] as shown in Fig. 2. The pipeline comprises four stages: 1) Pre-processing. This stage can make it more effective to extract image features. 2) Feature extraction. In this stage, feature information is extracted from the neighbor pixels or some keypoints. 3) Feature matching. Similar features are searched from two or more features by the feature matching algorithms. 4) Post-processing. Some spurious matched pixels are filtered and some scattered pixels are merged into adjacent larger areas in this stage. In this paper, traditional methods and recent deep learning-based methods are illustrated. For traditional methods, block-based methods and keypoint-based methods are discussed respectively. The remaining of this paper is organized as follows. In Section 2, we describe block-based methods. A review of keypoint-based methods is provided in Section 3. Section 4 especially covers the latest deep learning-based methods. Future research directions are discussed in Section 5. Finally, Section 6 concludes the paper.

Block-based methods
In the pre-processing stage of block-based methods, an image is divided into overlapping or non-overlapping blocks. Suitable features of these blocks are extracted in the feature extraction stage. Then in the feature matching stage, these features are sorted and arranged according to applicable data structures to search similar features. Finally, the forged regions can be localized after post-processing operations. According to different types of feature, the block-based methods can be grouped into four categories: frequency transform-based methods, texture-based methods, moment invariant-based methods, and dimension reduction-based methods.

Frequency transform-based methods
Frequency transform is the most popular feature used in block-based methods. The commonly-used frequency transforms include Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), Dyadic Wavelet Transform (DyWT) and Fourier Transform (FT). 1) DCT: Among the frequency transform functions, DCT is the most widely used in copy-move forgery localization field. DCT-based algorithms are robust against noise and JPEG compression attacks. Cao et al. [Cao, Gao, Fan et al. (2012)] first divided the image into blocks. They then extracted the low-frequency coefficients of each block using the DCT algorithm to reduce the computational load. Finally, the lexicographical sorting algorithm was used for the feature matching. This method is robust against noise and JPEG compression attacks. 2) DWT: Discrete wavelet is broadly applied in various fields of image processing and image analysis for the time-frequency localization property. Ji et al. [Ji, Yu, Jin et al. (2017)] first utilized the DWT algorithm to decompose the image, then sub-bands of each layer were used as the approximation of image content to reduce the computational load. Finally, the Oriented Fast and Rotated Brief (ORB) algorithm was used to extract block features of sub-bands, and the hamming distance was used to match similar features. The experimental results showed that this method has certain robustness against the translation and rotation transformation with added noise. Compared with DWT, DyWT owns the characteristics of translation invariant and is more suitable for data analysis. Muhammad et al. [Muhammad, Hussain and Bebis (2012)] developed a method based on the DyWT algorithm to extract features efficiently. This method first decomposed the image into two sub-bands which comprise the low-frequency component and the high-frequency component, then similar blocks of each sub-band were matched via calculating the Euclidean distance between overlapping blocks. This method not only keeps the advantages of DWTbased methods but also significantly improves the localization accuracy. 3) FT: The DWT and FFT-based algorithms cannot localize tampered regions with geometric transformation attacks. So, some improved methods were proposed to overcome the difficulties. Dixit et al. [Dixit and Naskar (2018)] first applied the Fourier-Mellin Transform (FMT) algorithm to extract the feature of each block. In this method, the low-frequency components were extracted with the DWT algorithm, and then the low-frequency sub-bands were divided into blocks. Finally, they calculated the Euclidean distance between blocks using the K-means algorithm. The experimental results proved that this method locates forged regions accurately even if these regions are rotated or scaled with additional operations, but it is unable to localize incomplete scaling regions and has the weakness of high computational complexity. Zhong et al. [Zhong and Gan (2016)] improved the FMT algorithm and presented a method which combined the Analytical Fourier-Mellin Transform (AFMT) with the properties of image moments. This method constructed a circular auxiliary template to extract the moment invariant feature of each block and analyzed these invariant features. This method is robust against geometric transformation and Gaussian noise attacks, especially for the geometric distortion. But this method underperforms in affine transformation and compact tampering. Yang et al. [Yang, Bai, Yin et al. (2015)] proposed a method based on Fractional Fourier Transform (FrFT). Firstly, the low frequency components of the image onelevel DWT coefficient matrix were reserved. Secondly, the FrFT algorithm was used to extract the feature of each block. Finally, the lexicographical sorting algorithm was adopted to search the correlative features. The experimental results demonstrated that this method has certain robustness to geometric transformation. However, it is not suitable for locating small forged regions. In a word, the DCT and DWT-based methods are robust to many additional operations e.g., JPEG compression, blur and noise attacks. But these methods lack the ability to deal with geometric transformation operations because the DCT or DWT algorithm is invariant to geometric transformation. Some improved methods based on the FMT algorithm perform well in handling geometric transformation operations.

Texture-based methods
Texture exists in natural scenes which contain grass, clouds, trees, and ground. It can be represented by the regular image attributes such as smoothness and roughness. When tampering the image, these properties are preserved and transferred to other areas. Therefore, texture can be used as features in the similarity matching stage. Ardizzone et al. [Ardizzone, Bruno and Mazzola (2010)] proposed an edge histogram-based method to extract the feature of each block. The tampered areas can be located accurately in a short computing time even if these areas are small. Lee et al. [Lee, Chang and Chen (2015)] presented a directional gradient histogram-based method to extract features. An average gray value-based method was introduced by Lynch et al. [Lynch, Shih and Liao (2013)]. The methods proposed in Ardizzone et al. [Ardizzone, Bruno and Mazzola (2010); Lee, Chang and Chen (2015); Lynch, Shih and Liao (2013)] are robust against JPEG compression, blur and altered contrast. Despite the benefits, these methods are limited if the duplicated regions are applied with geometric transformation operations. The color information is invariant to JPEG compression, translation and rotation. Bravo-Solorio et al. [Bravo-Solorio and Nandi (2011)] proposed a method to localize the tampered regions with geometric transformation. This method mapped the color-dependent feature of image pixels. However, it underperforms in JPEG compression and Gaussian blur and needs to improve the performance in large areas with little color information. Novozá mský et al. [Novozá mský and Šorel (2018)] developed a novel method to improve the localization performance, especially for the small regions containing texture. This method combined average gray value with Tamura texture to extract and analyze texture features. In summary, texture-based methods are robust against contrast-changing, Gaussian blur, and JPEG compression attacks, but these methods do not perform well in geometric transformation.

Moment invariant-based methods
Some additional operations like translation, rotation, and scaling are ordinarily applied after tampering the image. The image moments are a set of features that are invariant to these operations. It was initially introduced by Hu [Hu (1962)] for the pattern recognition community. Mahdian et al. [Mahdian and Saic (2007)] first proposed a method on copymove forgery localization field which used blur moment to extract 24 invariant features from each block. In order to solve different kinds of problems of blur moments such as data redundancy, various improved methods have been proposed like Krawtchouk moments, Zernike moments, and exponential moments. Kushol et al. [Kushol, Salekin, Kabir et al. (2016)] implemented a method based on the seven Krawtchouk moments and three colors features of image segmentation. This method is robust against geometric transformation, but not suitable for locating regions with the mirror reflection transformation. Ryu et al. [Ryu, Lee and Lee (2010)] proposed a method based on Zernike moments for feature matching. In this method, the Principal Component Analysis (PCA) algorithm is used to reduce the computational load and the k-Nearest Neighbor (kNN) algorithm is employed to filter spurious matched pixels. It turned out that this method can locate tampered areas accurately with translation and rotation but fails to locate forged regions with the scaling manipulation. Hu et al. [Hu, Zhang, Shao et al. (2014)] illustrated Exponential-Fourier moments (EFM) can improve the performance of feature matching. The experimental results proved that this method is robust against noise and smooth distortion. In comparison with Zernike moments-based method [Ryu and Lee (2010)], the radial function of EFM has lower computational complexity and has more uniform distribution of zero making EFM represent image features better. Zhong et al. [Zhong and Xu (2013)] suggested combining exponential moments with histogram moments. This method has shorter processing time and increases robustness against geometric transformation, altered brightness, and contrast-changing. However, there still needs some improvement in locating small forged areas. Some block-based methods employ the circular shape rather than the conventional square in the block dividing stage. Li et al. [Li, Zhao, Liao et al. (2012)] proposed a Polar Harmonic Transform (PHT)-based method to analyze features in circular shape. This method is robust against rotation, noise and JPEG compression. Compared with Zernike moments-based methods, the PCT-based methods not only own the capability of stronger anti-noise and JPEG compression but also have lower computational complexity. Wo et al. [Wo, Yang, Han et al. (2017)] presented an improved method called PCET (Polar Complex Exponential Transform). They extracted the multi-scale features of the image to reserve detailed information sufficiently. This method is robust against translation, scaling and JPEG compression with the high temporal complexity. In a word, moment invariant-based methods are robust against geometric transformation. However, these methods have the weakness of higher computational complexity contrast to other methods.

Dimension reduction-based methods
The dimension reduction-based methods are used to reduce the dimensionality of the image feature and improve the performance. These algorithms rarely affect the localization accuracy, and mainly include PCA, Singular Value Decomposition (SVD) and Local Linear Embedding (LLE). 1) PCA: Popescu et al. [Popecu and Farid (2005)] extracted the image feature based on the PCA algorithm. This method is robust against noise and JPEG compression. Bashar et al. [Bashar, Noda, Ohnishi et al. (2010)] utilized DWT and Kernel-PCA to extract two robust features. The DWT-based method has a better performance than the PCA and KPCA in free of noise and JPEG compression, while the KPCA-based approach is robust against noise and JPEG compression. 2) SVD: Zhang et al. [Zhang and Wang (2009)] proposed a method based on SVD and kd-tree algorithm to extract features and match blocks. This method performs well in scaling and rotation but it is not robust against JPEG compression. Li et al. [Li, Wu, Tu et al. (2007)] used DWT and SVD algorithm to extract features. This method applies the low frequency wavelet components of SVD coefficients for matching blocks. It can localize the duplicated regions accurately even when the image was highly compressed or edge processed. 3) LLE: Zhao [Zhao (2010)] developed LLE algorithm to reduce high-dimensional features and discover the relations between high-dimensional and low-dimensional features. This method can map high-dimensional data to low-dimensional data without changing the relative location. Compared with the PCA-based method [Popecu and Farid (2005)], the LLE algorithm is able to reserve the boundary feature of tampered regions. But the PCA-based method [Popecu and Farid (2005)] requires less processing time.

Comparative advantages and disadvantages of block-based methods
The block-based copy-move forgery localization methods have been reviewed in the previous subsections. In this subsection, a comparison has been made for these methods according to robustness and computational cost. The comparison is shown in Tab. 1. In summary, most frequency-based methods are robust to some additional operations, but these methods have limited capability in dealing with geometric transformation. Texture features are usually combined with other image inherent information to increase the robustness of the additional operations, especially for the small duplicated regions. However, texture-based methods also fail in localizing forged regions with geometric transformation. Therefore, some methods based on image moments are proposed to handle the geometric transformation. Then the dimension reduction-based methods like PCA and SVD are presented to increase the processing speed.

Keypoint-based methods
The second section have illustrated block-based copy-move forgery localization methods. Although most block-based methods can localize duplicated regions accurately, these methods have a high computational cost in the feature matching stage for a large number of blocks that need to be processed. Therefore, David Lowe [Lowe (1999[Lowe ( , 2004 proposed the keypoint-based method. For keypoint-based methods, the step of dividing blocks is removed in the pre-processing phase. The keypoint-based methods describe local features extracted from the extreme points. These extreme points exist in corners, spots, and edges. Each local feature is a set of descriptors generated in the surrounding of extreme points which contributes to increasing the effectiveness of features. Each descriptor is matched with other descriptors to find duplicated regions. The keypoint-based methods can be widely classified into two categories: Scale Invariant Feature Transform (SIFT) and Speed Up Robust Feature (SURF). SIFT-based methods are able to be classified into the following three steps: scale-space extrema detection, keypoints localization, orientation assignment, and keypoints description. SURF-based methods can be divided into three steps: fast keypoints detection, orientation assignment, and 64-element keypoint descriptors using the Hessian matrix and image integration.

SIFT techniques
SIFT was originally proposed by David Lowe [Lowe (1999)]. It is a local feature description for the image processing field. This description can localize duplicated regions for its scale invariant.
Hailing et al. [Hailing, Qiang and Yu (2008)] calculated the Euclidean distance between SIFT feature descriptors to conduct feature matching. This method is robust against rotation and scaling. Su et al. [Su and Zhu (2012)] proposed a fast feature extracting method based on Locally Preserved Projection (LPP) algorithm for reducing the dimension of SIFT descriptors. This method is robust against rotation, scaling and JPEG compression. Jaberi et al. [Jaberi, Bebis, Hussain et al. (2013)] improved SIFT features and proposed a method based on Mirror Reflection Invariant Feature Transform (MIFT). This method is robust against the mirror reflection transformation, especially for small regions. Li et al. [Li, Yang, Meng et al. (2014)] introduced a lower computational complexity method compared with the SIFT-based method [Hailing, Qiang and Yu (2008)]. They utilized PCA algorithm to reduce the feature dimension. Hashmi et al. [Hashmi, Hambarde and Keskar (2014)] extracted SIFT features using the low frequency components of DWT coefficients. This method is robust against rotation and scaling with a low computational load. However, the detailed information in the high frequency components was ignored resulting in locating the small tampered areas ineffectively. Anand et al. ] improved the method of Hashmi et al. [Hashmi, Hambarde and Keskar (2014)] and firstly combined DyWT algorithm and SIFT technique. Experimental results demonstrated that this method is more robust. The methods in Hailing et al. [Hailing, Qiang and Yu (2008) (2015)] proposed a method which integrated SIFT and Zernike moments for localizing smooth regions. The weakness of this method is that it is computationally complex. In summary, the forged regions with geometric transformation can be located accurately because SIFT-based methods rely on detecting keypoints and generating descriptors. However, these methods based on SIFT alone cannot locate smooth regions.

SURF techniques
Although SIFT-based methods can localize tampered regions accurately, the computational cost is higher in the feature matching stage since the generated feature descriptor with high dimensions, especially for high-resolution images. Therefore, Bay et al. [Bay, Ess, Tuytelaars et al. (2008)] firstly proposed the SURF technique to reduce feature dimension. The dimension of feature descriptors in the Bay et al. [Bay, Ess, Tuytelaars et al. (2008)] method was extended to 128 dimensions by Xu et al. [Xu, Wang, Liu et al. (2010)]. Experiments proved that SURF can reduce the spurious matched pixels. This method is robust against geometric transformation and some additional operations. Lin et al. [Lin and Wu (2011)] extracted SURF features from DCT coefficients. They performed experiments on a small dataset. This method is robust against geometric transformation. Hashmi et al. ] extracted SURF features from low frequency components of DWT coefficient matrix. Descriptors matching is based on Best-Bin-First algorithm. At the same time, they replaced DWT with DyWT to strengthen the robustness of translation. Pandey et al. [Pandey, Singh, Shukla et al. (2015)] proposed a method based on SURF and SIFT. The forged areas can be localized quickly and accurately. However, the methods proposed in Lin et al. [Lin and Wu (2011);Hashmi, Anand and Keskar (2014); Pandey, Singh, Shukla et al. (2015)] cannot localize smooth areas ineffectively. Zhang et al. [Zhang, Yang, Niu et al. (2017)] presented a method which combined SURF with FMT algorithm. This method resolved the issue in Lin et al. [Lin and Wu (2011);Hashmi, Anand and Keskar (2014); Pandey, Singh, Shukla et al. (2015)].
In summary, SURF techniques have a higher overall performance than SIFT techniques because of the lower dimension of descriptors. However, SIFT techniques own a better score in localization accuracy. Therefore, combining the strengths of both can ensure the accuracy and accelerate the computational speed.

Comparative advantages and disadvantages of keypoint-based methods
The keypoint-based copy-move forgery localization methods have been surveyed in the previous subsections. In this subsection, a comparison has been made for these methods in robustness and computational cost. The comparison is given in Tab. 2. Keypoint-based methods are broadly applied since they are robust against geometric transformation. However, as they need to detect lots of extreme points to generate descriptors, the computational cost is higher in the feature matching stage. Hence, some dimension reduction algorithms like PCA and DWT are implemented. Moreover, some moment invariant-based methods such as the method in Pandey et al. [Pandey, Singh, Shukla et al. (2015)] are combined with SIFT or SURF to solve the problem in the smooth areas.

Deep learning-based methods
Although the traditional methods perform well, these methods can only process a certain type of forgery and it is impossible to know which tampered method has been adopted without prior information [Liu and Pun (2018)]. In the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), Alex and his mentor (Hinton) won this challenge using Convolutional Neural Networks (CNN: AlexNet). The top5 accuracy of this model reaches 83.6%, surpassing the second of 74.2% (using the traditional computer vision method). Deep learning caused a huge sensation. Since then, deep learning-based methods have performed satisfactorily in the field of computer vision (CV), such as image classification, object detection and image segmentation. These methods do not require explicit feature extraction. Instead, they learn the related features automatically during the network training stage [Liu and Pun (2018)]. So, some deep learning-based methods [Liu, Guan and Zhao (2018); Wu, Abd-Almageed and Natarajan (2018)] have been proposed. More specifically, Liu et al. [Liu, Guan and Zhao (2018)] decomposed an image adaptively using Convolutional Oriented Boundaries (COB) algorithm. They performed the feature matching via kNN algorithm based on the Convolutional Kernel Network (CKN). Compared with the traditional methods, this method is robust against multiple additional operations including geometric transformation. Wu et al. [Wu, Abd-Almageed and Natarajan (2018)] proposed a novel method which was a two-branch architecture constructed by the target localization branch and the similarity localization branch. They exploited the Full Convolutional Network (FCN) [Shelhamer, Long and Darrell (2017)] and the Inception structure [Chen, Papandreou, Kokkinos et al. (2018)] to improve the performance of the target localization branch. In addition, they used the Pearson correlation coefficient to quantify the feature similarity in the similarity localization branch. The traditional methods can merely localize similar regions, but this method not only locates similar regions but also distinguishes the source and target regions. Although this method is robust against various known additional operations, there still needs some improvement. Because some spurious matched pixels and misjudged pixels in the boundary of tampered regions affect the overall accuracy.

Future research directions
The existing methods have achieved fine performance in copy-move forgery localization, but there still remain some unresolved issues.

Extract effective local feature descriptors
SIFT and SURF features are widely used in keypoint-based methods because both are good local characterizations. It can be known from the existing literature that keypointbased methods cannot localize smooth regions accurately. Recently, some researchers have proposed a few improved methods to handle the limitations of keypoint-based methods. Yang et al. [Yang, Sun, Guo et al. (2018)] developed a method for detecting keypoints using the adaptive threshold. The number of keypoints is controlled by the threshold to generate more keypoints in the smooth areas. In addition, this method exploited a circular region instead of a square one to improve the performance in mirror reflection transformation. Wang et al. [Wang, Li, Niu et al. (2017)] presented a method based on local information entropy which divided each non-overlapping block into irregular superpixels, then they extracted the robust keypoints from each superpixel. Finally, they used exponential moments to construct local features of each keypoint. The main drawback of this method is that it is not feasible in practical applications for its high computational cost. Therefore, there is a need to extract effective local feature descriptors to locate smooth regions.

Robustness against some additional operations for conventional methods
The forgery localization methods must be robust against additional operations in practical applications. The existing methods have achieved satisfied performance in free of additional operations. However, the localization accuracy of these methods may drop sharply when there exist one or more additional operations. Therefore, the method that is against some additional operations should be paid more attention.

Consideration of additional operations in the training process of deep learningbased methods
Although the deep learning-based methods perform well in free of additional operations, their performance still need to be improved in case of additional operations. The main reason is that the additional operations are not considered in the training process. Therefore, the future work needs to construct network models to simulate the additional operations and considers these additional operation networks in the training process to enhance the robustness against these additional operations.

Conclusion
This paper starts with introducing the image forensic investigation and then analyzes the necessary of copy-move forgery localization using passive techniques. The start-of-the-art copy-move forgery localization methods are classified and summarized. Specifically, these methods are categorized as block-based methods, keypoint-based methods, and deep learning-based methods. This paper outlines the process for these methods and compares the performance of different methods in robustness and computational cost. What's more, we also illustrate the current existing issues and present future research directions.