Perceptual Hashing-Based Robust Image Authentication Scheme for Wireless Multimedia Sensor Networks

Image authentication is critical for secure image transmission and storage in a wireless multimedia sensor network (WMSN). In this paper, we propose a perceptual hashing-based robust image authentication scheme, which applies the distributed processing strategy for perceptual image hashes and can provide compactness, visual fragility, perceptual robustness, and security in digital image authentication for WMSN. In the proposed scheme, first, the cluster head node generates a secure pseudorandom chaotic sequence with keys and sends it to the image capturing node, then the image capturing node uses the received chaotic sequence to divide randomly the captured image into several overlapping rectangles, after that, two gravity centers of each rectangular region block are calculated, and finally the binary distance of the two gravity centers will be calculated in each general cluster member node. The cluster head node receives the binary sequence of the distance from all of the general cluster member nodes and combines them to form the perceptual hashing sequence to be sent to the base station for image authentication purpose. Experimental results show that the proposed scheme has satisfactory authentication ability and can ensure not only the visual fragility for perceptually distinct images but also robustness for perceptually identical images via image rotation, JPEG compression, and noise burring.


Introduction
Wireless sensor networks (WSNs) are going to be widely used in the near future due to their breadth of applications by military, exploration teams, researchers, and so on. Most of this research is concerned with scalar sensor networks that measure physical phenomena, such as temperature, light, humidity, pressure, acoustic sensor, or location of objects that can be conveyed through low-bandwidth and delaytolerant data streams. Recently, the focus is shifting toward research aimed at revisiting the sensor network paradigm to enable delivery of multimedia information, such as a monitoring data, digital image, audio and video streams, as well as scalar data [1]. This effort will result in distributed, networked systems, referred to in this paper as wireless multimedia sensor networks (WMSNs). The WMSNs will enable new applications such as multimedia surveillance, traffic enforcement and control systems, advanced health care delivery, structural health monitoring, and industrial process control. Consequently, it will bring new security of challenges as well as new opportunities. Secure and robust multimedia communications become increasingly important for energyconstrained WMSNs [2]. As one of the security techniques, image authentication is critical for secure image transmission and storage in WMSNs. However, conventional data authentication schemes cannot be applied directly to WMSN due to the constraints on limited energy and computing resources in sensor nodes. Those constraints pose great challenges to WMSN development and motivate us to design a proper authentication strategy for WMSNs.
For open communication channel, WSNs are vulnerable to various attacks, and the security in WSNs is required. A secure hierarchical key management scheme in WSNs was proposed in [3]. The security analysis and simulation show the scheme can prevent several attacks effectively and reduce the energy consumption. In WSNs communications, Han et al. [4] described six types of attacks including communication attacks, attacks against privacy, sensor node targeted attacks, power consumption attacks, policy attacks, and cryptology attacks on key management. In communication attacks, eavesdropper can easily access or even manipulate message such as injecting, cropping, and tampering. So the receiver needs to make sure that the data used in any decisionmaking process originates from the correct source. Data authentication prevents unauthorized parties from participating in the networks and legitimate nodes should be able to detect messages from authorized nodes and reject them. In [5], a robust user authentication scheme for WSNs was proposed. The scheme takes an advantage of the two-factor authentication concept to provide a secure authentication system offering balanced features in terms of security and performance.
The authenticity of data and commands is also a critical requirement for the correct behavior of a WMSN [6]. Digital images are becoming widely used in WMSNs as a kind of common multimedia information. Therefore, the key establishment technique should guarantee that the image communication and storage have a way for verifying the authenticity, creditability, and integrity of the received image in WMSNs. However, resource constraints in sensor networks (such as limited battery power and bandwidth/computation capability) pose challenges for the image authentication technique. Conventional binary data authentication schemes could provide data integrity in a strict sense regardless of multimedia content. However, those schemes are not applicable to WMSNs because only simple bit errors during data transmission can lead to the authentication failing in spite of preserving multimedia content [7,8]. On the other hand, watermarking-based image content authentication techniques are robust against bit errors, packet losses, and compression distortion. However, watermark embedding creates extra source coding overheads and complicates transmission protocol design in WMSNs [9], which don't adapt well to WMSNs due to the constraints on limited energy and computing resources.
In [10], an optimized content-aware authentication scheme for JPEG 2000 images over lossy channels was proposed. An acyclic authentication graph was developed to optimize the trade-off between the expected image distortion and the cryptohash tagging cost, through the computation based on packet loss probability and visual importance level of the image packets. The work reported in [11] proposed a JPEG 2000 compatible stream authentication scheme that significantly reduced the computational complexity and had only a minimal authentication dependency overhead in WMSNs. Moreover, an authentication-aware wireless network resource allocation scheme was developed to reduce image distortion and energy consumption during transmission. The scheme significantly improved the authenticated image quality even under strict communication energy consumption constraints in WMSN. In [12], a rate-distortion optimization authentication scheme for H.264 video transmissions was proposed. A video packet transmission scheduler was designed to minimize the visual distortion under the limitation of total bit budget and authentication dependency. Another related work regarding bite rrors robust image or video authentication was given in [13][14][15]. However, all of these approaches are not able to be applied directly to WMSNs due to the energy constraint. In this paper, we propose a perceptual hashing-based robust authentication scheme for WMSNs. Based on the distributed processing strategy for perceptual image hashes, the proposed scheme can provide compactness, visual fragility, perceptual robustness, and security for image authentication in WMSNs.

Perceptual Image Hashing Extraction
A perceptual image hashing function maps an image to a short binary string as a digest based on an image's appearance to the human eye. Perceptual image hashing is a class of oneway mappings from image presentations to a perceptual hash value in terms of their perceptual content. Given an image and its perceptually similar copy with minor distortion , the image hashing function (⋅) depends on the secret key . In [16], the desirable properties of perceptual hashing function (⋅) can be summarized as follows.
(i) One-Way Function. Ideally, the hash generation should be noninvertible: (ii) Compactness. The size of the perceptual hashing value should be much smaller than that of the original image (v) Security. The perceptual hashing is intractable without the secret key All of the above parameters , , and should be close to zero. Based on THE above properties, perceptual image hashing can be usually applied to image content identification, image indexing, content authentication, and so forth. In particular, a perceptual hash function should have a property, that is, two images that look the same map to the same hash value, even if the images have small bit-level differences. This differentiates a perceptual hash from traditional cryptographic hashes, such as SHA-l and MD5. In cryptography, the hash function is typically used for digital signature to authenticate the message being sent so that the receiver can verify its source. A key feature of conventional hashing algorithms such as SHA-l and MD5 is that they are extremely sensitive to the input data; that is, changing even one bit of the input message will change the output dramatically. However, image data often undergoes various content-preserving manipulations such as lossy compression, channel additive noise, image enhancement, scaling, bit errors and packet losses during wireless transmission and storage in WMSNs. These distortions are usually insignificant, and image hashes should be robust to unmalicious distortions. On the contrary, some malicious manipulations could introduce perceptually significant distortions, for example, object insertion, removal, and substitution, and it is desirable that the image hash is sensitive to perceptually significant attacks. Therefore, image hashes should be robust to unmalicious distortions, but sensitive to malicious manipulations for the image authentication purpose [17].

Image Random Blocking by Chaotic Sequence.
In order to enhance the security of the perceptual hashing algorithm, we use a secure pseudorandom sequence with the key to divide randomly the digital image into overlapping rectangles, and the key controls the number of rectangles and the pseudorandom sequence. The image blocking can also make up the disadvantage that the extracted image features can only describe global characteristics of an image.
As a phenomenon found in a nonlinear dynamic system, chaos is deterministic and random-like. Based upon the sensitive dependence of chaotic systems on their initial conditions, a large number of nonperiodic, continuous broadband frequency spectrum, noise-like, yet deterministic, and reproducible signals can be generated. So chaos is very useful for generating secure pseudorandom sequences.
The chaotic maps (6) and (7) are given by where 3.57 < ≤ 4 is the chaotic system parameter and 0 < The curve of function = Ω( ) is shown in Figure 1. From Figure 1, it is seen that, when = 10 −19 , ≈ 0. So the value space of 1 is 10 19 . Similarly, the value spaces of , 0 , and 1 are shown in Figures 1 and 2. They are 1 × 10 15 , 3 × 10 14 , and 3 × 10 16 , respectively. Therefore, when the difference of initial values or chaotic parameters of chaotic system is greater than some specific value, two different chaotic sequences will be generated. For 1 , the difference should be greater than 10 19 and similarly for others. Let in (6) and 0 as secret keys be denoted by 1 and 2 , respectively. Consequently, we will use the generated chaotic sequences with keys 1 and 2 to divide randomly the digital image into overlapping rectangles for security purpose.
According to the image size, we adaptively select proper bits of the chaotic sequence as the coordinate of -axis and -axis, length and width of the random region to prevent out boundary, denoted by a quaternion ( , , length, width). So the image is divided into overlapping rectangular regions shown in Figure 3. Because the quaternion is randomly generated, the rectangular areas are random.

Robust Local Feature Points Based on Gravity
Center of Random Blocks. The local features of the image should be not only stable under geometric transforms such as rotation and scaling but also robust to insignificant distortions such as additive noising and blurring, bit errors, and packet losses during transmission in WMSNs. Based on the image random blocking in Section 2.1, we propose a robust local feature points extraction method using the gravity center of random blocks. The extracted robust local features will then be used to generate perceptual hashes in Section 2.3.
The two-dimensional (2D) moment can be directly used in the interested regions and does not need to separate them from the whole image. High-order moments are more sensitive to noise, while the low-order moments are insensitive to noise and bit errors, which is beneficial to the characterization of collectivity for the regions.
The 2D ( + )th order origin moment of a continuous image is defined as [18] = ∫ ( , ) , , = 0, 1, 2, . . . , where ( , ) represents the gray-level value at location ( , ). For digital image the integrals are replaced by summations and becomes = ∑ The gravity center ( , ) of the image is defined in terms of the zero-order moment and first-order moments as follows: where the zero-order moment 00 represents the area of the image clearly.
In order to obtain the local feature, the coordinate of the gravity center ( , ) of each random rectangular block is calculated as where ( , ) represents the gray-level value at location ( , ) in the th random rectangular block and is the number of random rectangular blocks. Thus, there are total gravity centers of the pseudorandom rectangular blocks denoted by = { 1 , 2 , . . . , , . . . , }. The local feature information of the image can be obtained by calculation of each block's gravity center, which improves the ability to distinguish different images.
The gravity center of the image has geometrically invariant property. In this paper, the rotation invariability is analyzed as an example. After a rotation by an angle about the original, the first-order moments are given by 10 Namely, Thus, the rotation invariability is held. In similar analysis, other geometrical invariability characteristics can also be obtained. Figure 4 shows the robustness of gravity centers under geometric distortions and common image processing. Note that × denotes the virtual locations gravity centers in the distorted image and o denotes theoretical locations of gravity centers. Once the two symbols are coincident, the geometrical invariability of gravity centers and the strong robustness to additive noise and JPEG compression will be indicated.

Perceptual Image Hashes Generation. The supplement image block of each random rectangular block is defined as
where level is the maximum gray-scale level of the image. Likewise, we obtain the gravity center of the supplement image block. For each rectangular block, we calculate its supplement image block's gravity center and obtain total gravity centers of the supplement image blocks. The normal gravity center of image usually lies around the center of image, so does that of the supplementary image. Thus the distance between the two gravity centers of image and its supplement image is short. We devise a solution by making a modification of the gravity center. The improved supplement image blocks' gravity center̂(̂,̂) of th rectangular image block is obtained bŷ where Δ 1 > 0 is the quantification step. Through such modification, we enlarge the distance between the two gravity centers on one hand. On the other hand, the parameter Δ 1 guarantees the robustness against malicious attacks in calculating the two gravity centers. In order to generate the perceptual image hashes, we calculate the distance of the two gravity centers between each rectangular block and its supplement image block. Considering the constraints on limited energy and computing resources in WMSNs, we use ∞ norm to measure the spatial distance between the locations of the two gravity centers for each rectangular block as follows: Let Δ 2 > 0 be another quantification step, and the distance between the two gravity centers can be quantified as where ⌊⋅⌋ denotes the floor function. Obviously, the quantified distance will be decimal integer. Moreover, the distance is more robust against common image processing and the bit errors during transmission.

Distributed Processing Strategy for Perceptual Image Hashes
In WSNs, clustering expedites many desirable functions and provides many advantages such as load balancing, energy savings, and distributed key management. The most prominent benefit of clustering is that it can greatly reduce the energy consumption of nodes and lengthen the network lifetime [19].
In this paper, in order to adapt well to the limited power resources and computational capabilities in sensor nodes, we consider clustering-based WMSNs with densely distributed nodes. The structure of a clustering is shown in Figure 5. Each clustering consists of a cluster head (CH), several general cluster member nodes, and a camera sensor that captures the digital image. In Figure 5, BS represents the base station, CH is a cluster head node, is the image capturing node, and 1 ∼ are the general cluster member nodes whose each node is assigned a fixed ID.
The distributed processing strategy is as follows: Step 1. The cluster head node CH generates a secure pseudorandom chaotic sequence with the keys 1 and 2 according to the method in Section 2.1 and sends this chaotic sequence to the image capturing node . In addition, the ID of each general cluster member node ( = 1, 2, . . . , ) is also sent to node .
Step 2. When the image capturing node captures an image, it will use the received chaotic sequence to divide randomly the captured image into overlapping rectangles. Moreover, each rectangle block is sent to the general cluster member node . Note that the chaotic sequence is mapping to the ID of the general cluster member nodes one by one.
Step 3. The distance of the two gravity centers for each rectangle block will be calculated in each general cluster member node , and its corresponding binary sequence ( ,1 , ,2 , . . . , , ) 2 can be generated by the method in Section 2.3, which is sent to the cluster head node CH.
Step 4. The cluster head node CH receives the binary sequences ( ,1 , ,2 , . . . , , ) 2 , = 1, 2, . . . , , from all of the general cluster member nodes, then combines them to form the perceptual hashing sequence, and finally the cluster head node CH sends the perceptual hashing sequence to the base station for image authentication purpose.

Perceptual Hashing-Based Image Authentication
When we identify the received image, firstly, the perceptual hashing sequence is obtained from the base station and is matched with the perceptual hashing sequence generated by the dubitable image. If the Hamming distance between two perceptual hashing sequences is less than the specified threshold, the image will be deemed an authentic version. Otherwise, the image is forged. The image authentication framework is shown in Figure 6. The steps are as follows.
Step 1. The perceptual hashing sequence 1 is received from the base station.
Step 2. The dubitable image is partitioned into random rectangular blocks by the same secret keys 1 and 2 like perceptual hashing generation process described in Section 2.1; then the distance of the two gravity centers for each rectangle block is calculated and quantified; after that the robust feature is extracted. Thus another perceptual hashing sequence 2 can be restructured like the description in Section 2.3.
Step 3. Setting a threshold > 0, normalized Hamming distance is calculated by  where is the length of perceptual Hashing sequence. If ( 1 , 2 ) ≤ , the image will be deemed an authentic version. Otherwise, if ( 1 , 2 ) > , the image will be forged. The smaller the normalized Hamming distance is, the stronger the robustness is. Ideally, the normalized Hamming distance for the perceptually identical images is close to 0, while the normalized Hamming distance of two different images is close to 0.5.

Visual Fragility of Perceptual Hashes.
The visual fragility represents that perceptually distinct images generate different perceptual hashes. We randomly select 80 images sized 300 × 300. The parameters are set as follows: 1 = 3.9, 2 = 1.2, Δ 1 = 10, Δ 2 = 4, and the number of random rectangle blocks is 150. Then we calculate their perceptual hashes and the Hamming distance between two perceptual hashing values ( 1 , 2 ). Finally, 6320 matching results can be obtained. The statistical histogram distribution is shown in As a result, the conflict probability is very small. Hence, the proposed perceptual hashing can ensure the visual fragility.

Perceptual Robustness.
To test the perceptual robustness to geometric transforms, we rotate the "Lena" image sized 512 × 512 with different degrees and calculate the perceptual hashes. Compare the perceptual hashing value of rotated image with the original. The relationship of the rotation angle and the normalized Hamming distance is shown in Figure 8. When the image rotation angle is within 5 ∘ , the normalized Hamming distance is less than 0.3, so the proposed perceptual hashing algorithm is robust to image rotation within 5 ∘ .    Figure 9 shows the robustness to image compression. When the quality factor of JPEG compression is changed from 90 to 30, the normalized Hamming distance between the original image and the compressed image is less than 0.3, so the proposed perceptual hashing algorithm is robust to image compression. The smaller the quality factor is, the larger the compression degree is. From Figure 9, we see that the robustness is getting more and more strong with the accretion of the quality factor. Figure 10 shows the robustness to additive Gaussian white noise and uniformly distributed multiplicative random noise. When the image is blurred by different noise degrees with variance 6.554 ∼ 65.54, we calculate the perceptual hashes. From Figure 10, we see that, when the variance is less than 65.54, the normalized Hamming distance between the original image and the noise blurred image is less than 0.3; That is to say, the proposed algorithm is robust to Gaussian Segment 1 ( 1 = 3.9, 2 = 1.2) ⋅ ⋅ ⋅100101110001101100001101000⋅ ⋅ ⋅ Segment 2 ( 1 = 3.91, 2 = 1.2) ⋅ ⋅ ⋅100011011001010010011000100⋅ ⋅ ⋅ Segment 3 ( 1 = 3.9, 2 = 1.21) ⋅ ⋅ ⋅101010011000100111001001110⋅ ⋅ ⋅ white noise and uniformly distributed random multiplicative noise during image transmission and storage.

Security.
Because the chaotic sequence is nonperiodic and sensitive to the initial value, the chaotic maps (6) and (7) are used to generate the pseudorandom sequence in this paper. Then, the digital image is randomly divided into overlapping rectangular regions by the chaotic sequence for perceptual image hashing extraction. Thus, if chaotic initial values are changed, that is, the keys are different, the extracted perceptual image hashing will be different. Table 1 lists the segments of the generated perceptual image hashes by different keys, which indicates the perceptual hashing is intractable without the secret key. We can calculate that the normalized Hamming distance is 0.4167 between the corresponding locations of the perceptual hashing segment 1 and segment 2 and that it is 0.4283 between segment 1 and segment 3. The two normalized Hamming distances are all close to 0.5, so the security meets the application requirement. Therefore, without knowing the key, even if the perceptual image hashing algorithm is known, the correct perceptual hashing value generated by the image cannot be leaked.

Conclusions
In this paper, we have proposed a robust image authentication methodology for authenticity, creditability, and integrity transmission and storage of authenticated images based on the perceptual hashing technique in WMSNs. First, a gravity center-based perceptual image hashing algorithm is proposed with compactness, perceptual robustness, visual fragility, and security. The generated perceptual hashing value is a short binary string as a digest of the image in order to tackle the problem of severe energy constraints and perceptual image redundancy in WMSNs. Furthermore, a distributed processing strategy for perceptual image hashes is developed to meet the limited computing resources of sensor nodes in WMSNs. The experimental results demonstrate that our scheme has the satisfactory authentication performance for perceptually distinct and identical images.