Privacy Protection in Surveillance Videos Using Block Scrambling-Based Encryption and DCNN-Based Face Detection

Surely surveillance cameras are certainly important in all aspects of life. We have become in an era where we need to use surveillance cameras everywhere, homes, schools, banks, hospitals, and companies, even in the general streets, to monitor everything that happens and follow the progress of those places with all safety by surveillance videos. However, the pervasiveness of surveillance cameras has become an issue for people’s privacy. This paper proposes a novel method for surveillance video privacy protection using block scrambling-based encryption and DCNN-based object detection. An object detection model based on DCNN You Only Look Once version 3 (YOLOv3) is used to detect the faces of the people. Then, the detected faces are scrambled using the fast block scrambling technique. Finally, the scrambled faces are encrypted using a secret key produced from a chaotic logistic map. The bounding boxes that output from the YOLOv3 are modified to include the entire edges of the detected faces to prevent any leaks of the sensitive regions. The simulation results and security analysis confirmed the proposed method’s effectiveness in protecting the surveillance videos’ privacy.


I. INTRODUCTION
Internet of Things (IoT) applications could bring massive value to our lives. With revolutionary computing capabilities, newer wireless networks, and superior sensors, IoT could be the next frontier in the race for its share of the wallet. Imagine an intelligent device such as a traffic camera. The camera can monitor the streets for accidents, weather conditions, traffic congestion, etc. The users can track anything that happened or is happening in their homes on their mobile devices. Because the surveillance videos are streamed over different networks, the people's privacy in the surveillance videos may be violated by an attacker. So the surveillance videos' privacy must be protected. Encryption is one of the most efficient mechanisms to protect surveillance videos [1], [2]. The surveillance video encryption techniques can be used The associate editor coordinating the review of this manuscript and approving it for publication was Zhenhua Guo . in two manners: 1) entire video encryption [3], [4], [5], [6], and 2) encrypt only the regions of interest (ROIs) that are considered sensitive information [7], [8], [9], [10], [11], [12]. There is no need to encrypt the entire surveillance video, especially if the surveillance video has information from public places and does not have sensitive information in all regions. Figure 1 shows the two mentioned manners.
Additionally, encrypting entire frames of a surveillance video is computationally expensive. So encrypting only ROIs and keeping the non-ROIs unprotected increase the efficiency of the surveillance video protection process. This work focuses on ROI-based protection, and the ROIs represent people's faces. The proposed technique uses an object detector to extract the locations of the ROIs from surveillance video. Then the proposed encryption algorithm encrypts the ROIS using a fast block scrambling technique and keys from a chaotic map. The work's contributions are summarized as follows: 1. The edges of the detected ROI are protected to prevent any sensitive leaks. 2. A novel splitting technique is used to generate blocks and sub-blocks from ROI. 3. Remove correlation between ROI pixels using a zigzag pattern, rotation, and blocks permutation to generate a scrambled ROI. 4. Encrypting different ROIs with different keys increases the security of encrypted ROIS. 5. The key used in the encryption process is based on the logistic map and the input ROI.
The rest of this paper is organized as follows. Section 2 explains the related work; Section 3 demonstrates the proposed privacy protection method in detail. In section 4, the simulation results and security analysis are presented. In section 5, the work is concluded.

II. REVIEW OF RELATED WORKS
Also, traditional encryption techniques such as 3-DES and AES are used in [13] and [14] to protect sensitive information. These techniques have a high computational cost [13]. Still, they are not the best solution for digital surveillance video privacy protection since the surveillance video frames have a high correlation between neighboring pixels and a large amount of redundancy. Recently, several techniques for privacy protection have been proposed [15], [16], [17], [18]. A secure video surveillance model is proposed in [15]. A secure authentication protocol is implemented to resist replay attacks and man-in-the-middle attacks. Lee and Park [16] have exploited blockchain technology in the network of surveillance systems. The method ensures the high security of cloud-based intelligent surveillance systems. Du et al. [17] proposed a method to quickly find the ROIs in videos, then protect the privacy of videos by encrypting the ROIs. Chu et al. [18] proposed a method for real-time privacy preserving moving object detection in the cloud. The method has some cons: 1) the encryption method is not strong if the chaotic map is not used in their randomness functions, and 2) the contours of the foreground objects are available at the server, violating privacy. Newton et al. [19] proposed a technique to protect privacy by de-identifying faces. The similarity between faces is calculated using a distance metric; then, new faces are generated by averaging components of the image. Korshunov et al. [20] developed a method to obfuscate faces in video surveillance based on well-known warping techniques. Ma et al. [21] proposed a reversible full privacy region protection method for cloud video surveillance. Du et al. [22] developed a privacy protection method in video surveillance that addresses video anonymization, behavior preservation, recoverability, and compressibility problems in one unified system. Rahman et al. [23] presented a scrambling technique based on chaos cryptography to protect ROIs that contain sensitive data in video surveillance. Zhang et al. [24] proposed a lightweight encryption technique based on layered cellular automata for privacy protection in surveillance videos. The ROIs are encrypted and stored on the camera side, where only authenticated users can access the encrypted ROIs. Any user can watch surveillance videos without ROIs.
ROI-based protection techniques protect the regions that have located by a detection algorithm. The security requirement for ROI protection should not depend only on the design of the encryption algorithm but also on the detection algorithm that detects the entire object to prevent any leaks of sensitive regions. The detection algorithm should detect objects accurately and efficiently. Wen et al. [25] use a geometric active contour model to detect the target regions. Kanso et al. [26] presented a technique to locate ROIs in a medical image. The input image is divided into blocks. Then each block is processed to determine whether it is a significant region or not based on a statistical measure. In [27], ROI with irregular shapes is chosen and detected arbitrarily. A Gaussian mixture model and HOG feature extraction are used in [27] and [28]    Motivated by these vulnerabilities, a fast and efficient encryption technique is proposed in this paper to protect the ROIs in surveillance videos to improve such drawbacks. The proposed technique uses an object detector to extract the locations of the ROIs. Then the proposed encryption algorithm protects the ROIS. An object detector called YOLOv3 [30] is used to locate multiple faces in a surveillance video. YOLOv3 is an improved version of YOLO [31]. YOLOv3 has advantages in the object detection process regarding speed and accuracy. Bounding boxes are generated from the detection process and represent the location of objects in a surveillance video. The proposed work modifies the bounding boxes to protect the edges of detected faces and prevent any sensitive information leaks. The proposed encryption algorithm consists of four steps. First, splitting the input ROI into blocks and sub-blocks. A series of operations are applied to the blocks and sub-blocks to scramble the ROI in the second step. Third, a secret key is generated using a chaotic logistic map. Finally, the scrambled ROI is encrypted using the generated secret key by applying the XOR operation.

III. THE PROPOSED PRIVACY PROTECTION METHOD
This section describes all steps of the proposed method in detail. YOLOv3 processes the surveillance video to locate the ROIs from each frame. Then the ROIs are fed into the encryption algorithm to encrypt sensitive data. Finally, the encrypted ROIs are placed into their original places in the original video frames. Figure 2 shows the proposed method pipeline.

A. FACE DETECTION PROCESS
Face detection is the first process in the proposed method to determine the locations of the people's faces in surveillance video. YOLOv3 is used to detect the people's faces in each frame. A pre-trained YOLOv3 weights file [32] which is trained on the wider face: a face detection benchmark dataset [33], is used in this work. The bounding boxes that represent the location of each face are modified by scaling them to include the edges of detected faces and prevent any sensitive leaks. Figure 3 shows a comparison between the original bounding boxes and the modified ones.

B. ENCRYPTION PROCESS
Before the detected ROIs are encrypted, they need to be preprocessed. The encryption and decryption algorithm requires the input size to be multiple of the block size number used in the ROI channel splitting step. Therefore the size of ROIs is processed and padded with neighboring pixels if needed. The proposed method for encrypting an ROI has four steps applied to the three channels (red, green, and blue) of an ROI. Figure 4 shows the steps of the ROI encryption process.

1) ROI CHANNEL SPLITTING STEP
An ROI channel is split into blocks that have the same size. The block sizes the user can use are (i.e. 16, 32, and 64). Then a random number is generated for each block. For each random number of each block, the block is split into subblocks or kept as it is.

2) ROI CHANNEL CONFUSION STEP
In this step, the blocks and sub-blocks positions in an ROI channel are reordered randomly by applying more than one operation: a. Change the arrangements of pixels in each block and sub-blocks by using a zigzag pattern. b. Rotate the blocks and sub-blocks by ninety degrees. c. Generate a random sequence with a length equal to the number of blocks in the ROI channel. d. Generate a confused ROI channel by changing the arrangements of the blocks depending on the random sequence.

3) ROI KEY GENERATION STEP
Each ROI in the video is encrypted using a different key to make the proposed method more robust. The chaotic logistic map is used to generate the keys. The generated key is used in the ROI diffusion step. The key can be calculated using: where the initial value X 0 is X n when n = 0, 0 < X 0 < 1, and 0 < λ ≤ 4. When λ ∈ [3.57, 4], the map is chaotic. The initial value X 0 depends on the original ROI. The steps for key generation are: a. Generate X 0 of the logistic map by: where M and N are the sizes of the input ROI. b. Generate a sequence called SEQ tmp by calculating the formula (1) N 0 + MN times, where N 0 is a user-defined variable. c. Generate a new sequence called S by skipping the first N 0 values of S temp . d. Calculate the key vector K by:

4) ROI CHANNEL DIFFUSION STEP
In this step, an ROI channel pixel's values are substituted with another value by applying XOR operation using the generated key vector K to generate the final encrypted ROI channel. The encryption steps have clarified that the encryption process is strong and could protect the privacy of videos Algorithm 1 The Proposed Encryption Process for the Detected ROI Input: Plain ROI P, block size b, a parameter of the logistic map λ, and the iterations number N 0 . Output: Encrypted ROI E 1) Separate the channels of P, so C 1 contains the red channel, C 2 contains the blue channel and C 3 contains the green channel. 2) Create an empty matrix E with a size equal to P to store the encrypted channels of the ROI. 3) Calculate the initial value of the logistic map by formula (2). 4) Execute the formula (1) N 0 + MN times, then skip N 0 element and store the result in a vector S.
End for 14) Change the pixel's positions of the blocks and sub-blocks by using a zigzag scan.

15)
Change the pixel's positions of the blocks and sub-blocks by Rotating 90 • .

17)
Change the positions of the blocks using the sequence R to obtain a permuted ROI Y .
E(:, :, j) = X (j). 22) End for against different attacks without affecting the normal use of video, as shown in figure 5. If the encryption quality is reduced as the naked eye has a relatively low resolution, the video privacy seems to be encrypted well to the naked eye. Still, as to attackers, the sensitive data may be exploited as the encryption quality is reduced. For example, the encryption complexity would be reduced if all ROIs were in the same video frame or all video frames were encrypted with the same key instead of one for every ROI. Still, if the attacker could decrypt one ROI, all ROIs in the video would be leaked. Algorithm 1 presents the encryption process in detail.

C. DECRYPTION PROCESS
In this process, the original surveillance video frames are retrieved from the encrypted one by inverting the encryption VOLUME 10, 2022 steps and obtaining the locations of the encrypted ROIS and keys used in the encryption process. The steps of the decryption process are: 1. Generate the scrambled ROI from the encrypted one by applying the XOR operation between the encrypted ROI and its key vector. 2. Reorder the positions of the blocks of the ROI channels using the generated random sequences. 3. Rotate the blocks and sub-blocks of the ROI channels by ninety degrees in the counter direction. 4. Apply an inverse zigzag pattern to the blocks and sub-blocks of the ROI channels to generate the plain ROI channels.

IV. SIMULATION RESULTS AND SECURITY ANALYSIS
This section uses different statistical tests and measurements to analyze the proposed work. Tested videos from [34] are used as surveillance videos. Table 1 shows the properties of the tested videos. MATLAB (R2021a) is used to execute the proposed work on a device that has Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz 2.21 GHz, 16 GB memory, and Windows 11 OS. The used parameters in the proposed algorithm are: b = 16, λ = 3.9, and N 0 = 1500.

A. VISUAL ANALYSIS
All detected faces in test videos are encrypted and decrypted to evaluate the proposed work from the visual inspection. All results are shown in figure 5. From the figure, it is clear that the proposed work can protect the privacy of the persons in the videos without any leaks. Also, the decryption process can retrieve the original videos successfully.

B. HISTOGRAM ANALYSIS
A histogram is an important tool for representing the frequency distributions of the intensities in a grayscale image.

C. INFORMATION ENTROPY ANALYSIS
The entropy is carried out to measure the unpredictability and randomness of the original ROIs and their corresponding encrypted ROIs of the test videos. The mathematical formula of the entropy is defined by: where X i represents the grey level value of the input ROI channel, and the probability of X i is p (X i ). The larger the ROI entropy is, the more randomness of the gray values are. The vidyo1 test video is used in this experiment.

D. CORRELATION ANALYSIS
Usually, there are strong correlations between neighboring pixels because they have similar values. Such relationships among pixels must be eliminated in any effective encryption algorithm. The correlation coefficient between adjacent pixels in the horizontal, vertical, and diagonal directions can be calculated to evaluate the correlation strength.  Mathematically, it is calculated by:  where u and v represent two adjacent pixel values, and s is the number of sampled pixel pairs. Five thousand pairs of neighboring pixels in vertical, horizontal, and diagonal directions are sampled from the color channels of the original ROIs and their corresponding encrypted ROIs of the vidyo1 test video. Figures 9-11 show the correlation distribution from the three directions for the left, middle and right ROIs in frame 312. Table 3 shows the results of the correlation coefficients of frame number 312. From these results, the coefficient values are close to one in the original ROIs, while the values are close to zero in the encrypted ROIs. Consequently, the proposed method has removed the correlation between pixels and can resist statistical attacks.

E. DIFFERENTIAL ATTACK ANALYSIS
In a differential attack, the adversary modifies a plain ROI p 1 by changing a one-bit pixel to get a modified plain ROI p 2 . Then, p 1 and p 2 are encrypted using the same key to get two encrypted ROIs I 1 and I 2 . The adversary then searches for the relationships between the plain and encrypted ROIs. The encryption algorithm must be sensitive to any small change in the original ROI. NPCR (Number of Pixels Changing Rate) and UACI (Unified Average Changing Intensity) are two criteria used to analyze this sensitivity, and they are identified by: D (i, j) × 100(%) VOLUME 10, 2022 where where I 1 and I 2 represent two encrypted ROIs. The ROI width and height are represented by W and H . The NPCR and UACI theoretical values are 99.6094% and 33.4635%, respectively. Two versions of the vidyo1 test video are used in this experiment. The first version is the plain video, and the second version is the plain video with a one-bit pixel change in each ROI. Both video versions are encrypted, and then the NPCR and UACI values are calculated. Table 4 shows the results of frame number 312. The results are near the expected values, and the developed method can withstand the differential attack.

F. PSNR, SSIM, AND FSIM ANALYSIS
In this section, peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and feature similarity (FSIM) values are calculated to assess the quality performance of the encryption and decryption processes. The PSNR is used to measure the ratio between the maximum value of a pixel to the mean square error (MSE) between the encrypted ROI and the original ROI and can be identified by: where, R 1 and R 2 are the original and the encrypted ROI. The encryption process is good if the value of PSNR between the original ROI and the corresponding encrypted one is small.    of original ROI, σ 2 e the variance of encrypted ROI, σ or represents the covariance of o and r, and c1 and c2 are constants. The results of SSIM values between the original and encrypted ROIs in frame number 250 for the vidyo1 test video are presented in table 5. Getting lower SSIM values between the original and encrypted ROIs is recommended to prove the encryption process's efficiency. From the table, the SSIM values are small, indicating the proposed encryption process equality is high. Also, the SSIM values between the original and decrypted ROIs in frame number 250 for the vidyo1 test video are presented in table 6. It is recommended to get higher SSIM values between the original and decrypted ROIs to prove the efficiency of the decryption process. From the table, the SSIM values are equal to 1, indicating the proposed decryption process equality is high.
FSIM calculates the local similarity between the original ROI and the corresponding encrypted one. It is calculated by: x∈ PC m (x) where S L (x) defines the total anticipated similarity between the two ROIs, denotes the spatial domain for the ROI, while PC m (x) represents the congruency phase value.
The results of FSIM values between the original and encrypted ROIs in frame number 250 for the vidyo1 test video are presented in table 5. It is recommended to get small FSIM values between the original and encrypted ROIs to prove the efficiency of the encryption process. From the table, the FSIM values are low, indicating the proposed encryption process equality is high. Also, the SSIM values between the original and decrypted ROIs in frame number 250 for the vidyo1 test video are presented in table 6. It is recommended to get higher FSIM values between the original and decrypted ROIS to prove the efficiency of the decryption process. From the table, the FSIM values are equal to 1, indicating the proposed decryption process equality is high.

G. EDGE DETECTION ANALYSIS
The proposed method should protect the information on the edges of the encrypted ROIs. The proposed method uses the edge differential ratio (EDR) metric to estimate the edge distortion. It is calculated as follows: where P (i, j) and P (i, j) denote the pixel values in the edges within the binary form of the original ROI and corresponding encrypted one, respectively. The value of EDR should be close to 1 to ensure the dissimilarity between the original ROI and the corresponding encrypted one. The EDR values between the original and corresponding encrypted ROIs in frame number 250 for the vidyo1 test video are presented in table 7. The values in the table are close to one; hence, the proposed method guarantees that the original and the corresponding encrypted ROIs are different. Also, the Laplacian of Gaussian edge detection for the original, corresponding encrypted, and decrypted ROIs in frame number 250 for the vidyo1 test video are displayed in figure 12. There is a big difference between the original and encrypted ROIs on the edges in the displayed results. So the proposed method can disappear the main details in the encrypted videos. Also, the edges in decrypted ROIs are similar to those in original ROIs, which ensures the efficiency of the proposed method in the decryption process.

H. KEYSPACE ANALYSIS
The keyspace used in the encryption of ROIs must be large enough to make the proposed work secure against brute-force attacks. The proposed work can resist brute-force attacks when the keyspace ≥2 100 . In this work, logistic map initial parameter X 0 , logistic map parameter λ, and iterations number parameter N 0 are used to generate a secret key.  The precision of b and λ is considered to be 10 16 , and the precision of N 0 is considered to be 10 3 . So 10 35 is the total keyspace.

I. KEY SENSITIVITY ANALYSIS
The attacker may use a key similar to an original key to break encrypted ROIs, so the proposed encryption algorithm must FIGURE 15. Gaussian noise. VOLUME 10, 2022 be sensitive to the secret key. To test the key sensitivity, the vidyo1 test video ROIs are encrypted using a key vector K 1 .
The initial values of the logistic map used to generate K 1 are saved in a vector X 0 . The encrypted video is decrypted twice. Once using the key vector K 1 and again with a key vector K 2 where the initial values of the logistic map used to generate K 2 are saved in a vector XX 0 where XX 0 = X 0 + 10 −10 . Figure 13 shows the results of this experiment. This figure shows that the encryption algorithm is sensitive to the secret key.

J. NOISE ATTACKS ANALYSIS
The encrypted video may be affected by noises during video transmission through different communication channels. The security method should be insensitive to such noises to recover the original video successfully. In this experiment, different types of noises are used to prove the efficiency of the proposed method.
Salt & Pepper Noise: This noise results in black and white dots on the affected regions. The salt and pepper noise is added to the encrypted ROIs for different test videos with a variance value of 0.005. Then the noisy ROIs are decrypted. Figure 14 shows the noisy encrypted ROIs and the corresponding decrypted ones for various test videos. It is clear from the figure that the decrypted ROIs are still intelligible, despite the effect of the noise. Consequently, the proposed method is robust to salt and pepper noise.
Gaussian Noise: the limitation of the sensor during the acquisition of the video under low-light conditions may cause such type of noise. The Gaussian noise is added to the encrypted ROIs for different test videos with a variance value of 0.005. Then the noisy ROIs are decrypted. Figure 15 shows the noisy encrypted ROIs and the corresponding decrypted ones for various test videos. It is clear from the figure that the decrypted ROIs are still intelligible, despite the effect of the noise. Consequently, the proposed method is robust to Gaussian noise.
Occlusion Noise: this type of noise may occur during the transmission of the encrypted video, and part of it has been dropped or lost in this experiment. The occlusion noise is performed on the encrypted ROIs for different test videos. Then the noisy ROIs are decrypted. Figure 16 shows the noisy encrypted ROIs and the corresponding decrypted ones for various test videos. It is clear from the figure that the decrypted ROIs are still intelligible, despite the effect of the noise. Consequently, the proposed method is robust to occlusion noise.

K. ENCRYPTION TIME ANALYSIS
The efficient security method should protect the privacy of surveillance videos with low processing time. In this experiment, various test videos are used to estimate the encryption time of the proposed method. The experiment is performed multiple times, and the average values are calculated. Table 8 presents the average encryption time for various test videos. From the table, the time of the encryption process is low, revealing the proposed method's power in protecting the videos, which will be stored on the cloud for future use or sent across in a small time. Additionally, the encryption time may be further reduced to apply the technique in real-time cases if the technique is implemented in parallel.

V. CONCLUSION
This paper proposed a practical method for surveillance video privacy protection based on block scrambling and face detection using YOLOv3. Multiple faces from a video frame can be detected and encrypted using a key for each detected face to increase security. The proposed method is reversible for displaying the faces to an authorized person. The proposed method performance was evaluated using visual analysis, histogram analysis, information entropy analysis, correlation analysis, differential attack, PSNR, SSIM, FSIM, edge detection analysis, keyspace analysis, key sensitivity, noise attacks analysis, and encryption time analysis. The results proved that the proposed mechanism could successfully detect and protect people's faces without any leaks, and the method could withstand potential attacks.