A Privacy-Preserving Image Retrieval Based on AC-Coefficients and Color Histograms in Cloud Environment

: Content based image retrieval (CBIR) techniques have been widely deployed in many applications for seeking the abundant information existed in images. Due to large amounts of storage and computational requirements of CBIR, outsourcing image search work to the cloud provider becomes a very attractive option for many owners with small devices. However, owing to the private content contained in images, directly outsourcing retrieval work to the cloud provider apparently bring about privacy problem, so the images should be protected carefully before outsourcing. This paper presents a secure retrieval scheme for the encrypted images in the YUV color space. With this scheme, the discrete cosine transform (DCT) is performed on the Y component. The resulting DC coefficients are encrypted with stream cipher technology and the resulting AC coefficients as well as other two color components are encrypted with value permutation and position scrambling. Then the image owner transmits the encrypted images to the cloud server. When receiving a query trapdoor form on query user, the server extracts AC-coefficients histogram from the encrypted Y component and extracts two color histograms from the other two color components. The similarity between query trapdoor and database image is measured by calculating the Manhattan distance of their respective histograms. Finally, the encrypted images closest to the query image are returned to the query user.


Introduction
With the rapid development of digital devices, a large number of images are generated and shared. Image data contain rich information and they have been explored in many fields, such as feature extraction [Hu, Wang, Wang et al. (2016)], information hiding [Cao, Zhou, Sun et al. (2018)] and image retrieval [Zhang, Liu, Dundar et al. (2015); Cavallaro, Lagendijk, Erkin et al. (2017)]. Searching an intended image from a huge dataset has raised increasing attention, and many advanced retrieval technologies have (2017)], where a data owner outsources text document to its server and is able to retrieve desired document with keyword search. Song et al. [Song, Wagner and Perrig (2000)] proposed the first retrieval scheme in data encryption, where the encryption and retrieval work is performed on a word-by-word basis, but this scheme is not efficient enough because of no-index mechanism. To improve efficiency, Curtmola et al. [Curtmola, Garay, Kamara et al. (2006)] constructed a secure reversed index for the encrypted text documents, where the computing cost is proportional to the amount of documents. They presented two encryption schemes, the first scheme is secure for the selective key attack, and the other scheme is secure for the dynamic selective key attack. Afterwards, the researchers improved the functions of encrypted algorithms for designing practical schemes, such as supporting fuzzy keyword search [Fu, Wu, Guan et al. (2017)], multikeyword sorted search [Fu, Sun, Linge et al. (2014)] and dynamic search [Xia, Wang, Sun et al. (2016)]. An exploring work of image retrieval in the encrypted domain has attracted much attention, which aims at executing retrieval work on the server while ensuring images to be handled secretly.  proposed the solutions to realize image retrieval in the encrypted domain for the first time. They introduced two secure indexing schemes, in which the information of word frequency distribution is protected by order preserving encryption and Min-hash. In another work ], they employed signal processing and cryptographic techniques to achieve secure distance calculation without divulging image content. They have compared three encryption algorithms, including plane randomization, random projection and randomized unary encoding. Their experimental results show that the first two algorithms support the ordered computation of Hamming distance and the third algorithm supports the approximate computation of L1 distance. Lu et al. [Lu, Varna and Wu (2014)] compared the homomorphic encryption with the proposed encryption scheme of work ] in terms of retrieval accuracy, retrieval efficiency and storage overhead. The experimental results show that the homomorphic encryption is not advantageous in these aspects. In Abdulsada et al. [Abdulsada, Ali, Abdulabbat et al. (2013);Yuan, Wang, Wang et al. (2014); Yuan, Yu and Guo (2015)], tree index and local sensitive hashing (LSH) method are used to reduce retrieval time. In Abdulsada et al. [Abdulsada, Ali, Hashim et al. (2013)], Abdulsada et al. established a searchable index by using LSH method. Images are protected by using the advanced encryption standard (AES) encryption method and image feature is protected by a reversible matrix. In order to improve the retrieval efficiency further, Yuan et al. [Yuan, Wang, Wang et al. (2014)] proposed to combine LSH and Cuckoo hash to get faster and more efficient similarity search. Image features are extracted by using the Bagof-words (BOW) model to generate visual word vectors. The image is protected by the encryption method based on attribute encryption, and its hash value is protected by a oneway function. In Yuan et al. [Yuan, Yu and Guo (2015)], secure k-nearest neighbors (kNN) algorithm is used to realize secure image retrieval and a tree index is constructed to improve search efficiency. In addition, Xia et al. [Xia, Zhu, Sun et al. (2013)] proposed a secure image retrieval scheme which uses invertible matrix to protect image feature vectors and achieves the order preserving calculation of Euclidean distance. Abduljabbar et al. [Abduljabbar, Jin, Ibrahim et al. (2017)] extracted local speeded up robust features (SURF) to represent image feature and AES technology is used to protect images in the database. The similarity between query image and database image is measured by calculating the Euclidean distance of their responding feature vectors. Although the previous outsourced CBIR schemes solve the privacy problem, the computational workload on user is still heavy, since it is image owner's task to deal with feature extraction and index generation that require numerous resources. Moreover, a large amount of computing resources and the problem of cipher expansion make homomorphic encryption impractical [Bellafqira, Coatrieux, Bouslimi et al. (2015); Zhang, Jung, Liu et al. (2017)]. A lot of work has been proposed to solve above problems. ] introduced an encrypted JPEG image retrieval scheme using block feature comparison. AC-coefficient histogram in a block is used to form a local feature descriptor, and AC coefficients as well as DC coefficients are encrypted with permutation encryption and stream encryption respectively. The similarity between query image and database image is measured by comparing the distance of their corresponding local features. Cheng et al. [Cheng, Zhang, Yu et al. (2015)] proposed a retrieval scheme based on Markov process. The scheme uses stream cipher to encrypt coded data, and extracts Markov features from the encrypted data directly. The similarity between query image and database image is measured by calculating the distance of their corresponding Markov features. Bernardo et al. [Ferreira, Rodrigues, Leitao et al. (2017)] proposed an encrypted content-based image retrieval scheme. In this scheme, color information is protected by random permutation encryption and the rows and columns are disorganized to preserve texture information of images. Liu et al. [Liu, Shen, Xia et al. (2017)] proposed an image retrieval scheme based on difference histogram. In their scheme, two kinds of difference matrices (order difference and random order difference) are proposed, and the value replacement and location scrambling are utilized to encrypt the difference matrix. The difference histogram is extracted as image features by the server. In the above schemes, the cloud server undertakes the workload of feature extraction and image search, so the image owners only need to encrypt images. Inspired by those outsourced CBIR schemes, we propose a secure retrieval scheme with the combination of AC-coefficients and color histograms in the encrypted domain. The retrieval result shows that the combination of them outperforms individual utilization, and our retrieval accuracy is six percent higher than the proposed scheme in Liu et al. [Liu, Shen, Xia et al. (2017)].

System model
Our scheme mainly consists of three entities, i.e. the image owner, query user and cloud server, as shown in Fig. 1. The following are the assigned tasks of these three entities. Image owner: The image owner possesses a huge image dataset including large numbers of images, and the dataset is denoted as ℐ = { } =1 with a corresponding identity number set ℐ = { } =1 . The image owner outsources the images to the server for cost saving and flexible utilization, and the outsourced images are encrypted to prevent the disclosure of privacy. The generated encrypted image set is denoted as = { } =1 . The image owner only needs to encrypt images and upload encrypted images to the server.

Cloud server:
The server stores the encrypted images from the image owner and undertakes the task of index generation and image search. Once receiving a search request, the server finds the most similar encrypted images from the encrypted image database and returns them to the query user. Query user: The query user wants to search the intended image from the encrypted image database. In order to protect the query image, the query user encrypts the query image to generate a query trapdoor and transmits the trapdoor to the server. The encryption process of query image is consistent with the database images'.

Security models and assumptions
As pointed out in the previous works [Kuzu, Islam and Kantarcioglu (2012); Ferreira, Rodrigues, Leitao (2017)], we believe that the cloud server is an honest but curious one. In other words, the server can accomplish tasks in accordance with the protocol, but it may analyze and speculate about the image data. In our scheme, the query users are believed to be trustworthy, so they will not disclose any private information of the images to the server during their communication process. The images to which the cloud has access are the encrypted ones, and the security strength of the encrypted images will be discussed in the Section 5.

The proposed scheme 4.1 Discrete cosine transform (DCT)
Discrete cosine transform is an efficient transform which presents the texture information of an image in frequency domain. Typically, DCT is performed for each 8 × 8 size subblock. For an image with the size of M × N, the value ( , ) represents the pixel value at the position ( , ) in the original image block. The transformation process can be formulated as: , and = 0,1, ⋯ , − 1, = 0,1, ⋯ , − 1. The resulting ( , ) denotes DC coefficient when = 0 and = 0, and the resulting ( , ) denotes AC coefficient when ≠ 0 and ≠ 0.
Each DCT sub-block consists of one DC coefficient and 63 AC coefficients. The DC coefficient is the average energy value of sub-block and contains main content of subblock. The remaining AC coefficients can be divided into three different categories according to their frequency. Most of energy in each block is concentrated in low frequency and middle frequency coefficients (i.e. in the upper left corner of the block), and most of high frequency coefficients in the lower right corner are equal to zero. The previous study [Fang, Cheng, Lin et al. (2012)] shows that AC coefficients can represent texture information of sub-block. The DC coefficients contain important image information, i.e., its histogram is important statistical information. In the proposed scheme, we do not utilize DC coefficient for image retrieval and encrypt them with the stream encryption technology. However, the AC coefficients contain rich edge and texture information of the image. We extract features from AC coefficients after encryption.

Quantization and truncation operation
The dynamic range of AC coefficients is one important factor. Our experiment with all database images shows that AC coefficients of Y component can vary in a large range [−682, 683]. In order to achieve a certain level of efficiency, we need to truncate the coefficients into a limited range. Accordingly, truncation causes unrecoverable quality degradation and it certainly is a tradeoff between efficiency and quality. In fact, truncation does not influence image recovery much, which will be discussed in Section 6. In the process of AC coefficients encryption, AC coefficients are firstly quantized as: (2) where Q is a quantization factor and equal to 1, the symbol ( , ) denotes one AC coefficient value which locates at ( , ) in an image block of Y component. Quantization causes image information loss, however, the main content of image is trivially influenced.

Figure 2: The distribution of AC coefficients in Y component
Although AC coefficients vary in a large range, most of them locate around zero value, as is shown in Fig. 2. From the statistical histogram of AC coefficients, we can observe that AC coefficients are mostly within the range of [−20, 20]. For efficiency, we truncate AC coefficients in a small interval [− , ] without losing a lot of useful information. Tab. 1 presents the percentages of AC coefficients in different ranges, which shows that more than 90% of AC coefficients are contained in a truncated range especially when is greater than 9. Therefore, the boundary coefficient is chosen to be greater than 9 in the following experiment. The truncation operation is defined as:

Image encryption
For encryption, we divide image data into three classes, i.e. DC coefficients in Y component, AC coefficients in Y component, and U and V color components. The encryption process is shown in Fig. 3. Above all, the discrete cosine transform is performed on the Y component, and the AC coefficients are quantized and truncated. Then, the image owner encrypts AC coefficients with value replacement and position scrambling and encrypts DC coefficients with stream cipher. The U and V color components are encrypted with the same encryption method but different keys as the encryption method of AC coefficients. We denote all keys in the whole encryption process as the symbol , and = { , , { * } * ∈{ , } , { * } * ∈{ , , } }.
where ′ denotes the encrypted DC coefficients and the private key = ( 1 , 2 , ⋯ , ) has the same length as .

Encryption of AC coefficients and U and V components
The private key is used to encrypt AC coefficients ( , ) in the Y component.
The private key { * } * ∈{ , } is used to encrypt color values in the U and V components. The value of the private key is in the range of [0, 255]. We denote one original pixel value as and the corresponding encrypted pixel value as ′ . The encryption process is described as: The private key { * } * ∈{ , , } is used to scramble the position of AC coefficients and color values. The value of the private key is in the range of [1, ], where the is the size of the image in the database. After above two encryption steps, the image owner gets the encrypted image set = { } =1 and sends it together with ℐ = { } =1 to the server. It is worth noting that the frequency information of each value in the image is not changed after encryption.

Index generation and search operation
Once receiving the encrypted image set, the cloud server extracts feature vectors from the encrypted images. In our proposed scheme, the server provides index construction and image query services, which greatly reduces computational burden of image owner. The following describes the index generation and search process.

Index generation
In the YUV color space, U and V channels represent Chrominance information and their AC coefficients provide texture information of a few. Therefore, two different histograms are considered in our scheme, which include the AC-coefficients histogram extracted from Y component and two color histograms extracted from U and V channels. These three histograms are combined into one feature vector to represent the image. First, AC-coefficients histogram is extracted from Y channel and the range of AC coefficients is [− , ] as mentioned in Section 4.2. We denote AC-coefficients histogram as and its length as = 2 + 1 . Then, the color histogram features are extracted from the encrypted U and V channels and the range of the pixel values is [0, 255]. We denote color histograms as * { * ∈( , )} and its length as { * } * ∈( , ) = 256. Finally, the final encrypted feature vector can be expressed as = { , , } = ( 1 , ⋯ , , ⋯ , ) and its length is expressed as = + + , where ∈ (1,2, ⋯ , ) and is the total number of images in the database. In order to achieve better performance of following retrieval work, the cloud server establishes oneto-one mapping relation between the encrypted database image and their corresponding feature vector as shown in Tab. 2.

Search operation
Before sending a query image to the server, the query user needs to encrypt the query image as the image owner does to generate a query trapdoor . Once a query trapdoor is received from one query user, the cloud extracts a query image feature from the trapdoor, which is denoted as = ( 1 , ⋯ , , ⋯ , ) . The similarity between the encrypted database image and the query image is measured by calculating the Manhattan distance between their respective feature vectors. Finally, the server returns the most similar images to the query user. The Manhattan distance is calculated as:

Security analysis
It is well known that the cloud server is an honest-but-curious model. In addition to executing and completing our designated tasks, the cloud server may maliciously analyze and count the uploaded image data. The security strength of our scheme is analyzed under the cipher-text-only attack (COA) and brute-force attack. We summarize the functionality ℱ and corresponding leaked information of proposed scheme under COA model in Fig. 4. In the real environment, the interaction in our scheme involves three kinds of participants, including the image client, cloud server and query user. The honest-but-curious cloud server is considered as a potential attacker in our scheme. In the ideal environment, we define a simulator that can simulate the information leakages from the view of attacker by using the functionality ℱ . Our proposed scheme can be proved secure if the difference of the two environments can be ignored. For security analysis, we expose three kinds of information to the cloud server (i.e. the encrypted images, features and query images). Therefore, the security analysis is performed based on these three aspects. On the basics of the existing encryption algorithms, we can conceive that it is computationally difficult for the server to gain plaintext images without knowing the private keys.
The functionality ℱ of proposed scheme as well as leaked information. 1. ℱ.StoreImg(ℐ, ℐ , ):  Functionality. Image client encrypts all database images in ℐ, and then sends the encrypted image set and corresponding identity set ℐ to the cloud server.  Storage leakage. The leaked information includes , ℐ , each image size and the total number of images. 2. ℱ.Feature ( ):  Functionality. Cloud server extracts AC-coefficients and color histograms from each encrypted image in as an image feature vector.  Feature leakage. The leaked information includes the encrypted feature vectors, and the similarities between the feature vector and the frequent distribution information of the histogram. 3. ℱ.QueryImg( ):  Functionality. Query user encrypts query image and sends the encrypted query image as a query trapdoor to the cloud server. After completing retrieval work, the server returns the encrypted images closest to the query trapdoor.  Query leakage. The leaked information includes the query trapdoor and similarity between the database image and query trapdoor.

Security of image content
The simulator knows the size of image database and each image size in it. Therefore, the simulator can simulate a fictitious image database ℐ which resembles the real image database ℐ. However, the simulator can only rig the whole image up through many different random sequences. As described in Section 4, the discrete cosine transform is performed in the images, and the resulting DC coefficient is encrypted by the stream cipher technology. Therefore, if the simulator wants to obtain the original image, it needs to solve some random sequences. The simulator needs to solve random sequences for discrete cosine transform and 2 random sequences for stream cipher technology. The color value and position information is protected by value permutation and position scrambling respectively, and different keys are used in three components. Therefore, the simulator needs to solve (2 + 1)! × (256!) 2 random sequences for value permutation encryption. The security strength of the position information depends on the size of image, so the security strength of the position information is ( !) 3 . The key space for database image can be expressed as , = 2 × × (2 + 1)! × (256!) 2 × ( !) 3 .

Security of image features
The image features in proposed scheme are the combination of AC-coefficients histograms in the Y component and color histograms in the U and V component. The simulator can extract the histograms from the fictitious image database ℐ similarly, and the security strength of image feature depends on the value permutation. The key space for image features can be expressed as (2 + 1)! × (256!) 2 .

Security of trapdoor
The encryption method of the query image and database images are same, so the security strength of encrypted image is similar to the encrypted database image. The key space for query image is equal to . Apparently, the key space is large enough to withstand the brute-force attack.

Experimental results
The scheme is implemented by MatLabR2012b and the experiments are performed on the computer with the Intel Pentium CPU 3.3 GHz and 4 GB memory. The database we use is INRIA Holidays database [Chen and Shi (2008)]. The database is composed of 1491 images which are divided into 500 categories, and the first image in each category is used as a query image. To examine whether the truncation and quantization processes affect the image recovery process or not, we calculate the peak signal to noise ratio (PSNR) values between the original image and recovery image. The PSNR values in Tab. 3 show that the truncation and quantization operation is acceptable and the encrypted images can be restored after decryption.

Retrieval accuracy
The mean average precision (mAP) value is used to measure retrieval performance of the proposed scheme. For mAP calculation, the retrieval precision is defined as the number of retrieved relevant images divided by the total number of retrieved images. Similarly, the recall rate is defined as the number of retrieved relevant images divided by the total number of relevant images in the image database. Finally, the mAP value is the mean value between the precision and the recall rate, which solves the single limitation problem. We use an evaluation package of Inria Holidays Database in python environment to figure out the mAP of our scheme.

Retrieval accuracy in different AC-coefficient ranges
AC coefficients in different intervals are selected to perform experiment as shown in Tab. 4. The best retrieval accuracy is 52.938 when the range of AC coefficient is [−60, 60].
We can also observe that the different intervals have no great influence on the retrieval accuracy. The reason is that the coefficient intervals chosen to contain more than 90% of AC coefficients and they can represent most of image information.

Retrieval accuracies of our schemes
We have also calculated the retrieval accuracies for three different types of feature set and the results are shown in Tab. 6. The first feature set is denoted as AC_YUV. We extract AC-coefficients histograms from YUV color components. Furthermore, we give the retrieval accuracy of AC-coefficients histogram extracted from each component, and those are denoted as AC_Y, AC_U and AC_V . The second feature set is denoted as color_YUV, where color histograms are extracted from YUV color components. The third feature set is denoted as ACCH, where AC-coefficients histogram is extracted from Y component and color histograms are extracted from the other two color components. Finally, we summarize some observations from above experiments. Tab. 6 shows that the retrieval accuracy of Y component (AC_Y) is better than other components (AC_U, AC_V) and is close to three components together (AC_YUV). The retrieval accuracy of the ACcoefficients and color histograms (ACCH) combination is better than color histogram only (color_YUV) or AC-coefficients histogram only (AC_YUV). Obviously, the Y component contains texture information more, and the U and V components contain color information more. This is why the retrieval accuracy of the third set (ACCH) is higher than the first two sets' (AC_YUV and color_YUV).

Retrieval efficiency
Efficiency is an important indicator to measure the usability of proposed scheme. In our scheme, the retrieval efficiency contains the time consumption of image encryption, index construction and image searching. The time consumption for image encryption contains value permutation and position scrambling. The time for index construction contains feature extraction and indexing. When receiving an encrypted query image, the server searches the index for similar images. A liner index is built in our scheme, so the search time is relevant to the length of feature vectors. Tab. 7 lists the time consumption of three mentioned experiments.

Conclusion
This paper proposes a secure image retrieval scheme with the combination of the ACcoefficients and color histograms. The proposed scheme consists of three operations: i.e. image encryption, index construction and image search. Except the image encryption operation, other operations are outsourced to the cloud server. The images are protected by using stream encryption technology, values replacement and position scrambling encryption algorithms. The security strength of exposed information is computationally analyzed. We further examined and compared the retrieval accuracy of three feature sets. The comparison result shows that the combination of AC-coefficients and color histograms achieves the highest retrieval accuracy. In the future work, we will consider the combination of color feature from DC coefficient and texture feature from AC coefficients to achieve higher retrieval accuracy.