Deep learning for content-based image retrieval in FHE algorithms

Sura Mahmood Abdullah; Mustafa Musa Jaber

doi:10.1515/jisys-2022-0222

Open Access Published by De Gruyter February 15, 2023

Deep learning for content-based image retrieval in FHE algorithms

Sura Mahmood Abdullah and Mustafa Musa Jaber

From the journal Journal of Intelligent Systems

https://doi.org/10.1515/jisys-2022-0222

Abstract

Content-based image retrieval (CBIR) is a technique used to retrieve image from an image database. However, the CBIR process suffers from less accuracy to retrieve many images from an extensive image database and prove the privacy of images. The aim of this article is to address the issues of accuracy utilizing deep learning techniques such as the CNN method. Also, it provides the necessary privacy for images using fully homomorphic encryption methods by Cheon–Kim–Kim–Song (CKKS). The system has been proposed, namely RCNN_CKKS, which includes two parts. The first part (offline processing) extracts automated high-level features based on a flatting layer in a convolutional neural network (CNN) and then stores these features in a new dataset. In the second part (online processing), the client sends the encrypted image to the server, which depends on the CNN model trained to extract features of the sent image. Next, the extracted features are compared with the stored features using a Hamming distance method to retrieve all similar images. Finally, the server encrypts all retrieved images and sends them to the client. Deep-learning results on plain images were 97.87% for classification and 98.94% for retriever images. At the same time, the NIST test was used to check the security of CKKS when applied to Canadian Institute for Advanced Research (CIFAR-10) dataset. Through these results, researchers conclude that deep learning is an effective method for image retrieval and that a CKKS method is appropriate for image privacy protection.

Keywords: CKKS; CNN; CBIR; HE; CIFAR-10; Random Forest

1 Introduction

Given the exponential growth of digital images on digital resources, content-based image retrieval (CBIR) is the most realistic method for meaningful image searching. The metadata of the image is used in the majority of traditional image searches. Text query methods are used to retrieve this metadata. However, in order to obtain more relevant photos, all images must contain accurate information. One of the factors affecting the precision of searches is the selection of the appropriate metadata for an image. Consequently, question by image is a preferable choice. Numerous research studies have found that the CBIR system is used for image-processing queries. A sample image is supplied into the CBIR system, which then uses low-level features to search a vast collection of images for matching pictures [1]. High-level and low-level features are distinguished in CBIR when it comes to picture retrieval. The use of single color, shape, and texture elements at first produced good retrieval outcomes since a variety of visual attributes were available. Additionally, machine learning methods are employed, which are highly effective at automatically extracting basic data from photos [2]. The model can compute an infinite computation over cipher text and decrypt it over plain text thanks to fully homomorphic encryption (FHE). Equation (1) shows a function f(function) that converts plaintext to cipher text by performing arithmetic operations on it, such as addition and multiplication [3].

(1) f ( E ( m 1 ) ) = E ( f ( m 1 ) ) ̇ .

The mathematical field of cryptography makes use of advanced algorithms to safeguard the secrecy of data while it is being exchanged and stored. Since the fundamental Diffie Hellman essay was published in 1976, numerous new public-key cryptosystems have been created. Most of them are based on two difficult mathematical puzzles: the discrete logarithm problem and the factorization problem (e.g., RSA, ElGamal cryptosystem, ECC, and many others). These cryptosystems have a reputation for being expensive and slow, even though the fact that they are extremely secure and have a large amount of key space [4,5]. In 2008 [6], Armknecht and Sadeghi created a method of cryptography that is algebraically homomorphic, while in 2009 [7], Gentry expanded on earlier research on fully homomorphic schemes. Plantard et al. [8] designed FHE using hidden ideal lattice. Chung et al. proposed a homomorphic comparison for point numbers with user-controllable precision and its applications [9]. Pedrouzo-Ulloa et al. revisited multivariate ring learning with errors and its applications on lattice-based cryptography [10]. Figure 1 shows how totally homomorphic cryptosystem classification.

Figure 1

HE classification.

The Cheon–Kim–Kim–Song (CKKS) homomorphic encryption (HE) method is state-of-the-art in approximation homomorphic computations for real and complex integers. The CKKS system has already been applicable in real-world settings, such as in machine learning [11].

Traditional instructional strategies rely on elements that were manually built. Deep learning, one of the most effective machine learning techniques, might be able to abolish this dependence. Inference, which employs the model for analysis such as classification or prediction, and training, which seeks to improve the model’s accuracy, are the two phases of deep learning (see the next section for more details). Recent years have seen the usage of deep learning across a wide range of sectors, including big data analytics and applications like pattern identification, speech recognition, and computer vision. Deep learning creates privacy issues, especially when utilized in conjunction with sophisticated cloud services and in a collaborative setting [12]. However, when using these setups, privacy risks could affect both users and the server. Privacy concerns arise when the focus shifts from data privacy to model privacy as a result of an attacker who has complete knowledge of the training process having access to model parameters [13]. In the case of collaborative learning, it is also important to take into account participant leakage of private information. Consideration should also be given to sensitive data leaks between users and external infrastructure, particularly via the Internet [12].

By means of linked research [14,15,16,17,18,19], the researchers came to the conclusion that there are two important CBIR-related concerns. First, user privacy and boost data confidentiality are maintained while balancing security and effectiveness. Lightweight encryption-based strategies, such as permutation or replacement, are effective but insecure, whereas heavyweight encryption-based strategies are secure but need expensive computing. Second, there is a trade-off between retrieval efficiency and retrieval accuracy. Low-level features like color, texture, and form are often used in image retrieval techniques; however, because of the “semantic gap” between visual features and the range of human semantics, retrieval accuracy typically serves the practical applications [20]. Trademark recognition and retrieval is a vital appliance component of CBIR. Reduction in the semantic gap, attaining more accuracy, reduction in computation complexity, and hence in execution time are the major challenges in designing and developing a trademark retrieval system. Ref. [21] uses an HE that devises the security scheme based on fully homomorphism encryption, which encrypts and decrypts data locally and involves computation on encrypted data rather than decrypting it into its original form, whereas all applications have deadline and mobility constraints. Method since investigations has shown that it is a highly effective algorithm for protecting data privacy. HE has become a well-liked and effective cryptographic technique for use in cloud computing applications. Additionally, it offers security analysis for all known attacks in relation to message expansion and holomorphic operations.

Still, in order to address this issue, the researcher split the algorithmic burden between the client and the server because it has a significant cost during the encryption/decryption operations. The balance between accuracy and efficiency, obtained by integrating deep learning methods with Random Forest (RF) to narrow down and select the relevant features, is another subject covered in this study. As a result, processing time was cut in half while accuracy increased. Establishing a safe AI model to balance picture retrieval accuracy, speed up retrieval processes, and protect the privacy of data sent via an insecure medium is the driving force behind the proposed study. The motivation of the proposed work consists of building a secure AI model to balance the accuracy of image retrieval, reduce the processing time of retrieval operations, and preserve the privacy of data transmitted through an insecure medium.

The following are the major contributions to this article:

Trade-off between security and accuracy of the CBIR.
Proposing an effective method for image retrieval based on deep learning techniques.
Encrypting the retrieved and sent images using FHE-CKKS algorithms to ensure their security.
Improving the classification CNN’s accuracy by training it on augmentation images.
Using the RF method to select the best features that increase accuracy and decrease retrieval time

This article is organized as follows. In Section 2, the researchers give related work in HE, image retrieval, and deep learning. In Section 3, the researchers explain in brief materials and methods used in this article. The researchers present their proposal in Section 4. Experimental results and performance analysis are given in Section 5. A conclusion is given in Section 6.

2 Literature review

A large amount of research has been published in the fields of CBIR and HE.

In the study by Das and Namasudra [16], high-level image features were extracted using convolutional neural networks (CNNs) and dimension reduction. In addition, feature dimensions that were overly big and strongly associated were reduced using multiline core component analysis. After feature reduction, features are binary encoded for efficiency.

Manisha and Raman [17] proposed a method that uses local neighborhood difference pattern for local characteristics. Using this technique, each pixel in an image is converted into a binary pattern based on the pixels around it. As a result, local pixel intensity is used to extract information by both LB.

Selvam and Kannan [18] proposed a technique for integrating the genetic algorithm and the HARP aggregation algorithm to improve system retrieval accuracy while consuming less processing time and to recover the pertinent picture and potential resolution using CBIR.

In their research for image retrieval, Kuo et al. [22] introduced deep CNNs. High-level picture feature extraction is achieved by using DEEP learning to train the weights of an NN.

CNN was suggested by Huang et al. [23] as a combination of ensemble models for image retrieval. To extract picture features, this image classifier combines Network in Network, a very efficient deep learning network, with AlexNet. For image retrieval, it calculates weighted average feature vectors.

An effective CBIR system that can retrieve accurate images semantically was created by Khan et al. [24]. For this, they suggested a hybrid feature description made up of color and texture features.

Ali and Mohammed’s [25] paper is dependent on two retrieval mechanisms. They used statistical features after first extracting visual features using the histogram (mean, standard deviation). In this situation, the association between a variety of photographs is investigated using the T-test.

Challa and Gunta [26] proposed a modified Reed-Muller Code-based symmetric key with completely HE with (MOD 2) unlimited addition and multiplication operations. Mathematical analysis and degree of difficulty are provided in the demonstration of security. Additionally, it examines the security of homomorphic operations and message expansion against all known flaws and threats.

Syed et al. [27] recommended using HE, which permits the training of both deep learning and conventional machine learning models while retaining data security and privacy. Applications for the smart grid, like load forecasting and fault localization, are evaluating the proposed methodology. The classification accuracy of the proposed privacy-preserving deep learning model using HE is comparable with the model’s classification accuracy on plain data, according to the fault localization results.

Lou and Jiang [28] proposed a shift-accumulation-based LHE-enabled deep neural network that can handle encrypted data. The binary operation-friendly Leveled Fast Homomorphic Encryption over Torus (LTFHE) encryption technique is used to create ReLU activations and max poolings. They accelerate inferences by using less expensive LTFHE shifts as opposed to pricey LTFHE multiplications.

Obla et al. [29] proposed a methodical strategy to produce activation jobs for CNNs that are friendly to higher education. They started by analyzing widely used functions like Sigmoid and linear correcting units (ReLU) to identify the characteristics of an effective activation function that improves performance. The best range of approximation for polynomial activation is then determined by comparing the various polynomial approximation techniques. They also suggested a brand-new weighted polynomial approximation technique for dividing the output of the batch adjustment layer. Finally, they used a number of datasets, such as MNIST, FMNIST, and CIFAR-10, to demonstrate the effectiveness of their approach.

Kwabena et al. [30] created a new framework called MSCryptoNet that makes it possible to execute, convert, and scale MSCryptoNet models while protecting privacy. In HE systems, the activation functions of sigmoid and rectified linear units are approximated by low-degree polynomials.

Clet et al. [31] exhaustively examined the training phase of feed-forward neural networks that have been successfully completed on the MNIST dataset for the three most popular homomorphic cryptosystems, BFV, CKKS, and TFHE.

3 Deep learning

In order to build a model that links inputs to outputs, deep learning tries to extract complex features from high-dimensional data (such as classes). Usually constructed as multi-layered networks, deep learning architectures calculate higher-level properties as nonlinear functions of lower-level data. Layer neural networks are the most typical form of deep learning architecture [32]. The structure of deep learning is represented by many levels. This section will present these layers and how they affect HEs.

3.1 CNN

The convolution layer that makes up CNN, which is frequently used for image classification, has the job of learning the characteristics obtained from the dataset. The dot product multiplication between neighborhood values will be done by the convolutional layer, which is N × N in size. The convolutional layer therefore only has addition and multiplication operations. This layer does not need to be altered because HE data can be utilized with it [33].

3.2 Activation layer

The output of the convolution layer is subjected to a mathematical formulation by the activation layer, a non-linear feature. When used to evaluate HE data, these tasks’ difficulty rises dramatically due to their non-linear nature. Designers must therefore create a replacement element that merely needs multiplication and addition [29].

3.3 Pooling layer

The objective of this sample layer is to reduce the amount of data. There are various types of pooling, including maximum and average pooling, mean pooling, and more. Average pooling, which determines the number of values using two operations that are permitted in HE, is a solution that is utilized in HE instead of the max-pooling option [28].

3.4 Fully connected layer and dropout layer

Since every neuron is linked to a neuron in the previous layer, it is referred to as a “Fully Connected Layer.” In this layer, there is simply a dot product operation, which consists of addition and multiplication operations. It can therefore be used with cipher text [34] and Dropout Layer. The purpose was to eliminate overfitting. When applying a machine learning model for classification, researchers frequently obtain good results, which may indicate bias in the training data [35].

4 HE

Different strategies, like differential privacy and HE, are used to safeguard privacy. With HE encryption, many types of calculations can be made on ciphertexts to generate an encrypted result. HE is broken down into two groups [36]. Partially Homomorphic Encryption offers only the addition or multiplication of one encrypted data processing option. Slightly Homomorphic Encryption (SWHE) offers multiple processes, like addition and multiplication, but the total number of operations is constrained. FHE offers numerous methods for multiplication and addition without putting a cap on the number of functions.

4.1 Four stages of HE schemes [37]

Generation of keys (KeyGen): In this stage, security parameters are generated. In an asymmetric type, a single key is generated, while in an asymmetric type, a pair of secret and public keys is generated.
Encryption process (Enc): This stage encrypts the plaintext inputs message, m ϵ M, with the encryption key. The ciphertext is generated by c = Enc(m), where c ∊ C, C is the ciphertext space.
Decryption process (Dec): In this stage, the original message is recovered by decrypting ciphertext c using the decryption key ((c) = m).
The Evaluation Algorithm (Eval): This stage performs the evaluations of the ciphertexts (c1, c2), (c1, c2) = Eval {(m1, m2)}, without revealing the messages (m1, m2).

5 CKKS HE scheme

The CKKS scheme is a leveled HE technique whose security focuses on the difficulty of the RLWE problem. CKKS, in contrast to other HE systems, supports accurate approximation computation for both real and complex values. Decryption noise is interpreted by the CKKS method as an error in the estimation of real values. Algorithms like machine learning, where the majority of computations are approximate, benefit greatly from it. The CKKS scheme is transformed into an FHE scheme by the application of the bootstrapping mechanism as indicated in paragraph [38].

6 Proposed system

The proposal system consists of two parts: online and offline. The offline part of this protocol is implemented on the server, whereas the online phase is implemented on both the server and the client. Generation describes the offline stage. The CNN model phase also comprises the training phases, which are composed of three procedures carried out on plaintext training data to create a classifier, which is then applied to plaintext testing data to create a trained model. As depicted in Figure 2, the interactive phase entails four processes on the client side and eight phases on the server side: sending them to the client’s side in the second step, receiving the encrypted image in the third step and decrypting the image in the fourth, concluding the decrypted image based on the trainer’s model in the fifth step, and delivering the decrypted image in the sixth step. Figure 2 illustrates the proposal model.

Figure 2

Offline and Online Phases RCNN_CKKS protocol.

6.1 Materials and methods

Various images are included in the CIFAR-10 dataset, which is used to train computer vision and machine learning algorithms. Consequently, this dataset is being used more and more in machine learning research. The 60,000 images in the CIFAR-10 dataset, all of which are (32 × 32) pixels in size and are in the PNG format, are divided into 10 separate classes as shown in Figure 3. A plane, a car, a bird, a cat, a deer, a dog, a frog, a horse, a ship, a truck, and other things are examples of classes. The CIFAR-10 dataset was utilized to collect a total of 50,000 training images and 10,000 test images, with the first set being used for training and the latter for assessing and testing the proposed model. There are precisely 1,000 photos from each class in the test batch that are randomly picked. Randomly arranged 5,000 images from each class are included in each training batch.

Figure 3

CIFAR-10 dataset.

6.2 Offline phase

The two main stages of this section are the first one, which contains a trained model (M), and the second, which employs the trained model to extract the features for each image in the training dataset.

6.3 Generation CNN model phase

The classifying model must be created during several numbers of steps in this phase. The next part explains the procedures employed in this section. Figure 3 shows the main steps for obtaining the trainer model. CIFAR-10 (Canadian Institute for Advanced Research), a collection of photographs used to train machine learning, is one of the image datasets utilized in this article.

Learning research is CIFAR-10. It has 60,000 images in 10 different categories, including truck, airplane, car, cat, deer, bird, cat, dog, frog, horse, ship, and PNG format with 32 × 32 color resolution. The training dataset and the test dataset are the two components of this dataset. Eighty percent of the whole dataset is made up of the 50,000 photos that make up the training dataset for the network. The test data set, on the other hand, represents 20% of the total dataset and consists of 10,000 photos for network testing. The proposal system makes use of ten helpful enhancements, including rotating, shifting horizontally, shifting vertically, and flipping. Bilinear interpolation is used to create 50,000 new images by applying the rotation procedure at an angle of 15 degrees to the original training photos.

Using horizontal shift augmentation, a picture can be moved horizontally while maintaining its original proportions. Vertical shift augmentation is the process of vertically shifting every pixel while keeping the image’s size. A binary number with local binary pattern is produced by thresholding the area around each pixel. There are other picture augmentation techniques that create new images using Gaussian blurring that are currently being developed. Additionally, two noise-generating methods are used. These are salt-and-pepper noise and Gaussian noise. As a result, a total of 450,000 training photos are created. The proposed framework for the proposed CNN contains ten layers as:

Convolutional layer 1: In this layer, a 3 × 3 filter slides over the 32 × 32 image to create a feature map that summarizes the presence of detected features in the original input image.
Convolutional layer 2: This layer repeats the steps of the convolutional layer 1 on the 32 × 32 image with filter 2 and depth 32.
Max Pooling layer 1: After applying two convolutional and ReLU layers, the max pooling layer is applied to extract maximum features from the image using a filter with size 2 × 2 and then uses dropout with 20% to prevent the network from overfitting.
Convolutional layer 3: This layer repeats the steps of the convolutional layer 2 on the 16 × 16 image with filter 3 and depth 64, to obtain robust features where minimum features cancel in max pooling layer3.
Convolutional layer 4: This layer repeats the same steps of the convolutional layer3 on the 16 × 16 image with filter 4 and depth 64.
Max Pooling layer 2: The max pooling layer is applied after convolutional layer 4 and extracts the maximum features from the image using a filter with size 2 × 2 and then uses dropout with 30% to prevent the network from overfitting.
Convolutional layer 5: This layer repeats the steps of the convolutional layer 4 on the 8 × 8 image with filter 5 and depth 128.
Convolutional layer 6: This layer repeats the steps of the convolutional layer 5 on the 8 × 8 image with filter 6 and depth 128.
Max Pooling layer 3: The max pooling layer is applied after convolutional layer 8 and extracts the maximum features from the image using a filter with size 2 × 2. After max pooling layer 3, the image becomes 4 × 4. It uses dropout with 40% to prevent the network from overfitting.
Flatten layer: The results from the convolutional and max pooling layers produce higher-level features of the input image.
Fully connected layer 1: Single feature vectors from the flatten layer become input nodes for the full connection layer. The classifier’s purpose is to assign class labels to images based on the training dataset.

Three key algorithms make up the proposed CNN system: the CKKS method for encryption, the CNN algorithm for high-level feature extraction and prediction, and the RF approach for feature selection. All of these algorithms have a collection of variables that improve the effectiveness of the suggested system shown in Tables 1–3.

Table 1

Parameters in CNN algorithm

Para. name	Para. value	Description
Kernel	3 × 3	Features that were taken from photos have been used with this filter
Padding	Same	The number of pixels that are added to an image during processing by the CNN kernel is referred to as padding
Dropout	0.2, 2.3, 0.4	During training, specific neurons are dropped out of evaluation. Therefore, throughout the network, this study employed three distinct values
Pooling (maxpooling)	2 × 2	The max-pooling has been used
Optimizers (RMS prop)	Learning rate = 0.02	Optimizers can change weights and the learning rate of your neural network to minimize losses. In this article, the RMS prop method has been used
Loss function	Categorical cross-entropy
Epochs	100	The cifar-10 dataset is processed through the neural network 100 times
Batch size	100	In a single batch, there are 100 images for training purposes
Weight decay	0.0001

Table 2

Parameters in CKKS algorithm

Para. name	Para. value	Description
Polynomial degree	8 bits	All of these parameters are described in the Section 6.5 below
Cipher modulus	600 bits
Big modulus	1,200 bits
Scaling factors	120 bits

Table 3

Parameters in RF algorithm

Para. name	Para. value	Description
n_estimators	30	The forest’s tree count
criterion	Gini, entropy	The ability to evaluate the quality of a split. “gini” for Gini impurity and “entropy” for information gain is supported criteria
max_depth	10	A tree’s maximum depth
Min samples split	2	The smallest number of samples needed to break apart an internal node into its parts
Min samples leaf	1	There is a minimum number of samples that must be taken at each leaf node

6.4 Feature extraction phase

Features taken from photos have already been utilized with this filter. Padding is a term used to define the quantity of pixels added to an image during processing by the CNN kernel.

6.5 Online processing phase

This stage starts when the client asks the server for the public key to encrypt the image, then provides the server with the encrypted image. The client first converts a message (M) to cipher vectors ([ct1, pk] percent q,[ct2,pk] percent q) in this phase using CKKS as the FHE method. Here, ct1 and ct2 are cipher texts expressed in polynomial form, pk stands for the public key, and q is the cipher text modulus. The server and client sides of this phase are implemented as follows:

The initial security parameter λ step (server-side, client-side): a polynomial of degree N, where N is a power of two and the polynomial degree d determines R’s position in the quotient ring.

The ciphertext modulus q:

Q has a large modulus. For both techniques, we can execute more homomorphic operations before the noise becomes too great for testing by using a bigger modulus q. Use 41 different q values, ranging from 40 to 1,200 bits. On the basis of the security parameter and polynomial degree, q and Q are, nevertheless, upper constrained [36].
The scalability factor: A higher value of the scaling factor results in more precision but restricts the number of homomorphic operations. More specifically, calculate the error in the outcome of decoding m(x) + e(x) for some accumulated error for a plaintext polynomial (x) Rq where (x) = Encode (z). In actuality, pick using this estimation as well. For instance, selecting = 210 results in a final decoded message with an accuracy of 4 bits. Choose = 215 if 6 bits of precision have been chosen. If the size of our decoded slots decreases, several procedures require raising the scaling factor’s size [37].

Generating keys step (server side, client side): Both the server side and the client produce public keys (pk) and private keys (sk), and utilize their respective keys to carry out encryption and decryption procedures.
Encrypting image (server side, client side): The client executes the image encryption steps after receiving the client’s public key. The image is scaled to scale 32 × 32 using the bilinear approach, resized to be appropriate for the model created on the server side, and then read pixel by pixel and encoded individually using CKKS encoding. Every pixel in the image is changed into an 8-degree polynomial in this step. Additionally, the client’s public key is used by the server side to encrypt the photos that were retrieved in the same way.
Decrypting image (server side, client side): The client-side generated encrypted image is reliant on the server’s public key transmission. The top five images that are most similar to this image are requested from the server using this image. With the help of the private key generated on the server side, the image is decrypted throughout this procedure. To obtain the actual values of the pixels, a decoding operation is used on the result of the decryption process, which is in polynomial form. Similar to how the server does it, the client likewise uses the private key to open photos it has received from the server.
Similarity matching (server side only): Using the model (M), the features of the sent image are extracted in this step of the image retrieval process based on the flattened layer. Using the features retrieved and the feature vectors stored in the training phase, the Hamming distance technique is used to assess how closely the five images that are most similar to the provided image match the original.

7 Results

There are many results from this proposed model illustrated as follows.

7.1 Augmentation data result

The rotating, horizontal shifting, vertical shifting, and flipping augmentations are examples of the four that are used in Figure 4. After applying the rotation process to the original training photographs, there are 50,000 new photos, as shown in Table 4, and the lost value is made up by performing bilinear interpolation on the new photos. An angle of 15 degrees was used to apply the rotation. According to Table 4, 50,000 new photos were created by applying a horizontal shift of one byte in the x-direction, right-to-left, to each image in the training set. Additionally, this table demonstrates that the feature of this augmentation method is vertical shift augmentation as well as horizontal flipping; 100,000 new photos are produced by this technique. Also, to create new data that would constitute the new training data set, four more techniques – Gaussian blurring, Local Binary Pattern, Gaussian noise, and Salt-and-pepper noise – were applied.

Figure 4

Result of augmentation process.

Table 4

Dataset size after applied data augmentation

Dataset type	No. of images
Original training dataset	50,000
Rotation dataset	50,000
Horizontal shift dataset	50,000
Vertical shift dataset	50,000
Gaussian blurring	50,000
Horizontal flipping dataset	50,000
Local binary pattern	50,000
Gaussian noise	5,0000
Salt-and-pepper noise	50,000
Total	450,000

7.2 CNN layers

Ten CNN layers have been used to implement the suggested model. Every layer in the training set is applied to every image. Six convolutional layers, three max-pooling layers, and one fully connected layer make up this network. The output of the input photos after they have been processed by the network layers.

7.3 Training results

The optimization method RMSprop was used to train the model on 450,000 training photos (450,000 R, 450,000 G, and 450,000 B) with an initial learning rate of 0.001.

The photos are improved using a dataset that goes through 100 epochs. The weights are adjusted in each epoch to get the image closer to the target image. The model accuracy and loss together with the related hyperparameters are shown in Table 5 for the training stage. This table demonstrates that 64 batches were taken each time the training and validation samples were repeated 100 times. After multiple epochs, the learning rate increased from its initial value of 0.01, reaching 0.0003.

Table 5

Results in training phase

Iteration	Batch size	Learning rate		Loss function	Optimizer	Epochs	ETA	VAL-Loss	VAL-Accuracy	Loss	Accuracy
Iteration	Batch size	Initial value	Last value	Loss function	Optimizer	Epochs	ETA	VAL-Loss	VAL-Accuracy	Loss	Accuracy
100	64	0.001	0.0003	Categorical cross-entropy	RMS prop	100	605 s	0.3324	0.9843	0.3276	0.9787

7.4 Tasting result

The 10,000 test photos, which account for 20% of the CIFAR-10 dataset, have been entered and have been run through the model’s layers. The testing photos were divided into ten classes using the saved parameters, including the weights that the network arrived at, and multiplying them by those weights. Table 6 shows the accuracy test results, the testing time estimate, and the loss amount.

Table 6

Testing in performance

ETA	Loss	Accuracy
37 s, 4 ms	0.334	98.94

The trained network has been assessed using the confusion matrix, which displays the accuracy of classification for the CIFAR-10 dataset, and the computation of recall, precision, and F1-score values as shown in Table 7. The predicted classes of the test images have been compared with the actual test image classes. Table 8 also displays the classification outcomes for each class.

Table 7

The confusion matrix

	Airplane	Automobile	Bird	Cat	Deer	Dog	Frog	Horse	Ship	Truck
Airplane	997	1	0	0	0	1	1	0	0	0
Automobile	1	994	0	0	0	1	0	1	1	2
Bird	3	2	980	5	1	2	1	0	5	1
Cat	0	0	1	999	0	0	0	0	0	0
Deer	2	1	0	2	984	1	4	3	1	2
Dog	1	1	1	2	1	991	1	0	2	0
Frog	3	2	1	1	1	1	983	2	1	5
Horse	0	0	0	0	0	0	0	998	0	2
Ship	1	0	0	1	1	0	2	2	992	1
Truck	6	1	1	3	5	8	1	2	4	969

Table 8

Classification for each one class

Class	Precision	Recall	F1-score	Support
0	0.97	0.99	0.96	1,000
1	0.99	0.98	0.95	1,000
2	0.98	0.99	0.99	1,000
3	0.97	0.96	0.98	1,000
4	0.98	0.95	0.99	1,000
5	0.96	0.98	0.96	1,000
6	0.98	0.98	0.94	1,000
7	0.99	0.96	0.98	1,000
8	0.99	0.95	0.99	1,000
9	0.98	0.99	0.98	1,000

7.5 CNN model analysis

Visualization is key to machine learning. Visualization is often used by practitioners and academics to track output metrics and learned parameters during model training. The dashboard component TensorBoard includes a module for monitoring the distribution of tensors as well as images and audio. It also includes a module for graph visualization. Every epoch’s variation in loss and precision is shown in Figure 5. It is crucial to comprehend loss and accuracy while training advances and when these metrics remain constant in order to properly understand forward and backward propagation when a complete dataset is processed through a neural network. Overfitting can be avoided by comprehending this scaler graph. Researchers see a convergence in the growth in loess and accuracy for each of the training and validation samples in this figure. This indicator leads us to the conclusion that there is no overfitting issue with the network that is suggested in this study.

Figure 5

Analysis of the CNN model.

7.6 NIST test results

The NIST results for the CKKS algorithms on the CIFAR dataset are shown in this section. In this test, the polynomial-formatted cipher data findings were translated to binary, and the NIST measurements were then examined. The outcomes of this test are shown in Table 9.

Table 9

NIST test in CKKS algorithm

#	Test	CKKS algorithm	Pass
1	Run	0.683633	True
2	Serial	0.272892	True
3	Random excursion variant	0.821127	True
4	Random excursion	0.699500	True
5	Non-overlapping template matching	0.429871	True
6	Frequency Monobit	0.272104	True
7	Maurer’s universal statistical	0.713554	True
8	The longest run of ones in a block	0.899260	True
9	Linear complexity	0.952996	True
10	Frequency test within a block	0.917504	True
11	Discrete Fourier transform	0.888236	True
12	Cumulative sums	0.338377	True

7.7 Timing test

The time results for each encryption scheme, deep learning technique, and image retrieval time are shown in Table 10. On a PC with a dual-core CPU running at 2.8 GHz, 8 GB of RAM, and Windows 8 already installed, all tests were carried out.

Table 10

Time results

CKKS		CNN model (seconds)		Retrieve each image (seconds)
Encryption each image (seconds)	Decryption each image (seconds)	Training	testing	Retrieve each image (seconds)
22	19	36,800	2,670	0.03

8 Comparison with previous studies

To improve the retrieved images, a number of techniques have been suggested. The results of image classification are shown in Table 11 while the retrieval performance for CIFAR-10 datasets in mean average precision for various research projects is shown in Table 12.

Table 11

Image classification accuracy in CIFAR-10

	Image classification accuracy
Kua et al. [22]	95.9
Hung et al. [23]	91.19
RCNN_CKKS proposed system	97.87

Table 12

Image retrieval map in CIFAR-10

	MAP
Kua et al. [22]	0.707
Huang et al. [23]	0.867
Khan et al. [23]	0.913
RCNN_CKKS proposed system	0.956

Tables 11 and 12 show that the suggested strategy performed better, as can be seen by researchers. In the section earlier, TensorBoard analysis is used to illustrate the researchers’ recommended system analysis. On the basis of the findings of this research, the optimum CNN architecture and settings are selected. The RF approach additionally offered helpful features for determining the separation between stored vectors and input vector attributes.

9 Discussion

The results that were shown demonstrated that the CNN suggested in this study stood out from previous studies with better results in the classification and retrieval processes. Because the flatting layer has attributes that rely on learning weights from the training phase, this research noted that it supplied features for retrieving images that are similar. The best features that produce superior outcomes in terms of accuracy and time have been selected using the RF approach.

Earlier research found that the security of the data transferred across the network is not taken care of; thus, samples were used to combat the issues of overfitting and raise the accuracy of training. Additionally, it was discovered via earlier study that the security of the data supplied through the network is not taken care of, and the proposed data-augmented approach increased the dataset samples utilized to combat the issues of overfitting and increase training accuracy. A protocol that protects data privacy was designed since the majority of applications that require image retrieval are apps that need to maintain the privacy of the data of the sending individual.

10 Conclusions

A powerful CBIR system that can semantically retrieve the encrypted images with good retrieval performance is given. To produce the best results, multiple algorithms were combined. One of the FHE techniques, the CKKS algorithm was used in the encryption portion. In order to develop a deep learning model, the best features were extracted using the RF and CNN algorithms, and the hamming distance method was employed to determine the distance between the input and stored components. The researchers suggest a method for retrieving images based on CNN that was created by utilizing the flattened layer, which removes 2,028 picture features encoded in a single feature vector. After that, the researchers use random. The method for retrieving images that the researchers suggest is based on CNN and exploits the flatten layer, which removes 2,028 picture features encoded in a single feature vector. Following this layer of feature selection, the researchers use the RF technique to create 600 features that lead to greater accuracy compared to an earlier study. Through the use of CKKS, which offers the maximum level of security, a secure protocol was created to protect the data transmitted via an unsecured connection between the client and the server. Increasing the number of training images and using eight distinct augmentation techniques also solve the overfitting problem and increases the accuracy of the model. The CKKS approach is a powerful encryption technique, but the researchers point out that it is sluggish and uses more cipher image space. Results were 97.87 and 98.94% for classification and retrieval, respectively. The safety of CKKS was also evaluated using the NIST test. Given the significance of guaranteeing, the security of applications is used in the field of image retrieval.

The first contribution made in this work is the trade-off between the accuracy of picture retrieval and providing a safe environment for image transmission. Building a CNN model with excellent classification accuracy aided in drawing out the best qualities from the photos. The second contribution, important addition that leads to high classification accuracy is expanding the dataset samples by including a set of anticipated impacts on the photos, as was previously demonstrated. The best features collected from the flatten layer are chosen using the RF technique in the researchers’ final contribution.

The accuracy and speed of image retrieval are improved by using these features to determine how far apart each stored image’s features are from those of the input images. In future work, the idea has several practical implications, including the development of safe search engines and the health fields, both of which call for a secure environment for the patient–hospital communication. The ability of completely HE algorithms to execute addition and multiplication operations on encrypted data is one of the upcoming efforts that will be used to build a deep-learning model for picture retrieval performed on encrypted photos.

Conflict of interest: Authors state no conflict of interest.

References

[1] Rout NK, Atulkar M, Ahirwal MK. A review on content-based image retrieval system: Present trends and future challenges. Int J Comput Vis Rob. 2021;11(5):461–85. 10.1504/IJCVR.2021.117578.Search in Google Scholar

[2] Murala S, Maheshwari RP, Balasubramanian R. Local tetra patterns: A new feature descriptor for content-based image retrieval. IEEE Trans Image Process. 2012;21(5):2874–86. 10.1109/TIP.2012.2188809.Search in Google Scholar PubMed

[3] Onoufriou G, Mayfield P, Leontidis G. Fully homomorphically encrypted deep learning as a service. Mach Learn Knowl Extr. 2021;3(4):819–34. 10.3390/make3040041.Search in Google Scholar

[4] Denning DER. Cryptography and Data Security. Boston, MA United States: Addison-Wesley Longman Publishing Co., Inc; 1982. http://portal.acm.org/citation.cfm?id=SERIES11430.539308.Search in Google Scholar

[5] Yassein HR, Al-Saidi NMG, Farhan AK. A new NTRU cryptosystem outperforms three highly secured NTRU-analog systems through an innovational algebraic structure. J Discret Math Sci Cryptogr. 2020;25(June):523–42. 10.1080/09720529.2020.1741218.Search in Google Scholar

[6] Armknecht F, Katzenbeisser S, Peter A. Group homomorphic encryption: Characterizations, impossibility results, and applications. Des Codes Cryptogr. 2013;67(2):209–32. 10.1007/s10623-011-9601-2.Search in Google Scholar

[7] Gentry C. A Fully homomorphic encryption scheme. PhD [dissertation]. Stanford University; 2009 (September). http://cs.au.dk/∼stm/local-cache/gentry-thesis.pdf.Search in Google Scholar

[8] Plantard T, Susilo W, Zhang Z. Fully homomorphic encryption using hidden ideal lattice. IEEE Trans Inf Forensics Secur. 2013;8(12):2127–37. 10.1109/TIFS.2013.2287732.Search in Google Scholar

[9] Chung H, Kim M, Al Badawi A, Aung KMM, Veeravalli B. Homomorphic comparison for point numbers with user-controllable precision and its applications. Symmetry (Basel). 2020;12(5):1–22. 10.3390/SYM12050788.Search in Google Scholar

[10] Pedrouzo-Ulloa A, Troncoso-Pastoriza JR, Gama N, Georgieva M, Pérez-González F. Revisiting multivariate ring learning with errors and its applications on lattice-based cryptography. Mathematics. 2021;9(8):1–42. 10.3390/math9080858.Search in Google Scholar

[11] Liu J, Wang C, Tu Z, Wang XA, Lin C, Li Z. Secure KNN classification scheme based on homomorphic encryption for cyberspace. Secur Commun Netw. 2021;2021:1–12. 10.1155/2021/8759922.Search in Google Scholar

[12] Boulemtafes A, Derhab A, Challal Y. A review of privacy-preserving techniques for deep learning. Neurocomputing. Elsevier 2020;384(4):21–45. 10.1016/j.neucom.2019.11.041.Search in Google Scholar

[13] Kadhim AF, Kamal ZA. Generating dynamic S-BOX based on particle swarm optimization and chaos theory for AES. Iraqi J Sci. 2018;59(3):1733–45. 10.24996/IJS.2018.59.3C.18.Search in Google Scholar

[14] Xu Y, Zhao X, Gong J. A large-scale secure image retrieval method in cloud environment. IEEE Access. 2019;7:160082–90. 10.1109/ACCESS.2019.2951175.Search in Google Scholar

[15] Namasudra S, Sharma P. Achieving a decentralized and secure cab sharing system using blockchain technology. IEEE Trans Intell Transp Syst. 2022;1–10. 10.1109/TITS.2022.3186361.Search in Google Scholar

[16] Das S, Namasudra S. MACPABE: Multi authority-based CP-ABE with efficient attribute revocation for IoT-enabled healthcare infrastructure. Int J Netw Manag. 2022. 10.1002/NEM.2200.Search in Google Scholar

[17] Manisha M, Raman B. Local neighborhood difference pattern: A new feature descriptor for natural and texture image retrieval. Multimed Tools Appl. Springer Sci 2018;77:11843–66. 10.1007/s11042-017-4834-3.Search in Google Scholar

[18] Selvam S, Kannan ST. A new architecture for image retrieval optimization with HARP algorithm. Asian J Comput Sci Technol. 2017;6(1):1–5.Search in Google Scholar

[19] Du A, Wang L, Cheng S, Ao N. A privacy-protected image retrieval scheme for fast and secure image search. Symmetry. 2020;12(2):1–17. 10.3390/sym12020282.Search in Google Scholar

[20] Pinjarkar L, Sharma M, Selot S. Deep CNN combined with relevance feedback for trademark image retrieval. J Intell Syst. 2020;29(1):894–909. 10.1515/jisys-2018-0083.Search in Google Scholar

[21] lakhan A, Mohammed MA, Garcia-Zapirain B, Nedoma J, Martinek R, Tiwari P, et al. Fully homomorphic enabled secure task offloading and scheduling system for transport applications. IEEE Trans Vehicular Technol. 2022;71:12140–53. 10.1109/TVT.2022.3190490.Search in Google Scholar

[22] Kuo CH, Chou YH, Chang PC. Using deep convolutional neural networks for image retrieval. In: Visual Information Processing and Communication. IS&T Int. Symp. Electron. Imaging Sci Technol; 2016. p. 1–6. 10.2352/ISSN.2470-1173.2016.2.VIPC-231.Search in Google Scholar

[23] Huang HK, Chiu CF, Kuo CH, Wu YC, Chu NNY, Chang PC. Mixture of deep CNN-based ensemble model for image retrieval. 2016 IEEE 5th Glob Conf Consum Electron GCCE 2016. Vol. 2; 2016. p. 5–6. 10.1109/GCCE.2016.7800375.Search in Google Scholar

[24] Khan UA, Javed A, Ashraf R. An effective hybrid framework for content based image retrieval (CBIR). Multimed Tools Appl. 2021;80(17):26911–37. 10.1007/s11042-021-10530-x.Search in Google Scholar

[25] Ali F, Mohammed AH. Content based image retrieval (CBIR) by statistical methods. Baghdad Sci J. 2020;17:694–700. 10.21123/bsj.2020.17.2(SI).0694.Search in Google Scholar

[26] Challa RK, Gunta VK. A modified symmetric key fully homomorphic encryption scheme based on Read-Muller Code. Baghdad Sci J. 2021;18(2):899–906. 10.21123/bsj.2021.18.2(Suppl.).0899.Search in Google Scholar

[27] Syed D, Refaat SS, Bouhali O. Privacy preservation of data-driven models in smart grids using homomorphic encryption. Information. 2020;11(7):1–17. 10.3390/info11070357.Search in Google Scholar

[28] Lou Q, Jiang L. SHE: A fast and accurate deep neural network for encrypted data. Adv Neural Inf Process Syst. 2019;32:1–9.Search in Google Scholar

[29] Obla S, Gong X, Aloufi A, Hu P, Takabi D. Effective activation functions for homomorphic evaluation of deep neural networks. IEEE Access. 2020;8:153098–112. 10.1109/ACCESS.2020.3017436.Search in Google Scholar

[30] Kwabena OA, Qin Z, Qin Z, Zhuang T. MSCryptoNet: Multi-scheme privacy-preserving deep learning in cloud computing. IEEE Access. 2019;7:29344–54. 10.1109/ACCESS.2019.2901219.Search in Google Scholar

[31] Clet P-E, Stan O, Zuber M. BFV, CKKS, TFHE: Which One Is the Best for a Secure Neural Network Evaluation in the Cloud?. Springer International Publishing; 2021. 10.1007/978-3-030-81645-2_16.Search in Google Scholar

[32] Zhang Q, Zhang M, Chen T, Sun Z, Ma Y, Yu B. Recent advances in convolutional neural network acceleration. Neurocomputing. 2019;323:37–51. 10.1016/j.neucom.2018.09.038.Search in Google Scholar

[33] Tzelepi M, Tefas A. Deep convolutional learning for Content Based Image Retrieval. Neurocomputing. 2018;275:2467–78. 10.1016/j.neucom.2017.11.022.Search in Google Scholar

[34] Bologna G. A simple convolutional neural network with rule extraction. Appl Sci. 2019;9(12):2411. 10.3390/app9122411.Search in Google Scholar

[35] Hussien ZK, Dhannoon BN. Anomaly detection approach based on deep neural network and dropout. Baghdad Sci J. 2020;17:701–9.10.21123/bsj.2020.17.2(SI).0701Search in Google Scholar

[36] Will MA, Ko RKL. A guide to homomorphic encryption. Waltham, MA, USA: Elsevier Inc; 2015. 10.1016/B978-0-12-801595-7.00005-7.Search in Google Scholar

[37] Shrestha R, Kim S. Integration of IoT with blockchain and homomorphic encryption: Challenging issues and opportunities. Adv Comput. 2019;115:293–331. 10.1016/bs.adcom.2019.06.002.Search in Google Scholar

[38] Cheon JH, Kim A, Kim M, Song Y. Homomorphic encryption for arithmetic of approximate numbers. In: International conference on the theory and application of cryptology and information security. Springer; 2017. p. 409–37. 10.1007/978-3-319-78381-9_14.Search in Google Scholar

Received: 2022-07-28

Revised: 2022-08-18

Accepted: 2022-09-05

Published Online: 2023-02-15

This work is licensed under the Creative Commons Attribution 4.0 International License.

Deep learning for content-based image retrieval in FHE algorithms

Abstract

1 Introduction

2 Literature review

3 Deep learning

3.1 CNN

3.2 Activation layer

3.3 Pooling layer

3.4 Fully connected layer and dropout layer

4 HE

4.1 Four stages of HE schemes [37]

5 CKKS HE scheme

6 Proposed system

6.1 Materials and methods

6.2 Offline phase

6.3 Generation CNN model phase

6.4 Feature extraction phase

6.5 Online processing phase

7 Results

7.1 Augmentation data result

7.2 CNN layers

7.3 Training results

7.4 Tasting result

7.5 CNN model analysis

7.6 NIST test results

7.7 Timing test

8 Comparison with previous studies

9 Discussion

10 Conclusions

References

Journal and Issue

Articles in the same Issue