Efficient Search over Encrypted Medical Data with Known-Plaintext/Background Models and Unlinkability

In advanced health care systems, the patients’ medical data can be outsourced to cloud servers to enable remote healthcare service providers to access and analyze patients’ data from any location to provide better treatment. However, outsourcing sensitive medical data makes data owners, i.e., patients, concerned about their privacy because private companies run the cloud service and the data can be accessed by them. Therefore, it is important to encrypt the data in the form of documents before outsourcing them to the cloud in a way that enables a data user, i.e., a doctor, to search over these documents without allowing the cloud provider to learn any private information about patients. Several schemes have been proposed to enable search over encrypted medical cloud data to preserve patient privacy, but the existing schemes suffer from high communication/computation overhead because they are designed for a single-data-owner setting. Moreover, they are not secure against known-plaintext/background and linkability attacks, and do not allow doctors to customize their search to avoid downloading irrelevant documents. In this paper, we develop an efficient search scheme over encrypted data for multi-data-owner setting. To secure our scheme, the cloud server obtains noisy similarity scores and doctors de-noise them to download the most relevant documents. Our scheme enables doctors to prescribe search conditions to customize the search without revealing the conditions to the server. Our formal proof and analysis indicate that our scheme can preserve privacy and is secure against known plaintext/background and linkability attacks, and the results of extensive experiments demonstrate the efficiency of our scheme compared to the existing works.


I. INTRODUCTION
D UE to the cloud computing capability of storing large scale databases [1], the patients' medical data can be outsourced to cloud servers through high speed cellular network, e.g., 5G network and beyond [2], [3]. The cloud enables remote healthcare service providers to access patients' data from any location to analyze this data using data mining [4] and machine learning [5] techniques for providing better treatment [6], [7].
Well-known examples for cloud-based health systems is the national e-health infrastructures in Finland and Croatia [8]. Also, the USA is widely implementing cloud-based health services, and the market cap is expected to exceed $40 billion by 2026 [9]. However, outsourcing sensitive medical data makes data owners, i.e., patients, concerned about their privacy because private companies run the cloud service and the data can be accessed by them. For instance, over 113 million clinical records were hacked in the US in 2015 [10].
Therefore, it is essential to encrypt the data in the form of documents before outsourcing them to the cloud in a way that enables a data user, i.e., a doctor, to search over these documents without allowing the cloud provider to learn any private information about patients. To enable doctors to download documents of interest without revealing any information to the server, several schemes have been developed for searching over encrypted data [11]- [15]. The idea is that patients attach with each document an encrypted vector (called index) for the keywords of the document. Then, a doctor encrypts a vector (called trapdoor) that contains the keywords of the documents he/she wants to download and sends it to the cloud server. The server can compute the similarity score of an index and a trapdoor without being able to learn their keywords and returns to the doctor relevant documents.
Motivations. The existing schemes suffer from several limitations.
Firstly, these schemes suffer from high communication/computation overhead and the need for a large number of keys because they are designed for single-data-owner setting (one patient and multiple doctors). In medical applications, multi-data-owner setting (multiple patients and multiple doctors) is more appropriate because a doctor treats several patients, and thus he should be able to search the documents of these patients efficiently. In the existing schemes, a doctor needs to use a unique key for each patient to be able to search his/her documents, which obviously makes key management inefficient due to using many keys at the doctor side.
Secondly, in the existing schemes, doctors cannot customize their search scope to download only the documents that achieve certain search conditions, which may result in downloading irrelevant documents, and thus wasting communication and computation resources. An example for a search condition is laboratory reports with a certain issuance date.
Thirdly, the existing schemes are vulnerable to known plaintext/background attacks and linkability. In the known plaintext attack, an adversary can decrypt encrypted data (indices and trapdoors) if he possesses a set of plaintext/ciphertext pairs. In the known background attacks, an adversary uses background (or statistical) information, such as the frequency of keywords, to infer the keywords of the documents by analyzing the frequency of downloading these documents, which may reveal sensitive information on the patients' health condition. The existing schemes also suffer from linkability attacks in which the server can link the trapdoors (or indices) that have the same keywords. The existing schemes try to thwart this attack by using random numbers in the encryption so that two trapdoors having the same keywords look different, but this is not enough because the server can link the trapdoors by observing that they give the same scores when they matched to all the documents.
Contributions. To address the aforementioned limitations, we propose EPSM: an Efficient and Privacypreserving Search over Medical cloud data with known plaintext/background and unlinkability security. We provide a formal proof and privacy analysis for EPSM to prove that our scheme is secure and can preserve the privacy of the patients. Moreover, we conduct extensive experiments to evaluate the performance of our scheme and compare it to the existing works. Specifically, the main contributions of this paper are listed as follows: • EPSM enables customized search in multi-data-owner and multi-data-user setting so that doctors can prescribe search conditions in trapdoors to limit the search scope to the documents that can satisfy the conditions, without  revealing the conditions to the server. In EPSM, the cloud server computes noisy similarity scores for indices and trapdoors and doctors de-noise them to download the most relevant documents. Moreover, unlike the existing schemes, EPSM allows each doctor to use only one key to search the data of all patients he treats. • Our security analysis proves that EPSM is secure under known plaintext/background models, and the cloud server cannot link two trapdoors (or indices) that have the same keywords. • Extensive experiments are conducted, and the results indicate that EPSM requires low overhead compared to the existing schemes.
The organization of this paper is as follows. The network and threat models and design goals are presented in Section III. In Section IV, the proposed EPSM is explained in detail. Then, we analyze the security and privacy of EPSM in Section V. In Section VI, we present the performance evaluation of EPSM. Section II provides the related works. Finally, conclusions are drawn in Section VII.

II. RELATED WORK
In this section, we review the related works and compare them to EPSM.
Song et al. [18] and Boneh et al. [19] have proposed secure searchable symmetric encryption (SSE) schemes based on k nearest neighbour (kNN) technique. However, these schemes are designed to support single keyword search over encrypted data, which is very restrictive because searching documents needs multiple keywords to give accurate results. The schemes also suffer from high computation/communication overheads.
Wang et al. [20] have proposed a ranked search scheme. In this scheme, the cloud server executes the search process and sends back only the top most relevant documents to the user. However, this scheme only considers the single keyword search. Then, Cao et al. [11] have proposed a privacy-preserving multi-keyword ranked search scheme. This scheme has a limitation that it does not consider the keyword frequency, and this may result in inaccurate search  Xia et al. [16] have proposed a searchable encryption scheme for single-data-owner and multi-data-users setting. The scheme assumes that the server knows the term frequency of each keyword and it uses this background information to guess the keywords of a trapdoor and an index from the similarity score it computes. In order to thwart this attack, the server ranks the documents using inaccurate similarity scores, but this leads to inaccurate search results and downloading irrelevant documents which may cause misdiagnosis by doctors. Also, the proposed scheme is designed for single-data-owner setting which is not suitable for medical applications where a doctor typically treats several patients and it is inefficient to use single-data-owner schemes in multiple-data-owner setting as explained in Section VI.
Xiangya et al. [17] have proposed a privacy-preserving keyword search scheme for single-data-owner and singledata-user setting. This setting is not suitable for medical applications that have multiple patients and multiple doctors. To secure the scheme against known plaintext model, the server ranks the documents using inaccurate similarity scores which may result in downloading irrelevant documents. Also, there is a tradeoff between accuracy and security because higher security is achieved with increasing the inaccuracy of the similarity scores, but downloading wrong documents is more likely.
Zhang et al. [14], have proposed a scheme for multikeyword ranked search. The scheme uses an additive order function to retrieve the relevant search results. After receiving a trapdoor from a search user, the cloud server compares each encrypted keyword in the trapdoor with all the keywords of each data owner. Then, the cloud server adds all the document's scores with all the matched keywords. However, because of comparing the individual keywords in the trapdoor with all the keywords, this scheme requires high computation overhead. In [12], Li et al. have proposed a searchable encryption scheme over medical cloud data. To prevent linking the indices/trapdoors that have the same keywords, the scheme uses random numbers in the encryption so that they look different even if they have the same keywords. However, the scheme is designed for a single-data-owner setting, and the cloud server can link trapdoors (or indices) having the same keywords by observing that they give exactly the same scores when they matched to the documents' indices (or doctors' trapdoors).
We provide Table 1 to summarize the differences between EPSM and the aforementioned schemes. Unlike the existing schemes, EPSM supports multi-data-owner and multi-datauser settings. Also, EPSM enables customized search feature that allows doctors to customize their search results. EPSM ensures the unlinkabilty of indices/trapdoors having the same keywords and ensures that the indices (or trapdoors) computed by a patient (or a doctor) cannot be decrypted by other patients (or doctors). Our scheme is secure against known plaintext and know background models.

III. SYSTEM MODELS AND DESIGN GOALS
In this section, we present the network and threat models and design goals considered in this paper.
Dx's query random number • Offline key distribution center (KDC): The KDC is an offline entity that is not involved in the searching process. It computes and distributes the data owners' and data users' keys. The KDC can be run by the health department that is interested in the security of the system.
• Data owners (DO): The data owner is either a patient or a hospital, and it manages the patient's medical records. For each document, DO outsources to the cloud server an encrypted document, an encrypted vector containing the keywords of the document (called index), and an encrypted random number used in the index to mask the similarity scores.
• Data users (DU): Data users include doctors, nurses, pharmacists, researchers, etc. Each data user sends an encrypted query (called trapdoor) containing the keywords of the documents he wants to download from the cloud server. The data user receives the documents' noisy similarity scores, de-noises the scores, and sends the identifiers of the documents with the highest similarity scores to the cloud server to download them.
• Cloud server (CS): After receiving a trapdoor, the cloud server computes the noisy similarity scores of the trapdoor and the index of each document (that achieves the search conditions) and returns to the user the noisy scores. Then, after receiving the identifiers of the documents requested by the data user, the cloud server sends the documents.
In the rest of the paper, for simplicity, we will refer to DO and DU as patients and doctors, respectively.

B. THREAT MODEL
In EPSM, the attacker can be the cloud server and eavesdroppers. The cloud server is honest-but-curious, where it follows our scheme correctly but it is curious to infer sensitive information, such as the health condition of the patients, by analyzing the data it receives [16], [20]- [26]. Specifically, eavesdroppers can capture all the communications in the system and analyze them to infer sensitive information. The server should not be able to infer the keywords of the indices and the trapdoors, or link two given trapdoors (or indices) if they have the same keywords or are sent from the same doctor. Moreover, EPSM should also be secure against the following attack models.
1) Known ciphertext model. In this model, the adversary only knows the encrypted indices and trapdoors [16], [27]. 2) Known plaintext model. In this stronger model, the adversary has a set of tuples of indices (or trapdoors) and their corresponding plaintext keyword vectors. Using these plaintext-ciphertext pairs, the adversary may try to infer the keywords or the search conditions of other indices and the trapdoors [24], [28]. 3) Known background model. In this model, the adversary possesses statistical information, such as the frequency of some keywords (or search conditions), i.e., the probability of querying documents with certain keywords. Using these information, the adversary tries to identify the keywords and the search conditions of the indices/trapdoors [16], [29].

C. DESIGN GOALS
To enable efficient and privacy-preserving search, EPSM should achieve the following design goals.
(1) Customized Search. EPSM should enable doctors to prescribe conditions in trapdoors so that the server returns only the documents that can satisfy these conditions without being able to learn the conditions.
(2) Security and Privacy Preservation. EPSM should prevent the cloud server from inferring any information about the content of documents, indices, and trapdoors. EPSM should also be secure against the Known plaintext and known background models so that the server cannot identify the keywords or the search conditions of given indices/trapdoors. Also, the trapdoors (and indices) that have the same keywords and conditions or are sent from the same doctor should not be linkable. The eavesdroppers should not be able to infer any sensitive information.
(3) Scalability and Efficiency. EPSM should efficiently support search for a large number of patients/doctors with a small number of keys for efficient key management. It should also need low search time and computation/communication overhead.

IV. PROPOSED SYSTEM
EPSM consists of four phases. In the system initialization phase, the KDC generates and distributes secret keys to patients and doctors. In the index generation phase, for each document, the patient composes the corresponding index and encrypts the random number used to mask the similarity score and outsources them to the cloud. In the trapdoor generation phase, the doctor encrypts a vector containing the keywords and search conditions of the documents he wants to download, and sends the ciphertext, called trapdoor, to the cloud server. Finally, in the query matching phase, the server calculates the noisy similarity scores of the trapdoor and the indices of the documents that can achieve the search conditions. Then, it returns to the doctor the noisy scores to de-noise them and send to the server the identifiers of the documents he wants to download, i.e., the documents that have the highest scores. Finally, the cloud server returns to the doctor these documents. Table 2 gives the main notations used in the paper. Figure 2 shows an overview of EPSM

A. SYSTEM INITIALIZATION
The KDC runs the following algorithms to compute the secret keys of the patients and the doctors.
This algorithm takes the security parameter 1 m as an input and outputs two Keys SK 1 and SK 2 . The first key is where, S is a random binary vector of length (m + e + 2), and {M 1 , M 2 , N 1 , . . . , N 8 } are a set of random invertible matrices of size (m + e + 2) × (m + e + 2), where m and e are the sizes of the keywords and search conditions, respectively. The second key is where, J is a random binary vector of length n and {V 1 , V 2 , U 1 , . . . , U 8 } are a set of random invertible matrices of size (n × n), where n is the bit length of the random number the patient uses to mask the similarity score.
Pi : For each patient P i , this algorithm outputs two secret keys SK 1 Pi and SK 2 Pi . SK 1 Pi is used to encrypt the keyword vectors to calculate the indices, and it is computed as follows:

SK 2
Pi is used to encrypt the random number P i uses to mask the similarity score, and it is computed as follows: For each doctor D x , this algorithm outputs two secret keys SK 1 Dx and SK 2 Dx . SK 1 Dx is used to encrypt the vectors of keywords to compose trapdoors and it is computed as follows.
Dx is used to decrypt the random numbers of the patients to de-noise the similarity scores, and it is computed as follows. where Finally, the KDC sends SK 1 Dx and SK 2 Dx to the doctor.

B. INDEX GENERATION
To outsource a document, a patient P i computes an index and an encrypted random number and sends them to the cloud server. To do so, the patient executes the following algorithm. CreateIndex (SK 1 Pi , SK 2 Pi , V i,j , a i,j ) − → I Vij , I ai,j : This algorithm takes as input the patient's secret keys SK 1 Pi and SK 2 Pi , a keyword vector V i,j corresponding to the document, and a random number a i,j , and outputs the index of the document (I Vij ) and the encrypted random number (I ai,j ).
For a document j, P i chooses a keyword set {w i,j,1 , w i,j,2 , . . . } to generate an m-element keyword vector V i,j . Every element in V i,j contains the relevance score of the TF-IDF (Term Frequency -Inverse Document Frequency) [30], [31], which represents the significance of keyword w i,j,k within the whole document collection, and it is computed as follows.
where, the frequency of the keyword w i,j,k is f req w i,j,k ,di,j , N represents the total number of keywords in the documents set, and n w i,j,k is the total number of documents the keyword appears in. Then, P i chooses a random number a i,j for the (m + 1) − th element in the vector V i,j . After that, P i composes an (m + e + 2)-element vector V i,j = V i,j ||EF i,j , where EF i,j has (e + 1) elements for the search conditions.

VOLUME , 2021
Patient Cloud Server Doctor    For example, assuming that there is one condition on the issuance year, an example for V i,j is shown in Fig. 3. The figure shows that, the element that represents the issuance year stores one (2021 in the figure) and the other elements store zeros. e elements are used to represent the years and one element stores one all the time. For simplicity, the figure shows the vector with one condition, but the idea can be extended to include multiple conditions. Then, in order to encrypt V i,j , P i first splits it into two column vectors v ij and v ij using the secret S. So, for every element in V i,j , P i checks the value of the corresponding element in S. If it is zero, P i sets the corresponding element in v ij and v ij with the same value of the element in V i,j . Otherwise, P i chooses two random numbers for this element in v ij and v ij where their summation is equal to the value of the corresponding element in V i,j .
Then, to encrypt the random number a i,j , P i first splits it into two column vectors a i,j and a i,j using the secret J. So, for every element in a i,j , P i checks the corresponding element in J. If it is zero, the corresponding elements in a i,j and a i,j are set to the same value of the element of a i,j . Otherwise, two random numbers are chosen for this element in a i,j and a i,j where their summation is equal to the corresponding element in a i,j . Finally, the encryption of  Score (Qx,y • Vi,j) ← Match(IQ x,y , IV i,j ) // where Score (Qx,y • Vi,j) is the dot product noisy score of Qx,y and Vi,j 5 if Score Qx,y • V i,j ≥ maxscore then 6 ignore this index and continue 7 else 8 Λ ← Scorelist.Append(Λ, Score (Qx,y • Vi,j)) 9 end 10 end Output: Send Λ and the corresponding Ia i,j of each document to Dx a i,j (I ai,j ) is computed using SK 2 Pi as follow.
where, I ai,j is a column vector of size 8n. Finally, for each document, P i sends to the cloud server the corresponding index I Vi,j and the encryption of a i,j (I ai,j ).

C. TRAPDOOR GENERATION
In this phase, to search for documents of interest, a doctor composes a query (Q x,y ) containing the keywords of interest and search conditions, and then uses the following algorithm to encrypt it and obtain the trapdoor I Qx,y .
CreateTrapdoor (SK 1 Dx , Q x,y ) − → I Qx,y : This algorithm takes the doctor's secret key SK 1 Dx and the query vector Q x,y as input, and computes the trapdoor I Qx,y .
Firstly, the doctor D x composes the m + e + 2-element query vector Q x,y . The first m elements contain the keywords of interest where each element stores one or zero to indicate whether or not the corresponding keyword to the element exists in the documents of interest. Specifically, Q x,y [k] = 1 if the doctor is interested in keyword k, and Q x,y [k] = 0 if the doctor is not interested in the keyword. Then, a random number b x,y is selected for the (m + 1) − th element. After that, D x uses the following e + 1 elements to prescribe the search conditions as follows.
where c is a random number that is greater than the maximum noisy similarity score,F ⊂ [m + 2, m + e + 1] is set of the elements' positions of the document issuance years that the doctor wants to search, and g is the length ofF . For example, if the doctor wants to search for the documents issued in 2021 and download them as shown in Fig. 3c. He/she stores −c in the element corresponding to 2021, c in the last element, and zeros in the other elements. Moreover, as illustrated in Fig. 3b, if the document is issued in 2021, the patient stores one in the corresponding element to 2021 and the last element and zero in the other elements of the index vector. By doing so, if the condition is satisfied, the dot product of the elements of the conditions in the index and the trapdoor is equal to zero. Otherwise, it is c whose value is greater than the maximum noisy similarity score. So, if the noisy similarity score obtained by the cloud server is greater than the maximum score, this indicates that the document does not satisfy the search conditions, otherwise, all the conditions are satisfied. For simplicity, Fig. 3 shows only one condition, but it can be extended to add additional conditions.
To encrypt the query vector Q x,y and obtain the trapdoor I Qx,y , Q x,y is first split into two row vectors q xy and q xy using the secret S, as follows. For every element in Q x,y , D x checks the corresponding element in S. If it is one, the corresponding element in q xy and q xy are set to the same value of the element of Q x,y . Otherwise, two random numbers are chosen for this element in q xy and q xy where their summation is equal to the corresponding element in Q x,y . Finally, the trapdoor I Qx,y is computed using SK 1 Dx as follow.
where I Qx,y is an 8(m + e + 2)-element row vector. Finally, D x sends the trapdoor I Qx,y to the cloud server.

D. QUERY MATCHING
In this phase, the cloud server computes the noisy similarity score of the trapdoor and the index of each document that achieves the search conditions without being able to learn the real score. Then, the server sends to the doctor the noisy scores and the encryptions of the random numbers the patients used to mask the scores as indicated in Algorithm 1. After that, the doctor de-noises the scores and sends to the cloud server the identifiers of the documents he wants to download. These documents include the ones that have high similarity scores in addition to redundant documents that are downloaded to protect against known-background attacks by preventing the server from learning the documents of interest and guessing the keywords of these documents. Finally, the cloud server returns to the doctor the documents he requested. The following algorithms are used in this phase.
Match(I Qx,y , I Vi,j ) − → N oisyScore: This algorithm takes a trapdoor I Qx,y and an index I Vi,j as input, and produces the noisy similarity score of Q x,y and V i,j by computing the dot product (I Qx,y • I Vi,j ).  Theorem IV.1. The server can obtain the noisy similarity score of indices and trapdoors using dot product operation. Proof.
If all the search conditions prescribed in the trapdoor are satisfied, Q x,y • V i,j = KeywordScore + a i,j b x,y , which gives the noisy similarity score that is equal to the similarity score of the keywords part in vectors V i,j and Q x,y (KeywordScore) masked by the random number a i,j b x,y , where a i,j is added by the patient in the document index and b x,y is added by the doctor in the trapdoor. If at least one condition is not satisfied Q x,y • V i,j = KeywordScore + a i,j b x,y +c and by selecting c to be greater than the maximum noisy similarity score, the server can learn that the document does not achieve at least one condition and it should discard the document. Finally, for each document that achieves the doctor's conditions, the cloud server returns to the doctor the noisy similarity score and the encryption of the random number a i,j (I ai,j ) used by the patient to mask the similarity score.
For each document, the doctor decrypts I ai,j to obtain a i,j using the algorithm DecryptRandomN umber(). Then, using this random number and his trapdoors's random number b x,y , the doctor de-noises the noisy scores (by subtracting a i,j b x,y ) to obtain the real scores. Then, the doctor sends to the cloud server the identifiers of the documents he wants to download, i.e., the documents that have the highest scores. The doctor should also download redundant documents to protect against known-background attacks by preventing the server from learning the documents of interest and guessing the keywords of these documents. Finally, the cloud server returns to the doctor these documents.

DecryptRandomNumber (SK 2
Dx , I ai,j ) − → a i,j : This algorithm takes the doctor's secret key SK 2 Dx and the encrypted random number I ai,j , and outputs the random number a i,j . The algorithm multiplies SK 2 Dx by I ai,j to obtain a i,j and a i,j , and then the splitting vector J is used to obtain a i,j as follows. For each element Theorem IV.2. The doctor can decrypt the encrypted random number I ai,j by multiplying it by SK 2 Dx . Proof. similarly,

V. SECURITY AND PRIVACY ANALYSIS
Our formal proof of the security/privacy-preservation of our scheme follows the logic and model presented in [32]. The goal of the proof is to prove that the cloud server can compute the noisy similarity score of an index and a trapdoor without revealing their keywords and search conditions. We will also prove that external attackers cannot reveal the keywords and search conditions. The server and external attackers cannot also learn the similarity scores of the indices and trapdoors.
Preposition 1. The cloud server can calculate the noisy similarity score of an index and a trapdoor without being able to learn the keywords or the search conditions.
Proof. History. The history consists of two sets, including a set of n indices corresponding to the documents of patients (I V = I Vi,1 , I Vi,2 . . . , I Vi,n , for each patient P i generated by encrypting a set of keywords vectors V = {V i,1 , V i,2 . . . , V i,n }) and a set of u trapdoors corresponding to the doctors' queries (I Q = I Qx,1 , I Qx,2 , . . . , I Qx,u , for each doctor D x generated by encrypting a set of queries vectors Q = {Q x,1 , Q x,2 , . . . , Q x,u }).
Trace. A trace T race(H) represents the information of the history H that is deduced by the cloud server, e.g., from the search patterns.
View. The view W (I V , I Q , T race(H)) has the encrypted history and its trace and it is the observation of the server.
A simulator S can produce a fake view W that is indistinguishable from the original view W by executing these steps.
Step 1: S generates the secret key sk = SK .
Step 2: S generates a set of random documents D = {d 1 , . . . , d n } such that |d i | = |d i |, 1 ≤ i ≤ n, d i = {w 1 , w 2 , . . . }, where |d i | is the number of keywords in d i .
Step 3: S generates a set of queries as Q = Q x,1 , Q x,2 , . . . , Q x,u , where Q is a random copy of Q.
Step 4: S generates a set of keyword vectors (V ) which is a random copy of V , where V = V i,1 , V i,2 . . . , V i,n .
Step 5: S generates indices I V and trapdoors I Q using the secret sk .
From the previous construction, EPSM is indistinguishable and secure if S has a trace T race(H ) of the history H = (I V , I Q ) that is similar to the original trace T race(H) such that in no probabilistic polynomial time, an adversary can differentiate between the original view W (I V , I Q ) and the fake view W (I V , I Q ) with non-negligible advantage, where the correctness of the construction implies this conclusion.
Preposition 2. EPSM ensures that adversaries can not reveal any keyword or search condition from trapdoors and/or indices, i.e., EPSM is secure in the known-ciphertext model.
Proof. In EPSM, the confidentiality of the indices and trapdoors is protected using encryption. For each patient/doctor, the matrix M is split into two randomly chosen matrices which are multiplied by another matrix N , and thus, no patient/doctor is able to reconstruct the matrix M . This is important because by knowing M , adversaries can compute the keywords or the search conditions from the indices or trapdoors. This means that the keywords and the conditions are protected in the known-ciphertext model because there is no information can be leaked about them.

Preposition 3. EPSM ensures that the indices (or trapdoors) computed by a patient (or a doctor) cannot be decrypted by other patients (or doctors).
Proof. If all patients (or doctors) share the same key, then the indices (or the trapdoors) computed by a patient (or a doctor) can be decrypted by other patients (or doctors). Thus, patients' sensitive information, e.g., the health condition, can be revealed by other patients. To avoid this problem in EPSM, each patient/doctor has a unique key, and, in spite of using different keys to encrypt the indices/trapdoors, the cloud server is still able to obtain the dot product of the keyword and query vectors and obtain the noisy similarity score. Proof. Under the known-plaintext model, the adversary possesses a set of plaintexts (keyword vector and queries) and their ciphertexts (indices and trapdoors). The adversary tries to use this set to attack the encryption scheme, e.g., by decrypting a new ciphertext. Most of the existing schemes are not secure against known-plaintext model because the server can learn the similarity score, by calculating the dot product of an index and trapdoor. Therefore, if an index has n elements (i.e., n unknowns), the server needs n trapdoors (with known plaintexts) to create n linear equations and solve them to compute the n elements of the index. To protect against this attack in our scheme, the server does not know the similarity score. It only knows noisy similarity score (real score + a i,j b x,y ). The random numbers a i,j and b x,y are known only to the patient and doctor. The patients should use a different a i,j for each index and doctors should use a different b x,y in each trapdoor, so that a i,j b x,y is always different even if the same query is used multiple times. By reusing a i,j and b x,y , the server can subtract two equations to cancel the term a i,j b x,y and obtain the difference between the two scores, and thus the server can create enough number of equations to obtain the keyword vector of an index. Therefore, by changing a i,j b x,y continuously, the server cannot have enough number of equations to solve because a i,j b x,y introduces a new unknown. Similarly, for the same reasons explained, the server cannot create equations to decrypt trapdoors.
Preposition 5. EPSM ensures that the cloud server or an external eavesdropper cannot identify the keywords and the conditions of the documents/trapdoors under the known background model.
Proof. In the known background attacks, an adversary uses background (or statistical) information, such as the frequency of keywords, to infer the keywords of the documents by analyzing the frequency of downloading these documents, which may reveal sensitive information on the patients such as their diseases. To protect against this attack in our scheme, the server should not know the real frequency of downloading documents, and this is done by downloading redundant documents (that do not have the highest similarity scores) by the doctors, and because of hiding the similarity scores of the documents in our scheme, the server cannot identify these redundant documents. Preposition 6. EPSM ensures unlinkabilty of indices/trapdoors sent from the same patient/doctor or having the same keywords and search conditions.
Proof. The existing schemes suffer from linkability attacks in which the server can link the trapdoors (or indices) that have the same keywords. They try to thwart this attack by using random numbers in the encryption, so that two trapdoors (or indices) having the same keywords look different. However, this is not enough because the server can link two trapdoors (or indices) by observing that they give exactly same similarity scores when they matched to a set of indices (or trapdoors). EPSM ensures that the encrypted in- Computation time (ms) Index in [12] Index in [16] Trapdoor in [12] Trapdoor in [16] Index / Trapdoor in EPSM (a) Index/trapdoor generation. Computation time (ms) [12] [16] EPSM (b) Query matching. dices/trapdoors that have the same keywords or sent from the same patients/doctors look different because of using random numbers in splitting the vectors V i,j and Q x,y . Moreover, our scheme also hides the similarity score from the server using the random numbers a i,j and b x,y . Therefore, by computing the noisy similarity scores by a server for two trapdoors with same keywords, the scores look different due to using different b x,y in the two trapdoors.

VI. PERFORMANCE EVALUATION
In this section, we compare the performance of EPSM with the existing schemes.

1) Experiment Setup
To evaluate the communication and computation overheads of EPSM, we have performed our experiments using python running on an Intel ® Core i7-8700 CPU @3.20GHz and 16 GB RAM. The computation and communication overheads of EPSM are compared to the proposed schemes in [12] and [16] after using them in a multi-data-owners setting. All the results presented in this section are averaged over 1000 trials for 2,000 documents, 10 patients, 10 doctors, and 2 bytes for each element in the ciphertext vector.

2) Performance Metrics
Three performance metrics are used for comparison and assessment of our scheme.
1) Computation overhead. The time needed by patients/doctors to generate indices/trapdoors to be sent to the server. Also, the time needed to calculate the similarity score by the cloud server to search the documents.
2) Communication overhead. The amount of data transmitted during the communication between the patients/doctors and the server. 3) Key management. The number of a doctor's keys that is used to search all the documents of all patients. Fig. 4a gives the computation overhead of generating indices/trapdoors versus the number of keywords. The figure shows that the computation overhead increases as the number of keywords increase due to increasing the size of the matrices and vectors. In EPSM, because of supporting the multi-data-owners setting, each patient generates one index for each document and the doctor needs to generate only one trapdoor to search over the documents of all patients. Also, the same computation time is needed to generate the indices and the trapdoors because their vectors have the same size. It can also be seen from the figure that EPSM is more efficient comparing to [12] and [16], because to use these schemes in a multi-data-owners setting, the doctor needs to calculate one trapdoor for each patient to be able to search their documents. Fig. 4b gives the time needed to calculate the similarity score by the cloud server versus the number of keywords. As shown in the figure, the computation overhead needed to calculate the similarity score increases as the number of keywords increase because the vector size increases. The figure also shows that EPSM needs less time than [12] and [16]. Although, EPSM increases by higher rate because it needs eight dot product operations to support multi-dataowner setting, the computation time is low (in ms) even with a high number of keywords (2000). Index in [12] Index in [16] Trapdoor in [12] Trapdoor in [16] Index / Trapdoor in EPSM Figure 5: Communication overhead.

2) Communication Overhead
In EPSM, each patient sends an index (I Vi,j ) for each document. The overhead is |I Vi,j |, where |I Vi,j | is the size of the index. If each element in the ciphertext is represented by 2 bytes, the ciphertext size in our scheme becomes 16(m + e + 2) bytes. Similarly, the trapdoor vector size is 16(m + e + 2) bytes. Fig. 5 gives the the index/trapdoor communication overhead versus the number of keywords. It can be seen that the communication overhead increases linearly with the number of keywords due to increasing the vector size. Moreover, the schemes [16] and [12] need more overhead comparing to EPSM because they need to extend the vectors by the maximum possible number of documents before encrypting them, and the doctor needs to send multiple trapdoors that are equal to the number of patients to search their documents but in our scheme only one trapdoor is sent to search the data of all patients. Fig. 6 gives the number of a doctor's keys versus the number of patients. As shown in the figure, in EPSM, each doctor has only one key that is used to search all the documents of all patients. However, in [16] and [12], because the schemes are designed for a single-data-owner setting, each doctor needs to share a key with each patient. In e-health application, a doctor typical treats several patients, so multi-data-owner is a proper setting. The figure shows that the number of keys of a doctor increases linearly with the number of patients. Using many keys in the system makes key management inefficient.

VII. CONCLUSION
In this paper, we have proposed, EPSM, an efficient and secure search scheme over encrypted medical cloud data in multi-data-owner setting. To secure EPSM, the could server cannot learn the similarity scores of indices and trapdoors, but it computes noisy scores and sends them to the doctor to de-noise them. Moreover, EPSM enables a new feature that allows doctors to customize their search results by expressing search conditions in the trapdoors. Our formal proof and security analysis demonstrate that EPSM can preserve patient privacy and is secure against known plaintext and know background models. Also, EPSM ensures the unlinkabilty of indices/trapdoors having the same keywords. Finally, our extensive experiments demonstrate that EPSM requires low computation and communication overheads and small number of keys because it is designed for multi-data-owner setting which is more suitable for medical applications.
For the future work, we will investigate denial of service (DoS) attacks against the centralized server. Specifically, we will try to replace the central server with a blockchain network. Also, we will investigate the use of machine learning technology to diagnose diseases e-health system.