Secure Symmetric Keyword Search with Keyword Privacy for Cloud Storage Services

As Internet services are widely used in various mobile devices, the amount of data produced by users steadily increases. Meanwhile, the storage capacity of the various devices is limited to cover the increasing amount of data.+erefore, the importance of Internet-connected storage that can be accessed anytime and anywhere is steadily increasing in terms of storing and utilizing a huge amount of data. To use remote storage, data to be stored need to be encrypted for privacy.+e storagemanager also should be granted the ability to search the data without decrypting them in response to a query. Contrary to the traditional environment, the query to Internet-connected storage is conveyed through an open channel and hence its secrecy should be guaranteed. We propose a secure symmetric keyword search scheme that provides query privacy and is tailored to the equality test on encrypted data. +e proposed scheme is efficient since it is based on prime order bilinear groups. We formally prove that our construction satisfies ciphertext confidentiality and keyword privacy based on the hardness of the bilinear Diffie–Hellman (DH) assumption and the decisional 3-party DH assumption.


Introduction
According to the development of IT technologies including communications and computations, the use of small devices for daily human life is increasing. Along with the change, the world's so-called Internet of Everything (for short, IoE) is getting closer to our life. In the IoE world, billions of devices are used for various IT services, including social network websites and applications, which deal with users' personal data for better IT services [1]. Not all data can be stored and managed in small and low-powered devices, and thus, we need to use Internet-connected storage. Although the use of Internet-connected storage can make it possible to utilize much more data without storing it in local storage, we need to care about the security of data which are stored and managed in remote storage. e main security concern in using Internet-connected storage such as a cloud storage is data privacy [2][3][4]. e storage inevitably stores and manages the incremental amount of sensitive data on clients. Clients of the storage service must entrust their data to a service provider [5]. Encryption has been the most classical method to provide data privacy. To provide encryption-based access control for clients' sensitive data, a number of value-added encryption techniques have been studied including attribute encryption techniques [6].
A general data protection regulation (GDPR) has forced companies to use encryption of personal data to reduce the probability of a data breach [7]. Accordingly, companies are encrypting, storing. and managing customers' personal information. When an encryption is used for data privacy, we face another obstacle. e storage server should be given a capability that allows server to identify exactly the documents a client wants to retrieve without decrypting them. As one of the basic steps to resolve this difficulty, secure keyword search over encrypted data is receiving much attention. Secure keyword search enables a user to search the encrypted data with a keyword without revealing any information on the data. When an encrypted document is uploaded to a server, a set of ciphertexts of keywords in the document are appended to the encrypted document. Let CT w denote the ciphertext of keyword w. For a given query (also called trapdoor) T w′ , the server runs the function test with inputs CT w and T w′ to identify whether or not w � w ′ . Only the user who can generate query T w′ , such that test (CT w , T w′ ) � 1, can retrieve the encrypted documents containing keyword w. Secure keyword search is a primitive to construct various queries and can be extended to the complex queries such as range queries and inner-product queries [8].
Secure keyword search systems can be classified into two types: asymmetric and symmetric settings. In the asymmetric setting [9][10][11], known as public key encryption with keyword search (PEKS), ciphertext CT w of keyword w is generated under a public key and only the owner of the corresponding secret key can generate trapdoor T w . Hence, PEKS is suitable in a store-and-forward system such as an e-mail system. In the symmetric setting [12][13][14][15][16][17][18], ciphertext CT w of keyword w is generated under a symmetric key and only the owner of the key can generate trapdoor T w using the symmetric key. Here, the symmetric key is not shared but owned by one client. e symmetric setting is suitable to personal storage service as well as a blog and web-hard service, where the same client uploads and downloads his/ her data.

Necessity of Keyword Privacy.
e formal notion of secure keyword search has considered ciphertext confidentiality, i.e., a semantic security against an attacker who generates the ciphertexts of keywords of her choice. When given a query and a ciphertext of a keyword, the server can decide whether or not the ciphertext is related to the query by running the function test. erefore, it is not possible to guarantee ciphertext confidentiality without guaranteeing the secrecy of the query (keyword privacy).
As stated in [19,20], it is not possible to provide keyword privacy of the trapdoor in PEKS which is one of the searchable encryptions in asymmetric setting, due to the ciphertext of a guessed keyword. Hence, an adversary can obtain the test result of the ciphertext of the guessed keyword and a given trapdoor. In [11], Rhee et al. firstly defined the notion of a keyword privacy in asymmetric setting. ey proposed the enhanced PEKS scheme that the keyword privacy can only be provided in situations where only the server can test whether the ciphertext and trapdoor are related or not.
ere has been several works providing keyword privacy in the symmetric setting. Shen et al. [20] firstly proposed a symmetric predicate encryption scheme for an inner-product operation of two vectors, which are used in generating a ciphertext and a token (like a trapdoor in PEKS), and considered keyword privacy in a symmetric predicate encryption scheme. eir scheme is constructed on composite-order bilinear groups, which requires 25 times of exponentiations and 30 times of pairing operations of those in prime-order groups [21]. Recently, Blundo et al. [12] proposed a symmetric hidden vector encryption in asymmetric prime-order bilinear groups. However, there exists no efficiently computable morphism between two different groups used and its security depends on the hardness of nonstandard (d, m) − Q assumption.
However, in general, a predicate encryption differs from a searchable encryption in that a decryption occurs at the same time as the test process. Since it is not common to trust the administrator of the server in various cloud environments, it is necessary to separate the decryption and test processes so that the unreliable server cannot perform the decryption. As noted above, the previous results in symmetric predicate encryption and symmetric hidden vector encryption cannot be immediately adopted for symmetric keyword search.
Also, the protocols for providing access pattern privacy were proposed in [22,23]. at is, anyone cannot get which documents contain the keyword. But, access pattern privacy does not provide the keyword privacy from the given queries. e keyword privacy is the more intuitional than the pattern privacy. Once the information of keyword from the queries are revealed, then the privacy of the corresponding ciphertext cannot be guaranteed even though the pattern privacy is guaranteed. Also, the protocols providing access pattern privacy do not satisfy the search correctness. at is, these protocols considering access pattern privacy cause a search error and require the additional efforts for fixing the search error.

Our Contributions.
Our contributions in this paper are twofold: (1) We firstly define the "trapdoor indistinguishability" for keyword privacy in symmetric keyword search against an active adversary who is able to get trapdoors as well as ciphertexts for any nontarget keyword of his choice. is security of a trapdoor guarantees that the keyword does not reveal any information on any keyword.
(2) We construct a practical and secure keyword search, called secure symmetric keyword search (SSKS), which is tailored for Internet-connected storage service. To construct SSKS, we exploit well-known results of PEKS. Moreover, the proposed scheme achieves both ciphertext confidentiality and keyword privacy. Our construction is efficient since it is based on prime-order bilinear groups unlike the scheme in [20]. e security depends on the hardness of standard assumptions. Ciphertext confidentiality is based on the hardness of the bilinear DH and keyword privacy depends on the hardness of the decisional 3-party DH assumption.

Preliminaries
For giving concrete description, we will use pairing-related operations. So, in this section, we describe some fundamental definitions for pairing, hard problems defined over the operation, and formal definitions for scheme descriptions and security features.

Underlying Mathematical Problems.
e pairing operation is defined over an elliptic cubic curve. We will give simple definition of the operation since it is possible to understand our scheme with the knowledge of the property so-called bilinearity of pairing.

Definition 1 (bilinear map).
e definition of bilinear groups appears in [9]. Let G and G T be two (multiplicative) cyclic groups of prime order p. We assume that g is a generator of G. e: G × G ⟶ G T which is a bilinear map with the following properties: and there is an efficient algorithm to compute map e To prove the security of our scheme, we use the bilinear Diffie-Hellman assumption and the decision three-party Diffie-Hellman assumption which are defined as follows.
Definition 2 (bilinear Diffie-Hellman assumption (BDH)). e BDH problem [9] is as follows: e BDH assumption is that all polynomial time algorithms have a negligible advantage in solving the BDH problem.
Definition 3 (decision 3-party Diffie-Hellman assumption (3-party DH)). e decision 3-party Diffie-Hellman problem [24] is as follows: given g, g a , g b , g c , Z ∈ G 5 as input, determine if Z � g abc or Z is random in G.
(2) e 3-party DH assumption is that all polynomial time algorithms have a negligible advantage in solving the decisional 3-party DH problem.

Formal Definitions for SKSS.
We begin by reviewing the formal definition of symmetric keyword search scheme. (1) KG (k) takes security parameter k ∈ Z + and generates secret key SK. (4) Test (CT w , T w′ ) takes input ciphertext CT w and trapdoor T w′ . If W � W ′ , output "1"; otherwise, output "0." For any scheme, it should be guaranteed that it works as intended. More precisely, if w is identical to w ′ , then the test algorithm Test, (CT w , T w′ ) outputs 1. For a SKSS, we define its correctness as follows.
Definition 5 (correctness). For the security parameter k, we define that SSKS algorithm satisfies correctness if there is a SKSS scheme SSKS � (KG, SEKS, STd, Test) which is defined over a keyword space KW and secret key sk←SK; then for any keywords w, w ′ ∈ KW, where CT w ←SEKS(SK, w) is valid for a keyword w and T w′ ←STd(SK, w ′ ) valid for a keyword w ′ . Here, if neg is negligible, for any constant k, there exists N such that neg(k) < 1/n k for n > N.

Ciphertext Confidentiality.
Our definition for a ciphertext confidentiality (SEKS-IND-CPA-security) follows the general framework of those given in [9,12,13].
Let A be an probabilistic polynomial time adversary whose running time is bounded by t, which is a polynomial in a security parameter k. In the experiment of Table 1, A chooses keywords w 0 and w 1 in the find stage. Given challenge ciphertext CT * w b in the guess stage, A tries to correctly guess b. St is used to retain some state information.
A is allowed to obtain trapdoors and ciphertexts by querying trapdoor oracle STd(SK, w) and encryption oracle SEKS(SK, w), respectively. But, A is not allowed to obtain the trapdoor of w 0 or w 1 . Otherwise, A could run Test to find out b.
Here, the trapdoor oracle STd(w) and the encryption oracle SEKS(w) are defined as follows.
We newly define keyword privacy (KEY-IND-CPA-security) for SSKS. In the experiment of Table 2, A tries to correctly guess b of T * w b . A is allowed to query the trapdoor oracle and the encryption oracle. But, A should not be allowed to obtain the ciphertext of w 0 or w 1 . Otherwise, A could run Test to find out b. A also needs to be restricted in obtaining the trapdoor of w 0 or w 1 . Otherwise, A might find ciphertext CT in the database such that Test(CT, T w b ) � 1. If the trapdoor of w 0 (or w 1 ) is available, A then can easily decide the value of b. e advantage of A attacking a keyword privacy is defined as follows: Adv

New Symmetric Keyword Search with Keyword Privacy
In this section, we give a detailed description for our symmetric keyword search with keyword privacy. Since we use pairing operations for our scheme, we use the following notations. Let G and G T be groups of prime order p, and let e: G × G ⟶ G T be a bilinear map. We use hash functions H 1 : 0, 1 { } * ⟶ G and H 2 : G 1 ⟶ 0, 1 { } log p . Our construction works as follows: KG (k) takes the security parameter k ∈ Z + and picks a random exponent α ∈ Z p , generator g ∈ G, and a random value u ∈ G(u ≠ g). It outputs secret key SEKS (SK, w) takes as input secret key SK and keyword w ∈ KW, where KW is a keyword space. It picks a random exponent s ∈ Z p and returns the ciphertext: STd (SK, w) takes as input secret key SK and keyword w. It picks a random exponent r ∈ Z p and outputs the corresponding trapdoor: Test (CT, T w ) takes as input ciphertext CT and trapdoor T w and parses CT as [C 1 , C 2 , C 3 ] and T w as [T 1 , T 2 ]. It checks if the following equality holds: If so, output "1"; otherwise, output "0."

Analysis
In this section, we analyze the proposed scheme in terms of security against the security notions discussed in Section 2.2.
We also compare the proposed scheme with existing schemes to show that our scheme guarantees better security than the existing schemes.

Security.
We now prove that our construction satisfies ciphertext confidentiality and keyword privacy. Ciphertext confidentiality is proved as the same manner in [9].

Theorem 1. If the (t, ε)-decisional BDH assumption holds in G, then our SSKS scheme is SEKS
Proof. Suppose A is an adversary that has advantage ϵ in breaking ciphertext confidentiality. We construct B that solves the BDH problem with probability at least , where e is the base of the natural logarithm and q H 2 (resp., q T ) is the number of hash function H 2 queries (resp., the number of trapdoor queries). Given g, u 1 � g a , u 2 � g b , and u 3 � g c ∈ G, the goal of B is to compute e(g, g) abc ∈ G T . B interacts with A as follows: Setup. B picks a random t ∈ Z p , and let K 2 � u 1 � g a and K 3 � u � g t .
(1) H 1 , H 2 -Queries. As the same manner in [9], B can simulate H 1 and H 2 queries. If there exists Otherwise, B responds to a query for H 2 (t) by picking a random V ∈ 0, 1 { } log p for t and setting H 2 (t) � V and adds (t, V) to the H 2 -list.
Query Phase 1. A makes trapdoor and ciphertext queries as follows: If c i � 0, then B reports failure and terminates. Otherwise, since there exists e j ∈ Z p such that h j � g e j ∈ G, B picks a random r ′ ∈ Z p and sets K 1 � g r′ and B picks a random s ∈ Z p and sets C 1 � (u) s , C 2 � g s and C 3 � H 2 (e(u s 1 , H 1 (w))), where u ∈ G is the let value in the setup phase. B gives back CT for w to A.

Challenge.
A outputs challenge keywords w 0 and w 1 . To obtain h 0 , h 1 ∈ G such that H 1 (w 0 ) � h 0 and H 1 (w 1 ) � h 1 , B queries w 0 and w 1 to H 1 -queries. Let 〈w b , h b , e b , c b 〉 be the corresponding tuples on the H 1 -list (b � 0, 1). If both c 0 � 1 and c 1 � 1, then B aborts. Otherwise, since there exists b ∈ 0, 1 { } such that c b � 0, B picks a random z ∈ 0, 1 { } log p and sets C 1 * � u t 3 , C * 2 � u 3 and C * 3 � z, where t ∈ Z p is the selected value in the setup phase. B responds with CT * � [C * 1 , C * 2 , C * 3 ]. Query Phase 2. B answers the queries in the same manner as phase 1 under the restriction of trapdoor queries of w ≠ w 0 , w 1 . Output. A outputs its guess, b ′ ∈ 0, 1 { }. e value h i � u e i 2 was set with the probability 1/(q T + 1) in the setting of the H 1 queries. Since A queries the H 2 oracle regarding the value of the form e(K 2 , H 1 (w b )) s )) � e(g a , (g bc ) e b ) with the same probability 1/(q T + 1), there exists one pair of the form (e(g, g) abce b , H 2 (e(g a , (g bc ) e b )) ∈ H 2 -list. erefore, B picks a random pair (t, V) ∈ H 2 -list and outputs t (1/e b ) as its guess for e(g, g) abc , where e b is the selected value in the challenge phase. To show that B correctly outputs e(g, g) abc with probability at least ε ′ , we should analyze the probability that B does not abort during the simulation.
We define the following events: in the real attack We can show in the same manner as in [9] that We omit the detailed description.
To prove keyword privacy, we consider a hybrid game which differs on what challenge trapdoor T * b is given by the challenger to A. We suppose that T * β is the real trapdoor , which is the challenge ciphertext given to the adversary during a real security game or is a random R ∈ G.

Theorem 2. If the (t, ε)-decisional 3-party DH assumption holds, then our SSKS scheme is KEY
Proof. Suppose that there exists a t-time adversary A with nonnegligible difference ϵ between its advantage for challenge T w * and its advantage for challenge R ∈ G. We construct an algorithm B that solves the decisional 3-party DH problem in G. Given a random challenge (g, g a , g b , g c , Z), B outputs 1 if Z � g abc and 0 otherwise. B interacts with A as follows: Init. A outputs challenge keywords w * 0 , w * 1 ∈ KW. B flips a coin to obtain β ∈ 0, 1 { }, internally. Setup. B chooses a random τ ∈ Z p and sets unknown secret values K 2 � g ab and K 3 � (g a ) τ . B chooses random exponents e * β , e 1 , e 2 , . .., e q H 1 ∈ Z p and sets Query Phase 1. A makes trapdoor and ciphertext queries of the form w i ≠ w * β as follows: Challenge Phase . B chooses a random exponent r * ∈ Z p and outputs the challenge trapdoor for keyword w * β as follows: Query Phase 2. B answers the queries in the same manner as phase 1 under the restriction of trapdoor and ciphertext queries of w ≠ w 0 , w 1 .
B's advantage in solving the decisional 3-party DH problem is directly taken from A's advantage to distinguish between T w * β and R, except with negligible probability less than or equal to q T q C /p. □ 4.2. Comparison. Since the symmetric searchable encryption scheme is related to symmetric predicate encryption schemes and symmetric searchable encryption providing access pattern privacy, we give the following two categories for correct comparison.

Symmetric Predicate Encryption.
ere have been several studies providing keyword privacy in the symmetric setting. In [20], symmetric predicate encryption schemes for an inner-product operation of two vectors were proposed and it considered keyword privacy in a symmetric predicate encryption scheme. eir scheme is constructed on composite-order bilinear groups, which requires 25 times of exponentiations and 30 times of pairing operations of those in prime-order groups. Recently, Blundo et al. [12] proposed a symmetric hidden vector encryption in asymmetric prime-order bilinear groups. However, there exists no efficiently computable morphism between two different groups used and its security depends on the hardness of nonstandard (d, m) − Q assumption. e comparisons with [20,21] are shown in Table 3. However, as mentioned before, the predicate encryption cannot be immediately adopted for symmetric keyword search on cloud storage services.

Symmetric Searchable Encryption.
In the previous works for the symmetric searchable encryption [12][13][14][15][16][17][18], only the ciphertext confidentiality was guaranteed. In searchable encryption schemes, if the trapdoor T w is given, then the server can run the test function with T w and ciphertext C w′ . When the keyword w from a trapdoor T w is revealed, the confidentiality of the ciphertext C w associated with T w cannot be guaranteed. Once the keyword privacy is ensured, even if we know that the given trapdoor and the ciphertext are associated, we cannot know which keyword is related to the trapdoor and the ciphertext. As shown in Table 4, we firstly defined the keyword privacy in a symmetric searchable encryption scheme and presented an efficient SSKS scheme with ciphertext confidentiality as well as keyword privacy.

Efficiency.
Since the access patter privacy is the main security goal of this work, as mentioned in the beginning of Section 4.2, we will compare our scheme with predicate encryption schemes. In Table 5, we compare our scheme and the other schemes in terms of functionalities and performances. e table shows that our scheme provides both ciphertext confidentiality and keyword privacy as well as enables to search on encrypted data without any decryption. However, predicate encryption schemes require decryption operations to implement test function in predicate encryption schemes. erefore, the predicate encryption schemes have the restriction that the test function can be

Conclusions
In this paper, we proposed a practical and secure keyword search, called secure symmetric keyword search (SSKS), which is tailored for cloud storage service. We firstly defined the keyword privacy in a symmetric searchable encryption scheme and presented SSKS scheme that guarantees ciphertext confidentiality as well as keyword privacy. Our SSKS systems and the new security model for key privacy in the symmetric setting can be further exploited to construct symmetric searchable encryption schemes providing extended queries such as conjunctive and inner-product queries [25][26][27].

Conflicts of Interest
e authors declare that they have no conflicts of interest.