Research on the Ranked Searchable Encryption Scheme Based on an Access Tree in IoTs

With the continuous development of the Internet of things (IoTs), data security and privacy protection in the IoTs are becoming increasingly important. Aiming at the hugeness, redundancy, and heterogeneity of data in the IoTs, this paper proposes a ranked searchable encryption scheme based on an access tree. First, this solution introduces parameters such as the word position and word span into the calculation of the relevance score of keywords to build a more accurate document index. Secondly, this solution builds a semantic relationship graph based on mutual information to expand the query semantics, effectively improving the accuracy and recall rate during retrieval. Thirdly, the solution uses an access tree control structure to control user authority and realizes fine-grained access management to data by data owners in the IoTs. Finally, the safety analysis of this scheme and the efficiency comparison with other existing schemes are given.


Introduction
With the rapid development of IoTs technology, it has been used in all walks of life and has been widely recognized in various fields such as medical care, smart transportation, government work, smart home, and environmental monitoring [1][2][3][4][5]. At the same time, all kinds of information generated by users are increasing by a huge order of magnitude. Cloud storage is widely used due to its low cost and good scalability, and it solves the storage and management of this electronic data information by the IoTs. However, frequent privacy data leakage incidents have caused severe social impacts and disrupted economic development [6][7][8]. Therefore, how to protect user privacy and data security has become a technical bottleneck restricting the further development of IoT applications [9][10][11][12][13]. An effective way to solve data privacy leakage is to encrypt data and then store it on a cloud server. It can prevent unauthorized servers from accessing user data, and it can also effectively protect user data when the server is attacked. However, when users want to access their own data, because the cloud server stores encrypted data, these data no longer maintain the plaintext data structure before encryption, so the cloud server cannot effectively return the data searched by the user. In view of this situation, the easiest way is to download all encrypted data locally and then decrypt them one by one before searching. This method does not make full use of the computing power of the cloud server and wastes a lot of time and bandwidth power consumption, so that it cannot meet the actual needs of the cloud storage of the IoTs. Therefore, how to securely retrieve ciphertext data is an urgent problem to be solved in the IoTs [14].
The searchable encryption (SE) technology is a special encryption technology that can realize keyword ciphertext retrieval and ensure that attackers cannot obtain the keyword information queried by users through keyword ciphertext or search trapdoors [15]. At present, searchable encryption technologies mainly include symmetric searchable encryption (SSE) and asymmetric searchable encryption (ASE) [16,17]. In 2000, Song [18] proposed a single-keyword searchable encryption scheme based on symmetric key encryption, which searches the ciphertext of related keywords by linearly scanning the entire ciphertext document. In 2004, Dan et al. [19] proposed an asymmetric searchable encryption scheme (Public-Key Encryption with Keyword Search (PKES)) for mail routing application scenarios. After that, researchers have done a lot of research on this basis.
In practical applications, users are usually more concerned with finding the top K documents most relevant to multiple keywords. In order to meet such demands, various multikeyword ranking searchable encryption schemes have been proposed in recent years. In 2011, Cao et al. [20] based on the secure KNN technology [21] first proposed a multikeyword ranking searchable encryption scheme based on vector inner product calculation. The scheme uses a 0/1 vector to represent each document and query vector. Compare the number of digits in the same position with a value of 1 to obtain the relevance score of the document, but this solution does not consider the importance of different keywords in the document. Therefore, Sun et al. [22] extended the scheme, introduced keyword weights when constructing document vectors and query vectors, and calculated the correlation through vector cosines to improve the accuracy of ranking.
The above ranking searchable encryption schemes all focus on the precise search of keywords and do not take into account the semantic expansion of keywords, resulting in many documents that meet the query conditions not being retrieved. Yang et al. [23] proposed a fast multikeyword semantic ranking search scheme, which introduced the concept of domain weighted scoring into document scoring and semantically expanded search keywords to improve the accuracy of the document index. In addition, the document vector is divided into blocks to effectively filter a large number of irrelevant documents, which effectively improves the efficiency of the scheme. However, this solution does not involve access control and can only be limited to a single legitimate user's query and is not suitable for the needs of keywords being queried by multiple users in an actual environment. Sun et al. [24] proposed an attribute-based keyword search scheme, which only returns authorized documents to search users. However, the search results cannot be ranked. Li et al. [25] proposed an authorized multikeyword ranking search scheme based on encrypted cloud data using attribute-based encryption strategy and symmetric searchable encryption. This scheme satisfies the confidentiality of files, the unlinkability of trapdoors, and the resistance to collusion attacks. Moreover, the scheme can enable the same data to be queried by multiple users but does not consider the semantic relevance of search keywords.
Therefore, the research on searchable encryption schemes for the cloud storage environment of the IoTs not only must protect the privacy of data to achieve the purpose of secret search but also ensure the efficiency of search efficiency. At the same time, it is also necessary to consider the situation of multiple users accessing search in the special application scenario of the IoT cloud storage environment. In the solution, the access tree is used to set user access per-missions, which allows only authorized users to retrieve cloud data and obtain the most relevant K documents. And based on the secure KNN method, the document is encrypted to ensure the security and correctness of index creation and trapdoor generation.
The main contributions of this paper are as follows: (1) This paper introduces parameters such as word position and word span into the calculation of the relevance score of keywords and assigns more accurate weights to keywords at different positions in the document, thereby constructing a more accurate document index (2) This paper builds a semantic relationship graph based on mutual information to expand the query vector semantics, which effectively improves the precision and recall during retrieval (3) This paper involves multikeyword search and access control. The access tree is used to control user access rights. Only users whose attributes meet the access policy defined by the data owner can search encrypted data with multiple keywords, so as to realize the fine-grained access management of the data by the data owner in the IoTs

Preliminaries
2.1. Vector Space Model. The vector space model [23] is the representation of the document set in the same vector space. Each document corresponds to a document vector, the dimension of the vector is equal to the length of the keyword collection, each dimension of the vector corresponds to a keyword and the value is equal to the weight of the corresponding keyword in that dimension. The user's query is also regarded as a vector in the same space, which is called the query vector. The keywords corresponding to each dimension of the vector are consistent with the document vector, and the vector dimension is the same as the document vector. The relevance score of the query and each file is equal to the value of the inner product of the document vector and the query vector.

Word Span.
Word span [19] refers to the distance between the first and last occurrence of a word or phrase in the document. The larger the word span, the more important the word is to the topic of the document. The word span can effectively reduce the impact of local keywords on document keyword extraction, because local keywords often become keywords in the entire document due to their high-frequency advantages, reducing the accuracy of keyword extraction. The word span formula is shown in formula (1).
Among them, first ij is the location identifier where the keyword w j first appeared in the document f i , last ij is the 2 Wireless Communications and Mobile Computing location identifier where the keyword w j last appeared in the document f i , and sum ij is the total number of keywords in the document f i obtained after word segmentation processing.

Word
Position. The word position [26] refers to the area where a keyword appears in a document, which is of great value for judging the importance of the keyword. The title and abstract are the central ideas extracted by the author through the summary of the whole article, so the keywords appearing in these two positions are more important than those appearing in the main text. This article divides the word position into three parts: title, abstract (or first paragraph), and body. Here, let the position value area ij of the keyword w j in different areas of the document f i be set to 3, 2, and 1. There are two situations where a keyword appears multiple times: (1) If the same area appears multiple times, the record is not repeated (2) If it appears multiple times in different areas, the highest value is used The word position formula is shown in formula (2).
2.4. Relevance Score. This paper is based on the calculation method of TF-IDF (term frequency-inverse document frequency) to evaluate the importance of keywords in documents.TF represents the frequency of the keyword appearing in the document, and IDF represents the frequency of the inverse document, that is, the fewer the documents containing the keyword, the greater the IDF value, indicating that the keyword has a strong distinguishing ability. The TF-IDF formula is shown in formula (3).
where tf ij represents the frequency of the keyword w j in the document f i , N represents the total number of all documents, and n j represents the number of documents containing the keyword w j . When calculating the relevance score of keywords, the word position and word span factors of the keywords should also be considered.
Therefore, the correlation score formula used in this paper is shown in formula (4).

Semantic Relation Graph.
Mutual information allows users to analyze the correlation between keywords. Constructing a semantic relationship graph [27] based on mutual information to expand the query semantics can effectively improve the precision and recall during retrieval.
For keywords x and y, their mutual information Iðx, yÞ [28] is expressed as shown in formula (5).
where pðxÞ represents the probability of a document, and p ðx, yÞ represents the probability of a document containing both x and y. Then, normalize the information.
where I max represents the maximum mutual information value in all Iðx, yÞ. Figure 1 is a small-scale semantic relationship graph GðV, EÞ, where node v represents the keyword and the edge weight e ij represents the normalized mutual information value of two related keywords v i , v j .
2.6. Access Tree. The scheme in this paper uses the access tree defined by the CP-ABE [29] scheme to represent the access structure. The access tree can be flexibly and efficiently applied to access authority control, which is defined as follows.
Let ϒ denote the visit tree, and each nonleaf node in ϒ represents a threshold. If node x has num x child nodes and its threshold is k x , then, 0 < k x ≤ num x . When k x = 1, the node represents an OR gate. If k x = num x , it means the AND gate. Each leaf node in ϒ represents an attribute and the leaf node corresponds to k x = 1.
When checking whether the user authority meets the access tree ϒ , let R be the root node of ϒ and let ϒ x be the subtree with node x as the root node. If the attribute set Att can satisfy the strategy represented by ϒ x , then, denote ϒ x ðAttÞ = 1. Calculate ϒ x ðAttÞ using the following recursive algorithm.
If x is a nonleaf node, then, calculate ϒ x ′ðAttÞ for the child node x′ of x. Only when the number of child nodes satisfying ϒ x ′ðAttÞ = 1 is greater than or equal to k x , then, let ϒ x ðAttÞ = 1; otherwise, it is NULL. If the node is a leaf node, only if the corresponding attribute of the node is attrðxÞ ∈ Att, then, let ϒ x ðAttÞ = 1; otherwise, it is NULL.

Bilinear
Mapping. G, G T are two multiplicative cyclic group of prime order p, g is a generator of G, e : G × G ⟶ G T is a bilinear map [30] if three properties are satisfied: (1) Bilinear. For a, b ∈ Z p and ∀g 1 ,

Wireless Communications and Mobile Computing
(3) Computability. There is an efficient algorithm computing eðg 1 , g 2 Þ, for any g 1 , g 2 ∈ G.
It is said that e is an effective bilinear mapping from G to G T .

Problem Description
3.1. System Model. The entities included in this program include data owners (Data Owner (DO)), data users (Data User (DU)), IoT cloud servers (IoT Cloud Server (CS)), and trusted institutions (Trust Authority (TA)). The system model is shown in Figure 2.
(1) Data Owner. The data owner is responsible for encrypting the original document, establishing a secure index and uploading the ciphertext document and the secure index. First, the data owner extracts the keyword collection from the original document collection and encrypts it according to the keyword collection and the data access strategy to generate a security index and then uses the symmetric key to encrypt the original document collection to generate a ciphertext document collection. Finally, the ciphertext document collection and the security index are uploaded and stored to the cloud server together.
(2) IoT Cloud Server. The IoT cloud server is mainly responsible for receiving and storing the data uploaded by the data owner and satisfying the query requests of authorized users. When receiving a user's query request, the cloud server first conducts permission review. If it is an authorized user, use the stored security index and trapdoor to calculate the similarity score of the document, search for related documents, sort the query results, and return the most relevant TOP-K document to the user. It is worth noting that only authorized users can perform a correct search and unauthorized users cannot obtain search results.
(3) Trust Authority. It is mainly responsible for generating system keys and generating user private keys based on user attribute sets.
(4) Data User. The data user submits a query request to the cloud storage server of the IoTs to query the files of interest.
The user sends his own set of attributes to a trusted organization to obtain the user's private key and then uses the private key and query keywords to generate trapdoors and permission tags and upload them to the cloud server. Finally, authorized users can receive the most relevant TOP-K query results sent by the cloud server.

Safety
Requirements. This paper assumes that trust authority is completely credible. The cloud server is semihonest but curious. It can correctly execute the user's query request in accordance with the requirements of the plan and will not delete or modify the data uploaded by the data owner. But the cloud server is curious, and it may try to obtain other additional information from the security index and trapdoor. Therefore, the solution in this paper mainly considers the following 4 types of security requirements: (1) Confidentiality of Documents. The data owner does not want unauthorized entities (cloud servers or data users) to know the content of the documents, so the documents must be encrypted before they are sent to the cloud servers and the unauthorized entities do not have decryption keys.
(2) Anonymity of Indexes and Trapdoors. The cloud server knows the ciphertext information stored by the data owner, including ciphertext document collection, security indexes, and trapdoors, but does not know the key.
(3) Anonymous Access. Data users can access IoT data without giving their detailed identity information.
(4) Collusion Resistance. Any two or more data users cannot collude to access the document. (1) Setupð1 κ Þ. TA runs the initialization algorithm and generates system master key MSK, index key IK, and system public parameter PK by inputting system security parameter κ.
(2) KeyGenðMSK, AttÞ. This algorithm is the user's private key generation algorithm, which is executed by TA. The algorithm inputs the system master key MSK and user attribute set Att and outputs the user private key SK.
(3) EncryptðIK, PK, FF, ϒ Þ. The data owner executes the encryption algorithm. The algorithm inputs the index key IK, the system public parameters PK, the plain text document collection FF, and the access tree ϒ and outputs the security index I and the cipher text document collection CC.
(4) TrapdoorðW Q , IK, PK, SKÞ. The data user uses the algorithm to generate search credentials corresponding to the keywords that need to be queried. The algorithm inputs  Wireless Communications and Mobile Computing the query keyword set W Q , index key IK, system public parameters PK, and user private key SK and outputs search credentials TP.
(5) SearchðI, TD, KÞ. The keyword search algorithm is executed by the cloud server. The algorithm inputs the security index I, search credentials TP, and the parameter K and outputs the TOP-K documents most relevant to the query keyword set. It is worth noting that only users who meet the access control authority can get the correct results; otherwise, the search will fail.

Scheme Description
(1) Setupð1 κ Þ ⟶ fMSK, IK, PKg. TA randomly selects a large prime number pðp ∈ Z p Þ. Let G, G T be the multiplicative cyclic group whose generator are g and the order are p . TA generates a bilinear map e : G × G ⟶ G T and a hash function H 1 : f0, 1g * ⟶ G. In addition, TA randomly generates an m + ε-dimensional segmentation vector S and two ðm + εÞ × ðm + εÞ-dimensional invertible matrices fM 1 , M 2 g, where ε is the number of confusion bits and m is the number of keywords, and generate index key IK = fS, M 1 , M 2 g. Finally, TA randomly selects α, β ∈ Z p and generates system master key MSK = fα, βg and system public parameters P K = fg, G, G T , eðg, gÞ, eðg, gÞ α , H 1 , g α , g β g.
(2) KeyGenðMSK, AttÞ ⟶ SK. TA selects a random number r ∈ Z p and randomly selects r j ∈ Z p for each attribute a j in the attribute set Att and finally generates the user's private key SK = fK = g ðα+rÞ/β ,∀a j ∈ Att : D j = g r H 1 ða j Þ r j , D j ′ = g r j g. The system transfers the user's private key SK to the data user.
The data owner extracts keywords from the plaintext document collection FF = f f 1 , f 2 , ⋯, f m g to obtain the keyword collection W = fw 1 , w 2 , ⋯, w n g.
The data owner uses the symmetric key ek to encrypt each document to obtain the ciphertext set CC = fc 1 , c 2 , ⋯, c m g.
Based on the vector space model, the data owner generates a document vector D i for each document f i . If the document contains the keyword w j , use formula (4) to calculate the relevance score D i ½j = score ij of the keyword w j in the document. Otherwise, D i ½j = 0. The data owner expands each document vector D i from the m dimension to the m + ε dimension and sets D i ½m + t = η t , where 1 ≤ t ≤ ε and η t are random numbers with normal distribution Nðμ, σ 2 Þ. Then, the data owner splits each document D i into two vectors fD i ′ , D According to the access tree ϒ , a polynomial q x is selected for each node x in ϒ and the polynomial q x is generated as follows. Starting from the root node R of ϒ , use a recursive algorithm to run from top to bottom. For each node x, let the number of terms d x of the polynomial q x be one less than the threshold k x represented by the node, that is, d x = k x − 1. First, select s ∈ Z p randomly for the root node R and let q R ð0Þ = s, and then, randomly select the coefficients of other terms. For other nodes x, define the function parentðxÞ, indexðxÞ, the former represents the parent node of node x, and the latter represents the position of node x in the parent node. Let q x ð0Þ = q parentðxÞ ðindexðxÞÞ, and randomly select coefficients for the other terms of q x . According to the above algorithm, C v , C v ′ is generated for all nodes in ϒ , namely, I ϒ = fC = ek ⋅ eðg, gÞ αs , W 1 = g βs ,∀v ∈ ϒ , C v = g q v ð0Þ , C v ′ = H 1 ðattrðvÞÞ q v ð0Þ g. Finally, a safety index I = fI i , I ϒ g is generated.
Finally, the data owner uploads the security index I and the ciphertext document collection CC to the cloud server.
First, the data user performs semantic expansion on the keyword set W Q according to the semantic relationship graph to obtain the expanded keyword set W Q ′ .
Based on the vector space model, the query vector Q is constructed. Here, if w j ∈ W Q , then let Q½j = 1; if the expansion word w j corresponds to one original keyword w i , then, Q½j = e ij ; similarly, if the expansion word w j corresponds to multiple original keywords w i , then, Q½j = fe ij g max . Finally, extend the query vector Q from the m dimension to the m + ε dimension and let Q½m + t = τ t , where τ t is a random number and 1 ≤ t ≤ ε.
First, the data user divides the query vector Q into two vectors fQ ′ , Q ′ ′g according to the division vector S. This is the opposite of the document D i split method. If S½j = 1, j = 1, 2, ⋯, m + ε, then, Q ′ ½j = Q ′ ′½j = Q½j; if S½j = 0, let Q ′ ½j and Q ′ ′½j be random values and Q ′ ½j + Q ′ ′½j = Q½j. Finally, the data user encrypts Q ′ ½j and Q ′ ′½j with the system master key MSK to get the trapdoor TD = fM −1 1 Q ′ , M −1 2 Q ′ ′g. Then, randomly select θ ∈ Z p and generate search credentials TP = fTD, T 1 = K θ = g ðα+rÞθ/β ,∀a j ∈ Att : Finally, the data user sends the search credentials TP to the cloud server.
If the cloud server receives the query request from the data user, it can perform the following steps: First, the cloud server first calculates whether the user attributes meet the access tree defined by the data owner. x is a node in the access tree ϒ , and the cloud server executes the following recursive algorithm: If node x is a leaf node, let a j = attrðxÞ be the attribute corresponding to node x. If a j ∈ Att, then, Otherwise, DecryptNodeðxÞ = NULL.
If node x is a nonleaf node, calculate F z = DecryptNode ðzÞ for the child node z of node x. Let S x be the set of k x subnodes that satisfy F z ≠ NULL. If no such set S x , namely, DecryptNodeðxÞ = NULL, is found, it means that the access requirements are not met. Otherwise, calculate Using the Lagrangian interpolation theorem, V = DecryptNodeðRÞ = eðg, gÞ θrs can be obtained. Here, it is explained that the data user is an authorized user who can perform data query.
The cloud server uses formula (9) to calculate the correlation between the security index I i and the query trapdoor TD and returns the TOP-K documents most relevant to the query keyword set to authorized users. If the user does not meet the access rights, the search fails and NULL is returned.
The formula for calculating document relevance is shown in formula (9).
If the user's attribute set Att satisfies part or all of ϒ , the 6 Wireless Communications and Mobile Computing user can obtain V = eðg, gÞ θrs according to formula (8) and calculate the document encryption key by formulas (10) and (11).
Finally, the user uses ek to decrypt the obtained document to obtain a collection of plaintext documents.

Safety Analysis
4.1. Confidentiality of Documents. The document is encrypted with a symmetric key before being uploaded to the cloud server, and only data users who meet the access policy defined by the data owner can search for the document and further obtain the decryption key to decrypt the obtained ciphertext document. Therefore, this solution guarantees the confidentiality of the document.
where the segmentation vector S and the two invertible matrices M 1 , M 2 are the encryption keys of this scheme. It can be seen from the foregoing that in the above equations, M 1 , M 2 , D u ′ , and D u ′ ′ are all ðm + εÞ-dimensional vectors (here, ε is equal to 0), so there are 2m, n equations in a set containing n documents. However, there are 2m 2 unknowns in M 1 , M 2 , and 2mn unknowns in D u ′ ,D u ′ ′ . It is not feasible to solve such a system of equations in which the number of equations is less than the number of unknowns, so the cloud server cannot deduce M 1 , M 2 , D u ′, and D u ′ ′. Similarly, the query vector Q can be regarded as two mdimensional vectors fQ ′ , Q ′ ′g, that is, the number of unknowns is 2m. There are 2m 2 unknowns in M 1 ,M 2 . However, the number of equations for solving the query vector is only 2m, so the query vector Q and the invertible matrix M 1 , M 2 cannot be solved as well. Therefore, this scheme can ensure the safety of indexes and query vectors.

Anonymous
Access. The solution uses attributes as the minimum granularity of access control. When an access request is made, the IoTs does not care about the user's identity and only needs to verify whether the user's attributes meet the access structure and decide whether to provide the user with decrypted data.

Collusion
Resistance. Collusion resistance means that users with different attributes cannot decrypt the corresponding ciphertext even if they combine their private keys.
In the searchable encryption scheme, it is required that even if users collude, they cannot search for unauthorized keyword ciphertexts. In this scheme, the system selects a random number r ∈ Z p for each attribute on the access tree. Since r is randomly distributed, the private keys of the same attribute in different networks are different, so that the secret value eðg, gÞ rθs that can be recovered is different. Therefore, this scheme has the property of anticollusion.

Function and Safety Comparison.
In this section, we compare the expression ability and supported functions of the proposed scheme with some existing schemes. The summary is shown in Table 1.

Efficiency Analysis
The following analyzes the computational cost performance of this scheme from the stages of private key generation, indexing, trapdoor, search, etc. and compares the efficiency of the scheme in the literature [31] with the scheme in this paper and then conducts an experimental simulation on the scheme, and the following situations can be ignored.
(1) Index Generation Stage. For encrypting each document index, the data owner performs the multiplication of two ð m + εÞ-dimensional vector and ðm + εÞ × ðm + εÞ-dimensional matrix with a complexity of Oððm + εÞ 2 Þ, where m + ε is the number of keywords after expansion. Comparing the exponential operation and bilinear pairing operation on the group G, G T , the time spent on the matrix multiplication operation is negligible.
(2) The Trapdoor Generation Stage. For calculating the encrypted query vector, the data user needs to perform the multiplication operation between the two ðm + εÞ-dimensional vector and the ðm + εÞ × ðm + εÞ-dimensional matrix and the time spent in the multiplication operation can also be ignored.
(3) Search Stage. If the user meets the access rights, the cloud server performs a search. The main operation is the inner product operation of two ðm + εÞ-dimensional vectors. The computational complexity is Oðmðm + εÞÞ, where m is the number of the entire document collection. Also here, the time spent in the vector inner product operation can be ignored.
Here, let T g and T gt denote the exponential operation of groups G and G T , respectively, T p denote bilinear pairing operation, T h denote the time of hash operation, n is the number of attributes in the system, s is the number of user attributes, jFj is the number of files, and jWj is the number of keywords.
The efficiency comparison of the scheme in literature [31] and the scheme in this paper is shown in Table 2.
In order to verify the effectiveness of the scheme, this paper compares the performance of the scheme in literature [31] with the performance of this scheme. We conduct real 7 Wireless Communications and Mobile Computing experiments on a Windows 10 64-bit operating system, Inter (R) CoreTM i7-7700 CPU @ 3.60 GHz and 8 GB RAM to study the true execution time. Here, we set the number of keywords in the dictionary to be the same as the number of query keywords in the trapdoor (jFj = jWj) and set the number of attributes in the system to be equal to the number of user attributes (n = s), and n = s = 10, jFj = jWj = 30.
As shown in Figure 3, we found that compared with the computational cost in [31], in a large-scale data sharing system, the algorithm in this scheme is more computationally efficient, which means that this scheme is more effective and practical.
As shown in Figure 4, we compare the execution time of the search operation of one single subindex. The computational overhead of the search phase is mainly affected by the number of user attributes. We see that the computational overhead of the search phase of these two schemes increases linearly with the increase in the number of user attributes.

Conclusion
Aiming at the special application scenario of the IoTs environment, this paper proposes an attribute-based multikeyword ranking search scheme. The scheme not only realizes the keyword search function based on semantic expansion but also realizes the user's access control function. The scheme takes into account the weight difference of different positions of keywords and introduces parameters such as word position and word span into the calculation of the relevance score of keywords to build a more accurate document index. Secondly, the scheme expands the query keywords semantically according to the semantic relationship graph to find more keywords with similar meanings, thereby effectively improving the precision and recall rate during retrieval. Again, the solution uses an access tree control structure to control the access authority of data users and realizes the fine-grained management of data owners based on attributes. Finally, the functional and security analyses and comparison of the scheme show that the scheme has document confidentiality, index and trapdoor anonymity, anonymous access, and resistance to collusion attacks. In addition, the efficiency of the scheme is theoretically analyzed and the analysis results show that this scheme has advantages over other schemes.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.