Integrity Audit of Shared Cloud Data with Identity Tracking

. More and more users are uploading their data to the cloud without storing any copies locally. Under the premise that cloud users cannot fully trust cloud service providers, how to ensure the integrity of users’ shared data in the cloud storage environment is one of the current research hotspots. In this paper, we propose a secure and effective data sharing scheme for dynamic user groups. (1) In order to realize the user identity tracking and the addition and deletion of dynamic group users, we add a new role called Rights Distribution Center (RDC) in our scheme. (2) To protect the privacy of user identity, when performing third party audit to verify data integrity, it is not possible to determine which user is a specific user. Therefore, the fairness of the audit can be promoted. (3) Define a new integrity audit model for shared cloud data. In this scheme, the user sends the encrypted data to the cloud and the data tag to the Rights Distribution Center (RDC) by using data blindness technology. Finally, we prove the security of the scheme through provable security theory. In addition, the experimental data shows that our proposed scheme is more efficientand scalable than the state-of-the-art solution.


Introduction
As an emerging network storage technology, cloud storage has been extended and developed in cloud computing. Cloud computing systems are transformed into cloud storage systems when the core of computing and processing is to store and manage massive data. In simple terms, cloud storage is an emerging solution that puts storage resources on the cloud for people access.
The user can access data on the cloud easily through any connected device whenever and wherever. Through data storage and sharing services in cloud computing, group members can share data in the form of a group. As a member of a group, users can not only access the shared data, but also modify the shared data. While cloud computing makes it easier for users to share data, users are still concerned about the security of data, especially the integrity of data, due to some security factors in cloud storage. The effective way is to use third party auditor (TPA) to achieve the purpose of validating shared data integrity. However, third party auditor (TPA) can obtain the block identifier (that is, the identity of each shared block signer) during the process of verifying the data integrity. If these identity information and confidential information in the shared data group cannot get effective protection, they will be leaked to a third party auditor (TPA) such as the situations that user in the group plays a crucial role or data block in the shared data has higher value.
Although the current public auditing scheme for sharing data solves the problem of user identity protection, it also brings dynamic changes in the group. However, the identity of group members who maliciously modify the shared data cannot get traced. We can observe that the amount of computation is comparatively large during the signature of data blocks by cloud users, which takes users a long time with limited resources. This paper proposes an auditing scheme that supports user identity tracking and lightweight sharing of cloud data, which enables traceability of user identities and reduces the burden on the resource constrained users. Using the data storage and sharing services provided by cloud server, legitimate users can easily form a group by sharing data with each other. That is to say, the users can create data and share it with others in the group. Users in the group can not only access the shared data, but also modify the shared data. Although cloud service providers 2 Security and Communication Networks provide users with a secure and reliable storage environment as much as possible, data integrity can still be compromised. For example, it is considered that operational errors, data hardware and software failures, may lead to data tampering and data loss. This series of problems happened to us [1].

Related Work
Users always pay more attention to data security in the cloud. In recent years, data integrity schemes have become one of the research hotspots. With the help of data integrity schemes, any data corruption or deletion can be discovered in time and then necessary measures can be taken to recover the data. To develop a better understanding of data integrity schemes, we carry out the relevant work from the audit model, soundness, and other aspects.
Performance. Many researchers have proposed a series of schemes to this problem. On the one hand, how to solve the problem of user revocation? Wang et al. [2][3][4][5][6][7] noticed the problem of shared data integrity verification and proposed a public auditing method that supports efficient user revocation for shared data. To sum up, this scheme introduces proxy resignature technology to solve the problem. However, when the user is revoked, the cloud server is allowed to replace the previously signed data block of the revoked user to a legal group instead of the group member, which can cause efficiency problem. In addition, in scheme [8], the authors propose to enable efficient user revocation in identity-based cloud storage auditing for shared big data. On the other hand, Yu et al. [8,9] proposed the issue of key security among cloud users. In these schemes, the key exposure in one time period does not affect the security of cloud storage auditing in other time periods and verifiable out-sourcing of key updates.
Identity Privacy. With the development of related technologies in cloud computing, public audit of shared data integrity has attracted more and more attention. Yu et al. [10] proposed that the storage and sharing services of cloud servers allow users to share data in the form of a group. As a group member, they have the right to view and modify shared data. Although users can easily share data, data integrity issues remain [11,12]. Using TPA for public auditing results in the leakage of user's identity privacy [13]. Wang et al. [14] fully considered the confidentiality of the data in the public audit process and proposed a privacy scheme that used ring signature to protect group member. Adopting the ring signature can ensure that the TPA protects the user's identity privacy while verifying the integrity of the data. However, the efficiency of the scheme is reduced by the increasing number of team members. Meanwhile, the client also takes a lot of computing. Therefore, the scheme does not apply to large user groups. Shen et al. [15] proposed a lightweight auditing scheme for shared data privacy protection, taking full account of the computational limitations of the resource constrained client. Using data blindness methods, the scheme allows (TPM) Third Party Medium instead of group users to sign the data. It not only reduces the burden on the client, but also ensures the privacy of identity during public auditing. Thus the identity of the data owner can be protected. However, this scheme does not support group dynamics and the traceability of data blocks. Wang et al. [16] proposed another public audit method for sharing data privacy protection. Using dynamic broadcast technology, group members can be signed as the owner of the data when modifying the shared data, thereby protecting the privacy of the group members. It not only realizes the dynamic operation of data by group members, but also supports group dynamics. However, this scheme does not protect the identity of data owner, making the TPA steal the identity of the data owner during public auditing, and it does not support the traceability of data blocks.
Public Auditability/Private Auditability. The first method [17] allows only the data owner to audit. The second [18] method allows a third party auditor to audit. The audit process in both approaches is performed without retrieving the remote data. If only the data owner can verify the integrity of the outsourced data, then this scheme is considered to provide private auditability. However, in some cases, it is not practically feasible for the data owner to remain online all time for data integrity verification. Hence, the data owner can delegate this responsibility for integrity verification to a third party auditor or other users. A data integrity scheme must have public auditability property to support this audit delegation.
Dynamic Data Handling. Data can be either static (backup or archival data) or dynamic nature (supporting operations like insertion, deletion, and modification). Providing integrity for dynamic data is more challenging than static data or just attaching data. Most of the schemes proposed in the literature are not able to handle dynamic data, such as the description of the schemes [19,20] dynamic data handling characteristic demands that data integrity should remain intact, even after insertion, deletion, or modification.
Soundness. An untrusted server cannot able to deceive a challenge request. In the schemes of Wang [21] and Zhang et al. [22], the soundness property of data integrity schemes ensures data reliability. Data integrity schemes are designed to prevent tampering. Therefore, if metadata is tampered with or corrupted intentionally or unintentionally by the CSP, this should be timely identified by a data integrity scheme. If the CSP can pass a challenge request without holding the data or with corrupted data, then a client will never be able to identify data corruption promptly, and the value of the data will be lost. Therefore, a good data integrity scheme requires that the server's response must be reliable.
Privacy Preserving. Privacy protection should be emphasized in the process of data integrity verification. As involved in the scheme [23], privacy concerns are introduced due to public verifiability. On the premise that the data owner will not allow the disclosure of his private data to a third party auditor, the privacy preservation property demands that a third party auditor should not obtain any confidential information about the user's data but can still verify the integrity of outsourced data.
Fairness. In the scheme [24], fairness means that a data integrity scheme should provide protection for an honest CSP against legitimate but dishonest users, who may attempt to accuse CSP of manipulating the outsourced data. If a data integrity scheme does not support fairness, it means dishonest users can damage CSP reputation.
Organization. The organization of the paper is as follows: the first part introduces the research status and background of cloud sharing data; the second part introduces the relevant work; the third part introduces the relevant knowledge; the fourth part describes the system model and each function of its entity, and describes the integrity audit scheme in detail; the fifth part analyses the security of the scheme, including the correctness analysis, unforgeability analysis, and proof of identity privacy by using provable security theory; the sixth part analyses the performance of the proposed scheme, including the functional comparison and efficiency analysis among different schemes. Finally, according to the advantages and disadvantages of this paper, we will formulate our next research direction.

Bilinear Pairings.
Let G 1 and G 2 be two multiplicative groups of the prime order q, and 1 , 2 be generators of group G 1 . A bilinear pairing is a map̂: 1 × 1 → 2 with following properties.

Data Blindness.
In general, the blindness of the data is that user A passes the encrypted data to user B and user B cannot infer the plaintext of user A based on these data. Therefore, users are protected as privacy. Among them, a simpler and less computational scheme is proposed in the paper, which can complete the blinding of data. The method is as follows: user A blinds the data block by using the random function and sends it to user B. User B cannot obtain the original data.

Security Theory Assumption
Definition 1 (DL problem). Unknown ← * , g is the generator. Given calculate .
Definition 2 (DL assumption). The probabilistic advantage of algorithm B to solve the DL problem in probabilistic polynomial time is Definition 3 (DCDH problem). Known , ← * , given 1/ and , calculate .
Definition 4 (DCDH assumption). The probability that algorithm B solves the DCDH problem in probabilistic polynomial time is

Dynamic Broadcast Technology.
Broadcast encryption technology is capable of transmitting encrypted information to group members over a broadcast channel. During the dissemination of this information, only members of the group can decrypt the message. Compared with traditional BE, BE can effectively support the dynamic changes of the group.

Data Sharing Integrity Verification Threat Target
Cloud Server Storage Problem. Cloud servers face the problems in situations where data is lost or data preservation is incomplete. Considering the interest of cloud service providers, to protect their reputation, they may have the potential to defraud public auditors.
Data Leakage Problem. In the process of integrity auditing performed by the third party audit, when the cloud service provider submits the certificate to the TPA for complete public verification, the cloud service provider also sends the linear combination value of the data to the third party audit. This leads to the possibility that third parties may steal content from shared data and infer the identity of the relevant user.
Data Tamper Problem. As for shared data in the cloud, team members may make malicious changes, resulting in the fact that shared data is not available. However, due to the fact that users cannot be traced back to a particular cloud, resulting in data being tampered with, so they still cannot determine the user's identity.

System Architecture
Rights Distribution Center (RDC). Figure 1 shows the cloud shared data model. In the process of data integrity verification, users, third party audit, and cloud service provider are often involved in privacy disclosure and user identity  traceability issues. In this paper, by introducing the Rights Distribution Center, as shown in Figure 2 the users will be reasonably grouped and the RDC will record the operations of the data performed by the user. The RDC first performs an initialization operation to set global parameters (G 1 , G 2 ,ê, , , PK) for the system. RDC selects x as its own private key and X j ∈ Z * q as the private key of the member M j and sets a hash function H: Z * q → G 1 . Secondly, the RDC generates auxiliary information of the relevant data according to the (id i , i ) sent by the user. The relevant information is counted in the table. Finally, when the user requests to operate on the data, the RDC will record the operation of the corresponding user to achieve identity tracing.
User. As a member of the cloud sharing data service, after registering an account, the user needs to insert, modify, and delete his or her own data. As shown in Figure 3. In the scheme, when the user sends the data to the cloud service provider, the user first performs data blinding operation on the data. On the one hand, the user blinds the data using the pseudorandom function and sends the blinded (5) to the cloud service provider.
On the other hand, the user sends the tag i generated by his data block to the Rights Distribution Center. Finally, the cloud user generates its own integrity verification request.  According to the auxiliary information sent by RDC and its own private key, the audit request is sent to the TPA. The third party audit center verifies the integrity of the data and returns the results to the user.
Cloud Service Provider (CSP). The cloud storage service provides data owners with data storage capabilities, so that the client does not need to back up locally when using it, reducing the pressure on local storage. When the cloud service provider receives the challenge of the TPA, the cloud service provider generates evidence to indicate the integrity of the data and sends it to the TPA based on the stored file. According to the proposed scheme, on the one hand, the cloud service provider processes the data sent by the user and obtains the original data through processing. It will use the pseud random key k to get the original data m i and store the data in the next step. On the other hand, according to the challenge sent by the TPA, the cloud service provider calculates the i corresponding to m i . It calculates the linear combination value u of the sample block, and sends proof = ( , u) to TPA, from which the TPA detects whether the data is complete. Figure 4 provides a brief description of the cloud service provider.
Third Party Audit (TPA). In Figure 5, when receiving a user's audit request, the TPA first sends a challenge to the cloud service provider and then verifies the data based on the evidence returned by the cloud service provider to determine Security and Communication Networks 5  whether the data is complete. Finally, the TPA returns the result of the integrity verification to the user. If it is complete, it returns 1; else it returns 0. In this scheme, we first initialize the user identity hash value as a reservation to the TPA. It will be used to verify the identity of the user. After the identity of the user is verified by the TPA, the TPA sends a challenge to the cloud service provider. Receiving the evidence returned by the cloud service provider, the TPA verifies whether (7) is true to judge the integrity of the data.
Cloud Data Privacy. In our scheme, we need to make sure that TPA does not know the real data from the user. At the same time, it cannot get the content of the real data from the cloud response in the audit phase.
Audit Soundness. When the cloud stores the data intact, the cloud server can be validated by TPA.
Identity Privacy. TPA cannot determine which user sent the audit request during the validation of data integrity. The cloud sharing model mentioned in this paper includes RDC, CSP, TPA, and Client. In the following introduction, the relevant notations are shown in the Table 1. The details of the algorithm are shown in Figure 6.
(1) Setup. User can be expressed as (j=1, 2...s) in the scheme. The initialization work is completed by RDC. RDC generates two multiplicative groups 1 , 2 , RDC selects two independent generators , ∈ 1 , chooses a hash function H: * → 1 , and calculates PK = RDC selects ∈ * as the private key of member and selects x as its own private key. Select r j and calculate r j . So the public parameters are ( 1 , 2 ,̂, , , PK). RDC distributes the private key to user.
(2) Encryption. The user selects the file and divides the file into blocks M = { 1 , 2 , 3 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ }. The user's identity can be identified as . For each file block we can make the following operations. First, the file is blinded; we blind the data by using pseudo-random functions. We use = ( , ). Each blinded file block is = + . Second the user generates a file label for each file block by using a short signature . For convenience, we use to represent . On the one hand, the user sends ( , ) to the CSP. On the other hand, the user sends ( , ) to RDC. Once RDC receives the user's ( , ), it will generate user's identity table referred to as UIT, which is shown in 13. Accepts chal{(( id i ,  i )} i∈I from TPA;
Send to the user. The RDC sends the hash value to the TPA. As shown in Table 3, TPA keeps a copy of the legal user's identity table.
Send an audit request {H( ), } to the TPA.
(4) Send Challenge. TPA receives and uses the look up table to determine whether it is a valid identity. If it is an invalid user, the result is returned to user. If it is a legitimate user, the TPA sends the corresponding challenge to the cloud service provider. The TPA randomly selects v i ∈ z * q and sends (13) to CSP.
The cloud server uses the pseudorandom key to compute thus restoring the original data m i . According to a random value, calculating the m i corresponding i by the CSP.CSP aggregates and calculates a linear combination of sampling blocks Then the CSP sends proof= ( , u) to the TPA as an evidence of whether the data is complete.  to the user. The user who gets the key will have the same rights as other users and he can perform data processing on the shared data. At the same time, RDC will also add this new user in the user identification table. When a user wants to leave the group, or if some malicious users are removed forcibly, RDC will mark the user's key as a special treatment. When a user with the same key logs in again, the user can no longer continue to view and modify the data.

(7) Members Modify Data and Achieve Identity Tracking.
When the user wants to modify his own data, the user needs to send a request to the CSP. After the CSP authenticates, the CSP immediately informs the RDC and the RDC will use the dynamic broadcast list to broadcast in the group where the user is located. They can receive information about the data change. If there is no objection, the RDC will record the identity of the member. And the CSP will rereceive the user's modified data. When there is an argument about the operation of the data block m i , the RDC can find the dishonest member by looking up the operation of the relevant user. The RDC finds the corresponding element by looking up the list ( , ). Finally, it finds the cloud user .

Security Analysis
In this section, we will prove the correctness, unforgeability, identity privacy protection, data confidentiality, and identity traceability of the scheme in detail. By certification we can make a conclusion that the proposed scheme has high security.

Correctness Analysis.
In this paper, the correctness firstly means that a cloud user uploads data to a cloud server, after receiving permission from the RDC. We do this by applying for authentication. Only legitimate users can apply for this right. Malicious users are flagged and locked in time.
Secondly, correctness means that after a cloud user obtains reasonable authority and sends an audit request to the TPA, the TPA receives the evidence sent by the cloud service provider to perform data integrity audit. Therefore, the correctness of the scheme is that TPA can complete the integrity verification through the evidence provided by the cloud service provider, thus giving the cloud user an accurate answer to the data integrity audit. If the data is complete, the result is 1 and if the data is incomplete, 0 is returned. Now it is proved in detail as follows.
We can prove that the validation results are correct; that is, the left side of the equation equals the right.
Firstly, we simplify the equation Secondly, we calculatê The proof is over, so we can know that when the cloud server can save the data correctly, we can verify the integrity of the data through the evidence sent by the cloud service provider.

Unforgeability Analysis.
Based on the security definition based on the discrete logarithm problem, we assume that there are malicious attackers who can falsify evidence and successfully authenticate with a third party. There must be an algorithm that solves the difficult problem of discrete logarithms based on the probability of nonnegligible. In order to complete the statement that the evidence in the scheme is not falsified now, we make the following game.
Game. We assume that there is shared data M. When a third party audit sends a challenge to the cloud service provider, challenge is { , V }. The evidence generated by the original data is ( , ) when the cloud based on the data (M ̸ = ); the service provider assumes that the evidence it generates is ( , ), and we specify u ̸ = . If the TPA passes the integrity verification, then we say that cloud service providers have won this game.
When the cloud service provider wins the game, we can get the two TPA equations for verifying the data's integrity: Through the above two formulas, we know that , are generators of G 1 . And we know that PK= x , so PK is also a generator of group G 1 . By applying the relevant properties of the bilinear map, we can infer the following equations: From the above three equations, we can infer that From this, we can conclude that By observing the above formula, we find that the value of x can be solved when this equation is established, which is known from our previous game definition that ûu ̸ =0. Therefore, the equation is meaningless only when r 2 is zero. We can calculate it. The probability of finding x in the group Z q is 1-1/q. Since q is a large prime number, the probability of 1-1/q cannot be ignored. That is, when the cloud service provider wins this ame, we can solve the problem of discrete logarithm with a nonnegligible advantage. This is contrary to the difficulty of discrete logarithm. Therefore, cloud service providers mentioned in the scheme can only pass the verification of the TPA if they provide the correct evidence, which illustrates that the proposed scheme has unforgeability.

Identity Privacy.
As described in this scheme, the user's identity privacy means that when the TPA receives the audit request sent by the user, it cannot obtain the identity of the user from the audit request.
When we perform data integrity verification, we should pay attention to the protection of user's identity privacy. In the process of integrity auditing by a third party audit, the identity verification process hides the identity of the user by exploiting the good nature of the hash function so as to better protect the user's identity privacy. Specifically, on the one hand, during the integrity audit process, when the TPA authenticates the user, it is not necessary to directly compare the user's specific id value but rather compares the hash value stored by the third party audit center with itself. If the hash value shows that the user identity exists, then the identity of the sender of the audit request can be verified, and the third party audit center can send evidence to the cloud service provider. On the other hand, TPA cannot infer relevant information about the user's identity based on the audit request sent by the user.

Data Privacy.
In the scheme proposed of this paper, the privacy of data refers to when a user sends a data authentication request: on the one hand, the information about user's data cannot be acquired by other parties except for the server; on the other hand, the user combines data. When the audit request is sent, the user's data information is not leaked out to the third party audit center during the processing of the audit request.

Algorithm Function Analysis.
In cloud computing, data is usually shared by several users. Through comparative analysis of different schemes, as shown in Table 4, we can compare and analyze the different functions involved in the scheme, including identity tracking, data block privacy, dynamic groups, and identity privacy. Therefore, on the one hand, we can have a basic understanding of our scheme's function. On the other hand, we can better conduct the next step of research by comparing different schemes.

Algorithm Performance Analysis.
In this section, we performed the following experiment. Based on these functions, we designed several experiments to assess the workload of involved entities. These experiments are carried out on a server running Linux OS with an Intel Pentium processor of 2.70GHZ and 4GB memory. In terms of audit generation time efficiency, we evaluated the authentication algorithm. In terms of running time, we compared the efficiency of the three schemes (Yang [31], Ateniese G [32], and Wang [14]). The experimental results are shown in Figures 7 and 8. Our signature scheme is based on the BLS signature scheme and it is similar to the Yang [31] scheme. The scheme of Ateniese G [32] is based on proxy resignature. The computational cost is mainly the resignature of the data block and the modular exponent calculation on the G 1 group. The scheme of Wang [14] is based on RSA signatures. Its computational complexity is similar to that of ring signatures, and the amount of computation is also huge. It can be seen from the figure that Ateniese G [32] and Wang [14] are very time consuming, so our scheme has advantages.
We compare the time-consuming calculation with the number of other challenge blocks. The running time is shown in Figure 9. We can see the calculations of the three schemes, our scheme, Dongare D [33], and Yuan J and Yu [34]. The amount of computation for the three schemes is linear Security and Communication Networks   with the number of data blocks being challenged increasing or decreasing. The more data blocks are challenged, the more time it takes to calculate. In the same experimental environment, our scheme spends less time than Yuan. J's scheme in calculating time. It takes more time than the Dongare D's scheme. However, this scheme can only achieve identity privacy; it cannot implement identity traceability. In terms of feasibility, our scheme has more obvious advantages. Specifically, generating a challenge message that specifies 400 random blocks takes only about 20 milliseconds, while the time specified as 1000 blocks increases to 50 milliseconds.
The scheme meets the current mainstream cloud server configuration and it has strong feasibility.

Conclusions
According to the above analysis, we can see that our proposed scheme is able to realize the desired security goals. In this paper, we establish a data sharing framework in cloud environment and propose a public auditing scheme with identity privacy and identity traceability for group members. The proposed auditing scheme achieves the security requirements that a well-constructed auditing scheme for shared cloud data should satisfy. As far as future work is concerned, we will continue to study how to improve the allocation of rights in the data integrity audit process and how to improve the security level of user data and protect identity privacy. The above will be the focus of our next research.

Data Availability
The data source of this paper is true and reliable. The relevant code link in this paper is https://github.com/xiaofeixue123/ Integrity-audit.