SM2-Based Offline/Online Efficient Data Integrity Verification Scheme for Multiple Application Scenarios

With the rapid development of cloud storage and cloud computing technology, users tend to store data in the cloud for more convenient services. In order to ensure the integrity of cloud data, scholars have proposed cloud data integrity verification schemes to protect users’ data security. The storage environment of the Internet of Things, in terms of big data and medical big data, demonstrates a stronger demand for data integrity verification schemes, but at the same time, the comprehensive function of data integrity verification schemes is required to be higher. Existing data integrity verification schemes are mostly applied in the cloud storage environment but cannot successfully be applied to the environment of the Internet of Things in the context of big data storage and medical big data storage. To solve this problem when combined with the characteristics and requirements of Internet of Things data storage and medical data storage, we designed an SM2-based offline/online efficient data integrity verification scheme. The resulting scheme uses the SM4 block cryptography algorithm to protect the privacy of the data content and uses a dynamic hash table to realize the dynamic updating of data. Based on the SM2 signature algorithm, the scheme can also realize offline tag generation and batch audits, reducing the computational burden of users. In security proof and efficiency analysis, the scheme has proven to be safe and efficient and can be used in a variety of application scenarios.


Introduction
Cloud storage technology is convenient and flexible, its use growing rapidly at home and abroad [1]. Big data from Internet of Things (IoT) devices and medical big data also use cloud storage technology to provide services. However, after users have stored data in the cloud, although they can thereby access convenient storage and management services, they also lose the power to control the data directly. Therefore, ensuring data integrity in the cloud has become a hot research topic for scholars [2]. Data integrity verification technology uses cryptography-related technology to design appropriate schemes that convince users that their data, when stored in the cloud server, is secure and complete, by means of a series of interactions between the auditor and the cloud server. Using this technique can effectively deter cloud service providers (CSP) from deliberately concealing the issues of data loss or corruption from users due to their fear of damaging their reputations. It also effectively stops users from unreasonably making accusations or claims against CSPs simply because of suspicion, thus effectively protecting the legitimate rights of both users and CSPs [3].
IoT devices have been widely used and have become a convenient and universal access terminal for Internet services. However, IoT devices have limited storage space and weak (1) Public auditing: anyone can perform the audit. Generally, experienced and skilled TPAs are entrusted by the users to perform the audit task. (2) Dynamic updating of cloud data: users can insert, delete, and modify the data stored in the cloud at any time. (3) Privacy protection: the TPA cannot know the contents of the user data. It is also preferable that CSP should not know the contents of the user data. (4) Lightweight computation: the users' computational overhead should be as small as possible. (5) Batch audits for multiple users: the most appropriate scheme is able to implement batch audits for multi-user data.
However, we found that most existing cloud storage schemes do not meet the above five conditions well. Therefore, we designed an efficient offline/online data integrity Sensors 2023, 23, 4307 3 of 16 verification scheme. The proposed scheme is not only applicable to the integrity audit of cloud data but is also applicable to the integrity verification of IoT data and medical data.

Related Works
In early remote data integrity verification schemes, the auditor needs to download all data from the cloud and use the locally stored metadata to confirm the integrity, which requires high communication and calculation costs and takes a long time to achieve, resulting in a great waste of computing power. In 2007, Ateniese et al. [8] proposed the first provable data possession (PDP) scheme. Their scheme divided the data files into blocks. The auditor only needed to download partial data blocks from the CSP to verify the integrity of all data, with a high probability. For 1,000,000 4 KB blocks, assuming that 1% of the blocks have been deleted or tampered with by the CSP, the auditor only needs to verify the integrity of 460 blocks to judge the integrity of all data with a greater than 99% confidence probability. In 2007, Juels et al. [9] first proposed the proofs of a retrievability scheme to audit data. Their scheme used error correction codes and sampling detection technology to recover the damaged data after detecting that the integrity of the cloud data was damaged. However, their scheme does not support public auditing, and the number of audits is limited.
With the increasing demands of users, scholars have expanded various functions based on the scheme proposed by Ateniese et al. In their study [8], a dynamic data updating function is added to the cloud audit scheme to enable users to modify the data stored in the cloud more flexibly. If the cloud data are directly modified, the tag and index will not match, and subsequent verification work cannot be completed. Therefore, various appropriate data structures are proposed to achieve dynamic data updates. In order to prevent malicious auditors from colluding with CSPs or stealing users' data privacy, the random mask technology and blockchain technology are combined in cloud audit schemes to achieve security goals. In order to enable auditors to audit the data integrity of more than one user at a time, the batch audit function is added to the cloud audit scheme, which improves the efficiency of large-scale audits. In meeting the needs of one user after another, cloud data audit schemes gradually become more mature. However, with cloud storage technology, the existing cloud audit schemes are no longer fully applicable to the cloud storage environment for IoT and medical data.
The cloud audit scheme proposed in [10] constructs a multi-leaf authentication method based on the Merkle tree. The scheme can simultaneously authenticate multiple leaf nodes and realize batch data updates. The proposed scheme also supports log auditing. Users can verify whether the auditors perform their audit work honestly by checking the log files generated by auditors. However, the scheme does not mention comprehensive privacy protection, and there is a security problem wherein attackers can forge data tags to pass the audit. Hou et al. [11] designed a public audit protocol supporting blockless verification and batch verification practices; the protocol uses a chameleon certification tree to implement the efficient dynamic operations of outsourcing data, reduces the computational cost caused by data updates, and further improves audit efficiency. Nevertheless, the scheme does not describe how to achieve privacy protection for users and requires the computation of many bilinear pairs during the upload block verification and bulk audit phases. Based on the BLS signature, Mishra et al. [12] used a binomial binary tree and an indexed hash table data structure to construct an efficient and dynamically updated cloud audit scheme. However, the scheme cannot achieve batch audits.
Fan et al. [13] built a flexible auditing scheme that supports efficient dynamic updating based on the alliance blockchain. However, the scheme does not consider the batch audits of large-scale users. The ID-based offline/online PDP protocol that was constructed in [14] is based on an offline/online signature. The scheme supports batch verification and entire dynamic data operation but cannot realize data content privacy protection for cloud servers. The audit scheme introduced in [15] is based on an ID with compressed cloud storage, and it only uses encrypted data blocks in a self-verified way to audit the cloud data. Xu et al. [16] Sensors 2023, 23, 4307 4 of 16 introduced the concept of transparent integrity auditing. They proposed a concrete scheme, based on the blockchain, which does not rely on third-party auditors while freeing users from high communication costs in data integrity auditing.
Ji et al. [17] proposed an ID-based data integrity verification scheme with the designated auditor. In their scheme, only the auditor designated by the user could join the audit task, which improved the scheme's security compared with the previous ID-based audit schemes. However, the scheme needed to be more comprehensive. Li et al. [18] proposed an audit scheme based on a redactable signature. CSP can transform the signature directly, without the additional sanitizer, while sharing sensitive data. The signature can also be used to authenticate the source of sharing data. Lin et al. [19] proposed a consortium blockchain-based audit protocol. This protocol can check the abnormal behavior of auditors, but the scheme needs to be more comprehensive to achieve batch audits. In addition, during the audit process, the above schemes used numerous high-cost operations, such as the power index, point hash function, and bilinear mapping, thereby incurring high computing costs; thus, it cannot be applied to the environment of IoT data and medical data cloud storage completely.
Our Contributions. In this paper, we propose an efficient offline/online data integrity verification scheme for multiple application scenarios. Our contributions can be summarized as follows: (1) Based on the SM2 signature algorithm and the SM4 block encryption algorithm, we have constructed an offline/online remote data audit scheme. The scheme supports dynamic data updates, comprehensive privacy protection, and batch audit capability. Based on the advantages of offline tags and scheme design, our scheme has low computational overheads and is suitable for lightweight environments. (2) We have carried out a security analysis and proof of the scheme. The scheme is resistant to forgery attacks from the storage side and achieves comprehensive privacy protection; even the storage side cannot obtain the real content of the data. (3) We analyzed the scheme's efficiency and compared the functions and computing costs with the existing schemes, proving the comprehensiveness of the scheme's functions and its high efficiency.

Organization.
We have organized the rest of this paper as follows. Section 3 introduces the system model and the security model. The background knowledge used in the scheme's construction process and defines the proposed scheme's system and security model are introduced in Section 4. In Section 5, the concrete scheme is described. We analyze the scheme's performance and compared it with other schemes in Section 6. In Section 7, we conclude our work. We analyze the security of the scheme in Appendix A.

System Model
The system model of the scheme is shown in Figure 1. Three interacting entities are included: the CSP provides data storage services to users for payment, but it is not trusted and may delete data from the cloud or pry into the data privacy of its users for profit. The data owner (DO) is the owner of the data, uploading the data to the cloud to save their own storage overhead, but does not want the data privacy to be compromised. The TPA is a semi-honest auditor commissioned by users. They will faithfully perform the task of auditing the integrity of the data in the cloud, on the one hand, but on the other hand, they are curious about the content of the data.
The operation process of the proposed audit scheme includes the following algorithms: (1) Setup: the CSP runs the algorithm, which inputs the security parameter, λ, and generates the public parameters {E, G, q, g}. (2) KeyGen: the DO runs the algorithm, which outputs the private key, k s , and the public key, k p . (3) OffTagGen: the DO runs the algorithm, which inputs k s and the random numbers d i , l, outputting the offline tags, r i , s i . (4) OnTagGen: the DO runs the algorithm, which inputs r i , s i and data blocks m i , then outputs the online tags r i , s i . (5) ChalGen: the TPA runs the algorithm, which inputs the random number π and outputs the indexes, i j (1≤j≤c) .
(6) ProofGen: the CSP runs the algorithm, which inputs the m i j , r i j , s i j , i j (1≤j≤c) and outputs the proof {ρ, s, r}. (7) VerifyProof: the TPA runs the algorithm, which inputs the proof {ρ, s, r} and outputs "true" or "false" to indicate the integrity of the data. The operation process of the proposed audit scheme includes the following algorithms: (1) Setup: the CSP runs the algorithm, which inputs the security parameter,  , and generates the public parameters { , , , } E G q g .
(2) KeyGen: the DO runs the algorithm, which outputs the private key, {} j j c i  .
(6) ProofGen: the CSP runs the algorithm, which inputs the and outputs "true" or "false" to indicate the integrity of the data.

Security Model
In the existing data integrity audit schemes, security analysis often considers the CSP to be unreliable; it will forge tags in an attempt to pass the audit. Therefore, we mainly prove the unforgeability of the current scheme in the security analysis; this means that if the DO's data are corrupted, this must be detected by the interaction between the CSP and TPA when executing the scheme. That is, the CSP cannot forge integrity evidence and pass the data integrity audit under the condition that the data security is damaged; thus, it must carefully maintain the cloud data. We can define the unforgeability of the scheme with the following game: Game: Assuming that C is the challenger, C runs the Setup algorithm to generate the system parameters and sends the system parameters to an adversary, A . In this security model, we assume that the adversary A has great privileges, although these privileges are unlikely to be possessed in a real situation. In Appendix A, we will show that even if the adversary, A , has all the privileges assumed herein, he/she is unable to break the auditing scheme proposed in this paper, thus demonstrating that the scheme has high security strength. Except for the target user that adversary A wants to attack, he/she can inquire about any other user's information. Specifically, A can ask the follow-

Security Model
In the existing data integrity audit schemes, security analysis often considers the CSP to be unreliable; it will forge tags in an attempt to pass the audit. Therefore, we mainly prove the unforgeability of the current scheme in the security analysis; this means that if the DO's data are corrupted, this must be detected by the interaction between the CSP and TPA when executing the scheme. That is, the CSP cannot forge integrity evidence and pass the data integrity audit under the condition that the data security is damaged; thus, it must carefully maintain the cloud data. We can define the unforgeability of the scheme with the following game: Game: Assuming that C is the challenger, C runs the Setup algorithm to generate the system parameters and sends the system parameters to an adversary, A. In this security model, we assume that the adversary A has great privileges, although these privileges are unlikely to be possessed in a real situation. In Appendix A, we will show that even if the adversary, A, has all the privileges assumed herein, he/she is unable to break the auditing scheme proposed in this paper, thus demonstrating that the scheme has high security strength. Except for the target user that adversary A wants to attack, he/she can inquire about any other user's information. Specifically, A can ask the following predictor: (1) Public key query: When A queries the public key of ID w , C runs the KeyGen algorithm to generate k wp and returns k wp to A. (2) Private key query: When A queries the public key of ID w , C runs the KeyGen algorithm to generate k ws and returns k ws to A. Based on the above query, after A is challenged, if A outputs the aggregate tag {ρ * w , s * w , r * w } with the ID * w , k wp * , and the following conditions are met, then A wins the game. That is, our scheme is forgery-resistant.
Condition 1: The forged aggregation tags {ρ * w , s * w , r * w } meet the verification equations. Condition 2: There is no interruption of the public key query. Condition 3: All the blocks m * wi of ID * w have been queried tags.

Chinese Commercial Cryptography Algorithm
In 2010, the State Cryptography Administration of China released the elliptic curvebased SM2 cryptographic algorithm. The SM2 algorithm has high cryptographic complexity, fast processing speed, lower machine performance consumption, better performance, and more security. Its security has been proven by the authors of [20], and SM2 is more secure against generalized key substitution attacks. In 2012, the Security Commercial Code Administration Office of China released the SM4 block cipher standard. This is similar to AES-128, with simplified round key generation, and it is mainly used for data encryption. The encryption algorithms and decryption algorithms both use 32 rounds of a nonlinear iterative structure, the S box is a fixed 8-bit input and 8-bit output, the number of calculation rounds is large, and nonlinear changes are added, which make them more effective in defending against key-leaking Trojans [21]. The SM2/4 algorithm has been incorporated into the ISO/IEC international standard. Given its excellent security and performance, it is believed that it will be recognized or adopted by more and more organizations and individuals in China or outside of China.
Our scheme uses the SM2 digital signature algorithm to construct the audit scheme and the specific steps of the SM2 digital signature algorithm are as follows [22]. To facilitate understanding, we define and explain the various notations that appear in this paper in Table 1. Table 1. Notations used in this paper.

Notations Descriptions λ
The system initialization parameter. E The elliptic curve. G The additive cyclic group. q A large prime number. g G. k s The user's secret key. k p The user's public key.
The user's data file. (m 1 . . . m n ) n data blocks. ID The identity of the file.
The version numbers of m i . n The number of total data blocks. c The number of challenged blocks. per The pseudo-random function. π The input parameter of per.
The proof of data possession.
(1) Key generation: the selected elliptic curve equation is y 2 = x 3 + ax + b. Let g be the base point on the elliptic curve; the integer k s ∈ Z * q is randomly selected as the private key, then the public key k p = k s · g is calculated.
(2) Signature: Let the data to be signed be m. The signer first selects a random integer d ∈ Z * q , sets d · g = (x 1 , y 1 ), and computes r = m + x 1 , s = (1 + k s ) −1 (d − rk s ); the signature of the message m is {r, s}.
(3) Verification: After receiving m and {r, s}, the verifier calculates t = r + s, (x 1 , y 1 ) = sg + tk p , and r = x 1 + m. If the values of r and r are equal, the signature is correct.

Dynamic Hash Table
Our scheme uses the dynamic hash table data structure proposed in Reference [23] to achieve a dynamic update of the data in the cloud. The dynamic hash table is a twodimensional data structure, as shown in Figure 2.

Dynamic Hash Table
Our scheme uses the dynamic hash table data structure proposed in Reference [23] to achieve a dynamic update of the data in the cloud. The dynamic hash table is a twodimensional data structure, as shown in Figure 2.  The table includes both file and data block elements. In the file element, NO. indicates the index value of the corresponding file, while ID indicates the identification of the corresponding file and a pointer of the first data block of this file. In the data block element, t i indicates the timestamp of the data block, and v i indicates the version number of the data block. The version number is initially set to 1 and its value is incremented by 1 for each change of the data block. The data block elements in the dynamic hash table are connected by a chain table, and each data block element is a node in the chain table, while each node includes the version information of the data block, the timestamp, and a pointer to the next node. Once the dynamic hash table is established, operations such as search, insert, deletion, and modification can be performed at either the file level or the data block level.

Elliptic Curve Discrete Logarithm Problem
The elliptic curve discrete logarithm problem (ECDLP): Let G be an additive cyclic group of elliptic curves of the order of the large prime q and set g ∈ G as a generator. ECDLP means that, given g, a · g ∈ G, an attacker A calculates a ∈ Z * q . The probability that the attacker A can solve the ECDLP in polynomial time is negligible: where ε represents the negligible probability; that is, it is computationally infeasible to solve the ECDLP.

SM2-Based Offline/Online Efficient Data Integrity Verification Scheme
In this section, we give a detailed description of the proposed scheme.
(1) Setup(λ) → (E, G, q, g) : the CSP inputs the security parameter λ and generates the public parameters {E, G, q, g}. E : y 2 = x 3 + ax + bmodp is the elliptic curve, p and q are large prime numbers, G is an additive cyclic group of order q defined on E, and g is the generator of the group, G. (2) KeyGen → (k s , k p ) : the DO randomly selects k s ∈ Z * q as the private key and calculates k p = k s · g ∈ G as the public key. , calculates D i = d i · g ∈ G, and sets the coordinates of D i to {x i , y i }. For i ∈ [1, n], the DO calculates: and obtains the offline tag r i , s i 1≤i≤n .
The DO receives the online tag {r i , s i } 1≤i≤n , then sends {ID, i, m i , r i , s i , t i , v i } 1≤i≤n to CSP, sends {ID, i, t i , v i , D i , l} 1≤i≤n to TPA, and finally delete the local data.
(5) ChalGen(π) → i j : the TPA selects the random number π ∈ Z * q and sends it to the cloud server. Both parties take π as input, run the same pseudo-random function, per, and obtain the random c numbers i j (1≤j≤c) in [1, n] as the indexes of the challenged data blocks. (6) ProofGen( m i j , r i j , s i j , i j (1≤j≤c) ) → proof : after the CSP receives the audit request and generates the indexes of the challenged data blocks, it calculates ρ = ∑ c j=1 m i j , s = ∑ c j=1 s i j , and r = ∑ c j=1 r i j , and sends the proof {ρ, s, r} to the TPA as the proof of data possession. (7) VerifyProof(ρ, s, r, k p , D i j , x i j ) → true/false : the TPA receives the proof {ρ, s, r}, calculates t = r + s,D = ∑ c j=1 D i j , x = ∑ c j=1 x i j , and verifies whether the following equations hold: s · g + t · k p = D x + ρ + cl = r.
If Equations (6) and (7) hold, the DO is informed that the data integrity is not compromised. The correctness of them is derived as follows: Sensors 2023, 23, 4307 9 of 16 (8) DynamicUpdate: our scheme enables dynamic update operations on the cloud data, including insertion, deletion, and modification. Since the number of data blocks involved in the dynamic update is small, offline tags are not required in the dynamic update process. When a data block, m i , needs to be modified to m j , the DO selects a random number, d j , to calculate D j = d j · g ∈ G, where the coordinate of D j is set to x j , y j . Then, v j and t j are generated for the data block m j , and the tags r j = m j + x j + l and s j = (1 + k s ) −1 · (k j − r j · k s ) are calculated. Finally, ID, i, m j , r j , s j and ID, j, D j , t j , v j are sent to the CSP and TPA, respectively. After receiving ID, i, D j , t j , v j , the TPA finds the i − th node of the linked list corresponding to the file M in the dynamic hash table, and then replaces v i and t i with v j and t j . After receiving ID, i, m j , r j , s j , the CSP finds the location of m i and replaces m i , r i , s i with m j , r j , s j .
When the DO needs to insert the data block m j in front of the data block m i , they first select a random number d j to calculate D j = d j · g and set the coordinate of D j as (x j , y j ). Then, they generate v j and t j for data block m j and calculate the tags r j = m j + x j + l, s j = (1 + k s ) −1 · (k j − r j · k s ). Finally, the DO sends ID, i, m j , r j , s j and ID, i, D j , t j , v j to the CSP and TPA, respectively. After receiving ID, i, D j , t j , v j , the TPA finds the i-th node of the linked list corresponding to the file M in the dynamic hash table and inserts a new node after the i − th node with the content v j , t j . After receiving ID, i, m j , r j , s j , the CSP finds the location of m i , r i , and s i according to i, ID, and inserts m j , r j , s j in front of them.
When the data block m i needs to be deleted, {ID, i} is sent to the CSP and TPA. After receiving {ID, i}, the TPA deletes the i − th node of the linked list corresponding to the file M in the dynamic hash table. After receiving {ID, i}, the CSP deletes m i , r i , and s i according to i.
(9) BatchAudit: the scheme can implement a batch audit for multi-user cloud data. Each DO {u w } 1≤w≤x randomly selects the private key, k ws ∈ Z * q , and calculates the public key,k wp = k ws · g ∈ G. The DO {u w } 1≤w≤x randomly selects d wi , l w ∈ Z * q 1≤i≤n , calculates D wi = d wi · g ∈ G, and sets the coordinates of D i to {x wi , y wi } for i ∈ [1, n], calculates: r wi = x wi + l w , s wi = (1 + k ws ) −1 (d wi − r wi k ws ), and obtains the offline tag r wi , s wi 1≤i≤n . The DO u w uses the SM4 block cipher algorithm to encrypt the data file M w with the identity, ID w , and then divides M w into n blocks, expressed as m wi ∈ Z * q 1≤i≤n ; for each data block m wi , the DO u w generates the corresponding timestamp t wi and version number v wi , and calculates: r wi = m wi + r wi , s wi = s wi − k ws (1 + k ws ) −1 m wi , as the online tag {r wi , s wi } 1≤i≤n , then sends {ID w , i w , m wi , r wi , s wi , v wi , t wi } 1≤i≤n to the CSP, sends {ID w , i w , t wi , v wi , D wi , l w } 1≤i≤n to the TPA, and finally deletes the local data. The TPA selects a random number π as the parameter of per and sends it to the CSP. Both sides run the same pseudo-random function, per, and obtain the random number i w j (1≤j≤c) as the index of the challenged data block. After the CSP generates the indexes of the challenged data blocks, it calculates ρ = ∑ x w=1 ∑ c j=1 m wi j , s = ∑ x w=1 ∑ c j=1 s wi j , and r = ∑ x w=1 ∑ c j=1 r wi j , then {ρ, s, r} will be sent to the TPA as the proof. The TPA receives the proof, computes t = r + s, D = ∑ x w=1 ∑ c j=1 D wi j , and x = ∑ x w=1 ∑ c j=1 x wi j , and verifies the following equations: x + ρ + ∑ x w=1 l w = r. If Equations (10) and (11) hold, the TPA informs the total x DOs that data integrity has not been compromised. The correctness of them is derived as follows: w=1 ((∑ c j=1 s wi j g + k ws s wi j g) + ∑ c j=1 r wi j k ws g) = ∑ x w=1 (∑ c j=1 (1 + k ws )s wi j g + ∑ c j=1 r wi j k ws g) = ∑ x w=1 (∑ c j=1 (d wi j g − r wi j · k ws g − k ws m wi j g + r wi j k ws g)) = ∑ x w=1 (∑ c j=1 (d wi j g − r wi j · k ws g − k ws m wi j g + r wi j k ws g)) = D (12)

Performance Analysis
In this section, the computational overhead of the scheme and the advantage of the offline/online tags are first analyzed, then we compare the functions of our scheme with existing schemes [10][11][12][13][14], which proves that our scheme is more suitable for the IoT data storage environment and medical data storage environment. The schemes in Refs. [10][11][12][13][14] are novel cloud data audit schemes proposed in recent years. They are not out of date and, at the same time, they have been tested by scholars in the past two years. Then, we compare the computational overhead of our scheme with the schemes in Refs. [10][11][12][13][14] numerically. Finally, we experimentally verify the results of the numerical analysis of computational overhead to visualize the performance of our scheme.
We set G1 and G2 to be the additive cyclic group of E : y 2 = x 3 + ax + bmodp and the multiplicative cyclic group. p is a 512-bit prime number and q is a 160-bit prime number. The experiment was run on a 64-bit Windows 10 operating system with an i5 CPU, 2.5 GHz main frequency, and a 4 GB memory environment, using the JPBC library. After selecting a Type A elliptical curve and defining each operation, we ran each operation 10,000 times to obtain the average time overhead. The meaning of each operation and the corresponding time cost are shown in Table 2. To simplify the description, n is used here to denote the total number of data blocks, and c is used to denote the number of challenged data blocks. Because of the large values of n and c, we omit the operations' single occurrence in our analysis of the calculation overhead.
In the OffTagGen phase, the user needs to compute D i = d i · g and r i = x i + l, so the computational overhead is about n|M G1 | + n|A Z |. In the OnTagGen phase, the user needs to compute r i = m i + r i and s i = s i − k s (1 + k s ) −1 m i , so the computational overhead is about n|M Z | + 2n|A Z |. In the ProofGen phase, the CSP computes ρ = ∑ c j=1 m i j , s = ∑ c j=1 s i j , and r = ∑ c j=1 r i j , and the computational overhead is about 3c|A Z |. In the VerifyProof phase, after computing t = r + s, D = ∑ c j=1 D i j , and x = ∑ c j=1 x i j , the auditor also verifies the equations sg + tk p = D and x + ρ + cl = r, and the computational overhead is about c|A Z |+c|A G1 |. After using the offline/online tags, the computational overhead of the user in the scheme is about n|M G1 | + 3n|A Z | + n|M Z |. If offline/online tags are not used, the user needs to calculate D i = d i · g, r i = m i + x i + l and s i = (1 + k s ) −1 · (d i − r i · k s ); the computational overhead of the user is about n|M G1 | + 3n|A Z | + 2n|M Z |.
We compared our scheme with the existing certificateless schemes; the function comparison is shown in Table 3. As can be seen from Table 3, although other schemes are novel, their functions are not comprehensive. Our proposed scheme is the most comprehensive and the most suitable for the cloud storage environment of IoT data and medical data. The numerical computational overhead comparison of our scheme and other existing schemes is shown in Table 4. In the current cloud data audit schemes, the calculation overhead of the ProofGen and VerifyProof stages is borne by the CSP and TPA, respectively, while the calculation overhead of the TagGen stage is borne by the users themselves; the users only need to bear the calculation overhead in the TagGen stage. Because of the strong computing capability of the CSP and TPA, in the design of cloud data audit schemes, more emphasis should be placed on reducing the computing cost of the user side, that is, reducing the computing cost of the audit scheme in the TagGen stage. It can be seen from Table 4 that in the TagGen stage, the computational overhead of this scheme and the scheme in [14] is the smallest and is significantly smaller than other schemes. Therefore, this scheme and the scheme in [14] are more user-friendly and can be applied to equipment with lower computational power, which is more reasonable and efficient in its design. At the ProofGen stage, the computational overhead of our scheme is also significantly lower than that of other schemes. In the case where the number of challenged data blocks, c, increases gradually, the computational overhead of the other schemes increases at a faster and more dramatic rate than that of this scheme, and the advantages of our scheme are more significant. Table 4. Comparison of the computational overhead.

TagGen
GenProof VerifyProof [12] n(|H G2 | + |E G2 | +|E Z |) ≈ 1.9601n Scheme [13] n(s + 1)(|E G2 | + |M G2 |) +n|H G2 | ≈ 10.6066n (c + s)(|M G2 | + |E G2 |) +2|P| + c|H G2 | ≈ 1.9886c + 2|P| Scheme [14] n(2|A Z | + |M Z |) ≈ 0.0012n In order to test the performance of the scheme in terms of practical application and more intuitively compare the computational cost of each scheme, each scheme is run within the experimental environment, and the time costs in the stages of TagGen, ProofGen, and VerifyProof are recorded, as shown in Figures 3-5. The number of sectors s is set at 10 [23].   According to the above performance analysis, our scheme has more comprehensive functions and less time cost at each stage, especially in the TagGen stage, so it is more   According to the above performance analysis, our scheme has more comprehensive functions and less time cost at each stage, especially in the TagGen stage, so it is more   According to the above performance analysis, our scheme has more comprehensive functions and less time cost at each stage, especially in the TagGen stage, so it is more compatible with lightweight devices. Therefore, our scheme is more suitable for the IoT Figure 5. The time cost of the VerifyProof phase (Schemes 1-5 correspond to references [10][11][12][13][14], respectively). Figure 3 shows the time cost of each scheme in the TagGen phase when the total number of data blocks is set to 2000, 4000, 6000, 8000, and 10,000, respectively. It can be concluded that the time cost of each scheme increases as the number of data blocks increases, but the time costs of the scheme in [14] and of our scheme do not increase significantly as the number of data blocks increases. This is due to the use of exponential operations in Refs. [10][11][12][13], which consume a significant amount of computational capacity. However, in our proposed scheme, the computation of tags is divided into two stages: OffTagGen and OnTagGen. For the users, their computation burden should mainly take into account the online tag computation. In our scheme, the online tag computation only requires simple addition and multiplication operations, resulting in a small computation overhead. Even with a large amount of data, it will not impose a significant computation burden on users. Under the conditions of the same number of data blocks, the time cost of the schemes in Refs. [10][11][12][13] is significantly higher than that of the scheme in Ref. [14] and in this scheme.
The time cost of the GenProof and VerifyProof phases is shown in Figures 4 and 5, when the number of challenged blocks is set to 200, 400, 600, 800, and 1000, respectively. It can be concluded that in the GenProof stage, the time cost of the schemes in Refs. [10,14] and our scheme is relatively low, and ours is the lowest. Scheme [12] has the highest time cost. In the VerifyProof stage, the time cost of our scheme and the schemes in Refs. [10,12,14] are significantly lower than that of the schemes in Refs. [11,13]. With the increase in the number of data blocks, the audit efficiency of our scheme becomes more prominent.
According to the above performance analysis, our scheme has more comprehensive functions and less time cost at each stage, especially in the TagGen stage, so it is more compatible with lightweight devices. Therefore, our scheme is more suitable for the IoT storage environment and medical data storage environment.

Conclusions
In this paper, we constructed an efficient SM2-based offline/online data integrity verification scheme for IoT and medical data. In the stage of preprocessing data of the scheme, users use the SM4 symmetric encryption algorithm to encrypt data. We used the encrypted data to generate tags and then uploaded them to the cloud, thus achieving full data privacy protection. In the scheme, users employ the SM2 signature algorithm to construct data tags in the uploading data stage. The scheme divided tags into offline parts and online parts. Users can calculate the offline tags in advance to reduce computing costs. The scheme uses a dynamic hash table to support the dynamic update of cloud data and realizes batch audits of multi-user data. It can adapt to the IoT and medical data storage environment. The theoretical safety analysis proves the scheme's safety. The high level of efficiency of the proposed scheme is demonstrated by comparing it with five existing schemes in terms of efficiency. In future work, we will focus on adding more functions to the existing audit schemes to meet the increasing needs of users in the cloud storage environment.
Author Contributions: X.L. and Z.Y. contributed equally to this work; X.L. was responsible for the writing of the article and the construction of the scheme. Z.Y. was responsible for the derivation of the formulas in the article and gave some significant ideas. R.L. was responsible for the validation and formal analysis. X.-A.W. was responsible for the collecting of resources related to this article. H.L. was responsible for the verification of the security of this article. X.Y. revised the finished manuscript. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: All relevant data has been provided in the article. If someone have any other needs, he or she can contact the authors by email.

Conflicts of Interest:
The authors declare that they have no conflict of interest to report regarding the present study. and solve the DL problem.
We define the terms as follows. Event E 1 indicates that there is no interruption in the public key query. Event E 2 indicates that the forged aggregation tags {ρ * w , s * w , r * w } are valid. Event E 3 indicates that all the blocks m * wi of ID * w have been queried tags. Therefore: Adv DL C = Pr[E 1 E 2 E 3 ] ≥ ε 1 1 − q pk q pk n(n − 1) · · · (n − c + 1) q t (q t − 1) · · · (q t − c + 1) (A5) and C uses the time t : t < t + t inv + (3q t + 1)t a + 2q t t m + q t t M (A6) We can reach the following conclusion: under the random prediction model, if A can break our scheme with a non-negligible ε within t, then there is an algorithm C that can solve the DL problem by the advantage ε ε 1 1−q pk q pk n(n−1)···(n−c+1) q t (q t −1)···(q t −c+1) in time t < t + t inv + (3q t + 1)t a + 2q t t m + q t t M .

Theorem A2. (Privacy protection):
The scheme supports privacy protection for the user's data and a private key against both the CSP and TPA.
Proof. In the OnTagGen stage of the scheme, the user first employs the SM4 block encryption algorithm to encrypt the original data file and obtains the encrypted data blocks, m i . The online tags are calculated using the encrypted data block, m i , and the uploaded data are also the encrypted data. Therefore, even if the cloud stores a large quantity of data and tags, it is impossible to know the original data content. In the VerifyProof stage, TPA is unable to calculate the original data value from the aggregate data obtained and the aggregate tag. As a result, entities in the scenario other than the users cannot know the contents of the users' data.
The user's private key, k s , is only related to {s i } (1≤i≤n) in {ID, i, t i , v i , m i , r i , s i } (1≤i≤n) , stored at the cloud server. Therefore, the following system of equations will be listed when the cloud server tries to obtain the private key:            s 1 = (1 + k s ) −1 · (d 1 − r 1 · k s ) s 2 = (1 + k s ) −1 · (d 2 − r 2 · k s ) . . . s n = (1 + k s ) −1 · (d n − r n · k s ) (A7) k s and d i are unknown to CSP. Since there are n + 2 unknowns in n equations, the number of unknowns is always more than the number of equations so the private key k s cannot be calculated.