Verification of data integrity and co-operative loss recovery for secure data storage in cloud computing

Abstract In Cloud Computing, the data stored in the external servers may be tampered or deleted by unauthorized persons or selfish Cloud Service Providers (CSPs). Hence, the Cloud Data Owners (CDOs) have to provide assurance to the integrity and correctness of the stored data in the server. In this paper, a Verification of Data Integrity and Co-operative Loss Recovery technique for secure data storage in Cloud Computing is proposed. In this technique, a ciphertext file is split into various cipher blocks and distributed to randomly selected cloud service providers (CSPs) by the cloud data owner (CDO). If a cloud data user (CDU) wants to access any file, the corresponding ciphertext file is reconstructed from the blocks and downloaded by the user. The file can be decrypted if the attribute set of the user matches the access policy of the application. By simulation results, we show that the proposed technique enhances the data integrity and confidentiality. GraphicalAbstract

Abstract: In Cloud Computing, the data stored in the external servers may be tampered or deleted by unauthorized persons or selfish Cloud Service Providers (CSPs). Hence, the Cloud Data Owners (CDOs) have to provide assurance to the integrity and correctness of the stored data in the server. In this paper, a Verification of Data Integrity and Co-operative Loss Recovery technique for secure data storage in Cloud Computing is proposed. In this technique, a ciphertext file is split into various cipher blocks and distributed to randomly selected cloud service providers (CSPs) by the cloud data owner (CDO). If a cloud data user (CDU) wants to access any file, the corresponding ciphertext file is reconstructed from the blocks and downloaded by the user. The file can be decrypted if the attribute set of the user matches the access policy of the application. By simulation results, we show that the proposed technique enhances the data integrity and confidentiality.

PUBLIC INTEREST STATEMENT
This research investigates the security aspects of cloud data storage. Cloud computing offers customers a more flexible way to obtain computation and storage resources on demand. In this environment, customers can now rent the necessary resources as soon as, and as long as, they need. However, the data stored in the external servers may be tampered or deleted by unauthorized persons or the owner of the server. This research work designs a technique to protect the stored data from data loss and unauthorized access. In this technique, the data owner encodes a plain text file and splits it into many blocks. Each block is then distributed to randomly selected servers. If any part of the file is tampered, it can be detected by applying some verification techniques. By conducting experiments, it has been proved that the time involved in distributing and reconstructing the blocks is significantly less.

Introduction
The recent development of cloud computing has shown its potential to reshape the current way that IT hardware is designed and purchased. Cloud computing offers customers a more flexible way to obtain computation and storage resources on demand. Rather than owning (and maintaining) a large and expensive IT infrastructure, customers can now rent the necessary resources as soon as, and as long as, they need. It provides rich benefits to the cloud clients such as costless services, the elasticity of resources, easy access through the Internet, etc (Vurukonda & Thirumala Rao, 2016).
Cloud computing requires data security mechanisms which can prevent leakage and loss of user data. Different from the traditional computing model in which users have full control of data storage and computation, cloud computing entails that the management of physical data and machines are delegated to the CSPs (Wei et al., 2013). However, the CSPs are usually not trustworthy. They may conceal the data loss or error from the users for their own benefit. Even more, they might delete rarely accessed user data for conserving storage space (Cao, He, Guo, & Feng, 2016).
The traditional cryptographic technology cannot be implemented for cloud data security, as the users lose their control on data storage. Hence, it is required to verify the correctness of data stored in external storage. It needs to more advanced technology to avoid data loss from the cloud storage (El Mrabti, Ammari, & De Montfort, 2016) and (Chatterjee, Sarkar, & Dhaka, 2015).

Problem identification and objectives of the work
The main threats for data storage security are (i) Modifying the stored data to compromise the data integrity (ii) Unauthorized access of stored data (iii) Loosing part of data or whole data.
Hence, existing solutions should address these threats for efficient storage security. A detailed discussion of existing solutions is presented in the next section. Based on the problems identified in those works, an efficient solution is required with the following objectives: • The stored data should be protected from data loss and unauthorized access • The integrity of data blocks should be ensured • The confidentiality of data should be ensured.
• The storage cost should be reduced.
In order to meet the these objectives, Verification of Data Integrity and Co-operative Loss Recovery technique for secure data storage in Cloud Computing is proposed.

Related works
While the works (Cao et al., 2016) and (Zhu, Hu, Ahn, & Yu, 2012) deal with data integrity verification, they did not ensure recovery of lost blocks and confidentiality.
For ensuring the correctness of data and prevent data losses, the works (El Mrabti et al., 2016) and (Al-Anzi, Salman, & Jacob, 2014) split the encrypted data into various cipher blocks and distribute among different service providers. But the data blocks are distributed equally to all CSPs leading to a chance of fetching the blocks by any external adversary or malicious CSP in future. Moreover, the integrity of each block was not checked so that there may be a possibility of corrupted blocks.
A Steganographic Approach using Huffman Coding (SAHC) (Chatterjee et al., 2015) was applied to prevent data access by unauthorized users from cloud storage. In (Nikam & Potey., 2016), authorization is achieved in various levels such as providing static usernamepassword as the entry-level authentication, followed by OTP based on token generator technique. Both the works (Chatterjee et al., 2015) and (Nikam & Potey., 2016) protect unauthorized access and ensure confidentiality, but result in huge storage overhead.
The privacy-preserving auditing protocol (Yang & Jia, 2012) handles frequent data updates and maintain consistency of data, but did not provide data confidentiality. In (Yin, Qin, Zhang, Ou, & Keqin, 2017) a secure, easily integrated, and fine-grained query results verification scheme for secure search over encrypted cloud data was explained. This scheme can verify the correctness of each encrypted query result or further accurately find out how many or which qualified data files are returned by the dishonest cloud server. A short signature technique is designed to guarantee the authenticity of the verification object itself.
In , an iterative proximal algorithm (IPA) to compute a Nash equilibrium solution is proposed. The convergence of the IPA algorithm is also analyzed and we find that it converges to a Nash equilibrium. But in here the communication cost is not clearly analyzed. In (Liu, Li, Tang, & Li, 2018), a CA which can find the NBS very efficiently. For the general case, the author proposed an IA which is based on duality theory. The convergence of the IA algorithm is also analyzed. But here dynamic configuration of the multiple servers in cloud is not explained.

Overview
In this paper, a VDI-CLR technique for secure data storage in Cloud Computing is proposed. In this technique, CDO encrypts the original file using CP-ABE scheme and splits it into n/2 cipher blocks, where n denotes the number of CSPs. The cipher blocks are then distributed to randomly selected n/2 CSPs. In order to reconstruct the data blocks, the parity information P corresponding to these data blocks is stored in k CSPs, where k = n-n/2. Each block is represented by a block-tag pair (m i ,β i ) where β i is a short signature tag of block m i generated by a set of secrets. A probabilistic verification scheme is applied to prove the integrity of each block. If a CDU wants to access the file, the cipher file F' is reconstructed from the respective CSPs, after verifying the integrity of each block. Then, CDU decrypts the cipher file, if and only if its attribute set satisfies the access policy.

System model
The scheme consists of a Multiple Cloud Server Providers such as CSP 1 , CSP 2 … .,CSP n , Cloud Data Owner (CDO) and Cloud Data User (CDU). A Trusted Server (TS) is responsible for the system initialization and the generation of the master secret key. The Verification Server (VS) is used to verify the integrity of user data. Figure 1 shows the system model and their basic operations. Figure 1 represents the proposed system model. Let F be the original data, the client wants to store in the cloud storage. The original data F is encrypted to form F'. Then, F' is split into various cipher blocks and are stored in n/2 randomly chosen CSPs.

File fragmentation and distribution by CDO
The CDO encrypts the file, fragments and distributes it to randomly selected CSPs. The steps involved in this process are presented in the following algorithm. For example, consider Figure 2. Suppose n = 4. Let the ciphertext File F 1ʹ is split into 2 cipher blocks B 11 and B 12 . Suppose data block B 11 is stored in CSP 1 and data block B 12 is stored in CSP 3 . The parity information A associated with B 11 is stored in CSP 2 . Similarly, the parity information B associated with block B 12 is stored in CSP 4 . If any CDU request for the file F 1 , it can be reconstructed using parity information A in CSP 2 and of parity B in CSP 4 . This results in effective reconstruction of data with the help of this parity scheme, if a double disk failure occurs.

Integrity verification by VA
The VA applies a probabilistic verification method to prove the integrity of the blocks stored in various CSPs. The steps involves in this process are shown in the following algorithm. Initially, based on an access policy (AP), the CDO encrypts a file and creates a cipher text file F'. If a CDU wants to access the file F, it sends a request to the VA. VA reconstructs the cipher file F' after verifying the integrity of each block. The CDU downloads the cipher file F' and decrypts it, if and only if its attribute set satisfies AP. If CDO modifies the file F, it again creates an updated cipher file F'' and repeats the above process.

Experimental results
To validate the VDI-CLR technique presented in the previous section, we have implemented it in Java based on the CP-ABE toolkit and the Java Pairing-Based Cryptography library (JPBC) (De Caro & Iovino, 2011). To compare experimental results of the proposed scheme, we also simulate the typical CSS-FRS (El Mrabti et al., 2016) and RSSNS (Al-Anzi et al., 2014) schemes.
The following experiments are conducted by using Java on the system with Intel Core processor at 3.00 GHz and 4GB RAM running Windows 7 Ultimate. All the results are averages of 10 trials. As illustrated in (Yin et al., 2017), the complexity of the encryption and decryption in CP-ABE scheme can be measured from the two factors: (i) The time cost of encryption and decryption. (ii) The Storage cost of ciphertext.
For example, assume that the patient shares three files, i.e., M = {m1,m2, m3}, with three access levels, the access policy is designed as {(a 1 , a 2 , …, a i , i of i) AND a i+1 AND a i+2 }in FH-CP-ABE scheme. Accordingly, he should construct three access policies for CP-ABE scheme, where the policies are {(a 1 , a 2 , …, a i , i of i) AND a i+1 AND a i+2 }, {(a 1 , a 2 , …, a i , i of i) AND a i+1 }, and {a 1 , a 2 , …,a i , i of i}. The policies only contain AND gate to ensure that all the ciphertext components are computed in the decryption algorithm.

Results for varying the number of files
The experimental results in this section are obtained by varying the number of files to be downloaded by the CDU. The number of attributes used in the weighted policy is fixed as 4. The average file size is fixed around 5Mb. The number of CSPs maintained is 4.
The time cost is measured in terms of the data distribution and reconstruction times. Data distribution time involves the encryption time, splitting time and distributing into various CSPs. Similarly, data reconstruction time involves the time for fetching the blocks from various CSPs, time for combining the blocks as per parity information and decryption time. Figures 3 and 4 shows the results of data distribution and reconstruction times measured for files 1 to 5.
As it can be seen from Figure 3, the CSS-FRS scheme has the highest distribution time which increases from 10.15 s to 15.25 s when the number of files is increased from 1 to 5. The distribution time of RSSNS falls in the range of 11.45 s to 13.5 s. Since in both CCS-FRS and RSSNS, the data blocks are distributed equally to all CSPs, they need higher distribution time. The proposed VDI-CLR scheme attains the lowest distribution time in the range of 8.05 to 10.2 s, since it replicates only in half of the available number of CSPs.
Hence, the percentage improvement of VDI-CLR over CSS-FRS and RSSNS schemes are 31.1% and 26.9%, respectively.
As it can be seen from Figure 4, the CSS-FRS scheme has the highest data reconstruction time which increases from 12.15 s to 17.45 s when the number of files is increased. But the reconstruction time of RSSNS falls in the range of 13.45 s to 15.57 s. Since in both CCS-FRS and RSSNS, the data blocks are distributed equally to all CSPs, they need higher reconstruction time. The proposed VD-CLR scheme attains the lowest reconstruction time in the range of 8.25 to 11.21 s, since it can fetch the parity information from limited CSPs and need to obtain the blocks from half of the CSPs. Hence, the percentage improvement of VDI-CLR over CSS-FRS and RSSNS schemes are 36.2% and 32.1%, respectively.
The storage cost for the cipher text is measured in terms of the size of the encrypted files. Figure 5 shows the results of storage cost measured for files 1 to 5.
As it can be seen from Figure 5, the CCS-FRS scheme has the highest storage cost which increases from 2750 KB to 10480 KB when the number of files is increased. Ths storge cost of RSSNS falls in the range of 2175 KB to 9284 KB. However, the proposed VDI-CLR scheme attains the lowest storage cost in the range of 1270 KB to 6870 KB, since it stores the replicated data in half of the total available CSPs. Hence, the percentage improvement of VDI-CLR over CCS-FRS and RSSNS schemes are 45% and 37%, respectively. Figure 6 shows the correctness of downloaded data at the CDUs. As it can be seen from the figure, the correctness of data degrades slighlty for all the techniques, when the number of files is increased. Due to the integrity verification of each blocks, VDI-CLR has 10% and 7% increased correctness when compared to CCS-FRS and RSSNS techniques.

Results for varying file size
The experimental results in this section are obtained by varying the size of the files to be downloaded by the CDU. The number of attributes used in the weighted policy is fixed as 4. The average number of files is fixed as two files.
The time cost is measured in terms of the data distribution and reconstruction times. Figures 7 and 8 shows the results of data distribution and reconstruction times measured for file sizes from 5 to 25 MB.
As it can be seen from Figure 7, the RSSNS scheme has the highest distribution time which increases from 11.45 s to 12.9 s when the file size is increased5. But the distribution time of RSSNS falls in the range of 10.15 s to 12.11 s only. However, the proposed VDI-CLR scheme attains the lowest distribution time in the range of 8.05 to 9.36 s, since it replicates only in half of the available number of CSPs. Hence, the percentage improvement of VDI-CLR over CSS-FRS and RSSNS schemes are 20% and 27%, respectively.
As it can be seen from Figure 8, the RSSNS scheme has the highest data reconstruction time which increases from 13.45 s to 15.23 s when the file size is increased. The reconstruction time of CSS-FRS falls in the range of 12.15 s to 14.22 s only. However, the proposed VD-CLR scheme attains the lowest data reconstruction time in the range of 8.25 to 9.88 s, since it can fetch the parity information from limited CSPs and need to obtain the blocks from half of the CSPs. Hence, the percentage improvement of VDI-CLR over CSS-FRS and RSSNS schemes are 29% and 34%, respectively.
The storage cost for the ciphertext is measured in terms of the size of the encrypted files. Figure 9 shows the results of storage cost measured for file sizes 5 to 25 MB.
As it can be seen from Figure 8, the CCS-FRS scheme has the highest storage cost which increases from 2750 KB to 5380 KB when the file size is increased. The storge cost of RSSNS falls in the range of 2175 KB to 4394 KB. However, the proposed VDI-CLR scheme attains the lowest storage cost in the range of 1270 KB to 3770 KB, since it stores the replicated data in half of the total available CSPs. Hence, the percentage improvement of VDI-FLR over CCS-FRS and RSSNS schemes are 42% and 25%, respectively.

Conclusion and future work
In this paper, we have proposed a Verification of Data Integrity and Co-operative Loss Recovery (VDI-CLR) technique for secure data storage in Cloud Computing is proposed. In this technique, a ciphertext file is split into various cipher blocks and distributed to randomly selected cloud service providers (CSPs) by the cloud data owner (CDO). If a cloud data user (CDU) wants to access any file, the corresponding ciphertext file is reconstructed from the blocks and downloaded by the user. The proposed VDI-CLR technique has been compared with CCS-FRS and RSSNS techniques. The performance is measured in terms of data distribution time, data reconstruction time, the correctness of data and storage cost, for a different number of files and different file sizes. From the performance results, it can be concluded that VDI-CLR has the least data distribution and reconstruction time, least storage overhead along with improved correctness of data.
In cloud computing environment, both the cloud users and service providers should have a trusted relationship. But most of the works consider trusts of cloud users alone, failing to address the trust worthiness of cloud servers. In order to solve the above-mentioned issues, the future work aims to design a trust-based access control framework for a multi-cloud environment. Funding